Text Mining, Analytics & More: Sentiment Analysis

Showing posts with label Sentiment Analysis. Show all posts

Jan 11, 2015

Micropinions vs. Micro-reviews

I was recently asked "What's the difference between micropinions and micro-reviews?" based on my paper "Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions". The two concepts are actually fairly synonymous. Micropinion refers to a set of short phrases expressing opinions on a specific topic or entity. Here is an example of a micropinion summary of a restaurant:

As you can see, the size of micropinions are small enough that they would easily fit within the display of a hand-held device. Constraints can be imposed to limit the size of the overall generated micropinions. Micro-reviews on the other hand typically refer to reviews posted on social media websites like Twitter and Four Squares where the goal is to provide a review of a product or some service (e.g. restaurant) within the character limits imposed by these social media websites. Here are a few example of micro-reviews from Twitter:

"A Song of Ice and Fire" series by George R R Martin. It's ok, but doesn't live up to hype
— Micro Book Review (@microbookreview) May 7, 2013

Scribblenauts (DS): As one person said, "The main menu screen is funner than the real game." Hit and Miss controls and too repetitious. 7/10
— Micro Game Reviews (@MicroGameReview) September 30, 2009

So in essence, the terms micro-reviews and micropinions refer the same concept: Very concise opinions about an entity or topic often adhering to some constraints (e.g. character limit or word limit ). Hope this explains!

Dec 19, 2014

What is an Opinion-Driven Decision Support System?

Opinion Driven Decision Support System (ODSS) refers to the use of large amounts of online opinions to facilitate business and consumer decision making. The idea is to combine the strengths of search technologies with opinion mining and analysis tools to provide a synergistic decision making platform.

The research and engineering problems related to developing such a system include :
(1) opinion acquisition
(2) opinion based search
(3) opinion summarization
(4) presentation of results

Opinions in this case can be aggregation of user reviews, blog comments, facebook status updates, Tweets and so on. Essentially any opinion containing texts on specific topics or entities qualify as candidates for building an ODSS platform. Here's a description of some of the research and engineering problems towards developing an ODSS platform:

1. Search Capabilities Based on Opinions

The goal of opinion-based search is to help users find entities of interest based on their key requirements. Since a user is often interested in choosing an entity based on opinions on that entity, a system that ranks entities based on a user’s personal preferences would provide a more direct support for a user’s decision-making task. For example, in the case of finding hotels at a destination, a user may only want to consider hotels where other people thought was clean. By finding and ranking hotels based on how well it satisfies such a requirement would significantly reduce the number of entities in consideration, facilitating decision making. Unlike traditional search, the query in this case is a set of preferences and the results is a set of entities that match these preferences. The challenge is to accurately match the user’s preferences with existing opinions in order to recommend the best entities. This special ranking problem is referred to as Opinion-Based Entity Ranking. Many of the existing opinion mining techniques can be potentially used for this new ranking task. I have explored information retrieval based techniques to specifically solve this ranking problem and there has been a few follow-up works (from other groups) trying other approaches.

2. Opinion Summarization (i.e. Sentiment Analysis + Text Summarization)

Opinion summaries play a critical role in helping users better analyze entities in consideration (e.g. product, physician, cars, politican). Users are often looking out for major concerns or advantages in selecting a specific entity. Thus, a summary that can quickly highlight the key opinions about the entity would significantly help exploration of entities and aid decision making. The field of opinion summarization has been long explored with most techniques being focused on generating structured summaries on a fixed set of topics. These are referred to as stuctured summaries. In the last few years, textual summaries of opinions have been gaining more and more popularity. Bing Liu's Opinion Mining Tutorial covers some of these recent works or you can refer to this article point (5).

3. Opinion Acquisition (i.e. Opinion or Sentiment Crawling)

To support accurate search and analysis based on opinions, opinionated content is imperative. Relying on opinions from just one specific source not only makes the information unreliable, but also incomplete due to variations in opinions as well as potential bias present in a specific source. Although many applications rely on large amounts of opinions, there has been very limited work on collecting and integrating a complete set of opinions. I recently explored a very simple method to collecting large amounts of opinions on arbitrary entities.

The idea of an Opinion Driven Decision Support (ODSS) was developed as part of my thesis. For more information on this please see Kavita's thesis.

Oct 15, 2014

Text Mining, IR and NLP References

These are some Text Mining, IR and NLP related reference materials that would be useful to anyone who is doing research and development in the area of Text Data Mining, Retrieval and Analysis. I have found many of these resources particularly useful in getting me started. Please note that this page is periodically updated.

Opinion Analysis

Survey: Opinion Mining and Sentiment Analysis [ pdf ]

This is a fairly complete survey that covers some of the core techniques and approaches used in Opinion Mining (prior to 2008). Note that the techniques covered are the earlier ones that do not necessarily involve summarization of Tweets or short texts. In particular, it does not cover the newer body of work focusing on textual summarization of opinions.
(By Bo Pang and Lillian Lee - 2008)

Opinion Mining Tutorial [ pdf ]
This is a nice and easy-to-follow set of slides on Opinion Mining. The main focus in these slides is the use of heuristics / data mining based approaches to opinion mining. It does not really cover some of the more recent probabilistic / learning based approaches, but it gives a fairly good introduction to Opinion Mining. (By Bing Liu)

Survey: Opinion Mining and Summarization [pdf]
This survey zooms into recent research in the area of opinion summarization, which is related to generating effective summaries of opinions so that users can get a quick understanding of the underlying sentiments. Since there are various formats of summaries, the survey breaks down the approaches into the commonly studied aspect-based summariztion and non-aspect based ones (which includes visualization, contrastive summarization and text summarization of opinions). (By Kim et al - 2011)

Interesting tasks within Opinion Mining and Sentiment Analysis [link]
A one page summary of the various tasks within opinion mining and related areas. (By Kavita Ganesan)

Automatic Text Summarization

A Survey on Automatic Text Summarization [pdf]
This is a well written survey about text summarization. The focus of this survey is mainly on techniques in extractive summarization. The authors talk about techniques used in single document summarization, multi-document summarization and also include a nice section on evaluation methods. There is one section that talks about sentence compression which can be considered a form of abstractive summarization.
(Dipanjan Das and André F. T. Martins. 2007)

Text Summarization an Overview [pdf ]
This article contains nice descriptions of the various text summarization methods used. The explanation is fairly intuitive. It also attempts to classify the summarization methods in several ways (e.g. abstractive vs extractive) - which is very useful. (Elena Lloret, 2008)

Query Based Text Summarization Tutorial [view]
This is a nice deck of slides summarizing the area of text summarization. Easy to follow and has a lot of useful information.
(Mariana Damova)

Abstractive Summarization

More info here

Information Retrieval

Stanford IR/NLP Book [ read online ] [ pdf ]
A very good reference point for IR/NLP tasks. I would recommend this to anyone who is getting in to the IR field. The concepts are well explained and easy to understand. (By Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze)

Information Retrieval - a Survey [ pdf ]
A good survey on classic IR approaches. Topics include VSM, Bayesian, Term Weighting..etc (By Ed Greengrass 2000)

Statistical Language Models [ pdf ]
This book systematically reviews the large body of literature on applying statistical language models to information retrieval with an emphasis on the underlying principles, empirically effective language models, and language models developed for non-traditional retrieval tasks. Recommended to those who want to get in-depth knowledge on language model based retrieval approaches. (By ChengXiang Zhai 2008)

Search User Interfaces [ read online ] [ book ]
This book talks about the design of search user interfaces, how to evaluate search interfaces, effective methods of presentation and other useful tips that relates to users of a search system. This book is ideal for those who are interested in designing, studying or improving upon search systems from the user's perspective. (By Marti Hearst 2009)

Recommended Reading for IR Research Students [ pdf ]
This paper highlights some of the very core papers that should be read if you are in the IR field. This is a good place to start if you need a refresher or are a new student.

Faceted search [ pdf ]

(Daniel Tunkelang)

Jul 20, 2011

User Review Data Set for Sentiment Analysis, Opinion Mining and Summarization

If you are looking for user review data sets for opinion analysis / sentiment analysis tasks, there are quite a few out there. These dataset below contain reviews from Rotten Tomatoes, Amazon, TripAdvisor, Yelp, Edmunds.com and so on.

Here are some of the many dataset available out there:

Dataset	Domain	Description	Courtesy Of
Movie Reviews Data Set	Movies	This is a collection of movie reviews used for various opinion analysis tasks; You would find reviews split into positive and negative classes as well as reviews split into subjective and objective sentences. This dataset was initially used to predict polarity ratings (+ve/-ve).	Pang & Lee
Multi-Domain Sentiment Dataset	Products (books, dvds..)	Product reviews from Amazon.com covering various product types (such as books, dvds, musical instruments). The data has been split into positive and negative reviews. There are more than 100,000 reviews in this dataset. The reviews come with corresponding rating stars. This dataset was initially used to predict polarity ratings (+ve/-ve).	Blitzer et. al
LARA Review Dataset	Hotels & Products	Reviews from Amazon.com and TripAdvisor. It contains attributes such as author name, content, date and the ratings. This dataset was initially used to decompose user reviews to preference rating on aspects.	Wang et. al
Opinosis Review Dataset	Hotels, Cars, Electronics	Topic related sentences extracted from user reviews. You will find 51 topics with approximately 100 sentences each (on average). The reviews were obtained from multiple sources - Tripadvisor (hotels), Edmunds.com (cars) and Amazon.com (various electronics). This dataset was used for text summarization of opinions.	Ganesan et. al
OpinRank Tripadvisor and Edmunds.com Dataset	Hotels & Cars	Reviews of cars and and hotels collected from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews). For cars, the extracted fields include dates, author names, favorites and the full textual review. For hotels, the fields include date, review title and the full review and also includes gold standard judgments for ranking. This dataset was initially used for opinion-based entity ranking.	Ganesan & Zhai
Restaurant Review Dataset	Restaurants	Contains a total 52077 reviews. The fields contain rating information, review counts, percent and cuisine type	Elhadad
SNAP Review Dataset	Products	Contains a 34,686,770 Amazon user reviews from 6,643,669 users. This dataset was initially used for recommendation systems.	McAuley
MovieLens Dataset	Movies	100,000 ratings (1-5) from 943 users on 1682 movies. Each user has rated at least 20 movies. Simple demographic info for the users (age, gender, occupation, zip) Please note that the review text is not available	GroupLens Research Project at the University of Minnesota.
Micropinion Generation Dataset (CNET)	Electronics	330 review texts. The reviews are on products from various categories like tv, cell phones, gps etc. This dataset was used for text summarization of opinions.	Ganesan & Zhai

May 1, 2011

Interesting Research Topics in Opinion Mining and Sentiment Analysis

A friend once asked me "What do you guys do with opinions, you all seem to be working on the same thing?!".

When talking about the area of opinion analysis in general, the common misconception is that it is all about trying to predict the polarity of a piece of 'opinion text' as being positive or negative. It is true that a lot of people have actually studied the task of sentiment polarity prediction, but the area of opinion analysis is actually much broader than just that. Actually, many of the tasks related to opinion analysis often go unnoticed due to lack of 'popularity'. Here are some interesting tasks related to opinion analysis which I have discovered over the course of reading related literature. I also have pointers to some highly cited papers within each subtask.

1. Subjectivity Detection

This task is about determining if a piece of text actually contains opinions or not (i.e. subjective expression or objective?). It is not so much about determining the polarity of the text itself (see task 2). Here are some notable papers related to this task:

2. Sentiment Prediction

This task is specifically about predicting the polarity of a piece of text usually positive or negative. People have studied sentiment prediction at the document level, sentence level and phrase level. This is an extremely popular task in the field of Opinion Analysis.
Some notable papers for this task can be found here.

3. Aspect Based Sentiment Summarization

This task goes beyond sentiment prediction. The goal is to provide a nice little sentiment summary at the feature or aspect level. For example for an iPhone you may have features like design, sound, screen, and etc. The goal is to provide a summary in the form of star ratings or scores on each of these features. So the task involves finding features and then discovering the sentiments for each feature. This task is now quite popular as it solves a practical need.
Some notable papers for this task can be found here.

4. Constrastive Viewpoint Summarization

This task is about trying to highlight contradiction in opinions where present. For example, some people may say the healthcare plan is a great idea and some may say that it is a failure waiting to happen. With contrastive viewpoints highlighted, people can get a better understanding of the opinions and under which condition it holds.
Some notable papers for this task can be found here.

5. Text Summarization for Opinions

Instead of generating structured summaries of opinions, another useful summary format is to generate textual summaries. For example, a few sentences summarizing the reviews of a product or a set of phrases acting as summaries.
Here are some related papers for this task:

6. Predicting Helpfulness of Online Comments/Reviews

Some comments or reviews may be more helpful or insightful compared to others. Instead of displaying these comments or user reviews in chronological order, sorting the reviews by its helpfulness would improve user productivity. This task thus aims at automatically predicting the helpfulness of user reviews instead of just relying on user votes.
Some related papers for this task can be found here.

7. Opinion-Based Entity Ranking

Opinion based-entity ranking is basically the task of ranking entities based on opinions. The query is essentially "preferences" for the entity. The results would be the likelihood of the entities matching those preferences. So the more opinions on the entities match the specified preferences, the higher the rank. This is very useful in finding for example attractions in a specific location that are considered to be "safe, close to the airport and child friendly". The first work to explore this can be found here: http://kavita-ganesan.com/opinion-based-entity-ranking . There have been some variations and improvements over this work by different groups:

Opinion-Based Entity Ranking (first work)

CONSENTO: a consensus search engine for answering subjective queries

Consento: a new framework for opinion based entity search and summarization

Improving Opinion-based Entity Ranking

Review Based Entity Ranking using Fuzzy Logic Algorithmic Approach: Analysis

Other Related Tasks

8. Product Feature Extraction

9. Opinion Retrieval

For more information about some of these tasks you can check these surveys.

sentiment analysis, opinion mining