Senin, 23 Februari 2015

Extracting Insights from Consumer Reviews

Here is one more example on how we can extract Insights from Consumer Reviews. This time we will use Reviews that were given for several Supplement Brands of Omega-3 Fish Oil.

For this example we analyze 4018 Reviews of Consumers who bought Omega-3 Supplements.  Keep in mind that in most cases each Product Review has an associated Rating (usually given as 1-5 stars) which signifies the overall satisfaction of each Consumer . Therefore, after data collection of the Reviews and Ratings we have a file with the following entries per row :

[Text of Review,Rating]

The fact that a Customer gives also a Score can be especially helpful because we can identify the words and Phrases that differentiate Positive experiences (ie those having 5 Star Ratings) from the Negative Ones (We assume that any Review having a Rating of  4 stars or less is Negative). So for example, Positive Reviews may contain mostly words and phrases such as "Great", "Happy" and "Will buy again" whereas Negative Reviews may contain words and phrases such as "Never buying again","not happy" or "damaged".

The tools used for this example are NLTK and Python. The code simply reads the reviews and associated text and creates a Matrix with the same representation as the file it read.

Next, we want to identify which Insights we can extract from this representation. For example :

-Identify which words commonly occur in 5-star reviews
-Identify which words commonly occur in Reviews with a rating of 4 Stars or Lower.
-Identify potentially Interesting Phrases and Words
-Extract term Co-Occurrences

We start with terms occurring more frequently in Negative Reviews for Omega-3 Supplements. Here is what we've found :






So it appears that people tend to give negative Reviews when the Taste (and possibly After-Taste) is not quite right. A lot of people complain about a Fishy odor. Notice also that the 3rd Term is sure which we can assume that it originates from customers saying that they are not sure if the Product works or not (Notice also that the 4th term is yet). Some more terms to consider :

however
rancid
krill (a type of Oil which is alternative Product to Omega-3 Supplementation)
soy
stick


Now let's look at the Terms associated with Positive Reviews :




great and excellent are terms that were expected to be found in Positive Reviews.  Some terms to consider are :

price
quality
brain
triglycerides
cholesterol

We move on to identifying potentially interesting terms and Phrases. Here is a Screenshot from the Software that i used  :







I added a Red Rectangle wherever sensitive information (such as Company Names) appears which for the purpose of this post is not relevant (but it certainly is relevant in a different setting).

We immediately see some interesting mentions, for example : Heavy Metal poisoning, Upset Stomach incidences, Cognitive Function , Joint Pains, Panic Attacks, Reasonably Priced Items, Postpartum Depression, Allergic Reactions, Speedy Delivery and Soft Gels that Stick together.

Recall that in a previous example we found that the term however is a term that occurs frequently within Negative Reviews. Some analysts may have chosen to treat this term as a stopword which in this case would be a serious mistake. The reason for this is that the term however shows us very often the reason for which a product or service is not receiving a perfect rating and vice-versa. Therefore, If a Data Scientist would have chosen to exclude this term from the Analysis (stopwords are typically removed from the text), potentially interesting insights would have never surfaced.

Ideally, we would like to know what is the context that occurs after the term however whenever this term occurs withing a negative review. That will help us to focus on all occurrences of however with negative sentiment. To do this, we only take into account all reviews containing the term however and having a Rating of 3 stars or less. It appears that the most common terms occurring after the term however was Fishy odor and After-taste. In other words, fishy odor is the cause that keeps Customers from giving a 5-star Rating.

On the other hand, phrases such as highly recommend are interesting because we may use co-occurrence analysis to see which terms co-occur with a highly recommended product.

Of course this is -by no means- the end on what we can do. To extract even better insights we have to spend significantly more time to do proper Pre-processing, use Information Extraction and use several other techniques to analyze Text Data in novel and potentially interesting ways.



Tidak ada komentar:

Posting Komentar