Senin, 27 September 2010

Inside a consumer's mind with Text Analytics



So far we have seen several examples on how Predictive Analytics applied in Social Media and Blog posts can help us suggest better strategies in Marketing, Branding, Sales and PR . This post is a walk-through example on how we can choose a concept, extract what users write about this concept on Twitter, get insights on how consumers think / behave about it and finally group similar consumer thoughts and experiences using Cluster Analysis. A "concept" could be :

- Any activity
- A Brand (e.g Apple Inc.)
- A Product / Service
- A Politician


and -almost- anything discussed in user Tweets .

What we will look at is work that was made specifically for understanding what consumers think, liked or disliked while visiting a shopping Mall. What do people feel when visiting a Mall? Which words are associated with a positive experience or when a smiley is present in Tweets about Malls? Using the Twitter API approximately 36000 distinct Tweets where collected on consumer experiences from visiting a shopping Mall (sample below shows an example of a consumer's negative sentiment ) :



So how can an analyst get into a consumer's mind by analyzing Tweets and how would this information be useful? To find some answers I teamed up with Marketing Strategist Dr Nikos Dimitriadis to assist me in the actionability and interestingness of each extracted insight. Note that we capture thoughts from a biased sample which means that we cannot make inferences about the general population. However this work can be a great additional tool for finding new ideas and insights for Marketing initiatives -on top of more traditional methods such as focus groups- and also enables us to form several hypotheses as to what could likely work.

After a number of pre-processing steps to clean captured Tweets from irrelevant information (such as links), replace words with their synonyms and remove frequently occurring words such as 'and', 'to', 'at', 'in' and 'mall' and also filter all Tweets with small length i started performing frequency counts of the words contained in Tweets about Malls :


We immediately notice how often LOL and :-) (smiley) appear in Tweets about being, going or returning from the Mall which also gives us examples of consumers being in a specific mood . Here is what happens when we look at the most frequently occurring 2-word phrases :


and 3-word phrases (Note : ive = i've) :





Looking at the two charts we also notice that we frequently find the phrases :

- My best friend : since consumers Tweet the fact that are visiting a Mall with their best friend.

- My nails done : appears to be one of women's frequently discussed activity.

We then could look at Words and Phrases that seem interesting in understanding consumer experiences and values when visiting a Mall, such as :

- Shop
- Shoes
- Parking Lot
- Food Court
- Need / Want
- Walk around
- Made my Day
- Post Picture FaceBook

and mine through all these words / phrases to understand what consumers think : What exactly made the day of consumers who used the phrase "Made my Day" in their tweets? How do consumers feel when they visit the Mall with their best friend? when they are alone? Which activities trigger positive feelings? But more importantly : How could one use this information to better understand consumers and Market a concept? More on the next post.

Selasa, 21 September 2010

Social Media Insights from Predictive Analytics



Here is one more example on how Predictive Analytics may help professionals to make better decisions. For this post a total of 3000 Social Media title posts where analyzed to gain -hopefully- important insights for Social Media professionals. To achieve this, Text Mining was used to analyze the text of titles, identify the most important subjects (do posts about Personal Branding tend to be re-tweeted more than Social Media Monitoring?) and also try to prioritize the various areas of Social Media.

We start with the basics. Many of Social Media pros read (and write) about various subjects : How-to's, things to avoid, Adoption of Social Media etc). The first goal was to identify the most frequently occurring subject areas in Social Media posts using simple keyword frequencies. The following chart shows this information :


Although the fact that Social and Media is on top of the list is not much of an insight or that Twitter appeared in posts more frequently than FaceBook, we see that Brand is found more frequently than Marketing or Strategy.

However, there is a slight problem : The chart shown above is about single words and perhaps measuring how often 2 adjacent words occur in Social Media posts could be more useful with Social Media being omitted (click to enlarge):



This leads us to the fact that most of Social Media posts where found to be about How-to's (note that phrases How to and ways to have similar meaning). One could dig more to identify the concepts for which How-to's apply (How to monetize, How to be successful, How to avoid mistakes etc)


The next goal was to find words and phrases that are commonly found in posts with a high number of retweets (>40). To get this insight various Text Mining techniques where used. The following features have been taken into consideration :

- Author of Post
- Title of Post
- Number of Retweets


and here are some of the results :



Words that have a negative weight tend to be found in SM posts with a low number of re-tweets (write, talk, trust, sentiment) while launch and America where commonly found in popular posts. Please notice (the reason will be explained later) that personal is one of the hot words but also link and increase.

With this information, an analyst may then identify why such words tend to commonly exist in popular Social Media posts. Here are some insights :

  • Personal Branding appears to be a hot area. People are primarily interested on the various ways they can increase their "personal worth" in the Social Media arena.
  • IWOM : Internet Word Of Mouth is also a concept that frequently occurs in SM posts with many re-tweets.
  • Positive & Possible : It appears that posts that discuss various possibilities in a positive way (use of the word could) where found to be re-tweeted more (recall link and increase keywords discussed previously).

Minggu, 05 September 2010

"Ways to stop Social Media and Sentiment Mining"



While looking at my Google Analytics account i came across a keyword search originated from Australia which was different from keywords that usually drive traffic to my blog. The keywords were the following :

"Ways to stop Social Media and Sentiment Mining"

I decided to write this post assuming that the person who submitted this search does not like the fact that machines are mining his points of view about people or products or "understand" to some point whether he/she feels happy or not.

Among the many interesting aspects of being a Data Miner is to explain to other people what a Data Miner does (this was also discussed by G Piatetsky - Shapiro if my memory serves me well). When asked, i sometimes say that i also "analyze emotions as these are expressed on the Web". At first people are very interested but after a short amount of time almost always the next responses go along these lines :

- Are you allowed to do this?
- Is this legal?
- Have you ever heard about Big Brother?

It's no big secret that emotions play a major role in our lives and drive our decisions. Many people start to realize that companies are already using Information Extraction and Data - Text Mining techniques to extract the things that we discuss about various products or people and better understand our behavior. I believe that the most important thing in this area is not just Sentiment Mining or in other words whether we feel positive or negative about a Person, Product or Brand but the ability of Analytics to extract our core values and analyze our emotions.



When applying Text Mining or a mixture of Data and Text Mining methods on -for example- Twitter, we are not only able to see the sentiment for a product. We can identify a user that is alone, feeling bored and watching television. We can form several hypotheses on whether users that survived from Cancer express more positive thoughts than other user groups (see Surviving Cancer, Happiness and Twitter), find what sort of lifestyle makes a CEO happy or whether a specific profession increases your chances of being single (see Twitter Analytics : Cluster Analysis reveals similar users). Cluster Analysis can also identify core values of people and what they want or what trying to avoid.

Some of the examples discussed above have a clear business value while others don't. The important fact however is that analysts now have data to analyze emotions and our responses on facts happening in our lives on a much deeper level. This information has not been available on this scale before.

Should we stop extracting these insights and how dangerous can these insights become?