Sabtu, 31 Oktober 2009

The sentiment on US Economy from Twitter

Is the economic crisis over? What is the sentiment of people regarding US Economy and the future? These are some of the questions that many people ask these days and the signs are somewhat mixed. Dow Jones is close to the 10000 mark and some US Economy Indices show that the worse is behind. But do people feel the same?

To answer these questions 10000 Tweets containing the word economy were collected with the purpose of finding out what people think and how they feel about the US Economy and the economic crisis. The following web chart shows some of the results :



PositiveSentiment is an annotation type that includes all words that suggest positivity such as good, better,advances while the opposite annotation (NegativeSentiment) exists for all keywords that suggest negativity.

The bolder the lines between words the heavier the association. To get an idea of how people feel, look at the line that connects NegativeSentiment and the word still which implies that the strongest sentiment is that US Economy is still under big problems.

Some other findings :

- US President tells that the economy gets better but people don't feel the same.

- Economy cannot be getting better while at the same time there are layoffs.

- People expressing very negative feelings after losing their jobs.


Notice also the association between NegativeSentiment and people, job, money, sales. Interesting insights can also be found if brand names and product categories are also taken into account : In this analysis a specific brand was found that was associated with word sales and a good overall sentiment. Buying behavior can also be found regarding consumer intentions.

You will also find that an association exists between finance_institution keywords (implying keyword Fed) and PositiveSentiment. This association exists because a number of Re-Tweets is about the Fed signaling the start of exit from recession and its impact on housing. Interesting also is the association between the words fool and annotation PositiveSentiment (...)

Specific Tweets were removed such as spam Tweets (that try to sell investing products). Re-Tweets were kept intact since we are making the assumption that if someone Re-Tweets -say- a positive sentiment Tweet then he/she also feels the same -positive- sentiment. Tweets that were jokes were identified, marked accordingly and removed.

As with many examples in the past, the software that was used consisted of GATE (for annotating unstructured text from Tweets) but also SPSS Clementine (now PASW Modeller). Here is the setup from GATE :




Specific rules (JAPE) were used that identify and annotate accordingly negative and positive sentiment. Consider the following sentences :

- The economy is most likely bad at the moment
- If the economy is great then why so many people can't find a job?

The first sentence has clearly a negative sentiment since the word bad exists. However the second phrase contains the word great so a specific matching rule should take into consideration the word If and annotate this phrase as one having negative sentiment despite the presence of word great.

After running GATE here is how the -now structured- data look like from a smaller sample of the original dataset (notice the highlighted record and the IfGood flag) :


With data in a structured form as the one depicted above we are then ready to identify which Tweets were found having a positive or negative sentiment, see erroneous annotations , take corrective actions and finally analyze the information and extract knowledge from it.

Senin, 12 Oktober 2009

Mining the Tweets

I received through my Google Alerts a very interesting article : Twitter is in talks with Microsoft and Google regarding the use of Data Mining technology on user Tweets.

Despite the fact that Twitter execs do not appear so eager in making the deal as soon as possible, these news clearly show where things are going. If and when the deal is finalized it will be very interesting to see :


1) What kind of Data and Text Mining techniques will be mostly used? Which of them will prove useful?

Many examples of what can be done in terms of Data and Text Mining application on Twitter were given in this blog (starting from January 2009). In my opinion, types of analysis that will prove to be interesting -apart from Sentiment Mining for Products and Services which is already taking place- are Cluster Analysis (see post "Clustering the Thoughts of Twitter Users" here) and Prediction of Virality.

Although Twitter will be able to monetize through insights extracted from Cluster Analysis and Opinion - Sentiment Mining perhaps the most important analysis is finding patterns in user emotional states. Recall that everything needed for such an analysis exists in user Tweets : Life Events, thoughts and their associated emotional states. What emotions drive people in making several decisions such as which Product to buy or which Politician to support? What kind of feelings are generated during a bad economy? Perhaps by analyzing Tweets we could understand people (and thus consumers) in entirely new ways since this is the first time that this information is available to us.

2) How will Twitter users react when knowing their Tweets are being analyzed?

My first impression is that Twitter users do not care too much if companies extract the insights discussed above however this does not mean that people's opinion will stay like this. Again, user reaction on this matter is something that could be changed anytime and should be looked at closely.

3) Which other technologies will be mostly sought?

Although no one can give a definitive answer, i would likely expect Natural Language Processing (NLP) and Ontologies to be also heavily used and sought as expertise.