Rabu, 25 Januari 2012

Text Analytics for Telecommunications - Part 1

As discussed in the previous post, performing Text Analytics for a language for which no tools exist is not an easy task. The Case Study which i will present in the European Text Analytics Summit is about analyzing and understanding thousands of Non-English FaceBook posts and Tweets for Telco Brands and their Topics, leading to what is known as Competitive Intelligence.


The Telcos used for the Case Study  are Telenor, MT:S and VIP Mobile which are located in Serbia. The analysis aims to identify  the perception of Customers for each of the  three Companies mentioned and understand the Positive and Negative elements of each Telco as this is captured from the Voice of the Customers - Subscribers.


By analyzing several thousands of Tweets and FaceBook posts and comments we can have a first glimpse of Competitive Intelligence. For example when we wish to identify which words frequently occur with mentions about postpaid packages this is what we find  :




Red boxes show Telco Brands - notice "mts" and "mtsa" which point to the same Telco, namely mt:s.  Blue boxes indicate similar words that should be merged.  From a first look at the results above we see that : 

a) mt:s is found more frequently when users mention PostPaid packages.

b) Telenor and VIP Mobile are not found as frequently as MT:S in PostPaid package conversations.

c) We see several  problems from insufficient pre-processing : Kredit and Kredita (=credit) should merge into one word, the same applies for telefona - telefon, internet - interneta and mts - mtsa.



Notice that we can perform the same High-level analysis for several Telco Topics such as Network, Billing, Customer Care, Promotions, Questions of subscribers and so on. The next task is to identify the reason(s) why MT:S was found to have more mentions about PostPaid packages. Note that at this point we do not know why this is so : It could be the fact that MT:S prices of prepaid packages are high, very cheap or something else is happening that needs to be identified.


The Serbian Language poses extra work because it is a highly inflected language : Even the ending  of  Brand names change according to the usage.  Consider the following :

U mts-u (at mts)
Sa mts-om (With mts)
Bez mts-a (Without mts)


It is evident that a highly inflected language explodes our feature space and for this reason R can come to the rescue with some success. We can use R for changing several synonyms to one word, removing (Serbian) stop words, removing URLs and performing several other pre-processing steps that are necessary prior to an extensive analysis. More on the next post.

Senin, 09 Januari 2012

Case Study : Competitive Intelligence for Telecommunications

Telcos are a good example of a fast moving business environment and a good candidate for using Competitive Intelligence analysis from Social Media sources. The Case Study involves three major Telcos located in an Eastern European Country and shows the results from the analysis of thousands of Tweets and FaceBook wall posts to understand the following :


- How subscribers perceive each Telco Brand? 

- Which information do subscribers tend to Re-Tweet and "Like" on FaceBook Wall Posts? 

- Which words and Topics are commonly found with Intense feelings / thoughts?

- Which topics are mostly discussed when subscribers compare two or more Telco operators?

- What do subscribers discuss about  Network Quality and Speed, Billing, Promotions, Marketing Events, Customer Care, TV Commercials etc.

- How do they prioritize these topics and which of them are interesting and why?  

- What do subscribers talk about in general (i.e without any Telco Brand being mentioned) regarding Internet speed, Charges and what would they expect to see more?

I will present the Case Study mentioned  above in the forthcoming 9th Annual European Text Analytics Summit in April in London - UK. The Case Study is an example of application of Text Analytics to a language for which currently no tools exist and thus all difficulties and possible solutions will also be discussed. Examples will be also given on analyzing information to different conceptual levels and how this technique provides even more insights in consumer behavior.

The following tools were used for the analysis : 

- GATE to annotate all Topics that occur within Telco conversations (such as "sms", "internet", "dropped call", "network","promotion") and for setting up Conceptual Levels.

- R for pre-processing Text and performing Text Classification, Topic Detection and Cluster Analysis.

- WEKA  for Feature Selection and Text Classification.

- Finally,  Java is used to manage the information that is generated from GATE such as  understanding how subscribers prioritize various Telco Concepts and Topics and also identify important phrases and/or words that frequently occur when these Topics are being discussed.