Sentiment Analysis of Twitter for You and Your Competitors - Part 1

This post is split in two, primarily because I hit a roadblock half-way through the work – and I wanted to get the first part out. Second part to follow once I’ve fixed the difficult problems!

A lot of people follow the Twitter feeds for competitors or, of course, themselves. But, one of the things I’m sure you’d like to know is – are the tweets for any given company generally favourable or not? Does the company’s Twitter community love or hate the competitors? Or you!?

I’m going to try and find out. The process of investigating this problem was arduous, so below is the summary of the things that actually worked. I have a machine-learning background, but have done very little on natural language processing. There were a lot of dead-ends before I got here..

In summary, there are some standard techniques for doing something called “Sentiment analysis” for sets of textual data. I.e. if you have, say, 1,000 pieces of text written for something – a book, a film, in this case – a company – are they largely positive or negative?

Obviously I’m not inventing the methodology here, but stealing from various sources. Or rather “building on others’ work” (full acknowledgements below – I really am just using the hard work of others). Here are the steps I went through if you want to try this at home:

Bought the book Web Data Mining by Bing Lau. I can’t remember who recommended it, but chapter 11 of this tomb, “Opinion Mining” is a nice overview of what we’re trying to do here, and the steps you need to take.
From this, I learnt that step 1 is a natural language parsing technique known as part-of-speech (POS) tagging. For something like sentiment analysis, the algorithm has to have some understanding of the structure in a sentence, i.e. what are the different parts of speech therein – what’s a Verb, an Adverb, a Noun and so on. This is needed because adverbs and adjectives are particularly useful for sentiment analysis (e.g. “great film!”).
So far, so good. But there’s a snag – Twitter is notorious for abbreviations, emoticons and so on. How does a standard POS tagger pick up LOL and ? As you’ll see below, this was the main issue that stopped this being an almost trivial exercise.
During the course of my investigation in to this problem, Google found me a group that have, essentially, produced a complete solution to (almost) all of the sentiment analysis problem that I face. I almost feared this, as I was rather looking forward to piecing the problem together step-by-step. However, the Natural Language Processing group at Stanford have produced the Stanford Core NLP library* which basically does everything from tokenising, to POS tagging to sentiment analysis. And there’s a .NET version of this which is very impressive. Thank you so much for the port Sergey Tihon and of course the whole team at Stanford.
There’s a great bit of example code that comes with the library (available via NuGet), that lets you test out different parts of the process. To carry out sentiment analysis is a series of steps – first you have to tokenise the sentence/tweet – not always trivial for Twitter – then you have to POS tag as described above to identify the different parts of speech. Then of course you have to carry out the sentiment analysis. The example below shows why this library (and technique) is so powerful. If we carry out sentiment analysis on the short sentence “this isn’t good”, we get the following output:
```
Sentence #1 (4 tokens, sentiment: Negative):
This isn't good
[Text=this SentimentClass=Neutral]
[Text=is SentimentClass=Neutral]
[Text=n't SentimentClass=Neutral]
[Text=good SentimentClass=Positive]
```
Obviously the clever thing here (apart from the tokenisation which doesn’t just treat “isn’t” as one word), is the way in which the algorithm can’t determine that the word “good” is positive, but that the sentence is negative overall – as mentioned above, you need to see words in the context around them and the algorithm is smart enough to know that the “not” that is part of “isn’t” negates the positive word “good” that follows it, to make the overall sentence negative – very clever.
So, what’s going on here? It’s fine and easy to take someone else’s code and run it on your data – but one of things I’ve learnt for machine learning is that you have to understand what’s going on. There are no perfect algorithms for all scenarios – are these showing the right results for my data? Are they interpreting the (weird, shortened) English that is Twitter? What happens to non-English tweets (well I know that – it doesn’t work). And are tweets for companies different to the LOLZ and other phrases used more generally for Twitter?
We’ll use an example to illustrate the first big problem that I hit. I tried out a number of tweets gathered for a given company and spotted something untoward going on – that a number of tweets didn’t seem to be getting categorised in a way I expected. As an illustration, I used the following real tweet (company names and hashtags changed):
```
company3: Half way through a great panel session with #Company1 and @Company2 @ #LaunchEvent launch event http://test.co/test
```
Now for me, this is a pretty positive tweet. But running it through the vanilla models used in the Stanford NLP module, it gets a rating of 1, where:
```
 0 = Very Negative
 1 = Negative
 2 = Neutral
 3 = Positive
 4 = Very Positive
```
Perhaps at worst, this tweet might be misclassified as neutral, but “Negative” seems plain wrong. Also note, this is a statistical process – I know and expect a number of misclassifications, it’s part of the game – but this seems like quite a simple example.

So I tried some variation on this tweet to see what other scores I got – in an attempt to find the cause of the problem. Here are the results:

company3: Half way through a great panel session with #Company1 and @Company2 @ #LaunchEvent launch event http://test.co/test - score of 1

company3: Half way through a great panel session with Company1 and Company2 launch event - score of 3

company3: Half way through a great panel session with #Company1 and @Company2 @ #LaunchEvent launch event - score of 3

Looking at these examples it certainly implies that the algorithm is being thrown by the URL at the end – the @ and # symbols aren’t causing problems (in this specific case), but the URL is.
So, hypothesis 1 – it’s just the URLs that are causing the problem. The next step was to try simply removing the URLs from tweets (with a bit of regex), and running with what was left. However, trying this new approach on a number of other tweets shows that this doesn’t seem to be the problem. For example, the following should really be classified as neutral at worst:
```
FirstName: RT @Company1: Free eBook: 50 Web Performance Tips for Developers (from @Company2) http://test.com/test - score of 1
```

What seems to be going on here? Hypothesis 2 – it seems like there’s a problem rooted in the POS-tagging for the tweet. Looking at the tags assigned:

(ROOT
  (NP
    (NP (NNP FirstName))
    (: :)
    (NP
      (NP
        (NP (NN RT) (NN @Company1))
        (: :)
        (NP (NNP Free) (NNP eBook)))
      (: :)
      (NP
        (NP (CD 50) (NN Web))
        (NP
          (NP (NNP Performance) (NNP Tips))
          (PP (IN for)
            (NP (NNP Developers)))
          (PRN (-LRB- -LRB-)
            (PP (IN from)
              (NP (NN @Company2)))
                (-RRB- -RRB-))
              (NN http://test.com/test))))))))

If you look at what these definitions mean (the standard Penn-Treebank 45 tags):
There definitely seems to be an issue where many of the elements of tweets – URLs, hashtags, user IDs – aren’t being picked up and are instead just being classified as regular nouns (NN).

So my next port of call – is there a POS tagger which uses more Twitter specific tags, for example for URLs, hashtags and so on? Yes – I found the following from the University of Sheffield – https://gate.ac.uk/wiki/twitter-postagger.html ‡. This provides a pre-trained model that can plug in to StanfordCoreNLP, and will pick up the elements of tweets, not found in the example above. Using this model instead, on the very same sentence above yields:

(ROOT
  (NP
    (NP (NNP FirstName))
    (: :)
    (NP
      (NP (RT RT) (USR @Company1))
      (: :)
      (NP
        (NP (JJ Free) (NN eBook))
        (: :)
        (NP
          (NP (CD 50) (NN Web) (NN Performance) (NNS Tips))
          (PP (IN for)
            (NP
              (NP (NNP Developers) (NNP -LRB-))
              (PP (IN from)
                (S
                  (VP (USR @Company2)
                    (NP (UH -RRB-))))))))))
    (URL http://test.com/test)))

As can be seen, this is tagging elements more appropriately – USR for users, URL for URLs and so on. Seems great. However, if we again look at the classification…

FirstName: RT @Company1: Free eBook: 50 Web Performance Tips for Developers (from @Company2) http://test.com/test - score of 1

Sigh.

So, I’m going to pause there. I’ve understood the problem a little – tokenisation, followed by POS tagging, followed by sentiment analysis – I’ve found a standard framework – the StanfordCoreNLP codebase – which is incredible, and a better Twitter POS tagger.

However, my results still aren’t quite there. Running this algorithm for an example company (for which I have 10,000+ tweets), I find that positive tweets are clearly and, generally picked up. The struggle is distinguishing between neutral and negative tweets – far too many neutral tweets are being categorised as negative. This could be a number of things – many of these tweets are, by nature, very neutral and specific to B2B companies. Is this the mis-application. Of a generic model to a specific data set? Do I need to create my own model for “Company tweets”? How do I protect from over-fitting, how can I add to the great work already done? To be continued…

* Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. [pdf] [bib]

‡ L. Derczynski, A. Ritter, S. Clarke, and K. Bontcheva. 2013. “Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data”. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, ACL.