The main purpose of this project is to
perform sentiment analysis on Twitter data. In today’s time various information
is gathered from micro-blogging websites. The main reason for this is due to
the nature of micro-blog on which people post real time messages about their
opinions on a variety of topics, discuss current issues, complain, and express
positive sentiment for products they use in daily life. Many companies
manufacturing such products have started to poll these microblogs to get a
sense of general sentiment for their product. Most of the times, these
companies study user reactions and reply to users on microblogs.
Through this project we look at one such
popular microblog called Twitter and build models for classifying “tweets” into
positive, negative and neutral sentiment. This project will be based on two
classification tasks: a binary task of classifying sentiment into positive and
negative classes and a 3-way task of classifying sentiment into positive,
negative and neutral classes.
To analyse the sentiments of the people while
twitting in Twitter about a particular topic and to understand whether the
particular issue has positive, negative or neutral impact on peoples minds.
SOURCES OF DATA:
We will use the below mentioned packages to extract
the tweets from twitter:
Twitter : This will provide an interface to the Twitter web API.
ROAuth : This will provide an interface to te OAuth 1.0
specification ,allowing users to authenticate via OAuth to the server of
Stringr : It’s a fast and friendly string manipulation.
Plyr : It is a set of clean and consistent tools that implement the
split-apply-combine pattern in R.
The sentiments of the tweets will be classified based
on the polarity of the
individual words. Each word will be given a score of +1 if
classified as positive, -1 if negative,
and 0 if
classified as neutral. We will determine this by using positive and negative lexicon lists compiled in
the AFINN wordlist , which has
2477 words and phrases rated from -5 very negative to +5 very positive.
AFINN words are divided into four categories:
Very Negative (rating -5 or -4)
Negative (rating -3, -2, or -1)
Positive (rating 1, 2, or 3)
Very Positive (rating 4 or 5 or 6)
We will use word cloud which is text mining method that allows us to highlight
the most frequently used keywords in a paragraph of texts. It is a handy tool
which will allow us to to highlight the most commonly cited words in a
text using a quick visualization. We will also perform Twitter Analysis to
create a twitter application which will allow us to perform analysis by
connecting our R console to the twitter using the Twitter API.
We will be using R programming
to determine the sentiments score by downloading the positive and negative word
texts and uploading into R console. First we will scan the words into R, we can
even add our own words into the positive and negative word list. Once the
tweets are ready we just need to apply some
functions to convert these tweets into some useful information. The important working principle of
sentiment analysis is to find the words in the tweets that represent positive
sentiments and find the words in the tweets that represent negative sentiments.
The sentiment analysis uses two packages plyr and stringr to manipulate
strings. Then we put the tweets into frame and apply the sentiment function to
the tweets and generate the summary and histogram of the Scores. We can even
count the tweets as per the score or the scores of the tweets. The snapshot
generated of the score file will show the score of each tweet
as an integer in front of every tweet.
This will help
us to analyze the number of positive, negative or neutral tweets published by
the users in their Twitter account which will further help us to analyze the
sentiments of the people whether they are having positive or negative or
neutral thoughts about any particular activity on the internet. Sentiment analysis will also give us the
popularity about a particular topic and we can decide whether that topic is
creating positive or negative impact on the human mind.