BACKGROUNDAND RATIONALE:The main purpose of this project is toperform sentiment analysis on Twitter data. In today’s time various informationis gathered from micro-blogging websites. The main reason for this is due tothe nature of micro-blog on which people post real time messages about theiropinions on a variety of topics, discuss current issues, complain, and expresspositive sentiment for products they use in daily life. Many companiesmanufacturing such products have started to poll these microblogs to get asense of general sentiment for their product. Most of the times, thesecompanies study user reactions and reply to users on microblogs.
Through this project we look at one suchpopular microblog called Twitter and build models for classifying “tweets” intopositive, negative and neutral sentiment. This project will be based on twoclassification tasks: a binary task of classifying sentiment into positive andnegative classes and a 3-way task of classifying sentiment into positive,negative and neutral classes. RESEARCH QUESTION:To analyse the sentiments of the people whiletwitting in Twitter about a particular topic and to understand whether theparticular issue has positive, negative or neutral impact on peoples minds.SOURCES OF DATA:We will use the below mentioned packages to extractthe tweets from twitter: Twitter : This will provide an interface to the Twitter web API.ROAuth : This will provide an interface to te OAuth 1.0specification ,allowing users to authenticate via OAuth to the server oftheir choice.Stringr : It’s a fast and friendly string manipulation.
Plyr : It is a set of clean and consistent tools that implement thesplit-apply-combine pattern in R. The sentiments of the tweets will be classified basedon the polarity of theindividual words. Each word will be given a score of +1 ifclassified as positive, -1 if negative,and 0 ifclassified as neutral.
We will determine this by using positive and negative lexicon lists compiled inthe AFINN wordlist , which has2477 words and phrases rated from -5 very negative to +5 very positive.AFINN words are divided into four categories:· Very Negative (rating -5 or -4)· Negative (rating -3, -2, or -1)· Positive (rating 1, 2, or 3)· Very Positive (rating 4 or 5 or 6) We will use word cloud which is text mining method that allows us to highlightthe most frequently used keywords in a paragraph of texts. It is a handy toolwhich will allow us to to highlight the most commonly cited words in atext using a quick visualization. We will also perform Twitter Analysis tocreate a twitter application which will allow us to perform analysis byconnecting our R console to the twitter using the Twitter API.
ROLE OFR PROGRAMMING:We will be using R programmingto determine the sentiments score by downloading the positive and negative wordtexts and uploading into R console. First we will scan the words into R, we caneven add our own words into the positive and negative word list. Once thetweets are ready we just need to apply somefunctions to convert these tweets into some useful information.
The important working principle ofsentiment analysis is to find the words in the tweets that represent positivesentiments and find the words in the tweets that represent negative sentiments.The sentiment analysis uses two packages plyr and stringr to manipulatestrings. Then we put the tweets into frame and apply the sentiment function tothe tweets and generate the summary and histogram of the Scores.
We can evencount the tweets as per the score or the scores of the tweets. The snapshot generated of the score file will show the score of each tweetas an integer in front of every tweet. This will helpus to analyze the number of positive, negative or neutral tweets published bythe users in their Twitter account which will further help us to analyze thesentiments of the people whether they are having positive or negative orneutral thoughts about any particular activity on the internet. Sentiment analysis will also give us thepopularity about a particular topic and we can decide whether that topic iscreating positive or negative impact on the human mind.