There have been many articles written on linguistic and sentiment analysis for newspapers text, financial data service text application and microblogging web text message in the literature. Because being easy to collect the comprehensive datasets and increasing the population of the microblogging provide an efficient study field for the researcher. Bagehot (1971) was early researcher in studying about the effect of information on trading. He presented a new idea to the literature. Basically, he emphasized information asymmetry term and the effect of information on trading, without confirming any mathematical model. The basic idea created an inspiration for information based on market activity for many researchers in the following years. (Storkenmaier, 2011). Glosten and Milgrom (1985) demonstrated a scientific model to the literature in order to develop Bagehot’s new idea (1971). Their definition and approach observed that how market participants with superior information play a role to notify to the market (Glosten & Milgrom, 1985). After this studying, Jegadeesh (1990) suggested new evidence in to the literature. He applied that the stock returns can be predictable, utilizing experiential systematic movement of stock returns. In following Jedageesh’s predictability of stock returns approach, Mitchell and Mulherin (1994) investigated the relationship between volumes & returns of the market and news reports annouced by Us Stock Exchange also Dow Jones companies. They claimed a direct relation between numerous Dow Jones reports and market trading. Additionally, they emphasized that New York Times headlines and additional macroeconomics reports can be a significant predictor. In another early studying in this topic, Trahan and Bolster (1997) analyzed a popular financial magazine, Barron. They estimated to see a relation between stock prices and magazine’s section of investment news and views, besides the other articles which contain purchase recommendations. In conclusion, they found second-hand information in media had a positive impression on stock price significantly. Also, another approach to using text mining was presented by Chan (2003). He analyzed the relationship between monthly returns of individual companies and headlines of journalists, publications, and newswires related individual firms from 1980 to 2000 period. In this way, he tested the investors’ reaction to the news. Chan emphasized that investors give a slow reaction after bad news and the post-event deviation occurs mostly after bad news and this deviation also is very strong. On the other hand, Chan mentioned that a stock which has good news exhibited less deviation. Antweiler and Frank (2004) published an article in The Journal of Finance in 2004. In the studying, they investigated 1,5 million messages situated on Yahoo Finance web site associated with 45 firms in American Stock Exchange. According to Antweiler and Frank perspective, stock-related messages can be beneficial implements to predict volatility of the stock market statistically. In light of the literature, Read (2005) noted that sentiment classification pursues to determine a piece of text in order to understand that it is positive or negative according to its author’s general feeling. The author demonstrated that match with respect to domain and time is also important, and presents preliminary experiments with training data labelled with emoticons such as “:-)”1 and “:-(“2 to form a training set for the sentiment classification. The emoticons can be independent of field, subject, and date (Read, 2005). In addition to Read, Antweiler and Frank (2006) examined an event study analysis. Using computer linguistic analysis, they classified 250.000 Wall Street Journal’s stories from 1973 to 2001. As a result, they found that stock prices are affected by news instantly and constantly. Likewise, Tetlock (2007) handled the Wall Street Journal to investigate the effect of media on the stock exchange. He designed a Pessimism Media Factor model to forecast price, volume, and performance of DJIA stocks. The author concluded that negative news of the media has an influence on return but this effect continues temporarily. Moreover, a stock volume is predictable when this negative effect is outstandingly large or short. In line with this definition, Tetlock et al. (2008) implied a linguistic analysis to foresee accounting revenue and stock gain of individual companies. He focuses on stories about S&P 500 companies in The Wall Street Journal and Dow Jones News Service for 24 years period. The author implied ordinary least squares (OLS) regressions for prediction. He found three assumptions. First, Negative words about the company predicted low company earnings. Second, companies’ stock return is impressed by information in negative words clandestinely. Third, negative words in stories are principally beneficial forecasters for both of earning and returns (Tetlock, et al., 2008). In the field of sentiment analysis, Choudhury et al. (2008) improved a baseline model to perform communication dynamics in the blogosphere. Using these dynamics, they defined stimulating correlations with stock market movement. It is remarkably observed that the communication significantly correlated with the stock market. They used two baseline methods and Support Vector Machine3 (SVM) and they showed that average 78% accuracy to predict the scale of stock market movement and 87% accuracy to the direction of the same movement. At this point, another paper presented that even if mass media doesn’t content right news, stock prices can be impressed by mass media (Fang & Peress, 2009). The authors analyzed that the relationship between mass media and expected stock return. They defined that stock with no media coverage has an advantage return on the stock with media coverage in primarily much more clear among small stocks. Furthermore, stock returns are affected wideness of information spreading. Giller (2009) inspected a little dataset for an experiment in the use of Twitter to publicize a record of directional intraday index futures trades. The author implied the maximum likelihood ratio test and he revealed a positive correlation between success and an increase in the number of followers. In line with this consideration, Go et al. (2009) design an algorithm that can properly classify Twitter messages as positive or negative, with respect to a query term. The research results reported that high accuracy on classifying sentiment in Twitter messages utilizing machine learning methods. Gloor et al. (2009) presented an unconventional studying of social network analysis. The studying based algorithms for mining the Web, blogs, and online forums to determine trends and find the investors who start these new trends. The authors presented a correlation between Web buzz and real-world events. In this research, Word of Mouth (WOM) identified as the transferring of information from an individual to another individual by oral communication. Moreover, WOM is more trustable information about any brands and products. Next step, with developing technology, Electronic WOM is online information transferring from an internet user to another inter-user at the present time. Jansen et al. (2009) examined more than 150,000 microblog texting of social media sites containing branding discussions, sentiments, and perceptions. They planned to clarify the impact of microblog texting via electronic WOM on the brand information and brand relationship. Within this framework, they analyzed the timing, frequency, the range of tweets. They conducted that microblogging is an online tool for individuals and companies. Using microblogging is a part of word of mouth communications for individuals and it is also a tool for companies to see an effect of their marketing strategy. Bollen et al. (2010) collected Tweet texting related (DJIA) over time. They designed to measure collective mood states. For this reason, the authors determined 6 dimensions4 in order to the mood in terms. Their results proved that using with public mood dimensions can improve to predict DJIA significantly. Moreover, they state that they realized a precision of 87, 6% in predicting the daily fluctuations for Dow Jones Index close values. In addition, the authors succeed to decrease the Mean Average Percentage Error more than %6. In the linguistics analysis literature, Sprenger et al. (2010) examined approximately 250.000 twitter messages related S&P 100 companies for on a daily basis, using methods computational linguistics and Naïve Bayesian Classification. The authors expressed that message volume with abnormal stock return includes respected information to forecast following day trading volume. Moreover, they pointed out that users writing investment advice more than average are retweeted more often and have more followers. Correspondingly, Pak et al. (2010) presented a method for an automatic collection of a corpus that can be used to train a sentiment classifier. They collected 300.000 text posts from Twitter. They implemented linguistic analysis for sentiment analysis and opinion mining purposes. They explained discovered phenomena. Using the linguistic analysis, they design a sentiment classifier in order to define positive, negative and neutral attitudes for a text. Analyses proved that their proposed method is effective and achieves better than earlier proposed methods (Pak & Patrick, 2010). In the research discussion, Davidov et al. (2010) aimed to inspect the sentiment classification method based on Twitter. They categorized the Twitter message with respect to their tags5 and symbols6 that represents the writer’s emotional state. As a result, they achieved to classify short text in connection with the feeling their authors have. O’Connor et al. (2010) compared few surveys on public opinion besides consumer confidence and Twitter messages from 2008 to 2009 years. They issued around %80 a correlation between surveys and sentiment scores of Twitter message simultaneously. In this way, they verified that Tweeter emotional analysis can be used as public surveys with this paper. Castillo et al. (2011) studied the information reliability of news in a given set of tweets. They prove that it is able to separate messages related newsworthy subject from other kind of text messages. They evaluated social media reliability about newsworthy topics. They also show that they can assess automatically the level of social media credibility of newsworthy topics. A few authors write credible news and these news is broadcasted by other online users with re-posting. There are measurable differences in the way messages propagate to classify them as trustworthy or not trustworthy, with precision and recall in the range of 70% to 80 (Castillo, et al., 2011). Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%. Bollen et al., (2011) implied a sentiment mining for Twitter post messages. They used a psychometric test to ensure six mood states like their previous studied in 2010. However, they changed kind of six moods (tension, depression, anger, vigor, fatigue, confusion) this time. The Twitter text message was associated daily and it was computed a six-dimensional mood vector from postings (Bollen, et al., 2011). They analyzed extremely specific effect between texts related economic, political, cultural, and social, other major events and six-dimensional Profile of Mood States (PMOS). They find that events in analyses of public mood can provide to detect the emotive trend of society. Furthermore, this trend can help to ensure indicators to predict economic events. Following the existing theory, Zhang et al. (2011) aimed to analyze Twitter posts in order to predict stock market indicators in U.S. Financial stock market index. They gathered Tweets for six months. They collected the twitter feeds for six months. By the same token, they calculated collective hope and fear daily and they observed to a relationship with stock market indicators. They expressed negative correlation between tweet subjectivity analysis and Dow Jones, S&P 500 and NASDAQ indexes. On the other hand, the authors demonstrated positive correlation Chicago Board Options Exchange Volatility Index significantly. Moreover, they displayed that if emotions on the Twitter increase, people feel hope, fear, and worry. Then, Dow Jones Index decrease next day. In contrast, if observed twitter sentiment loses hope, fear and worry the Dow Jones Index increase next day. Therefore, tracking on twitter opinion extraction is a useful predictor to see the stock market next day (Zhang, et al., 2011). Rao and Srivastava (2012) investigated relationship Tweeter messages and stock price, volume and volatility of DJI, NASDAQ-100 Index, 13 technology companies. They introduced %88 correlation twitter sentiment analysis and stock movement. The authors defined an equation to predict stock returns with a high value of R-square (%95, 2). Mao et al., (2012) implied a regression model with exogenous input to forecast stock market indicator by Twitter data exogenous input. The result displayed that tweets are correlated stock market indicators (Mao, et al., 2012). Moreover, they find that Twitter is a beneficial tool to forecast the stock market. Sprenger et al., (2014) demonstrated a methodology to determine news events based on social media. They implied a computational linguistics to more than 400,000 stock-related Twitter messages about S&P 500. They separate good and bad news. They resulted that the returns before good news events are clearer than returns before bad news events. They displayed that the stock market effect of news events is different in diverse categories. Türkmeno?lu and Tantu? (2014) comprised Lexicon based and Machine Learning methods. They analyzed Turkish twitter dataset and movie dataset by using these two approaches in order to display their weak and strong ways. Çoban et al., (2015) collected Turkish Twitter messages to investigate the classification methods for sentiment analyses. They assessed the methods such as Naive Bayes, Multinomial Naive Bayes, Naive K Nearest Neighbors and Support Vector Machines learning algorithms. N-gram model is better performing than Bag of Word model. Also, Naïve Bayes exhibited the best performance in algorithmic methods (Çoban, et al., 2015). Ranco et al., (2015) collected 15 months Tweeter message to demonstrate relations twitter sentiment analyze, Twitter volume and abnormal returns of 30 companies of DJIA index. They claimed significant dependence between abnormal returns and Twitter subjectivity analysis when Twitter volume reached the peak level. Furthermore, the authors demonstrated that Twitter volume at the peak level can forecast the direction of stock returns. Eliaç?k and Erdo?an (2015) suggested a new user metrics method. For this purpose, they calculated financial community’s sentiment polarity in social media to test the method. They analyzed to a correlation between BIST100 index and financial community’s sentiment weekly. They achieved a significant linear relationship between the market and financial community’s sentiment by using this recent method. Souza et al., (2015) researched intercourse of 10.949 new stories, from DJI Newswires, Barron’s Magazine and the Wall Street Journal, nearly 42, 8 million Twitter messages and stock of 5 retail brands in US Stock Exchange. They observed the relations respect to views of stock returns, volatility, and Twitter sentiment. They presented that social media is more available and performer source for analyzing to market financial dynamics than sentiment analysis of Dow Jones Newswires and Wall Street Journal. Giudice (2015) chose the Microsoft, Apple, Google, Facebooks’ stocks from American Stock Exchanges for analysis. He studied the sentiment analysis of tweets containing hashtags of these four stocks using VAR model granger causality analyzing. The author found that tweeter sentiments have no impact on the stock returns. Furthermore, he explained that sentiment analysis is not an adequate trading strategy in order to predict the future movement of stock prices. Dickinson and Hu (2015) realized an opinion extraction for stock-related tweets in order to display a correlation between investors’ feeling and stock price movement. They used n-gram and “word2vec” techniques to classify the tweets. They found a significant correlation between the stock market and sentiment mining. Microsoft and Walmart have a positive correlation. On the other hand, Goldman Sachs and Cisco System have a negative correlation strongly. They claim that consumer-facing companies have a different interaction according to other companies. Heston et al., (2016) used a dataset of more than 900,000 news stories. They aim to test whether news can predict stock returns using textual analysis of news stories based on a neural network. They confirm that daily news is able to forecast stock returns for a few days and this case also confirmed their previous research. On the other hand, weekly news forecast stock returns for one quarter (Heston & Sinha, 2016). After positive new stories, stock return increase rapidly. However effect of negative stories have longer deferred response (Heston & Sinha, 2016). Pagolu et al., (2016) collected 250.000 Tweet messages in order to imply a sentiment analysis for about one year periods. In this way, they tested to a correlation between Microsoft’s stock price and tweeter in their work. In conclusion, they provided significant and strong correlation with %71,82 percentage between sentiment mining and movement of stock price. Akgul et al., (2016) tested sentiment twitter software. The program classifies each tweet as positive, negative or neutral. They used the program to assess which method one has better performance among n-gram and lexicon methods. In conclusion, they demonstrated lexicon method has better performance. In literature, Kaynar et al., (2016) analyzed the performing of classification algorithms models such as Multi-Layer Perception, Support Vector Machines, Central Based Classified and Naive Bayes. They used the content of review of Internet Movie Database (IMDb) for analyzing. They found that neural network and SVM7 demonstrated better performance. Kordonis et al., (2016) collected Twitter data and they implied Naive Bayes Bernoulli and Support Vector Machine to analyze the sentiment of Twitter. As a result, they found a correlation between subjectivity analysis of Twitter and stock price. Joshi et al., (2016) researched the relationship between the financial news articles and stock trends. They aimed to capture the relation between stock trends and sentiment analysis. They provided a good test performance via Support Vector Machines8 and Random Forest Model9. On the other hand, Naïve Bayes10 had a relatively good performance but not better than other two (Joshi, et al., 2016).Otherwise, the authors stated that the success of the prediction model is more than %80. Baykara and Gürtürk (2017) investigated a sentiment analyze for Twitter by using Bayes algorithm methodology. They determined the Twitter message posts are positive, negative or neutral. In addition, they classified users’ tweeter message as news, politics, and culture, according to their content. Kebabc? and Diri (2017) developed a system to classify Turkish Tweets. They implied Naive Bayes and Support Vector Machines together for classification. They studied Hybrid TFIDF method in order to summarize classified tweets. In conclusion, the article achieved to define opinion of Tweeter user publicly. Kürkçü (2017) aimed to define the interaction of degree of online news media. Furthermore, she collected to the official twitter account of Turkish Agencies and Turkish Newspaper daily. She found a ratio of “retweets” and “likes” in tweets in 48 hours after news occurs in order to measure users’ interaction levels. Zhang et al., (2017) aimed to provide full tweet content in different languages, as well. Therefore, they developed a multilingual tweets classification method to support over 40 languages. They encrypted the order of characters in a tweet according to UTF-8 codes and they used a character-based CNN classification method. They showed that UniCNN model is better performing than traditional methods and it is fully language independent. Also, the code based system doesn’t require any tokenization or translation. Social media my affect the financial markets (Jadhav & Wakode, 2017). Therefore, the authors calculated sentiment score of Twitter in 2017. They tested different technique in order to forecast the stock market. Moreover, Jadhav and Wakode (2017) argued to improve and accelerate the performance of computation.
3 Support Vector Machine is a learning algorithms model to examine data for classification and regression analysis.
4 Calm, Alert, Sure, Vital, Kind, and Happy
5 50 Twitter tags
6 15 smileys
7 Support Vector Machines
8 % 90 correctly classified
9 % 80 correctly classified
10 % 75 correctly classified