Acceleratingcrimes on internet alerts the law implementation bodies to keep an eye ononline activities which involve huge data. This will build a requirement todetect suspicious activities online available on discussion forums byoptimizing the usage of data mining tools. This paper highlights on the datamining techniques which are prototyped and implemented for closely studying discussionforums data for suspicious activities in different domains. Thus, for detectingsuspicious discussions on the discussion forums dataset, numerous miningmethods have been implemented till date. Through this, doubtful activities canbe revealed by analyzing the interests of all users.
The main obstacle faced byresearchers in doing so, is the lack of information retrieval and data analysistools for real time data of forum websites. The existent database is quite massiveand thus to extract desired knowledge from such large search space of socialdata, an intelligent and interactive data mining algorithm isrequired. Moreover, the involvement of large number of parameters in thesearch space makes the large-scale search impractical. Consequently, efficientsearch approaches are of essential significance. It is necessary to acquireknowledge about data mining in order to discover information. Data mining isdefined as the process of discovering, extracting and analyzing meaningfulpatterns, structure, models, and rules from large quantities of data. Datamining is emanating as one of the tools for crime detection, clustering ofcrime location for finding crime hot spots, criminal profiling, predictions ofcrime trends and many other related applications.
Manyscientific researches have been done on the significance of crime data miningand their results are revealed in the new software applications to analysis anddetecting the crime data. Aframework has been developed by FabioCalefato, Filippo Lanubile, Nicole Novielli, University of Bari “Aldo Moro”,which can be used for emotion detection from online forums. EmoTxt identifiesemotions in an input corpus provided as a comma separated value (CSV) file,with one text per line, preceded by a unique identifier. The output is a CSVfile containing the text id and the predicted label for each item of the inputcollection. There model intends to find the recognition of specific emotions,such as joy, love, and anger etc. Whereas the other proposed systems haveclassified the emotions as positive, negative, or neutral.Accordingto research by Fabio Calefato, theframework defines a tree-structured hierarchical classification of emotions,where each level refines the granularity of the previous one, thus providingmore indication on its nature.
The framework includes, at the top level, sixbasic emotions, namely love, joy, anger, sadness, fear, and surprise.Aresearch paper published in ImperialJournal of Interdisciplinary Researchdone by M.Suruthi Murugesan, R.
PavithaDevi, S. Deepthi, V.Sri Lavanya & Dr. Annie Princy on “AutomatedMonitoring Suspicious Discussions on Online Forums Using Data MiningStatistical Corpus Based Approach”, suggests various techniques and algorithmswhich can be employed. The paper elaborates about Stop-word Selection, Stemmingalgorithm, Brute-force algorithm, Learning Based algorithm and Matchingalgorithm. Anotherpaper, “Surveillance of Suspicious Discussions on Online Forums Using Text DataMining” written by Harika Upganlawar,Nilesh Sambhe, published in InternationalJournal of Advances in Electronics and Computer Science describes the systemwill analyze online plain text sources from selected discussion forums and willclassify the text into different groups and system will decide which post islegal and illegal using Levenshtein algorithm. In Levenshtein algorithm Levenshteindistance is a measure of similarity between two words.In our proposed system, we apply the steps of datamining.
The data set is collected and explored from various online forums suchas KD Nuggets, Reddit, GitHub etc.The raw data is converted fromunstructured to structured using traditional analysis tools. We collect,cleanse, and format the data because some of the mining functions accept dataonly in a certain format. Preparing the data forthe modeling tool by selecting tables, records, and attributes, are typicaltasks. The meaning of the data is not changed. The main techniques of the crimedata mining are clustering, association rule mining, classification andsequential pattern mining.
Along with these techniques we use advanced algorithmsuch Stop-word Selection and Emotional Algorithm to find clear and meaningful resultsand patterns.