Researchers on social-media understandably assert that the contributions social media has made on various sectors is massive. Business development managers today have directed a huge amount of effort in strategizing efficient collaboration with both customers and other organizations using social-media. Despite the visible impact social media has made, a lot of digitally shared information is yet to be revealed. Gradually twitter has become the main hub for many Information system researchers because tweets can freely be accessible in real-time by anyone. Motivated by earlier studies where IS researchers addressed big-data analysis and management by employing content analysis techniques, this paper proposes a novel approach to perform unsupervised classification of the tweets into different labels. It introduces a unique algorithm that uses semantic similarity between texts, Term-frequency and a determinant threshold to perform content analysis. The goal of this approach to extract relevant features from a tweet thus reducing dimension and preparing training datasets that would be used to build classifiers.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.