Open Access Open Access  Restricted Access Subscription or Fee Access

A Novel Method for Clustering Words in Micro-Blogs Texts and its Application to Event Discovery

B. Ramana Babu


This paper exhibits a novel method for clustering words in small scale online journals, in light of the likeness of the related fleeting arrangement. Our method, named SAX*, utilizes the Symbolic Aggregate ApproXimation calculation to discredited the fleeting arrangement of terms into a little arrangement of levels, prompting a string for each and then characterize a subset of "fascinating" strings, i.e. those speaking to examples of aggregate consideration. Sliding worldly windows are utilized to distinguish co-happening groups of tokens with the same or comparative string. To survey the execution of the method, first tune the model parameters on a 2-month 1 % Twitter stream, amid which various around the world occasions of contrasting sort and length (sports, legislative issues, calamities, wellbeing, and famous people) happened. At that point, assess the nature of every single found occasion in a 1-year stream, "goggling" with the most successive bunch n-grams and physically surveying what number of bunches compare to distributed news in a similar fleeting space. At long last, play out a unpredictability assessment and contrast SAX* and three alternative methods for occasion revelation. Our assessment demonstrates that SAX* is no less than one request of extent less complex than other fleeting and non-transient ways to deal with smaller scale blog bunching.


Event Detection • Temporal Mining • Symbolic Aggregate, Approximation Micro-Blog Analysis.

Full Text:



Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet designation. J. Mach. Learn. Res. 3:993–1022

Chae J, Thom D, Bosch H, Jang Y, Maciejewski R, Ebert D, Ertl T (2013) Spatiotemporal online networking

Investigation for irregular occasion discovery and examination utilizing regular pattern disintegration. IEEE symposium on visual investigation science and innovation, Seattle

Cha M, Haddadi H, Benvenuto F, Gummadi K (2010) Measuring client impact in twitter: the million devotees error. In: Proceedings of gathering on computerized reasoning AAAI

Cheng T,Wicks T (2014) Event location utilizing Twitter: a spatio-fleeting methodology. PLoSOne 9(6):e97807. doi:10.1371/journal.pone.0097807

DaoQ, Jiang J, Zhu F, LimWP (2012) Finding bursty themes frommicroblogs. In: Proceedings of meeting relationship of computational phonetics ACL 2012

Dou W, Wang X, Ribarsky W, Zhou M (2012) Event location in online networking information. In: IEEE VisWeek workshop on intuitive visual content examination. Seattle, WA

Dredze M (2012) How web-based social networking will change general wellbeing. IEEE Intell Syst 27(4):81–84. doi:10.1109/ MIS.2012.76

Hong L, DavisonB(2010) Empirical investigation of topicmodeling in twitter. In: Proceedings of the firstworkshop via web-based networking media examination, pp. 80–88. ACM

Hong L, Dom B, Gurumurthy S, Tsioutsioulikis K (2011) Time-subordinate subject model for various content streams. In: ACM gathering on learning revelation and information mining KDD 2011, San Diego

Huang B, Yang Y, Mahmood A, Wang H (2012) Microblog theme location in light of LDA model and single-pass bunching RSCTC 2012, LNAI 7413, pp. 166–171

Ifrim G, Shi B, Brigadir I (2014) Event discovery in Twitter utilizing forceful sifting and various leveled tweet bunching procedures of SNOW-WWW workshop, Korea Jain A (2010) Data bunching: 50 years past K-implies. Patt Recogn Lett 31:651–666

Keogh E, ChakrabartiK, PazzaniM(2001) Locally versatile dimensionality lessening for ordering vast time arrangement databases. In: Proceedings Of ACM particular vested party on administration of information SIGMOD, pp. 151–162

Kovacs F, Legany C, Babos A (2005) Cluster legitimacy estimation procedures. In: Proceedings of sixth global symposium of Hungarian analysts on computational insight, Budapest

Lee R, Sumiya K (2010) Measuring geological regularities of group practices for twitter-based geosocial occasion discovery. Procedures of the second ACM worldwide workshop on area based social systems SIGSPATIAL, LBSN '10. ACM, New York, pp. 1–10

Lehmann J,Goncalves B, Ramasco JJ, Cattuto C (2012) Dynamical classes of aggregate consideration in Twitter. Procedures of World Wide Web Conference WWW2012

Lin J, Keogh E, LiW, Lonardi S (2007) Experiencing SAX: a novel typical portrayal of time arrangement. Information Mining Knowl Discov 15(2):107–144

Lin J,Khade R, LiY(2012) Rotation-invariant closeness in time arrangement utilizing sack of-examples portrayal. J Intell Inf Syst 39:287–315

Li C, Sun A, Datta A (2012) Twevent: portion based occasion recognition from tweets. In: Proceedings of ACM universal meeting on data and learning administration CIKM

Maynard D, FunkA(2012) Challenges in creating supposition digging apparatuses for web-based social networking. In: Proceedings Of @NLP cann u tag #usergenartedcontent? Workshop at LREC 2012, Istanbul

McMinn A, Moshfeghi Y, Jose JM (2013) Building an expansive scale corpus for assessing occasion discovery in twitter, ACM global gathering on data and information administration CIKM'13, San Francisco

Mei Q, Zhai C (2005) Discovering transformative topic designs from content—an investigation of fleeting content mining. In: Proceedings of gathering of learning revelation and information mining KDD'05, Chigago

Oncina J, Garcıa P (1992) Inferring general dialects in polynomial refreshed time. In: fourth Spanish symposium on example acknowledgment and picture investigation, MPAI. vol. 1. World Scientific, pp. 49–61.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.