Open Access Open Access  Restricted Access Subscription or Fee Access

An Information Management System for Crime against Women in India from News: A Big Data Solution

Saptarsi Goswami, Urmi Saha, Subhasree Bose


In this work data mining techniques have been applied to crime data for various types of problems like identifying hot spots, identifying groups of offenders, predicting a crime, etc.  The data qualify to be called big data because of the volume, variety of data (spatial, unstructured texts in terms of news articles, blogs. Tweets, etc.) and the velocity in which decision needs to be taken.  The crime data is not only important for law enforcement agencies, but is also important for policy makers, social scientists, NGOs and researchers.  Presently there is no system in India, where the crime related data is available at an incident level.  Reports of summary level statistics are available from the National Crime Records Bureau (NCRB) but with a lag of at least 6 months.  The geographical units are state, district or maximum at the city level and the unit of time is a year of the said reports. To address this gap, we intend to build a news corpus information management based from online newspapers and social media.  The immediate focus of our system is ‘Crime against Women’ (CAW), which has risen at a higher rate compared to that of general crime.  The literature survey also reveals potential application areas of more advanced information extraction mechanism and machine learning models. We have also presented a couple of analysis in CAW on NCRB data as well as, data manually collected from a daily newspaper between the periods of June to December 2014, of West Bengal.


Big Data, Crime, India, Spatial Analysis

Full Text:



Gupta, Manish, B. Chandra, and M. P. Gupta. "Crime Data Mining for Indian Police Information System." Proceeding of the 2008 Computer Society of India (2008).

Baboo, S. Santhosh. "An Enhanced Algorithm to Predict a Future Crime using Data Mining." (2011).

Ministry of Home Affairs, Govt. of India, National Crime Record Bureau “Crime in India 2013”. URL:

Sanghavi, Prachi, Kavi Bhalla, and Veena Das. "Fire-related deaths in India in 2001: a retrospective analysis of data." The Lancet 373.9671 (2009): 1282-1288.

Ozgul, Fatih, Julian Bondy, and Hakan Aksoy. "Mining for offender group detection and story of a police operation." Proceedings of the sixth Australasian conference on Data mining and analytics-Volume 70. Australian Computer Society, Inc., 2007.

De Bruin, Jeroen S., et al. "Data mining approaches to criminal career analysis." Data Mining, 2006. ICDM'06. Sixth International Conference on. IEEE, 2006.

Shafeeq, Ahamed, V. S. Binu, and India Manipal. "Spatial Patterns of Crimes in India using Data Mining Techniques." International Journal of Engineering and Innovative Technology (IJEIT) 3.11 (2014): 291-295.

Adderley, Richard, and Peter B. Musgrove. "Data mining case study: Modeling the behavior of offenders who commit serious sexual assaults." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001.

Koenig, Michael A., Rob Stephenson, Saifuddin Ahmed, Shireen J. Jejeebhoy, and Jacquelyn Campbell. "Individual and contextual determinants of domestic violence in North India." American Journal of Public Health 96, no. 1 (2006): 132.

Mishra, Anindya J., and Avanish Bhai Patel. "Crimes against the Elderly in India: A Content Analysis on Factors causing Fear of Crime." International Journal of Criminal Justice Sciences 8.1 (2013).

Keyvanpour, Mohammad Reza, Mostafa Javideh, and Mohammad Reza Ebrahimi. "Detecting and investigating crime by means of data mining: a general crime matching framework." Procedia Computer Science 3 (2011): 872-880

Murray, A. T., and Grubesic, T. H. (2002). Identifying non-hierarchical spatial clusters. Int. J.Ind. Eng. Theory Appl. Practice 9 (1), 86–95

Grubesic, Tony H. "On the application of fuzzy clustering for crime hot spot detection." Journal of Quantitative Criminology 22.1 (2006): 77-105.

Chandra, B., Manish Gupta, and M. P. Gupta. "A multivariate time series clustering approach for crime trends prediction." Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on. IEEE, 2008.

Gupta, Manish, B. Chandra, and M. P. Gupta. "A framework of intelligent decision support system for Indian police." Journal of Enterprise Information Management 27.5 (2014): 512-540.

Ali, Nazlena Mohamad, et al. "i-JEN: visual interactive Malaysia crime news retrieval system." Visual Informatics: Sustaining Research and Innovations. Springer Berlin Heidelberg, 2011. 284-294.

Wang, Tong, et al. "Learning to detect patterns of crime." Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2013. 515-530.

Ku, Chih Hao, Alicia Iriberri, and Gondy Leroy. "Crime information extraction from police and witness narrative reports." Technologies for Homeland Security, 2008 IEEE Conference on. IEEE, 2008.

Damasceno, Marcelo, Jerffeson Teixeira, and Gustavo Campos. "A prediction model for criminal levels using socio–criminal data." International Journal of Electronic Security and Digital Forensics 4.2 (2012): 201-214.

Wang, Dawei, et al. "Crime hotspot mapping using the crime related factors—a spatial data mining approach." Applied intelligence 39.4 (2013): 772-781.

Cunhua, Li, Hu Yun, and Zhong Zhaoman. "An event ontology construction approach to web crime mining." Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on. Vol. 5. IEEE, 2010.

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL

Battistini, Alessandro, Samuele Segoni, Goffredo Manzo, Filippo Catani, and Nicola Casagli. "Web data mining for automatic inventory of geohazards at national scale." Applied Geography 43 (2013): 147-158.

Olston, Christopher, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. "Pig latin: a not-so-foreign language for data processing." In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1099-1110. ACM, 2008.

Thusoo, Ashish, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. "Hive: a warehousing solution over a map-reduce framework." Proceedings of the VLDB Endowment 2, no. 2 (2009): 1626-1629

Mahout, Apache. "Scalable machine-learning and data-mining library." available at mahout. apache. org.

Wang, Xiaofeng, Matthew S. Gerber, and Donald E. Brown. "Automatic crime prediction using events extracted from twitter posts." Social Computing, Behavioral-Cultural Modeling and Prediction. Springer Berlin Heidelberg, 2012. 231-238.

Halkidi, Maria, Yannis Batistakis, and Michalis Vazirgiannis. "On clustering validation techniques." Journal of Intelligent Information Systems 17.2-3 (2001): 107-145.

Li, Hongfei, Catherine A. Calder, and Noel Cressie. "Beyond Moran's I: testing for spatial dependence based on the spatial autoregressive model." Geographical Analysis 39.4 (2007): 357-375.

Russom, Philip. "Big data analytics." TDWI Best Practices Report, Fourth Quarter (2011).


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.