An Information Management System for Crime against Women in India from News: A Big Data Solution

Saptarsi Goswami, Urmi Saha, Subhasree Bose


In this work data mining techniques have been applied to crime data for various types of problems like identifying hot spots, identifying groups of offenders, predicting a crime, etc.  The data qualify to be called big data because of the volume, variety of data (spatial, unstructured texts in terms of news articles, blogs. Tweets, etc.) and the velocity in which decision needs to be taken.  The crime data is not only important for law enforcement agencies, but is also important for policy makers, social scientists, NGOs and researchers.  Presently there is no system in India, where the crime related data is available at an incident level.  Reports of summary level statistics are available from the National Crime Records Bureau (NCRB) but with a lag of at least 6 months.  The geographical units are state, district or maximum at the city level and the unit of time is a year of the said reports. To address this gap, we intend to build a news corpus information management based from online newspapers and social media.  The immediate focus of our system is ‘Crime against Women’ (CAW), which has risen at a higher rate compared to that of general crime.  The literature survey also reveals potential application areas of more advanced information extraction mechanism and machine learning models. We have also presented a couple of analysis in CAW on NCRB data as well as, data manually collected from a daily newspaper between the periods of June to December 2014, of West Bengal.


Big Data, Crime, India, Spatial Analysis

