Open Access Open Access  Restricted Access Subscription or Fee Access

An Improved Solution to Detect Credit Card Fraud Using Apache Hadoop in Big Data Environment

V. Nivedha, R. Sankar


This paper presents an improved approach for identifying the pattern and detecting an online credit card fraud. Recent years have seen increasing amounts of data generated and stored in a geographically distributed manner for a large variety of application domains. Examples include social networking, Web and Internet service providers, and content delivery networks that serve the content for many of these services. This paper focus on designing an online credit card fraud detection framework with technologies, by which this can process large amount of data and to do detection in real time and to improve accuracy based on analyzing the factors such a processing speed, latency, fault tolerance, performance and scalability. On behalf of an evaluation about the techniques it was proposed that Apache Spark is performing better on Credit card fraud detection system when compared to other techniques or frameworks. Real time analysis is highly desirable to update models when new events are detected.


Big Data, Fraud Detection, Hadoop, Map Reduce, Apache Spark.

Full Text:



D. Iyer, A. Mohanpurkar, S. Janardhan, D. Rathod, and A. Sardeshmukh, “Credit card fraud detection using hidden Markova model,” in Information and Communication Technologies (WICT), 2011 World Congress on, Dec 2011, pp. 1062–1066.

M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald, and E. Muharemagic, “Deep learning applications and challenges in big data analytics”, Journal of Big Data, 2015.

Palak Gupta, Nidhi Tyagi, “An Approach towards Big Data–A Review”, International Conference on Computing, Communication and Automation (IEEE), 2015.

Shahzaib Tahir, Waseem Iqbal, “Big Data−An Evolving Concern for Forensic Investigators”, IEEE Transactions, 2015.

R. Anbuvizhi, V. Balakumar, “Credit / Debit Card Transaction Survey Using Map Reduce in HDFS and Implementing Syferlock to Prevent Fraudulent”, International Journal of Computer Science and Network Security, 2016.

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica, "Spark: cluster computing with working sets", HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, 2010.

Brian Ye, Anders Ye, “Exploring the Efficiency of Big Data Processing with Hadoop MapReduce”, School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden.

“Real-Time Big Data Analytics: Emerging Architecture”, Mike Barlow, O’Reilly media, 2013.

“Spark Streaming Programming Guide” [Online], Available:

“The Big-Data Ecosystem Table” [Online], Available:

“Comparing Hadoop, MapReduce, Spark, Flink, and Storm” [online], Available:

R. Magoulas and B. Lorica, “Introduction to Big Data”, Release 2.0, Issue 11, Feb 2009.

“Gartner Report: Big Data will Revolutionize Cyber Security in the Next Two Years” [online], Available: -will-revolutionize-the-cyber security in-next-two-year/

“Gartner Report: Big Data will Revolutionize Cyber Security in the Next Two Years” [online], Available: report- big-data-will-revolutionize-the-cyber security in-Next-two-year/

K. Aoulad Abdelouarit, B. Sbihi, N. Aknin, “Towards an Approach Based on Hadoop to Improve and Organize Online Search Results in Big Data Environment”, to be published in ICCMIT'16, COSENZA., ITALY, 2016.

Apache hadoop [Online]. Available:


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.