Open Access Open Access  Restricted Access Subscription or Fee Access

A Comparative Study of Malware Detection System from Hadoop Perspective

Nilesh Kisanrao Dengle, Shweta C. Dharmadhikari


The increasing volume of data to be analyzed imposes new challenges in detecting the malware. Since data in computer servers is increasing rapidly, the analysis of these large amounts of data and to find anomaly fragments has to be done within a relevant amount of time. Malware collection and analysis are critical to the modern security industry. However, in order to face with the increasing amount of data, new parallel methods need to be developed in order to make the algorithms scalable e.g. Hadoop and MapReduce techniques. In our survey, previous attempt of a malware detection system are summarized.


Malware, MapReduce, Hadoop

Full Text:



J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” in Proceedings of the OSDI ’04, 2004, pp. 137–150.

Apache Hadoop, available at: (2013).

Ibrahim Aljarah and Simone A. Ludwig. “MapReduce Intrusion Detection System based on a Particle Swarm Optimization Clustering Algorithm,” Evolutionary Computation (CEC), 2013 IEEE Congress, June 2013

Shan Suthaharan. “Big data classification: problems and challenges in network intrusion prediction with machine learning.” ACM, March 2014.

Tobias Wüchner, Martín Ochoa and Alexander Pretschner. “Malware Detection with Quantitative Data Flow Graphs.” ACM 978-1-4503-2800-5/14/06

Andrei Venzhega, Polina Zhinalieva and Nikolay Suboch. “Graph-based Malware Distributors Detection.” ACM 978-1-4503-2038-2/13/05.

Zhiyong Shan and Xin Wang. “Growing Grapes in Your Computer to Defend Against Malware.”IEEE, VOL. 9, NO. 2, FEBRUARY 2014

I. Aljarah and S. A. Ludwig, “Parallel particle swarm optimization clustering algorithm based on mapreduce methodology,” in Proceedings of the Fourth World Congress on Nature and Biologically Inspired Computing (NaBIC’12), Mexico City, Mexico, November 2012, pp.104–111.

T. White, Hadoop: The Definitive Guide, original ed.O’Reilly Media, Jun. 2009.

Hadoop., 2014.

HDFS (hadoop distributed file system) architecture. http://hadoop., 2009.

J. Dean and S. Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72-77, 2010.

K. Shvachko, H. Kuang, S. Radia, R. Chansler, "The Hadoop Distributed File System," 26th IEEE Symposium on Mass Storage Systems and technologies, Yahoo!, Sunnyvale, pp. 1-10, May 2010

A. Bialecki, M. Cafarella, D. Cutting, and O. OMALLEY, “Hadoop: a framework for running applications on large clusters built of commodityhardware,”Wiki at, vol. 11, 2005.

Ibrahim Aljarah and Simone A. Ludwig. “Towards a Scalable Intrusion Detection System based on Parallel PSO clustering Using MapReduce.” ACM 978-1-4503-1964-5/13/07


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.