Open Access Open Access  Restricted Access Subscription or Fee Access

Machine Learning Algorithm Used for Detecting Malicious PDF Document

Manju Dudy, Yogita Gigras, Anuradha Anuradha


In computer security field, Malware is a constancy problem and its involvement is increasing rapidly .Cyber criminals are heavily using PDF documents for launching attacks. These attacks routinely results in the loss of confidential information. Attackers attach the malicious PDF documents to emails to deliver malicious code to normal users and make use of social engineering to open the email, attachment. This article outlines machine learning based approach for differentiating between the malicious and benign PDF document by analyzing the essential differences in the structural properties of the document. We have compared the proposed system with the other machine learning classifiers over 6000 real world Benign and Malicious files. Finally, this research work provides you some machine learning technique for the detection of malicious PDF documents.


Portable Document Format (PDF), Malicious PDF Document, Machine learning, Malware Detection.

Full Text:



Nir Nissim, Aviad Cohen, Chanan Glezer, Yuval Elovici (2014) “Detection of malicious PDF files and directions for enhancements: A state-of-the art survey “Springer.

Suleiman J. Khitan, Ali Hadi and Jalal Atoum (2017) “PDF Forensic Analysis system using YARA” IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.5.

Himanshu Pareek, P R L Eswari and N. Sarat Chandra Babu (2013) “Malicious PDF Document Detection Based on Feature Extraction And Entropy”, International Journal of Security, Privacy and Trust Management (IJSPTM) Vol 2, No 5.

N. Srndic and P. Laskov (2013) “Detection of malicious pdf files based on hierarchical document structure,” in NDSS.

C. Smutz and A. Stavrou (2013) “Malicious pdf detection using metadata and structural features,” in Proceedings of Annual Computer Security Applications Conference (ACSAC).

P. Laskov and N. Srndic (2011) “Static detection of malicious JavaScript-bearing PDF documents “in Annual Computer Security Applications Conference (ACSAC).

Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P (2011)” Combining Static and Dynamic Analysis for the Detection of Malicious Documents” In: EUROSEC Proceedings of the Fourth European Workshop on System Security.

D. Maiorca, G. Giacinto and I. Corona (2012) “A pattern recognition system for malicious pdf files detection “pages 510–524.

Combining static and dynamic analysis for the detection of malicious JavaScript-bearing PDF documents (March 2016) Proceedings of the 2016 International Conference on Computer Science, Technology and Application (CSTA2016).

Davide Maiorca, Battista Biggio (Dec. 2017) “Digital Investigation of PDF Files: Unveiling Traces of Embedded Malware” IEEE Security & Privacy magazine, Special Issue on Digital Forensics.

S. Salem, R. Darwish and S. Sayed, (2014) "A Real-Time Approach for Detecting Malicious Executables," Springer, pp. 355-364, doi: 10.1007/978-3-319-01857-7_34.

Mila, "CVE-2013-0640 samples listing," (24 April 2013). Available:

Sood and R. Enbody, "Targeted Cyberattacks (2012) A Superset of Advanced Persistent Threats," Security & Privacy, IEEE, vol. 11, no. 1, pp. 54 - 61, doi: 10.1109/MSP.2012.90.

Stevens, "Didier Stevens,"2016. Available:

F. Schmitt, J. Gassen and E. Gerhard’s-Padilla, "PDF Scrutinizer: Detecting JavaScript-based attacks in PDF documents," IEEE.

Han, B. Chul, H. Geun and K. Sohn, (2012) "Toward extracting malware features for classification using static and dynamic analysis," IEEE.

D. Stevens, (2010) "Anatomy of Malicious PDF Documents," IEEE.

Virus Total,

S. Ford, M. Cova, C. Kruegel and a. G. Vigna, "Wepawet," 2008. [Online]. Available: [Accessed 21 Januray 2016].


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.