WO2012091488A1 - System and method for detecting malicious content in non-pe file - Google Patents

System and method for detecting malicious content in non-pe file Download PDF

Info

Publication number
WO2012091488A1
WO2012091488A1 PCT/KR2011/010309 KR2011010309W WO2012091488A1 WO 2012091488 A1 WO2012091488 A1 WO 2012091488A1 KR 2011010309 W KR2011010309 W KR 2011010309W WO 2012091488 A1 WO2012091488 A1 WO 2012091488A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
item
information
malicious content
malicious
Prior art date
Application number
PCT/KR2011/010309
Other languages
French (fr)
Inventor
Sun Young Sim
Original Assignee
Ahnlab., Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ahnlab., Inc. filed Critical Ahnlab., Inc.
Publication of WO2012091488A1 publication Critical patent/WO2012091488A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Definitions

  • the present invention relates to system and method for detecting malicious content in a non-PE (Portable Executable) file, and more particularly to system and method for determining whether a non-PE file includes malicious content using information about a portion within the non-PE file in which the malicious content can be inserted.
  • a non-PE Portable Executable
  • Malicious content included in files has been used to disturb or obstruct program execution, file utilization, computer operation and so on.
  • Vulnerabilities, malwares, computer viruses, or the like may correspond to such a malicious content. Since malicious content can cause undesired operations, technologies have been developed for detecting, non-activating and deleting the malicious content before the malicious content is executed.
  • PE Portable Executable files refer to a file format which can be executed on a Win32 executable computer system regardless of platform.
  • PE files correspond to programs being executed on the computer system. Malicious content can also be included in such PE files.
  • the malicious content is simultaneously executed so that the computer system is maliciously affected. Due to this, there has been plenty of research on how to detect the malicious content included in the PE file. As such, a variety of technologies for detecting malicious content has been developed.
  • non-PE files such as documents, images and moving pictures are transmitted and distributed quite often due to development of networks.
  • malicious content is included in such a non-PE file, it may be difficult to detect the included malicious content without analyzing the configuration of the non-PE file.
  • hiding objects including malicious content is performed, which makes it more difficult to detect the malicious content.
  • the present invention provides method and system capable of detecting malicious content included in a non-PE file on the basis of the configuration of the non-PE file and to identify the malicious content.
  • a method for detecting whether malicious content is included in a non-PE file includes extracting information from a portion within the non-PE file in which the malicious content can be inserted; and determining whether the malicious content is included in the non-PE file on the basis of the extracted information.
  • a system for detecting whether malicious content is included in a non-PE file includes an information extraction unit for extracting information from a portion within the non-PE file in which the malicious content can be inserted; and a determination unit for determining whether the malicious content is included in the non-PE file on the basis of the extracted information.
  • Figs. 1a and 1b are views showing examples of the configuration of a non-PE file in which malicious contents can be included;
  • Fig. 2 is a view showing an example of a stream object within a PDF file
  • Fig. 3 is a view showing an example of a stream object in which malicious content is included
  • Fig. 4 is a flow chart illustrating a malicious content detection method in accordance with an embodiment of the present invention.
  • Fig. 5 is a flow chart illustrating a malicious content detection method in accordance with another embodiment of the present invention.
  • Fig. 6 is a block diagram showing a malicious content detection system in accordance with an embodiment of the present invention.
  • Figs. 1a and 1b are views showing examples of the configuration of a non-PE file in which malicious content can be inserted.
  • the non-PE file includes a header 10 representing the kind of the file or the like including information about the file, and a body 20 representing content of the file.
  • a header 10 representing the kind of the file or the like including information about the file
  • a body 20 representing content of the file.
  • a variety of contents can be included in the body 20. Further, the contents may be included in the body 20 in the form of an object.
  • the body 20 may include a flash object 22 and PE objects 24, as can be seen in Fig. 1a. All the flash and PE objects 22 and 24 may include malicious contents.
  • the body 20 may include a TTF object 26 for representing font information of the file and a PE object 28 as can be seen in Fig. 1b.
  • the TTF and PE object 26 and 28 may also include malicious contents.
  • it can be detected whether the malicious contents are included in the non-PE file by considering each object within the non-PE file.
  • the consideration for every object may increase system computing load result in deteriorating the efficiency of detection of the malicious contents.
  • the object includes encoded contents. In this case, when the object including the encoded contents is decoded for the consideration, the memory resources of a system can be excessively consumed, to thereby deteriorate the efficiency of detection of the malicious contents.
  • the present disclosure provides methods and systems capable of efficiently detecting the malicious contents by selectively inspecting objects in which malicious contents can be inserted without using such a decoding process.
  • a PDF file will be exemplified in the description, but the scope of the present invention is not limited to this. In other words, the scope of the present invention includes every non-PE file to which the principle of the present invention is applied.
  • Fig. 2 shows an example of a stream object included in a PDF file.
  • the PDF file includes information as a stream object which is a sequence of bytes as a stream object.
  • the stream object within the PDF file includes a label 110 identifying an object, and a keyword "stream” 130 representing the object to be a stream object.
  • the keyword "stream” 130 indicates the start of a stream and configures a pair together with a keyword “endstream” 140 indicating the end of the stream.
  • a sequence of bytes 150 is arranged between the keyword "stream” 130 and the keyword "endstream” 140.
  • the stream object includes a dictionary 120 representing encoding information, size information, content information of the object and so on.
  • the start and end of the dictionary 120 are indicated by parentheses “ ⁇ " and ">>”.
  • a keyword "endobj" 160 indicating the end of the object is disposed at the end of the object.
  • stream object refers to a portion being distinguished by the keywords “stream” 130 and "endstream” 140.
  • stream object refers to an entire portion of the object, which includes not only the object being distinguished by the keywords “stream” 130 and “endstream” 140, but also the dictionary 110, unless a separate comment is added.
  • object refers to a portion being distinguished by the keywords “obj” and “endobj” in order to distinguish from the term "stream object”.
  • some embodiments of the present invention suggest methods and systems capable of determining whether malicious content is included in a PDF file on the basis of information which is extracted from the stream object within the PDF file.
  • the dictionary 120 of the stream object may include a variety of information which may be identified by respective entries.
  • the dictionary 120 includes a DL item.
  • the DL item is identified by a keyword "/DL".
  • numerals representing an original length of the included stream i.e., a length of the included stream before being encoded
  • the keyword "/DL" When a long sequence of stream is included in the PDF file and high numerals are followed, a system for processing the PDF file can secure enough memory resources in advance on the basis of the DL item.
  • the DL item is included in the dictionary 120, it may mean that the PDF file includes a long sequence of stream and has a high possibility of including a malicious content. Therefore, the possibility of the existence of the malicious content can be determined according to whether the DL item is included in the dictionary 120.
  • the dictionary 120 may include an EF item and a keyword "EmbeddedFiles" which represents another file to be embedded in the PDF file.
  • the EF item is identified by a keyword "/EF”. If another file is embedded in the PDF file, the name of another file is followed by the keyword "/EF”.
  • the EF item included in the object indicates that another file is embedded in the PDF file. In this case, it can be determined that the possibility of the existence of a malicious content is high on the basis of the existence of the EF item.
  • the keyword "EmbeddedFiles” is one of the keywords which may be included in a Type item.
  • the Type item represents the kind of an object and is identified by a keyword "/Type". If the object is a file, the keyword "EmbeddedFiles” is followed by the keyword "/Type". As such, when the keyword “EmbeddedFiles” is included, it may mean that the possibility of the existence of a malicious content is high.
  • an EmbeddedFiles item identified by the keyword "/EmbeddedFiles” can be included in the dictionary 120, and an identifier representing another file can be followed by the EmbeddedFiles item. In this case, it can be determined that the possibility of the existence of a malicious content is high on the basis of the existence of the EmbeddedFiles item.
  • the dictionary 120 may include a Params item representing information about another file (i.e., an embedded file) when the embedded file is included in the PDF file.
  • the Params item is identified by a keyword "/Params". Detailed information may be followed by the Params item.
  • the embedded file may be included in the object in a non-compressed file type.
  • the embedded file can be included in the object in a stream type, and a Checksum item, which includes a checksum value of the stream, can be followed by the keyword "/Params".
  • the previously calculated checksum value included in the checksum item can be used for detecting the malicious content. As such, system computing load caused by calculating the checksum value can be reduced by using the previously calculated checksum value.
  • a Size item being followed by the keyword "/Params” represents the size of the included stream.
  • a CreationDate item and/or a ModDate item can be followed by the keyword "/Params”.
  • the CreationDate item represents the date and time when the embedded file is created
  • the ModDate item represents the data and time when the embedded file is altered.
  • the substance of the embedded file can be identified, and thus the malicious content can be detected.
  • a Subtype item representing a kind of the embedded file can also be included in the dictionary 120.
  • the Subtype item is identified by a keyword "/Subtype".
  • the kind of the embedded file can be identified by identifying a kind of a file followed by the keyword "/Subtype”, and the identified kind of the embedded file can be used to determine whether the PDF file includes the malicious content.
  • Fig. 3 is a view showing an example of a stream object within a non-PE file which includes malicious content.
  • malicious content is included into the non-PE file in the form of a stream 250 and the stream 250 is identified by a keyword "stream”.
  • a checksum value 270 is included in a Checksum item.
  • the Checksum item is included in a Params item which is identified by a keyword "/Params”.
  • a Subtype item 280 is included in the stream 250.
  • the above-mentioned items can be used to detect malicious content. As such, the existence of malicious content can be easily detected.
  • an object ID (110 in Fig. 2) for identifying the object can be used to detect malicious content. If the same malicious content is inserted into several files, the malicious content may be inserted in the same objects within the several files. In accordance therewith, the malicious content can be detected by comparing an object ID or a characteristic of the object within a target file with an object ID or a characteristic previously derived from a file which had been determined to have malicious content.
  • information about predeterminded items may be extracted from a portion within a non-PE file into which malicious content can be inserted.
  • information about the items can be extracted from a dictionary of an object within the PDF file.
  • a stream object within the PDF file is identified and then information included in a dictionary of the identified stream object can be extracted.
  • the items relating to extracted information includes at least one of a DL item, an EF item, a Type item, a Params item and a Subtype item which are included in an object within the non-PE file.
  • information about all the items may be extracted.
  • the non-existence of the items can be indicated for the items which are not included in the object.
  • all the items of the object can be considered in the detection of malicious content. Therefore, the number of items used for detecting the existence of the malicious content may increase, thereby enhancing accuracy for the determination of the malicious content.
  • the existence of items may be determined at a step 310.
  • the existence of at least one of the DL item, the EF item, the Params item and the Subtype item can be determined.
  • the extracted information about the DL item it can be determined that a large-sized stream is included in the non-PE file. As such, it can also be determined that possibility of including malicious content is high.
  • the extracted information about the EF item it can be determined that an embedded file exists in the non-PE file. As such, it can also be determined that possibility of including malicious content is high.
  • the extracted information about the Params item or the Subtype item exists, it can be determined that the embedded file exists in the non-PE file.
  • values of the existing items may be compared to those relating to malicious contents at a step 320.
  • the checksum value of a checksum item within the existing Params item can be compared to a checksum value relating to malicious content.
  • the checksum value of the malicious content can be previously calculated and stored in a database.
  • the malicious content detection method in accordance with at least some embodiments described herein can use the checksum value received from the database. If the checksum value included in the object is not the same as that of the malicious content, a similarity between both of the checksum values can be calculated and compared to a reference similarity. When the calculated similarity is higher than the reference similarity, the non-PE file can be determined to include the malicious content.
  • the comparison process using the similarities can be performed along the well-known method. In accordance therewith, mutations of malicious contents can also be detected.
  • the Subtype item can be compared to types of well-known malicious contents.
  • specific malicious content is well known as a flash type and the Subtype item indicates a flash type, it can be determined that the non-PE file includes the malicious content.
  • the value of a Type item can be used to determine whether "EmbeddedFiles" exists. As such, it can be determined whether the non-PE file includes an embedded file on the basis of the value of the Type item.
  • whether the non-PE file includes a malicious content can be determined based on the determined resultant of the step 310 for the existence of items and the compared resultant of the step 320 for the values of the items. In this manner, the existence of a variety of items and the values of the item are used for determining whether to include malicious content. As such, accuracy for the detection of malicious content can become higher. Also, it can be easily determined whether the non-PE file includes malicious content without decoding the stream.
  • the malicious content detection method in accordance with at least some embodiments allows the step 320 for comparing the item values to those relating to malicious content to be performed only when the determined resultant of the step 310 represents the predetermined items to exist.
  • a malicious content detection method in accordance with another embodiment of the present invention can be proposed.
  • the malicious content detection method may enable information about predetermined items to be extracted from a non-PE file at a step 400.
  • the existence of the predetermined items may be determined by inspecting the extracted information at a step 410. Then, whether at least one of a DL item, an EF item, a Params item and a Subtype item exists within an object of the non-PE file may be determined, at a step 412. If the determination resultant of the step 412 represents any one of the above-mentioned items not to exist, a step 414 instead of a step 420 is performed for determining that any malicious content is not included in the non-PE file. This results from the fact that a long sequence of stream or an embedded file, which can be regarded as an existence of a malicious content, does not exist.
  • the values of the existing items are compared to those relating to malicious contents at a step 422.
  • the checksum value of a checksum item within the existing Params item is compared to a checksum value relating to malicious content so as to determine whether the two checksum values are similar to each other. If the two checksum values are not similar to each other, the step 414 may be performed for determining that any malicious content is not included in the non-PE file.
  • a step 424 is performed for comparing a type of an embedded file included in the non-PE file to each type of malicious contents.
  • the value of the Subtype item is compared to types of well-known malicious contents. If the Subtype value corresponds to one of the types of well-known malicious contents, the non-PE file is determined to include malicious content at a step 430. On the contrary, if the Subtype value does not correspond to any one of the kinds of the well-known malicious contents, it may mean that the non-PE content includes new malicious content, an error occurred in the comparision of the checksum values or the like. Therefore, a step 426 is performed to inform a user or an external device for an additional procedure, e.g. analyzing the configuration of the new malicious content.
  • the malicious content detection method of Fig. 5 can reduce the number of steps to be substantially executed, to thereby provide a higher efficiency than that of Fig. 4.
  • the method of Fig. 5 may be provided only as an example, the scope of the present invention is not limited to this.
  • the aspects of the present invention, as generally described herein can be modified or altered by combining, arranging, substituting, separating and designing in a wide variety of different configurations.
  • the comparison of the Subtype values can be performed before or parallel to the comparison of the checksum values.
  • Fig. 6 is a block diagram showing a malicious content detection system in accordance with an embodiment of the present invention.
  • the malicious content detection system includes an information extraction unit 510 and a determination unit 520.
  • the determination unit 520 may include an existence determinator 522 and a comparator 524.
  • the information extraction unit 510 may extract information about at least one of predetermined items, such as a DL item, a Params item, an EF item, an EmbeddedFile item and a Subtype item, from a portion within a non-PE file in which malicious files can be inserted.
  • the extracted information by the information extraction unit 510 may be transmitted to the determination unit 520.
  • the determination unit 520 may inspect the extracted information, in order to not only determine the existence of the predetermined items but also identify the values of the predetermined items.
  • the existence determinator 522 can determine the existence of each of the DL item, the Params item, the EF item and the Subtype item which are included in an object within the non-PE file by inspecting the extracted information. Further, the existence determinator 522 can determine whether a stream or a file which can be malicious content, exists in the object within the non-PE file on the basis of the determined existence of each of the DL item, the Params item, the EF item and the Subtype item.
  • the comparator 524 can determine whether the malicious content is included in the non-PE file by comparing the values of the checksum and Subtype item to those relating to malicious contents. As such, the determination unit 520 can determine whether the malicious content is included in the non-PE file on the basis of the resultants from the existence determinator 522 and the comparator 524.
  • the determination unit 520 can use a communication unit 530 in order to obtain information about kinds of malicious contents and checksum values for the malicious contents. Further, the communication unit 530 can be used for transmitting the determination results for the existence of malicious content to a user.
  • the malicious content detection method and system in accordance with embodiments of the present invention can determine whether the non-PE file includes malicious content on the basis of information from a portion within the non-PE file in which the malicious content can be inserted. As such, the malicious content can be accurately and efficiently detected. Particularly, since configuration of the non-PE file is considered for detecting the malicious content, attacks with non-PE files can be efficiently prevented. Moreover, the substance of malicious content can be easily identified because a variety of information included in the non-PE file is used for detecting the malicious content.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

There is provided a method for detecting whether malicious content is included in a non-PE (Portable Executable) file. The method includes extracting information from a portion within the non-PE file in which the malicious content can be inserted and determining whether the malicious content is included in the non-PE file on the basis of the extracted information.

Description

SYSTEM AND METHOD FOR DETECTING MALICIOUS CONTENT IN NON-PE FILE
The present invention relates to system and method for detecting malicious content in a non-PE (Portable Executable) file, and more particularly to system and method for determining whether a non-PE file includes malicious content using information about a portion within the non-PE file in which the malicious content can be inserted.
Malicious content included in files has been used to disturb or obstruct program execution, file utilization, computer operation and so on. Vulnerabilities, malwares, computer viruses, or the like may correspond to such a malicious content. Since malicious content can cause undesired operations, technologies have been developed for detecting, non-activating and deleting the malicious content before the malicious content is executed.
Meanwhile, PE (Portable Executable) files refer to a file format which can be executed on a Win32 executable computer system regardless of platform. In other words, PE files correspond to programs being executed on the computer system. Malicious content can also be included in such PE files. When a PE file including malicious content is executed on a computer system, the malicious content is simultaneously executed so that the computer system is maliciously affected. Due to this, there has been plenty of research on how to detect the malicious content included in the PE file. As such, a variety of technologies for detecting malicious content has been developed.
On the other hand, relatively not enough research has been done for non-PE files. Recently, non-PE files such as documents, images and moving pictures are transmitted and distributed quite often due to development of networks. In this circumstance, if malicious content is included in such a non-PE file, it may be difficult to detect the included malicious content without analyzing the configuration of the non-PE file. Moreover, hiding objects including malicious content is performed, which makes it more difficult to detect the malicious content.
In addition, according to the statistical report of Symantec Cooperation for a period from April to June 2010, attacks with PDF files (corresponding to non-PE files) including malicious contents, i.e. attacks with malicious PDF files are being rapidly increased, especially the proportion of attacks with malicious PDF files including FLASH contents become higher. Therefore, it is necessary to develop methods and systems capable of detecting malicious contents in non-PE files, particularly in PDF files.
In view of the foregoing, the present invention provides method and system capable of detecting malicious content included in a non-PE file on the basis of the configuration of the non-PE file and to identify the malicious content.
In accordance with one aspect of the present invention, there is provided a method for detecting whether malicious content is included in a non-PE file. The method includes extracting information from a portion within the non-PE file in which the malicious content can be inserted; and determining whether the malicious content is included in the non-PE file on the basis of the extracted information.
In accordance with another aspect of the present invention, there is provided a system for detecting whether malicious content is included in a non-PE file. The system includes an information extraction unit for extracting information from a portion within the non-PE file in which the malicious content can be inserted; and a determination unit for determining whether the malicious content is included in the non-PE file on the basis of the extracted information.
The above and other objects and features of the present invention will become apparent from the following description of embodiments given in conjunction with the accompanying drawings, in which:
Figs. 1a and 1b are views showing examples of the configuration of a non-PE file in which malicious contents can be included;
Fig. 2 is a view showing an example of a stream object within a PDF file;
Fig. 3 is a view showing an example of a stream object in which malicious content is included;
Fig. 4 is a flow chart illustrating a malicious content detection method in accordance with an embodiment of the present invention;
Fig. 5 is a flow chart illustrating a malicious content detection method in accordance with another embodiment of the present invention; and
Fig. 6 is a block diagram showing a malicious content detection system in accordance with an embodiment of the present invention.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that they can be readily implemented by those skilled in the art.
Figs. 1a and 1b are views showing examples of the configuration of a non-PE file in which malicious content can be inserted. Generally, the non-PE file includes a header 10 representing the kind of the file or the like including information about the file, and a body 20 representing content of the file. A variety of contents can be included in the body 20. Further, the contents may be included in the body 20 in the form of an object.
In some examples, the body 20 may include a flash object 22 and PE objects 24, as can be seen in Fig. 1a. All the flash and PE objects 22 and 24 may include malicious contents. Alternatively, the body 20 may include a TTF object 26 for representing font information of the file and a PE object 28 as can be seen in Fig. 1b. The TTF and PE object 26 and 28 may also include malicious contents. As such, it can be detected whether the malicious contents are included in the non-PE file by considering each object within the non-PE file. However, the consideration for every object may increase system computing load result in deteriorating the efficiency of detection of the malicious contents. Further, in some examples, the object includes encoded contents. In this case, when the object including the encoded contents is decoded for the consideration, the memory resources of a system can be excessively consumed, to thereby deteriorate the efficiency of detection of the malicious contents.
In view of the foregoing, the present disclosure provides methods and systems capable of efficiently detecting the malicious contents by selectively inspecting objects in which malicious contents can be inserted without using such a decoding process. As an example of the non-PE file, a PDF file will be exemplified in the description, but the scope of the present invention is not limited to this. In other words, the scope of the present invention includes every non-PE file to which the principle of the present invention is applied.
Fig. 2 shows an example of a stream object included in a PDF file. The PDF file includes information as a stream object which is a sequence of bytes as a stream object. The stream object within the PDF file includes a label 110 identifying an object, and a keyword "stream" 130 representing the object to be a stream object. The keyword "stream" 130 indicates the start of a stream and configures a pair together with a keyword "endstream" 140 indicating the end of the stream. A sequence of bytes 150 is arranged between the keyword "stream" 130 and the keyword "endstream" 140. Further, the stream object includes a dictionary 120 representing encoding information, size information, content information of the object and so on. The start and end of the dictionary 120 are indicated by parentheses "<<" and ">>". Further, a keyword "endobj" 160 indicating the end of the object is disposed at the end of the object.
In general, the term "stream object" refers to a portion being distinguished by the keywords "stream" 130 and "endstream" 140. However, in the present disclosure, the term "stream object" refers to an entire portion of the object, which includes not only the object being distinguished by the keywords "stream" 130 and "endstream" 140, but also the dictionary 110, unless a separate comment is added. Further, the term "object" refers to a portion being distinguished by the keywords "obj" and "endobj" in order to distinguish from the term "stream object".
Since such a stream object may include a long sequence of bytes, malicious contents can be inserted in the stream object. Therefore, some embodiments of the present invention suggest methods and systems capable of determining whether malicious content is included in a PDF file on the basis of information which is extracted from the stream object within the PDF file.
Meanwhile, the dictionary 120 of the stream object may include a variety of information which may be identified by respective entries.
By way of example, but not limitation, the dictionary 120 includes a DL item. The DL item is identified by a keyword "/DL". Further, as can be seen in Fig. 2, numerals representing an original length of the included stream (i.e., a length of the included stream before being encoded) in bytes, are followed by the keyword "/DL". When a long sequence of stream is included in the PDF file and high numerals are followed, a system for processing the PDF file can secure enough memory resources in advance on the basis of the DL item. As such, if the DL item is included in the dictionary 120, it may mean that the PDF file includes a long sequence of stream and has a high possibility of including a malicious content. Therefore, the possibility of the existence of the malicious content can be determined according to whether the DL item is included in the dictionary 120.
Further, the dictionary 120 may include an EF item and a keyword "EmbeddedFiles" which represents another file to be embedded in the PDF file. The EF item is identified by a keyword "/EF". If another file is embedded in the PDF file, the name of another file is followed by the keyword "/EF". As such, the EF item included in the object (i.e., the dictionary 120) indicates that another file is embedded in the PDF file. In this case, it can be determined that the possibility of the existence of a malicious content is high on the basis of the existence of the EF item.
Meanwhile, the keyword "EmbeddedFiles" is one of the keywords which may be included in a Type item. The Type item represents the kind of an object and is identified by a keyword "/Type". If the object is a file, the keyword "EmbeddedFiles" is followed by the keyword "/Type". As such, when the keyword "EmbeddedFiles" is included, it may mean that the possibility of the existence of a malicious content is high. Alternatively, an EmbeddedFiles item identified by the keyword "/EmbeddedFiles" can be included in the dictionary 120, and an identifier representing another file can be followed by the EmbeddedFiles item. In this case, it can be determined that the possibility of the existence of a malicious content is high on the basis of the existence of the EmbeddedFiles item.
Furthermore, the dictionary 120 may include a Params item representing information about another file (i.e., an embedded file) when the embedded file is included in the PDF file. The Params item is identified by a keyword "/Params". Detailed information may be followed by the Params item. In some embodiments, the embedded file may be included in the object in a non-compressed file type. In this case, the embedded file can be included in the object in a stream type, and a Checksum item, which includes a checksum value of the stream, can be followed by the keyword "/Params". The previously calculated checksum value included in the checksum item can be used for detecting the malicious content. As such, system computing load caused by calculating the checksum value can be reduced by using the previously calculated checksum value.
Further, a Size item being followed by the keyword "/Params" represents the size of the included stream. Besides the Size item, a CreationDate item and/or a ModDate item can be followed by the keyword "/Params". The CreationDate item represents the date and time when the embedded file is created, and the ModDate item represents the data and time when the embedded file is altered. On the basis of the information regarding the Params item, the substance of the embedded file can be identified, and thus the malicious content can be detected.
When the embedded file is included in the PDF file, a Subtype item representing a kind of the embedded file can also be included in the dictionary 120. The Subtype item is identified by a keyword "/Subtype". Here, the kind of the embedded file can be identified by identifying a kind of a file followed by the keyword "/Subtype", and the identified kind of the embedded file can be used to determine whether the PDF file includes the malicious content.
Fig. 3 is a view showing an example of a stream object within a non-PE file which includes malicious content. As shown in Fig. 3, malicious content is included into the non-PE file in the form of a stream 250 and the stream 250 is identified by a keyword "stream". A checksum value 270 is included in a Checksum item. Here, the Checksum item is included in a Params item which is identified by a keyword "/Params". Further, a Subtype item 280 is included in the stream 250. Thus, by considering the configuration of the stream 250, it can be determined that a flash content as an embedded file is included in the non-PE file.
In this way, the above-mentioned items can be used to detect malicious content. As such, the existence of malicious content can be easily detected.
Furthermore, an object ID (110 in Fig. 2) for identifying the object can be used to detect malicious content. If the same malicious content is inserted into several files, the malicious content may be inserted in the same objects within the several files. In accordance therewith, the malicious content can be detected by comparing an object ID or a characteristic of the object within a target file with an object ID or a characteristic previously derived from a file which had been determined to have malicious content.
A malicious content detection method in accordance with an embodiment of the present invention will now be described with reference to Fig. 4.
First, at a step 300, information about predeterminded items may be extracted from a portion within a non-PE file into which malicious content can be inserted. In some embodiments, information about the items can be extracted from a dictionary of an object within the PDF file. Alternatively, a stream object within the PDF file is identified and then information included in a dictionary of the identified stream object can be extracted. Further, the items relating to extracted information includes at least one of a DL item, an EF item, a Type item, a Params item and a Subtype item which are included in an object within the non-PE file. In some embodiments, information about all the items may be extracted. In this case, if some of the items are not included in the object when extracting information, the non-existence of the items can be indicated for the items which are not included in the object. As such, all the items of the object can be considered in the detection of malicious content. Therefore, the number of items used for detecting the existence of the malicious content may increase, thereby enhancing accuracy for the determination of the malicious content.
Subsequently, the existence of items may be determined at a step 310. In some embodiments, by inspecting the extracted information, the existence of at least one of the DL item, the EF item, the Params item and the Subtype item can be determined. By way of example, but not limitation, if the extracted information about the DL item exists, it can be determined that a large-sized stream is included in the non-PE file. As such, it can also be determined that possibility of including malicious content is high. Further, if the extracted information about the EF item exists, it can be determined that an embedded file exists in the non-PE file. As such, it can also be determined that possibility of including malicious content is high. Furthermore, if the extracted information about the Params item or the Subtype item exists, it can be determined that the embedded file exists in the non-PE file.
Thereafter, values of the existing items may be compared to those relating to malicious contents at a step 320. In some embodiments, the checksum value of a checksum item within the existing Params item can be compared to a checksum value relating to malicious content. The checksum value of the malicious content can be previously calculated and stored in a database. As such, the malicious content detection method in accordance with at least some embodiments described herein can use the checksum value received from the database. If the checksum value included in the object is not the same as that of the malicious content, a similarity between both of the checksum values can be calculated and compared to a reference similarity. When the calculated similarity is higher than the reference similarity, the non-PE file can be determined to include the malicious content. The comparison process using the similarities can be performed along the well-known method. In accordance therewith, mutations of malicious contents can also be detected.
Alternatively, the Subtype item can be compared to types of well-known malicious contents. By way of example, if specific malicious content is well known as a flash type and the Subtype item indicates a flash type, it can be determined that the non-PE file includes the malicious content.
In another different manner, the value of a Type item can be used to determine whether "EmbeddedFiles" exists. As such, it can be determined whether the non-PE file includes an embedded file on the basis of the value of the Type item.
Finally, at a step 330, whether the non-PE file includes a malicious content can be determined based on the determined resultant of the step 310 for the existence of items and the compared resultant of the step 320 for the values of the items. In this manner, the existence of a variety of items and the values of the item are used for determining whether to include malicious content. As such, accuracy for the detection of malicious content can become higher. Also, it can be easily determined whether the non-PE file includes malicious content without decoding the stream.
Meanwhile, the malicious content detection method in accordance with at least some embodiments allows the step 320 for comparing the item values to those relating to malicious content to be performed only when the determined resultant of the step 310 represents the predetermined items to exist. To address this matter, a malicious content detection method in accordance with another embodiment of the present invention can be proposed.
The malicious content detection method in accordance with another embodiment of the present invention will now be described with reference to Fig. 5.
As shown in Fig. 5, the malicious content detection method may enable information about predetermined items to be extracted from a non-PE file at a step 400. The existence of the predetermined items may be determined by inspecting the extracted information at a step 410. Then, whether at least one of a DL item, an EF item, a Params item and a Subtype item exists within an object of the non-PE file may be determined, at a step 412. If the determination resultant of the step 412 represents any one of the above-mentioned items not to exist, a step 414 instead of a step 420 is performed for determining that any malicious content is not included in the non-PE file. This results from the fact that a long sequence of stream or an embedded file, which can be regarded as an existence of a malicious content, does not exist.
On the other hand, when the determination resultant of the step 412 represents that at least one of above-mentioned items to exist, for example, a long sequence of stream or an embedded file exists, the values of the existing items are compared to those relating to malicious contents at a step 422. By way of example, but not limitation, at the step 422, the checksum value of a checksum item within the existing Params item is compared to a checksum value relating to malicious content so as to determine whether the two checksum values are similar to each other. If the two checksum values are not similar to each other, the step 414 may be performed for determining that any malicious content is not included in the non-PE file.
On the contrary, when the two checksum values are similar to each other, a step 424 is performed for comparing a type of an embedded file included in the non-PE file to each type of malicious contents. By way of example, but not limitation, the value of the Subtype item is compared to types of well-known malicious contents. If the Subtype value corresponds to one of the types of well-known malicious contents, the non-PE file is determined to include malicious content at a step 430. On the contrary, if the Subtype value does not correspond to any one of the kinds of the well-known malicious contents, it may mean that the non-PE content includes new malicious content, an error occurred in the comparision of the checksum values or the like. Therefore, a step 426 is performed to inform a user or an external device for an additional procedure, e.g. analyzing the configuration of the new malicious content.
The malicious content detection method of Fig. 5 can reduce the number of steps to be substantially executed, to thereby provide a higher efficiency than that of Fig. 4. However, since the method of Fig. 5 may be provided only as an example, the scope of the present invention is not limited to this. In other words, it will be readily understood that the aspects of the present invention, as generally described herein can be modified or altered by combining, arranging, substituting, separating and designing in a wide variety of different configurations. For example, the comparison of the Subtype values can be performed before or parallel to the comparison of the checksum values.
Fig. 6 is a block diagram showing a malicious content detection system in accordance with an embodiment of the present invention. The malicious content detection system includes an information extraction unit 510 and a determination unit 520. The determination unit 520 may include an existence determinator 522 and a comparator 524.
The information extraction unit 510 may extract information about at least one of predetermined items, such as a DL item, a Params item, an EF item, an EmbeddedFile item and a Subtype item, from a portion within a non-PE file in which malicious files can be inserted. The extracted information by the information extraction unit 510 may be transmitted to the determination unit 520.
The determination unit 520 may inspect the extracted information, in order to not only determine the existence of the predetermined items but also identify the values of the predetermined items. By way of example, the existence determinator 522 can determine the existence of each of the DL item, the Params item, the EF item and the Subtype item which are included in an object within the non-PE file by inspecting the extracted information. Further, the existence determinator 522 can determine whether a stream or a file which can be malicious content, exists in the object within the non-PE file on the basis of the determined existence of each of the DL item, the Params item, the EF item and the Subtype item. Further, the comparator 524 can determine whether the malicious content is included in the non-PE file by comparing the values of the checksum and Subtype item to those relating to malicious contents. As such, the determination unit 520 can determine whether the malicious content is included in the non-PE file on the basis of the resultants from the existence determinator 522 and the comparator 524.
The determination unit 520 can use a communication unit 530 in order to obtain information about kinds of malicious contents and checksum values for the malicious contents. Further, the communication unit 530 can be used for transmitting the determination results for the existence of malicious content to a user.
As described above, the malicious content detection method and system in accordance with embodiments of the present invention can determine whether the non-PE file includes malicious content on the basis of information from a portion within the non-PE file in which the malicious content can be inserted. As such, the malicious content can be accurately and efficiently detected. Particularly, since configuration of the non-PE file is considered for detecting the malicious content, attacks with non-PE files can be efficiently prevented. Moreover, the substance of malicious content can be easily identified because a variety of information included in the non-PE file is used for detecting the malicious content.
While the invention has been shown and described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.

Claims (17)

  1. A method for detecting whether malicious content is included in a non-PE (Portable Executable) file, the method comprising:
    extracting information from a portion within the non-PE file in which the malicious content can be inserted; and
    determining whether the malicious content is included in the non-PE file on the basis of the extracted information.
  2. The method of claim 1, wherein the determining includes receiving information about malicious content from a database and comparing the extracted information to the received information.
  3. The method of claim 1, wherein the non-PE file is a PDF (Portable Document Format) file, and
    wherein the portion corresponds to a stream object within the PDF file.
  4. The method of claim 1, wherein the extracted information includes information about at least one of an object ID, a DL item, a Params item, an EF item, a Type item, a SubType item and an EmbeddedFiles item, which are included in an object within the non-PE file.
  5. The method of claim 1, wherein the extracted information includes a Checksum value within a Params item which is included in an object within the non-PE file.
  6. The method of claim 1, wherein the extracting extracts information about at least two items, and
    wherein the determining determines on the basis of the extracted information about at least two items.
  7. The method of claim 1, wherein the extracting includes extracting information about a predetermined item from a portion within the non-PE file and indicating, if information about the predetermined item is not included in the non-PE file, the non-existence of the information about the predetermined item.
  8. The method of claim 1, further comprises using the extracted information to obtain information about the malicious content when the determining determines the malicious content to be included in the non-PE file.
  9. A system for detecting whether malicious content is included in a non-PE (Portable Executable) file, the system comprising:
    an information extraction unit for extracting information from a portion within the non-PE file in which the malicious content can be inserted; and
    a determination unit for determining whether the malicious content is included in the non-PE file on the basis of the extracted information.
  10. The system of claim 9, wherein the determination unit includes a comparator for comparing the extracted information to information about malicious content which is received from a database.
  11. The system of claim 9, wherein the non-PE file is a PDF (Portable Document Format) file, and
    wherein the portion corresponds to a stream object within the PDF file.
  12. The system of claim 9, wherein the extracted information includes information about at least one of an object ID, a DL item, a Params item, an EF item, a Type item, a SubType item and an EmbeddedFiles item, which are included in an object within the non-PE file.
  13. The system of claim 9, wherein the extracted information includes a Checksum value within a Params item which is included in an object within the non-PE file.
  14. The system of claim 9, wherein the information extraction unit extracts information about at least two items, and
    wherein the determination unit determines on the basis of the extracted information about at least two items.
  15. The system of claim 9, wherein the information extraction unit extracts information about a predetermined item from a portion within the non-PE file and indicates, if information about the predetermined item is not included in the non-PE file, the non-existence of the information about the predetermined item.
  16. The system of claim 9, further comprises a unit for using the extracted information to obtain information about the malicious content when the determination unit determines the malicious content to be included in the non-PE file.
  17. A computer-readable storage medium storing therein a program which includes computer-executable instructions causing a processor to execute the method of claim 1.
PCT/KR2011/010309 2010-12-31 2011-12-29 System and method for detecting malicious content in non-pe file WO2012091488A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020100140190A KR101228900B1 (en) 2010-12-31 2010-12-31 System and method for detecting malicious content in a non-pe file
KR10-2010-0140190 2010-12-31

Publications (1)

Publication Number Publication Date
WO2012091488A1 true WO2012091488A1 (en) 2012-07-05

Family

ID=46383339

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2011/010309 WO2012091488A1 (en) 2010-12-31 2011-12-29 System and method for detecting malicious content in non-pe file

Country Status (2)

Country Link
KR (1) KR101228900B1 (en)
WO (1) WO2012091488A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8627478B2 (en) * 2012-05-11 2014-01-07 Ahnlab, Inc. Method and apparatus for inspecting non-portable executable files

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101311367B1 (en) 2013-04-09 2013-09-25 주식회사 안랩 Method and apparatus for diagnosing attack that bypass the memory protection
WO2023229062A1 (en) * 2022-05-25 2023-11-30 시큐레터 주식회사 Method and device for disarming ole object in ms-ooxml
KR102468428B1 (en) * 2022-05-25 2022-11-18 시큐레터 주식회사 Method and device for disarming of JavaScript in PDF or HWP
US20240184646A1 (en) * 2022-05-26 2024-06-06 SecuLetter Co.,Ltd. Methods and apparatus for disarming dynamic data exchange in ms excel
KR20240024686A (en) 2022-08-17 2024-02-26 한국과학기술원 Method for detecting malware from pdf files and system for performing the same

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070104761A (en) * 2006-04-24 2007-10-29 이병관 Hybrid-based Intrusion Detection System Using Signature Graph
JP2009157521A (en) * 2007-12-25 2009-07-16 Duaxes Corp Virus detection device
KR100945247B1 (en) * 2007-10-04 2010-03-03 한국전자통신연구원 Malware analysis method and device in non-executable file using virtual environment
KR100954357B1 (en) * 2008-06-13 2010-04-26 주식회사 안철수연구소 PPE file diagnosis system and method thereof and module applied thereto

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070104761A (en) * 2006-04-24 2007-10-29 이병관 Hybrid-based Intrusion Detection System Using Signature Graph
KR100945247B1 (en) * 2007-10-04 2010-03-03 한국전자통신연구원 Malware analysis method and device in non-executable file using virtual environment
JP2009157521A (en) * 2007-12-25 2009-07-16 Duaxes Corp Virus detection device
KR100954357B1 (en) * 2008-06-13 2010-04-26 주식회사 안철수연구소 PPE file diagnosis system and method thereof and module applied thereto

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8627478B2 (en) * 2012-05-11 2014-01-07 Ahnlab, Inc. Method and apparatus for inspecting non-portable executable files

Also Published As

Publication number Publication date
KR20120078030A (en) 2012-07-10
KR101228900B1 (en) 2013-02-06

Similar Documents

Publication Publication Date Title
WO2012091488A1 (en) System and method for detecting malicious content in non-pe file
CN106682505B (en) Virus detection method, terminal, server and system
KR101337874B1 (en) System and method for detecting malwares in a file based on genetic map of the file
RU2420791C1 (en) Method of associating previously unknown file with collection of files depending on degree of similarity
US7721334B2 (en) Detection of code-free files
KR100862187B1 (en) Network-based Internet Worm Detection Apparatus and Method Using Vulnerability Analysis and Attack Modeling
CN102521543B (en) Method for information semantic analysis based on dynamic taint analysis
US20170214704A1 (en) Method and device for feature extraction
US10013555B2 (en) System and method for detecting harmful files executable on a virtual stack machine based on parameters of the files and the virtual stack machine
WO2017012241A1 (en) File inspection method, device, apparatus and non-volatile computer storage medium
Li et al. FEPDF: a robust feature extractor for malicious PDF detection
JP6000465B2 (en) Process inspection apparatus, process inspection program, and process inspection method
WO2010024606A2 (en) System and method for providing a normal file database
US10747879B2 (en) System, method, and computer program product for identifying a file used to automatically launch content as unwanted
KR20180039830A (en) Apparatus and method for detecting code reuse attack
WO2022097898A1 (en) Malware detection model training method and malware detection method
CN110210216B (en) Virus detection method and related device
CN105793864A (en) System and method of detecting malicious multimedia files
CN110832488A (en) Normalizing entry point instructions in executable program files
CN114254069A (en) Method, device and storage medium for detecting similarity of domain name
KR101725399B1 (en) Apparatus and method for detection and execution prevention for malicious script based on host level
CN113360902B (en) Shellcode detection method, device, computer equipment and computer storage medium
CN110674501B (en) Malicious drive detection method, device, equipment and medium
CN104239800A (en) Detection method and device for bug trigger threat in PDF (Portable Document Format)
CN104462966B (en) The detection method and device that leak threatens are triggered in PDF

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11853826

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11853826

Country of ref document: EP

Kind code of ref document: A1