CN107368740A - A kind of detection method and system for being directed to executable code in data file - Google Patents

A kind of detection method and system for being directed to executable code in data file Download PDF

Info

Publication number
CN107368740A
CN107368740A CN201610313139.0A CN201610313139A CN107368740A CN 107368740 A CN107368740 A CN 107368740A CN 201610313139 A CN201610313139 A CN 201610313139A CN 107368740 A CN107368740 A CN 107368740A
Authority
CN
China
Prior art keywords
executable code
sample
detection
file
testing result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610313139.0A
Other languages
Chinese (zh)
Other versions
CN107368740B (en
Inventor
聂眉宁
应凌云
苏璞睿
冯登国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN201610313139.0A priority Critical patent/CN107368740B/en
Publication of CN107368740A publication Critical patent/CN107368740A/en
Application granted granted Critical
Publication of CN107368740B publication Critical patent/CN107368740B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a kind of detection method and system for being directed to executable code in data file.This method is:1) each sample to be analyzed is parsed, identifies the file format of the sample;2) judge whether this document form is the secure file format set, if it is not, then some coding/decoding methods according to corresponding to being chosen this document form decode to the file content of the sample respectively, obtains the decoded data content of several pieces;3) every a decoding data content that some executable code detection schemes according to corresponding to the selection of the file format of the sample obtain to step 2) respectively detects;4) determined to whether there is executable code in the sample according to the testing result of step 3).The unknown malicious code hidden in energy effective detection data file of the invention.

Description

A kind of detection method and system for being directed to executable code in data file
Technical field
The invention belongs to Malicious Code Detection technical field, and in particular to a kind of detection side for being directed to executable code in data file Method and system.
Background technology
As the continuous development and progress of society, cyber-net are more and more extensive in the application of social every field.It is same with this When, malicious code attacks the harm brought also getting worse, is especially hidden in Word document, PowerPoint documents, PDF Executable code attack in the data files such as document is more and more fiery, becomes the main flow means of attack technology of new generation, to public Safety and information privacy cause to have a strong impact on.Therefore, the detection for executable code in data file seems very necessary.
Current data file executable code detection technique, usually using following several method:
1. pair data file carries out static binary scanning, the existing malicious code feature in malicious code feature database is contrasted, is realized Executable code detects.Because this method can only be detected for known malicious code, and the executable code in data file Unknown malicious code, Metamorphic malware, even extraordinary malicious code would generally be included, and for the software vulnerability of specific environment Or 0Day leaks are attacked, in this case, the detectability deficiency of this method.
2. the loading procedure of pair data file carries out dynamic debugging, issuable exception during analysis is somebody's turn to do, executable generation is realized Code detection.Entering for data file is still loaded because this method is not easy to define the abnormal executable code being produced from data file Cheng Benshen, the manual interventions of a large amount of specialties are needed plus the process of analysis, while are also needed possible in reply data file executable code Comprising anti-debug technology, therefore practical operation difficulty is larger, and the degree of accuracy is relatively low.
3. data file is inserted into sandbox operation, its running of dynamic analysis, behavioural characteristic is extracted, done pair with behavior white list Than realizing that executable code detects.Because dependence of the trigger condition to running environment of executable code in data file is higher, It would generally be attacked for specific support, the particular vulnerability of particular version, be to improve precision of analysis, it is necessary to prepare therefore A large amount of virtual environments, a large amount of retests are carried out, space complexity and time complexity are higher.
In summary, at present in data file executable code detection method, its major defect is:For unknown attack generation The detectability deficiency of code, the manual analysis interventions for needing a large amount of specialties, space complexity and time complexity is higher, accuracy Deficiency.
The content of the invention
For technical problem present in prior art, it is directed to it is an object of the invention to provide one kind in data file and can perform generation The detection method and system of code.The file type of sample data file is identified first for this method, then according to different texts Part type is detected from a variety of different detection schemes.These detection schemes include embedded executable module detection, reorientation Command detection, the detection of special sensitive instructions, sensitive API call detection, sensitive character code detection, embedded executable script inspection Survey etc., finally assessed according to the testing result comprehensive analysis of every scheme and whether there is executable code in data file, and to dividing Analysis result is recorded and reported.But it is inappropriate that above detection scheme, which is applied directly in data file, because on the one hand Final form of the data file in internal memory depends on parsing and layout of the host program to this document, another aspect malicious code sheet Body may also hide itself using different codings.Therefore, on the one hand this method parses according to file format to file content, So as to obtain practical layout form (i.e. memory content comprising layout information of the file in internal memory.Such as a doc document, Office programs not be simply by the document read in internal memory can show, but it is parsed according to format standard, Decoding, layout), the conventional cryptographic means that on the other hand may be used using a variety of decoding processes reply malicious code, in this base Detection method identification malicious snippets of code known in more than being applied on plinth so that existing a variety of detection schemes go for data text Part.
A kind of detection method and system for being directed to executable code in data file, its step are as follows:
1) call File Format Analysis module to parse sample file, the file format of sample file identified according to file characteristic, And its file content is parsed according to the standard process flow of the form, to obtain the sample (if by actually opened rear) Distribution form in internal memory.File Format Analysis module exists in the form of expansion plugin, and can supplement plug-in unit at any time;
2) according to sample file form, call data decoder module to carry out various ways conversion decoding to sample content, obtain more parts Decoded data content, hide the conventional coded system of itself use to tackle malicious code.Decoding algorithm has step-by-step different Or, ASCII character turn binary system etc..Decoder module exists in the form of expansion plugin, and can supplement plug-in unit at any time;
3) the executable code detection module according to corresponding to calling sample file form, to sample original contents and decoded more parts Data content is detected.Typical scenario includes embedded executable module detection, repositioning instruction detection, the inspection of special sensitive instructions Detection, sensitive character code detection, embedded executable script detection etc. are called in survey, sensitive API.Executable code detection module Exist in the form of expansion plugin, and plug-in unit can be supplemented at any time;I.e. to same original sample content, the detection of N kinds is respectively adopted Method detects to M parts decoding data content corresponding to the original sample content respectively, obtains N*M kind testing results, this Place is the static scanning detection to internal memory, and performance is higher by instruction trace analysis than the dynamic mentioned in background.
4) according to the testing result of a variety of detection schemes of executable code detection module, analysis and assess in sample data file whether Executable code be present, if being recorded in the presence of if to the result and be reported to user.In the present invention, it is believed that data text Binary executable code in part is all dangerous, therefore is no longer further analyzed and (if desired further analyzes executable generation The actual intention of code, known analysis method can be combined).There should not be executable code in data file in theory, if Binary executable code be present, then sample is very possible wishes that attacking realization by leak performs these codes.
Further illustrate, for the data file sample of different-format, can targetedly be selected not when executable code detects Same scheme;Same scheme also has different inspection policies, such as embedded executable script for the data file sample of different-format The JavaScript scripts that detection scheme is embedded for the data file sample meeting emphasis scanning of PDF format, for DOC texts Shelves form data file sample then emphasis scanning VBA types script.
Advantages of the present invention is as follows:
1. the present invention carries out group for different file formats on the basis of sample file form is identified using different detection schemes Detection is closed, therefore scanning process has very strong specific aim.
2. the present invention is when detecting executable code, in addition to being matched characterized by executable code command content in itself, Also detected characterized by the implantation of executable code and implantation feature, therefore hidden in energy effective detection data file The unknown malicious codes such as 0Day attack codes, Metamorphic malware, extraordinary malicious code.
3. the present invention is based on static scanning, it is not necessary to the triggering for monitoring executable code performs, and has higher detection performance, compared with Small space complexity and time complexity.
4. the File Format Analysis module, data decoder module, executable code detection module in the present invention are with expansion plugin Form is present, and can supplement at any time, has higher scalability.
Brief description of the drawings
Fig. 1 is a kind of detection method flow chart for being directed to executable code in data file of the present invention.
Embodiment
The technical scheme that the invention will now be described in detail with reference to the accompanying drawings:
As shown in figure 1, a kind of detection method for being directed to executable code in data file, including step:
1st, sample file to be analyzed, configuration sample file format analysis module, sample data decoder module, executable generation are prepared The plug-in units such as code detection module.
2nd, a sample to be analyzed is selected, calls File Format Analysis module to parse the form of sample file, identifies sample File format.
In this step, sample file format analysis module exists in the form of expansion plugin, can be expanded by user.Its tray The recognition methods of formula can use a variety of methods such as extension name identification, file header feature recognition, file content feature recognition.
3rd, in step 2, if the file format for identifying sample is not Safe Format, step 4 is carried out;If identify The file format of sample is Safe Format, then terminates the subsequent analysis to the sample, go to step 2, reselects one and treats point Analyse sample." Safe Format " refers to indicate those file formats for being reluctant analysis by user.If the sample text prepared in step 1 All analysis finishes part, then terminates whole detection process.
In this step, secure file format refers to the file format that user trusts completely, such as txt, jpg etc., by user voluntarily Configuration.
4th, call data decoder module to carry out various ways deformation decoding to sample file content, obtain in more parts of decoded data Hold.
In this step, data decoder module exists in the form of expansion plugin, can be expanded by user.Its decoding algorithm has step-by-step XOR, ASCII character turn the various ways such as binary system, it is therefore an objective to tackle executable code deformation encryption.
5th, for the file format of sample, suitable executable code detection scheme is selected to combine.
In this step, the executable code detection scheme of selection be according to sample file form targetedly select it is a variety of not With the combination of detection module, such as embedded executable module scans module, repositioning instruction scan module etc.;According to different samples This form, identical detection scheme can configure different inspection policies, such as " embedded executable script detection scheme " detection Script type can be according to the different and different of sample type:Can emphasis scanning insertion for the data file sample of PDF format JavaScript scripts therein, for DOC document formats data file sample then emphasis scan VBA types script. Also, according to user configuration, for the scripted code extracted in scanning process, it can enter to advance by way of simulating and performing The analysis of one step.Concrete mode be first according to script type select user configuration script analytics engine (such as JavaScript scripts may be selected to be based on the amended script analytics engine of Google V8 engines), then load script using engine And script semanteme is parsed, realize that simulation performs, malicious script behavior is detected in implementation procedure is simulated.It is not true that simulation, which performs, Sample file is opened on ground, simply verifies the executable script for really meeting semanteme of the code snippet detected.It is in addition, executable Code detection module exists in the form of expansion plugin, can be expanded by user.
6th, using the executable code detection scheme selected in step 5 in the decoding data that is generated in sample content and step 4 Appearance is detected.
In this step, typical executable code detection scheme has embedded executable module scans, repositioning instruction scanning, spy Different instruction scan, sensitive API scanning, sensitive character scan, embedded script scans etc..Wherein embedded executable module scans master To be detected by scanning PE heads format character in sample with the presence or absence of embedded PE structures;Repositioning instruction is scanned mainly for several The typical relocation loading mode of kind is scanned, and such as call/pop instructing combinations, fnstenv/pop instructing combinations, is detected suspicious shellcode;Special instruction scanning predominantly detects the sensitive operation instruction commonly used in shellcode, such as is read in fs registers The instruction of TEB or PEB addresses;Sensitive API scanning is searched mainly for shellcode usually through API names and calling system is led Go out function this feature, the typical API Name character string hidden in scan data file, such as LoadLibrary, GetProcAddress Deng, auxiliary speculate whether there is related executable code;Sensitive character string scanning is by detecting the sensitive character string embedded in sample (such as URL addresses), auxiliary are speculated with the presence or absence of related executable code (such as related executable code of network access);It is interior The executable script embedded in the embedding main scan data file of script scans, such as VBScript, JavaScript etc..
7th, whether executable code is contained according to the scanning result of each detection scheme in step 6, the comprehensive assessment sample.
In this step, for the testing result of each different detection schemes in step 6, there is the different degree of accuracy, especially It is that data may also mistakenly be resolved to code in itself.Such as detect embedded PE structures or embedded executable script, then may be used To determine that the sample includes executable code;But sensitive character string or indivedual special instructions are detected, then can not conclude the sample completely This includes executable code.Therefore need to carry out comprehensive assessment and analysis to the testing result of each different detection schemes, draw most Conclusion afterwards.Appraisal procedure is that each testing result of each detection scheme has different scorings according to its scheme categories, and cumulative scores surpass It can determine that sample data file includes executable code when crossing threshold value.Detection sensitivity of the threshold value as the present invention, can be by user Setting.Or as long as an executable code can is found in any testing result and is drawn and judges that the sample includes executable code.
If the 8, judging in step 7, the sample includes executable code, records testing result and the result is reported into user.
9th, repeat step 2 is all analyzed until the sample file prepared in step 1 and finished, end entirely detected to step 8 Journey.
A kind of detection method and system for being directed to executable code in data file proposed by the present invention, for those skilled in the art For member, (or addition) File Format Analysis module, data decoder module, executable code oneself can be selected as needed Detection module, oneself configuration secure file format, detection scoring threshold value, realizes and the data file of multiple format be directed to Property executable code detection, so as to carry out high efficiency, high-accuracy Malicious Code Detection work.
Although disclosing the specific embodiments and the drawings of the present invention for the purpose of illustration, its object is to help to understand present disclosure And implement according to this, it will be appreciated by those skilled in the art that:Spirit and scope by the claims of the invention and appended is not being departed from It is interior, it is various replace, to change and modifications all be possible.Therefore, the present invention should not be limited to disclosed in most preferred embodiment and accompanying drawing Content, the scope of protection of present invention is defined by the scope that claims define.

Claims (10)

1. a kind of detection method for being directed to executable code in data file, its step are:
1) each sample to be analyzed is parsed, identifies the file format of the sample;
2) judge whether this document form is the secure file format set, if it is not, then some coding/decoding methods according to corresponding to being chosen this document form decode to the file content of the sample respectively, obtains the decoded data content of several pieces;
3) every a decoding data content that some executable code detection schemes according to corresponding to the selection of the file format of the sample obtain to step 2) respectively detects;
4) determined to whether there is executable code in the sample according to the testing result of step 3).
2. the method as described in claim 1, it is characterised in that determine that the method in the sample with the presence or absence of executable code is according to the testing result of step 3):Its testing result is scored according to the type of each executable code detection scheme, then the testing result scoring of each executable code detection scheme is added up, if being scored above given threshold after cumulative, judges to include executable code in the sample.
3. the method as described in claim 1, it is characterised in that determine that the method in the sample with the presence or absence of executable code is according to the testing result of step 3):If the testing result of any executable code detection scheme judges to include executable code in the sample executable code to be present.
4. the method as described in claim 1, characterized in that, the executable code detection scheme includes embedding executable module detection, repositioning instruction detection, the detection of special sensitive instructions, the detection of sensitive API calling, sensitive character code detection, embedded executable script detection.
5. the method as described in claim 1, it is characterised in that the file format of the sample is identified using extension name recognition method, file header characteristic recognition method or file content characteristic recognition method.
6. the method as described in Claims 1 to 5 is any, it is characterised in that each executable code detection scheme sets an inspection policies.
7. a kind of detecting system for being directed to executable code in data file, it is characterised in that including configuration sample file format analysis module, sample data decoder module, executable code detection module;Wherein,
Sample file format analysis module is configured, for being parsed to each sample to be analyzed, identifies the file format of the sample, and judges whether this document form is the secure file format set;
Sample data decoder module, the file content of the sample is decoded respectively for some coding/decoding methods according to corresponding to the selection of this document form, obtains the decoded data content of several pieces;
Executable code detection module, every a decoding data content is detected respectively for some executable code detection schemes according to corresponding to the selection of the file format of the sample, and determined according to testing result to whether there is executable code in the sample.
8. system as claimed in claim 7, it is characterised in that each executable code detection scheme sets an inspection policies.
9. system as claimed in claim 7 or 8, it is characterized in that, the executable code detection module scores its testing result according to the type of each executable code detection scheme, then the testing result scoring of each executable code detection scheme is added up, if being scored above given threshold after cumulative, judge to include executable code in the sample.
10. system as claimed in claim 7 or 8, it is characterised in that if the testing result of any executable code detection scheme is executable code be present, the executable code detection module judges to include executable code in the sample.
CN201610313139.0A 2016-05-12 2016-05-12 Detection method and system for executable codes in data file Active CN107368740B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610313139.0A CN107368740B (en) 2016-05-12 2016-05-12 Detection method and system for executable codes in data file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610313139.0A CN107368740B (en) 2016-05-12 2016-05-12 Detection method and system for executable codes in data file

Publications (2)

Publication Number Publication Date
CN107368740A true CN107368740A (en) 2017-11-21
CN107368740B CN107368740B (en) 2020-10-27

Family

ID=60303869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610313139.0A Active CN107368740B (en) 2016-05-12 2016-05-12 Detection method and system for executable codes in data file

Country Status (1)

Country Link
CN (1) CN107368740B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076540A (en) * 2021-04-16 2021-07-06 顶象科技有限公司 Attack detection method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043915A (en) * 2010-11-03 2011-05-04 厦门市美亚柏科信息股份有限公司 Method and device for detecting malicious code contained in non-executable file
CN102902915A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 System for detecting behavior feature of file
CN103546448A (en) * 2012-12-21 2014-01-29 哈尔滨安天科技股份有限公司 Network virus detection method and system based on format parsing
CN104506495A (en) * 2014-12-11 2015-04-08 国家电网公司 Intelligent network APT attack threat analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043915A (en) * 2010-11-03 2011-05-04 厦门市美亚柏科信息股份有限公司 Method and device for detecting malicious code contained in non-executable file
CN102902915A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 System for detecting behavior feature of file
CN103546448A (en) * 2012-12-21 2014-01-29 哈尔滨安天科技股份有限公司 Network virus detection method and system based on format parsing
CN104506495A (en) * 2014-12-11 2015-04-08 国家电网公司 Intelligent network APT attack threat analysis method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076540A (en) * 2021-04-16 2021-07-06 顶象科技有限公司 Attack detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107368740B (en) 2020-10-27

Similar Documents

Publication Publication Date Title
US9032516B2 (en) System and method for detecting malicious script
Kapravelos et al. Revolver: An automated approach to the detection of evasive web-based malware
US8549635B2 (en) Malware detection using external call characteristics
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
Carmony et al. Extract Me If You Can: Abusing PDF Parsers in Malware Detectors.
Saxena et al. FLAX: Systematic Discovery of Client-side Validation Vulnerabilities in Rich Web Applications.
Lu et al. De-obfuscation and detection of malicious PDF files with high accuracy
US20070152854A1 (en) Forgery detection using entropy modeling
KR100503387B1 (en) Method to decrypt and analyze the encrypted malicious scripts
CN103279710B (en) Method and system for detecting malicious codes of Internet information system
Liang et al. A behavior-based malware variant classification technique
Van Overveldt et al. FlashDetect: ActionScript 3 malware detection
CN105245495A (en) Similarity match based rapid detection method for malicious shellcode
Aebersold et al. Detecting obfuscated javascripts using machine learning
Tellenbach et al. Detecting obfuscated JavaScripts from known and unknown obfuscators using machine learning
CN116366377A (en) Malicious file detection method, device, equipment and storage medium
Bai et al. Dynamic k-gram based software birthmark
Layton et al. Authorship analysis of the Zeus botnet source code
KR101461051B1 (en) Method for detecting malignant code through web function analysis, and recording medium thereof
CN107368740A (en) A kind of detection method and system for being directed to executable code in data file
US8627099B2 (en) System, method and computer program product for removing null values during scanning
CN111131223B (en) Test method and device for click hijacking
US10515219B2 (en) Determining terms for security test
Wrench et al. Detecting derivative malware samples using deobfuscation-assisted similarity analysis
Seng et al. Automating penetration testing within an ambiguous testing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant