CN107368740A - A kind of detection method and system for being directed to executable code in data file - Google Patents
A kind of detection method and system for being directed to executable code in data file Download PDFInfo
- Publication number
- CN107368740A CN107368740A CN201610313139.0A CN201610313139A CN107368740A CN 107368740 A CN107368740 A CN 107368740A CN 201610313139 A CN201610313139 A CN 201610313139A CN 107368740 A CN107368740 A CN 107368740A
- Authority
- CN
- China
- Prior art keywords
- executable code
- sample
- detection
- file
- testing result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses a kind of detection method and system for being directed to executable code in data file.This method is:1) each sample to be analyzed is parsed, identifies the file format of the sample;2) judge whether this document form is the secure file format set, if it is not, then some coding/decoding methods according to corresponding to being chosen this document form decode to the file content of the sample respectively, obtains the decoded data content of several pieces;3) every a decoding data content that some executable code detection schemes according to corresponding to the selection of the file format of the sample obtain to step 2) respectively detects;4) determined to whether there is executable code in the sample according to the testing result of step 3).The unknown malicious code hidden in energy effective detection data file of the invention.
Description
Technical field
The invention belongs to Malicious Code Detection technical field, and in particular to a kind of detection side for being directed to executable code in data file
Method and system.
Background technology
As the continuous development and progress of society, cyber-net are more and more extensive in the application of social every field.It is same with this
When, malicious code attacks the harm brought also getting worse, is especially hidden in Word document, PowerPoint documents, PDF
Executable code attack in the data files such as document is more and more fiery, becomes the main flow means of attack technology of new generation, to public
Safety and information privacy cause to have a strong impact on.Therefore, the detection for executable code in data file seems very necessary.
Current data file executable code detection technique, usually using following several method:
1. pair data file carries out static binary scanning, the existing malicious code feature in malicious code feature database is contrasted, is realized
Executable code detects.Because this method can only be detected for known malicious code, and the executable code in data file
Unknown malicious code, Metamorphic malware, even extraordinary malicious code would generally be included, and for the software vulnerability of specific environment
Or 0Day leaks are attacked, in this case, the detectability deficiency of this method.
2. the loading procedure of pair data file carries out dynamic debugging, issuable exception during analysis is somebody's turn to do, executable generation is realized
Code detection.Entering for data file is still loaded because this method is not easy to define the abnormal executable code being produced from data file
Cheng Benshen, the manual interventions of a large amount of specialties are needed plus the process of analysis, while are also needed possible in reply data file executable code
Comprising anti-debug technology, therefore practical operation difficulty is larger, and the degree of accuracy is relatively low.
3. data file is inserted into sandbox operation, its running of dynamic analysis, behavioural characteristic is extracted, done pair with behavior white list
Than realizing that executable code detects.Because dependence of the trigger condition to running environment of executable code in data file is higher,
It would generally be attacked for specific support, the particular vulnerability of particular version, be to improve precision of analysis, it is necessary to prepare therefore
A large amount of virtual environments, a large amount of retests are carried out, space complexity and time complexity are higher.
In summary, at present in data file executable code detection method, its major defect is:For unknown attack generation
The detectability deficiency of code, the manual analysis interventions for needing a large amount of specialties, space complexity and time complexity is higher, accuracy
Deficiency.
The content of the invention
For technical problem present in prior art, it is directed to it is an object of the invention to provide one kind in data file and can perform generation
The detection method and system of code.The file type of sample data file is identified first for this method, then according to different texts
Part type is detected from a variety of different detection schemes.These detection schemes include embedded executable module detection, reorientation
Command detection, the detection of special sensitive instructions, sensitive API call detection, sensitive character code detection, embedded executable script inspection
Survey etc., finally assessed according to the testing result comprehensive analysis of every scheme and whether there is executable code in data file, and to dividing
Analysis result is recorded and reported.But it is inappropriate that above detection scheme, which is applied directly in data file, because on the one hand
Final form of the data file in internal memory depends on parsing and layout of the host program to this document, another aspect malicious code sheet
Body may also hide itself using different codings.Therefore, on the one hand this method parses according to file format to file content,
So as to obtain practical layout form (i.e. memory content comprising layout information of the file in internal memory.Such as a doc document,
Office programs not be simply by the document read in internal memory can show, but it is parsed according to format standard,
Decoding, layout), the conventional cryptographic means that on the other hand may be used using a variety of decoding processes reply malicious code, in this base
Detection method identification malicious snippets of code known in more than being applied on plinth so that existing a variety of detection schemes go for data text
Part.
A kind of detection method and system for being directed to executable code in data file, its step are as follows:
1) call File Format Analysis module to parse sample file, the file format of sample file identified according to file characteristic,
And its file content is parsed according to the standard process flow of the form, to obtain the sample (if by actually opened rear)
Distribution form in internal memory.File Format Analysis module exists in the form of expansion plugin, and can supplement plug-in unit at any time;
2) according to sample file form, call data decoder module to carry out various ways conversion decoding to sample content, obtain more parts
Decoded data content, hide the conventional coded system of itself use to tackle malicious code.Decoding algorithm has step-by-step different
Or, ASCII character turn binary system etc..Decoder module exists in the form of expansion plugin, and can supplement plug-in unit at any time;
3) the executable code detection module according to corresponding to calling sample file form, to sample original contents and decoded more parts
Data content is detected.Typical scenario includes embedded executable module detection, repositioning instruction detection, the inspection of special sensitive instructions
Detection, sensitive character code detection, embedded executable script detection etc. are called in survey, sensitive API.Executable code detection module
Exist in the form of expansion plugin, and plug-in unit can be supplemented at any time;I.e. to same original sample content, the detection of N kinds is respectively adopted
Method detects to M parts decoding data content corresponding to the original sample content respectively, obtains N*M kind testing results, this
Place is the static scanning detection to internal memory, and performance is higher by instruction trace analysis than the dynamic mentioned in background.
4) according to the testing result of a variety of detection schemes of executable code detection module, analysis and assess in sample data file whether
Executable code be present, if being recorded in the presence of if to the result and be reported to user.In the present invention, it is believed that data text
Binary executable code in part is all dangerous, therefore is no longer further analyzed and (if desired further analyzes executable generation
The actual intention of code, known analysis method can be combined).There should not be executable code in data file in theory, if
Binary executable code be present, then sample is very possible wishes that attacking realization by leak performs these codes.
Further illustrate, for the data file sample of different-format, can targetedly be selected not when executable code detects
Same scheme;Same scheme also has different inspection policies, such as embedded executable script for the data file sample of different-format
The JavaScript scripts that detection scheme is embedded for the data file sample meeting emphasis scanning of PDF format, for DOC texts
Shelves form data file sample then emphasis scanning VBA types script.
Advantages of the present invention is as follows:
1. the present invention carries out group for different file formats on the basis of sample file form is identified using different detection schemes
Detection is closed, therefore scanning process has very strong specific aim.
2. the present invention is when detecting executable code, in addition to being matched characterized by executable code command content in itself,
Also detected characterized by the implantation of executable code and implantation feature, therefore hidden in energy effective detection data file
The unknown malicious codes such as 0Day attack codes, Metamorphic malware, extraordinary malicious code.
3. the present invention is based on static scanning, it is not necessary to the triggering for monitoring executable code performs, and has higher detection performance, compared with
Small space complexity and time complexity.
4. the File Format Analysis module, data decoder module, executable code detection module in the present invention are with expansion plugin
Form is present, and can supplement at any time, has higher scalability.
Brief description of the drawings
Fig. 1 is a kind of detection method flow chart for being directed to executable code in data file of the present invention.
Embodiment
The technical scheme that the invention will now be described in detail with reference to the accompanying drawings:
As shown in figure 1, a kind of detection method for being directed to executable code in data file, including step:
1st, sample file to be analyzed, configuration sample file format analysis module, sample data decoder module, executable generation are prepared
The plug-in units such as code detection module.
2nd, a sample to be analyzed is selected, calls File Format Analysis module to parse the form of sample file, identifies sample
File format.
In this step, sample file format analysis module exists in the form of expansion plugin, can be expanded by user.Its tray
The recognition methods of formula can use a variety of methods such as extension name identification, file header feature recognition, file content feature recognition.
3rd, in step 2, if the file format for identifying sample is not Safe Format, step 4 is carried out;If identify
The file format of sample is Safe Format, then terminates the subsequent analysis to the sample, go to step 2, reselects one and treats point
Analyse sample." Safe Format " refers to indicate those file formats for being reluctant analysis by user.If the sample text prepared in step 1
All analysis finishes part, then terminates whole detection process.
In this step, secure file format refers to the file format that user trusts completely, such as txt, jpg etc., by user voluntarily
Configuration.
4th, call data decoder module to carry out various ways deformation decoding to sample file content, obtain in more parts of decoded data
Hold.
In this step, data decoder module exists in the form of expansion plugin, can be expanded by user.Its decoding algorithm has step-by-step
XOR, ASCII character turn the various ways such as binary system, it is therefore an objective to tackle executable code deformation encryption.
5th, for the file format of sample, suitable executable code detection scheme is selected to combine.
In this step, the executable code detection scheme of selection be according to sample file form targetedly select it is a variety of not
With the combination of detection module, such as embedded executable module scans module, repositioning instruction scan module etc.;According to different samples
This form, identical detection scheme can configure different inspection policies, such as " embedded executable script detection scheme " detection
Script type can be according to the different and different of sample type:Can emphasis scanning insertion for the data file sample of PDF format
JavaScript scripts therein, for DOC document formats data file sample then emphasis scan VBA types script.
Also, according to user configuration, for the scripted code extracted in scanning process, it can enter to advance by way of simulating and performing
The analysis of one step.Concrete mode be first according to script type select user configuration script analytics engine (such as
JavaScript scripts may be selected to be based on the amended script analytics engine of Google V8 engines), then load script using engine
And script semanteme is parsed, realize that simulation performs, malicious script behavior is detected in implementation procedure is simulated.It is not true that simulation, which performs,
Sample file is opened on ground, simply verifies the executable script for really meeting semanteme of the code snippet detected.It is in addition, executable
Code detection module exists in the form of expansion plugin, can be expanded by user.
6th, using the executable code detection scheme selected in step 5 in the decoding data that is generated in sample content and step 4
Appearance is detected.
In this step, typical executable code detection scheme has embedded executable module scans, repositioning instruction scanning, spy
Different instruction scan, sensitive API scanning, sensitive character scan, embedded script scans etc..Wherein embedded executable module scans master
To be detected by scanning PE heads format character in sample with the presence or absence of embedded PE structures;Repositioning instruction is scanned mainly for several
The typical relocation loading mode of kind is scanned, and such as call/pop instructing combinations, fnstenv/pop instructing combinations, is detected suspicious
shellcode;Special instruction scanning predominantly detects the sensitive operation instruction commonly used in shellcode, such as is read in fs registers
The instruction of TEB or PEB addresses;Sensitive API scanning is searched mainly for shellcode usually through API names and calling system is led
Go out function this feature, the typical API Name character string hidden in scan data file, such as LoadLibrary, GetProcAddress
Deng, auxiliary speculate whether there is related executable code;Sensitive character string scanning is by detecting the sensitive character string embedded in sample
(such as URL addresses), auxiliary are speculated with the presence or absence of related executable code (such as related executable code of network access);It is interior
The executable script embedded in the embedding main scan data file of script scans, such as VBScript, JavaScript etc..
7th, whether executable code is contained according to the scanning result of each detection scheme in step 6, the comprehensive assessment sample.
In this step, for the testing result of each different detection schemes in step 6, there is the different degree of accuracy, especially
It is that data may also mistakenly be resolved to code in itself.Such as detect embedded PE structures or embedded executable script, then may be used
To determine that the sample includes executable code;But sensitive character string or indivedual special instructions are detected, then can not conclude the sample completely
This includes executable code.Therefore need to carry out comprehensive assessment and analysis to the testing result of each different detection schemes, draw most
Conclusion afterwards.Appraisal procedure is that each testing result of each detection scheme has different scorings according to its scheme categories, and cumulative scores surpass
It can determine that sample data file includes executable code when crossing threshold value.Detection sensitivity of the threshold value as the present invention, can be by user
Setting.Or as long as an executable code can is found in any testing result and is drawn and judges that the sample includes executable code.
If the 8, judging in step 7, the sample includes executable code, records testing result and the result is reported into user.
9th, repeat step 2 is all analyzed until the sample file prepared in step 1 and finished, end entirely detected to step 8
Journey.
A kind of detection method and system for being directed to executable code in data file proposed by the present invention, for those skilled in the art
For member, (or addition) File Format Analysis module, data decoder module, executable code oneself can be selected as needed
Detection module, oneself configuration secure file format, detection scoring threshold value, realizes and the data file of multiple format be directed to
Property executable code detection, so as to carry out high efficiency, high-accuracy Malicious Code Detection work.
Although disclosing the specific embodiments and the drawings of the present invention for the purpose of illustration, its object is to help to understand present disclosure
And implement according to this, it will be appreciated by those skilled in the art that:Spirit and scope by the claims of the invention and appended is not being departed from
It is interior, it is various replace, to change and modifications all be possible.Therefore, the present invention should not be limited to disclosed in most preferred embodiment and accompanying drawing
Content, the scope of protection of present invention is defined by the scope that claims define.
Claims (10)
1. a kind of detection method for being directed to executable code in data file, its step are:
1) each sample to be analyzed is parsed, identifies the file format of the sample;
2) judge whether this document form is the secure file format set, if it is not, then some coding/decoding methods according to corresponding to being chosen this document form decode to the file content of the sample respectively, obtains the decoded data content of several pieces;
3) every a decoding data content that some executable code detection schemes according to corresponding to the selection of the file format of the sample obtain to step 2) respectively detects;
4) determined to whether there is executable code in the sample according to the testing result of step 3).
2. the method as described in claim 1, it is characterised in that determine that the method in the sample with the presence or absence of executable code is according to the testing result of step 3):Its testing result is scored according to the type of each executable code detection scheme, then the testing result scoring of each executable code detection scheme is added up, if being scored above given threshold after cumulative, judges to include executable code in the sample.
3. the method as described in claim 1, it is characterised in that determine that the method in the sample with the presence or absence of executable code is according to the testing result of step 3):If the testing result of any executable code detection scheme judges to include executable code in the sample executable code to be present.
4. the method as described in claim 1, characterized in that, the executable code detection scheme includes embedding executable module detection, repositioning instruction detection, the detection of special sensitive instructions, the detection of sensitive API calling, sensitive character code detection, embedded executable script detection.
5. the method as described in claim 1, it is characterised in that the file format of the sample is identified using extension name recognition method, file header characteristic recognition method or file content characteristic recognition method.
6. the method as described in Claims 1 to 5 is any, it is characterised in that each executable code detection scheme sets an inspection policies.
7. a kind of detecting system for being directed to executable code in data file, it is characterised in that including configuration sample file format analysis module, sample data decoder module, executable code detection module;Wherein,
Sample file format analysis module is configured, for being parsed to each sample to be analyzed, identifies the file format of the sample, and judges whether this document form is the secure file format set;
Sample data decoder module, the file content of the sample is decoded respectively for some coding/decoding methods according to corresponding to the selection of this document form, obtains the decoded data content of several pieces;
Executable code detection module, every a decoding data content is detected respectively for some executable code detection schemes according to corresponding to the selection of the file format of the sample, and determined according to testing result to whether there is executable code in the sample.
8. system as claimed in claim 7, it is characterised in that each executable code detection scheme sets an inspection policies.
9. system as claimed in claim 7 or 8, it is characterized in that, the executable code detection module scores its testing result according to the type of each executable code detection scheme, then the testing result scoring of each executable code detection scheme is added up, if being scored above given threshold after cumulative, judge to include executable code in the sample.
10. system as claimed in claim 7 or 8, it is characterised in that if the testing result of any executable code detection scheme is executable code be present, the executable code detection module judges to include executable code in the sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610313139.0A CN107368740B (en) | 2016-05-12 | 2016-05-12 | Detection method and system for executable codes in data file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610313139.0A CN107368740B (en) | 2016-05-12 | 2016-05-12 | Detection method and system for executable codes in data file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107368740A true CN107368740A (en) | 2017-11-21 |
CN107368740B CN107368740B (en) | 2020-10-27 |
Family
ID=60303869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610313139.0A Active CN107368740B (en) | 2016-05-12 | 2016-05-12 | Detection method and system for executable codes in data file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107368740B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076540A (en) * | 2021-04-16 | 2021-07-06 | 顶象科技有限公司 | Attack detection method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043915A (en) * | 2010-11-03 | 2011-05-04 | 厦门市美亚柏科信息股份有限公司 | Method and device for detecting malicious code contained in non-executable file |
CN102902915A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | System for detecting behavior feature of file |
CN103546448A (en) * | 2012-12-21 | 2014-01-29 | 哈尔滨安天科技股份有限公司 | Network virus detection method and system based on format parsing |
CN104506495A (en) * | 2014-12-11 | 2015-04-08 | 国家电网公司 | Intelligent network APT attack threat analysis method |
-
2016
- 2016-05-12 CN CN201610313139.0A patent/CN107368740B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043915A (en) * | 2010-11-03 | 2011-05-04 | 厦门市美亚柏科信息股份有限公司 | Method and device for detecting malicious code contained in non-executable file |
CN102902915A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | System for detecting behavior feature of file |
CN103546448A (en) * | 2012-12-21 | 2014-01-29 | 哈尔滨安天科技股份有限公司 | Network virus detection method and system based on format parsing |
CN104506495A (en) * | 2014-12-11 | 2015-04-08 | 国家电网公司 | Intelligent network APT attack threat analysis method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076540A (en) * | 2021-04-16 | 2021-07-06 | 顶象科技有限公司 | Attack detection method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107368740B (en) | 2020-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9032516B2 (en) | System and method for detecting malicious script | |
Kapravelos et al. | Revolver: An automated approach to the detection of evasive web-based malware | |
US8549635B2 (en) | Malware detection using external call characteristics | |
CN111639337B (en) | Unknown malicious code detection method and system for massive Windows software | |
Carmony et al. | Extract Me If You Can: Abusing PDF Parsers in Malware Detectors. | |
Saxena et al. | FLAX: Systematic Discovery of Client-side Validation Vulnerabilities in Rich Web Applications. | |
Lu et al. | De-obfuscation and detection of malicious PDF files with high accuracy | |
US20070152854A1 (en) | Forgery detection using entropy modeling | |
KR100503387B1 (en) | Method to decrypt and analyze the encrypted malicious scripts | |
CN103279710B (en) | Method and system for detecting malicious codes of Internet information system | |
Liang et al. | A behavior-based malware variant classification technique | |
Van Overveldt et al. | FlashDetect: ActionScript 3 malware detection | |
CN105245495A (en) | Similarity match based rapid detection method for malicious shellcode | |
Aebersold et al. | Detecting obfuscated javascripts using machine learning | |
Tellenbach et al. | Detecting obfuscated JavaScripts from known and unknown obfuscators using machine learning | |
CN116366377A (en) | Malicious file detection method, device, equipment and storage medium | |
Bai et al. | Dynamic k-gram based software birthmark | |
Layton et al. | Authorship analysis of the Zeus botnet source code | |
KR101461051B1 (en) | Method for detecting malignant code through web function analysis, and recording medium thereof | |
CN107368740A (en) | A kind of detection method and system for being directed to executable code in data file | |
US8627099B2 (en) | System, method and computer program product for removing null values during scanning | |
CN111131223B (en) | Test method and device for click hijacking | |
US10515219B2 (en) | Determining terms for security test | |
Wrench et al. | Detecting derivative malware samples using deobfuscation-assisted similarity analysis | |
Seng et al. | Automating penetration testing within an ambiguous testing environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |