CN107092829A - A kind of malicious code detecting method based on images match - Google Patents

A kind of malicious code detecting method based on images match Download PDF

Info

Publication number
CN107092829A
CN107092829A CN201710265324.1A CN201710265324A CN107092829A CN 107092829 A CN107092829 A CN 107092829A CN 201710265324 A CN201710265324 A CN 201710265324A CN 107092829 A CN107092829 A CN 107092829A
Authority
CN
China
Prior art keywords
sample
mrow
family
classification
malicious code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710265324.1A
Other languages
Chinese (zh)
Other versions
CN107092829B (en
Inventor
喻波
刘浏
杨强
解炜
唐勇
陈曙晖
方莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201710265324.1A priority Critical patent/CN107092829B/en
Publication of CN107092829A publication Critical patent/CN107092829A/en
Application granted granted Critical
Publication of CN107092829B publication Critical patent/CN107092829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Abstract

The present invention discloses a kind of malicious code detecting method based on images match, and step includes:S1. the training sample of the different family's classification malicious codes of correspondence is obtained, training sample is converted into gray level image respectively and corresponding image texture characteristic is extracted;The first baseline sample is chosen from the training sample of each family's classification, and the second baseline sample is chosen according to the difference of image texture characteristic between the first baseline sample, sample, the first baseline sample, the second baseline sample that each family's classification is chosen constitute corresponding baseline sample collection;S2. malicious code to be detected to be detected is converted into gray level image, and extracts corresponding image texture characteristic;S3. baseline sample collection corresponding with each family's classification is matched image texture characteristic step S2 extracted respectively, and family's classification of malicious code to be detected is confirmed according to matching result.Concrete methods of realizing of the present invention is simple, strong robustness, accuracy in detection and the high advantage of Detection results.

Description

A kind of malicious code detecting method based on images match
Technical field
The present invention relates to Malicious Code Detection analysis technical field, more particularly to a kind of malicious code based on images match Detection method.
Background technology
As malicious code automates the extensive use of Core Generator, and application of the Open Source Code in malicious code, The mutation of malicious code and the quantity of new malicious code family also rapidly increase, and the malicious code that year detects according to statistics becomes Plant and reach 4.3 hundred million, malicious code has become the significant challenge of cyberspace safety.Traditional malicious code detecting method is main It is divided into two kinds:A kind of is the detection method based on signature mechanism, can be quickly detected from known malicious code sample, but lack Point is to need substantial amounts of expertise and artificial participation analysis, and is difficult to reply deformation and the malicious code sample obscured;Separately A kind of is the detection method based on abnormal behaviour, can detect the malicious code sample of zero-day vulnerability and new family, but its Rate of false alarm is also very high.
Malicious code detecting method based on automated analysis can solve the above problems, and such method, which is mainly, uses machine The method of device study is analyzed malicious code, is generally divided into three steps:1. malicious code feature is extracted;2. it is appropriate to select Model;3. classification results are obtained.Malicious code detecting method based on automated analysis in the prior art, from feature selecting Angle can be divided into two types:A kind of is the method based on static nature, and another is the method based on behavioral characteristics, wherein The first type method need not run malicious code, it is only necessary to pass through the instruments such as IDA, PEView and hap-depends Binary code or command code of malicious code etc. are obtained, such as using detection features of the bytecode N-Gram as malicious code, then Characteristic dimension is reduced by word frequency method, eventually through the classification malicious code such as SVM models, or OpCode and byte code-phase is used With reference to feature, recycle integrated learning approach carry out sample classification;Dynamic learning can be obtained accurately compared to Static Learning The behavioural information and purpose of malicious code are obtained, second of mode based on behavioral characteristics needs operation program, obtains program Dynamic behaviour, the feature finally extracted by feature extracting method is static nature and behavioral characteristics, such as calling based on API Character string and parameter information extract the behavior knowledge of malicious code, and these knowledge transformations are characterized into vector.
There is following defect in the above-mentioned malicious code detecting method based on automated analysis:
(1) poor robustness, accuracy of detection are low.The feature of the malicious code based on extraction carries out classification inspection in such method Survey, accuracy of detection acquired in different features may be different, the selection of the precision and feature of feature extraction in itself all will be straight Connect the precision of the final detection and analysis result of influence, thus actually detected poor robustness and accuracy of detection is low;
(2) detection efficiency is low.Such method generally realize it is complex, be generally required for longer time carry out model instruction Practice so that detection efficiency is low.
The content of the invention
The technical problem to be solved in the present invention is that:The technical problem existed for prior art, the present invention provides one Plant simple implementation method, strong robustness, accuracy in detection and the high Malicious Code Detection based on images match of Detection results Method.
In order to solve the above technical problems, technical scheme proposed by the present invention is:
A kind of malicious code detecting method based on images match, step includes:
S1. baseline sample is chosen:The training sample of the different family's classification malicious codes of correspondence is obtained, respectively by the training Sample is converted to gray level image and extracts corresponding image texture characteristic;First is chosen from the training sample of each family's classification Baseline sample, and the second baseline sample is chosen according to the difference of image texture characteristic between first baseline sample, sample, The first baseline sample, the second baseline sample that each family's classification is chosen constitute corresponding baseline sample collection;
S2. image characteristics extraction:Malicious code to be detected is converted into gray level image, and it is special to extract corresponding image texture Levy;
S3. code classification is tested:The image texture characteristic that the step S2 is extracted is corresponding with each family's classification respectively The baseline sample collection is matched, and family's classification of malicious code to be detected is confirmed according to matching result.
As a further improvement on the present invention, each family classification chooses the specific of the second baseline sample in the step S1 Step is:
S11. candidate reference sample acquisition:Each first baseline sample of selection is entered with remaining training sample respectively Row matching, finds out in each family's classification by the training sample of mistake distribution and as candidate reference sample according to matching result This;
S12. the second baseline sample is determined:Each candidate reference sample and other candidates in each family's classification are calculated respectively Difference value between baseline sample, if the difference value calculated is more than specified threshold, by corresponding candidate reference sample It is used as the second baseline sample of correspondence family classification.
As a further improvement on the present invention, especially by each candidate reference sample of calculating in the step S12 Gabor functional values, and the distance between each candidate reference sample and other training samples value, according to the Gabor letters Numerical value, the distance value calculate the difference value between the candidate reference sample and other candidate reference samples.
As a further improvement on the present invention, the difference value between a candidate reference sample and other candidate reference samples Specific be calculated as follows obtains;
pd(esid)=∑J=0,1 ..., ND(esid,eshj)
Wherein, esidFor d-th of candidate reference sample of the i-th class, eshjFor h class j-th candidates baseline samples, D (esid, eshj) it is sample esidWith sample eshjThe distance between, h is esidBy family's classification of mistake distribution, μ is balance coefficient, and N is The quantity of sample on the basis of the quantity for the baseline sample that family classification h is included, M, l is the vector length of described image textural characteristics Degree.
As a further improvement on the present invention:Described image textural characteristics are the static textural characteristics of signal type.
As a further improvement on the present invention:Described image textural characteristics are specifically extracted using Gabor filter and obtained.
As a further improvement on the present invention, confirm in the step S3 malicious code to be detected family's classification it is specific Step is:
S31. malicious code to be detected is obtained respectively and the baseline sample of each family's classification concentrates all baseline samples Matching result;
S32. the comprehensive matching value of each family's classification of correspondence is respectively obtained by all matching results of each family's classification, according to The comprehensive matching value of each family's classification judges whether malicious code to be detected belongs to correspondence family classification.
As a further improvement on the present invention:The comprehensive matching value, which is specifically calculated as follows, to be obtained;
Wherein,estestFor malicious code to be detected, esijFor I-th j-th of class baseline sample, quantity of the N by the family classification i baseline samples included.
As a further improvement on the present invention:When the corresponding comprehensive matching value R of target family classification meets R > 0, Then it is determined as that malicious code to be detected belongs to target family classification, is otherwise determined as that malicious code to be detected is not belonging to target family Classification.
Compared with prior art, the advantage of the invention is that:
1) malicious code detecting method of the invention based on images match, image texture characteristic, base are extracted by automating Images match is carried out in characteristic similarity analysis, then realizes that family classification judges based on images match result, detection can be realized Automation, can conveniently, efficiently realize the detection and analysis of large-scale malicious code family;
2) malicious code detecting method of the invention based on images match, it is each by first choosing during images match First baseline sample of family's classification, the is chosen further according to the image texture characteristic difference between the first baseline sample and sample Two baseline samples, constitute baseline sample set pair malicious code to be detected by the first baseline sample, the second baseline sample and carry out image Matching, finally confirms family's classification of malicious code to be detected, without carrying out prolonged model training, selected benchmark sample This reliability is high, can substantially reduce and choose the influence caused to testing result by baseline sample, so as to improve accuracy of detection;
3) malicious code detecting method of the invention based on images match, further first chooses the first baseline sample, then base The second baseline sample is chosen in the first baseline sample, the sample of matching error is found out by the matching status with the first baseline sample This calculates the difference value between the training sample of current family classification, most as candidate reference sample by candidate reference sample Determined whether afterwards based on difference value as new baseline sample so that by mistake distribution, while between other samples difference compared with Big sample also serves as baseline sample, and implementation method is simple, and baseline sample is directly chosen compared to traditional images match, can The reliability of baseline sample selection is effectively increased, so as to further improve the precision of Malicious Code Detection.
Brief description of the drawings
Fig. 1 is the implementation process schematic diagram of malicious code detecting method of the present embodiment based on images match.
Fig. 2 is the realization principle schematic diagram of malicious code detecting method of the present embodiment based on images match.
Fig. 3 is that greyscale image transitions implement schematic flow sheet in the present embodiment.
Fig. 4 is the gray-scale map obtained in the specific embodiment of the invention.
Embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and Limit the scope of the invention.
As shown in Figure 1, 2, malicious code detecting method of the present embodiment based on images match, step includes:
S1. baseline sample is chosen:The training sample of the different family's classification malicious codes of correspondence is obtained, respectively by training sample Be converted to gray level image and extract corresponding image texture characteristic;The first benchmark is chosen from the training sample of each family's classification Sample, and the second baseline sample is chosen according to the difference of image texture characteristic between the first baseline sample, sample, by each family The first baseline sample that the same clan does not choose, the second baseline sample constitute corresponding baseline sample collection;
S2. image characteristics extraction:Malicious code to be detected is converted into gray level image, and it is special to extract corresponding image texture Levy;
S3. code classification is tested:The image texture characteristic that step S2 is extracted benchmark corresponding with each family's classification respectively Sample set is matched, and family's classification of malicious code to be detected is confirmed according to matching result.
The texture of image can reflect the visual signature of homogeneity phenomenon in image, while can embody body surface is had Slowly varying or periodically variable surface textural alignment attribute.The present embodiment is special using above-mentioned characteristic texture Property, the mode based on images match realizes the detection of malicious code, by automating extraction image texture characteristic, feature based phase Images match is carried out like property analysis, then realizes that family classification judges based on images match result, the automation of detection can be realized, The rear end categorizing system of PC ends, mobile client etc. or large-scale malicious code family homology analysis can be conveniently applied to Middle carry out malware detection, can efficiently excavate malice generation by the mode such as online or offline from magnanimity sample to be detected Code.
The present embodiment is during images match, by first choosing the first baseline sample of each family's classification, further according to Image texture characteristic difference between one baseline sample and sample chooses the second baseline sample, by the first baseline sample, second Baseline sample constitutes baseline sample set pair malicious code to be detected and carries out images match, finally confirms the family of malicious code to be detected Not, without carrying out prolonged model training, detection efficiency is high, and selected baseline sample reliability is high, can be significantly for the same clan Reduce and the influence caused to testing result is chosen by baseline sample, so as to improve accuracy of detection.
In the present embodiment, each family classification chooses concretely comprising the following steps for the second baseline sample in step S1:
S11. candidate reference sample acquisition:By each first baseline sample of selection respectively with remaining training sample carry out Match somebody with somebody, found out according to matching result in each family's classification by the training sample of mistake distribution and as candidate reference sample;
S12. the second baseline sample is determined:Each candidate reference sample and other candidate references in each family's classification are calculated respectively Difference value between sample, if the difference value calculated is more than specified threshold, regard corresponding candidate reference sample as correspondence Second baseline sample of family's classification.
It is directly to select a baseline sample to be used as this classification from the sample set of known class in traditional images match Baseline sample, if sample to be detected and the image of baseline sample match, sample on the basis of the sample to be detected is judged In classification corresponding to this, then select the matching effect obtained by the benchmark image of different samples may be different, baseline sample Choosing directly to produce the precision of matching effect influences.When the present embodiment chooses baseline sample, based on the first baseline sample again The second baseline sample is chosen, during the second baseline sample is chosen, first by being searched with the matching status of the first baseline sample Go out the sample of matching error to calculate between other candidate reference samples as candidate reference sample, then by candidate reference sample Difference value, is determined whether as new baseline sample based on difference value so that by mistake distribution, while with other candidate reference samples The sample differed greatly between this also serves as baseline sample, and implementation method is simple, is directly chosen compared to traditional images match Baseline sample, can be effectively increased the reliability of baseline sample selection, so as to further improve the precision of Malicious Code Detection.
Sample for training sample, is carried out image conversion, the malice generation that two are entered by the present embodiment first in step sl Code is converted into gray level image form.As shown in figure 3, when sample is carried out gray level image conversion by the present embodiment, due to gray-scale map Each pixel is to use the unsigned int data being located between [0,255] to represent, first by the malicious code of binary form Unsigned int data matrix is converted into, and 0 is greater than and less than 256, two because 8bit binary data is converted into integer Binary file is specifically to be cut and converted in units of continuous 8 bit, can finely tune image according to conversion requirements wide Degree, obtains the gray level image of each training sample.
Image texture characteristic mainly has statistical textural characteristics, model textural characteristics, signal type textural characteristics and structure Four kinds of type textural characteristics.In the present embodiment, image texture characteristic is specially signal type textural characteristics, using signal type texture processing Method carries out feature extraction, and image texture characteristic is specifically extracted using Gabor filter and obtained, the present embodiment is based on static special Levy, it is not necessary to run malicious code, realize simple.
Gabor filter is the linear filter extracted for picture edge characteristic, can be defined as a sine wave and multiply With Gaussian function, wherein being then sinusoidal plane wave for two-dimensional Gabor filter.Due to multiplication Convolution Properties, Gabor filter Impulse response Fourier transform be its harmonic function Fourier transform and Gaussian function Fourier transform convolution, then The wave filter is made up of real part and empty step and both are orthogonal each other.The Gabor filter institute specific as follows that the present embodiment is used Show, wherein complex expression is:
Real part is:
Imaginary part part is:
In formula, x '=xcos θ+ysin θ, y '=- xsin θ+ycos θ, λ is wavelength, in units of pixel;θ represents direction, Span is for 0 °≤θ≤360 °;ψ represents phase offset, belongs in [- 180 °, 180 °] regional extent;γ value is determined The ellipticity of the shape of Gabor functions;Σ represents the standard deviation of the Gauss factor of Gabor functions, and with bandwidth change and Change.
The Gabor Function Arrays of different frequency different directions can be obtained during image characteristics extraction, texture is calculated based on Gabor During feature, the image texture characteristic of each sample is represented by T=([a1, a2], [b1, b2], [c1, c2], [d1, d2]), i.e., Image texture characteristic is made up of tetra- characteristic values of a, b, c and d, and each characteristic value is respectively by real part (under be designated as 1) and imaginary part (subscript 2) to constitute.
When being matched between sample, the specific difference value for calculating the image texture characteristic between sample, according to difference value The difference value of image texture characteristic is smaller between matching between size judgement sample, sample corresponding more matches.It is single Sample siAnd sjBetween otherness calculation formula be:
In the present embodiment, especially by the Gabor functional values for calculating each candidate reference sample, training sample in step S12, And the distance between each candidate reference sample and other candidate reference samples value, calculated according to Gabor functional values, distance value Difference value between each candidate reference sample and other candidate reference samples.
In the present embodiment, a candidate reference sample esidDifference value between other candidate reference samples is specifically pressed Formula is calculated and obtained;
pd(esid)=∑J=0,1 ..., ND(esid,eshj)
Wherein, esidFor d-th of candidate reference sample of the i-th class, eshjFor h class j-th candidates baseline samples, D (esid, eshj) it is sample esidWith sample eshjThe distance between, h is esidBy family's classification of mistake distribution, μ is balance coefficient, and N is The quantity of sample on the basis of the quantity for the baseline sample that family classification h is included, M, l is that the image texture that G wave filters are obtained is special Levy the length of vector.Difference value size between each candidate reference sample and the training sample of current family's classification, you can really It is fixed whether to should be used as new baseline sample, to improve the reliability of baseline sample.
For any n sample and the sample set C of m family's classification is included, be designated as C={ C1,C2,…,Cm, set C It is middle to include n respectively1Individual training sample and n2Individual unknown sample, and n=n1+n2, choose the detailed step of candidate reference sample such as Under:
1. it is random that m sample { b is selected from training sample11,b21,…,bm1On the basis of sample, wherein bijExpression comes from In family i j-th of baseline sample, i.e., a training sample is randomly choosed from the training sample of each family's classification as first Beginning baseline sample (the first baseline sample);
2. remaining training sample is matched with initial baseline sample respectively, and the sample of statistical match mistake, i.e., wrong The sample distributed by mistake, the sample of matching error is also accordingly m classes, regard the sample of matching error as candidate reference sample, it is assumed that m The sample of class matching error is expressed as:
Es={ { es11,es12,...},{es21,es22,...},...,{esn,1,esn,2,...}}
3. Secondary Match will be carried out inside the matching error sample set of each family's classification, is specially:Family i time It is { es to select baseline sample set representationsi1,esi2..., the Gabor functional values { gabor of each candidate reference sample is calculated respectivelyl (esi1), gaborl(esi2),gaborl(esi3) ..., then the otherness between different candidate reference samples is calculated, it is specific by public affairs Formula (5) calculates sample esidOn family i difference value, if candidate samples esidDifference value meet D (esid)>ρ, then add esidFor family i new baseline sample (the second baseline sample).
In malice to be detected generation, is carried out image conversion first, is treated binary by the present embodiment to malicious code to be detected Detection malicious code is converted into gray level image form, then extracts image texture characteristic, specifically with the processing side of above-mentioned training sample Method is identical.
In the present embodiment, the image texture characteristic based on extraction in step S3 confirms the same clan of family of malicious code to be detected It is other to concretely comprise the following steps:
S31. malicious code to be detected is obtained respectively concentrates matching for each baseline sample with the baseline sample of each family's classification As a result;
S32. the comprehensive matching value of each family's classification of correspondence is respectively obtained by all matching results of each family's classification, according to The comprehensive matching value of each family's classification judges whether malicious code to be detected belongs to correspondence family classification.
In the present embodiment, comprehensive matching value, which is specifically calculated as follows, to be obtained;
Wherein,estestFor malicious code to be detected, esijFor I-th j-th of class baseline sample, quantity of the N by the family classification i baseline samples included.
The present embodiment is by malicious code es to be detectedtestWith baseline sample esijDuring matching, if matching, matching result is 1, if mismatching, matching result is -1, and matching result can also be set according to the actual requirements certainly;Again by each family All matching results that the same clan does not obtain are added up, and obtain final comprehensive matching value, and affiliated family is judged by comprehensive matching value Classification.
In the present embodiment, when the corresponding comprehensive matching value R of target family classification meets R>When 0, then it is determined as evil to be detected Meaning code belongs to target family classification, is otherwise determined as that malicious code to be detected is not belonging to target family classification.
Hereinafter the present invention is further illustrated by taking the detection classification of 10 test samples in Liang Ge families classification as an example.
The training sample that the present embodiment is used is as shown in table 1.
Table 1:Training sample table.
Step 1:Baseline sample is chosen
Step 1.1:Training sample image texture feature extraction
To be respectively two training sample S1 (0B06744D7C5822BA585C5992B10ADFA0) in above-mentioned family (1) , S2 (0BDAFFBA037A4880D31C93C0AADCC1FE), two training sample S3 in family (2) Exemplified by (2C69C485A46B03C277B5F88DED0BABF0), S4 (2C9F38EF39CFD73AA52E22869E8ABD90), Aforementioned four malicious code training sample is converted into gray-scale map by binary file first, such as one section binary code " 01100111 " it is 206 to be converted into unsigned int, then it is 206, gray-scale map to show the value for being converted into corresponding pixel points after gray-scale map As a result as shown in figure 4, wherein figure (a) is family (2), sample S3 and S4 are corresponded to respectively;It is family (2) to scheme (b), and sample is corresponded to respectively This S1 and S2;The textural characteristics of malicious code, the implementation method of the Gabor filter of use are extracted using Gabor filters again Specifically as shown in above-mentioned formula (1)~(3), the textural characteristics that calculating obtains four samples are respectively:
TSample S1=([3.64589196e-01,1.78531921e-02], [1.11456886e-01,3.62631582e- 03],[2.45940133e-01,4.82167451e-03],[3.66851460e-04,1.85390288e-04]);
TSample S2=([3.67820753e-01,2.47166142e-02], [1.12444790e-01,5.22362168e- 03],[2.48120037e-01,7.30584538e-03],[3.70103068e-04,3.69625304e-04]);
TSample S3=([3.82683113e-01,1.65478632e-02], [1.16988294e-01,5.31969120e- 03],[2.58145706e-01,3.20882018e-03],[3.85057648e-04,4.53992963e-04]);
TSample S4=([3.78114609e-01,2.53183776e-02], [1.15591678e-01,5.70669572e- 03],[2.55063941e-01,7.49917029e-03],[3.80460797e-04,3.41053618e-04])。
Step 1.2:Candidate reference sample is chosen
The present embodiment family (1) provides training sample set 1 and 2 as shown in table 1, and each sample respectively with family (2) Concentrate comprising 10 training samples.Use formula (4) to carry out otherness to training sample to calculate to match, and assume just Beginning baseline sample is sample S1 and sample S3, then matching result is:In training sample set 1, sample S7 and sample S10 is allocated mistake By mistake;In training sample set 2, sample S5 is allocated mistake;These three samples are then added to candidate reference sample { [c11,c12], [c21]}。
Step 1.3:Second baseline sample is determined
Calculate the textural characteristics of candidate reference sample:
([3.53133564e-01,2.24345224e-02],[1.07954837e-01,4.99747304e-03], [2.38212532e-01,6.49801062e-03],[3.55324746e-04,4.32171770e-04]), ([3.54380214e-01,2.24449735e-02],[1.08335945e-01,5.00705347e-03], [2.39053482e-01,6.41765146e-03], [3.56579131e-04,3.85045161e-04]), ([3.66485717e-01,2.55705031e-02],[1.12036663e-01,5.15760513e-03], [2.47219465e-01,8.83855001e-03],[3.68759749e-04,3.02423971e-04])。
The difference value between candidate reference sample and other baseline samples is calculated using above-mentioned formula (5), can be obtained:
Wherein, μ=2 are set.
Present embodiment assumes that threshold value ρ=0.45, due to there is D (c11)>ρ and D (c12)>ρ, then by candidate reference sample c11With c12New baseline sample (the second baseline sample) is added to, the baseline sample collection of Ze You families (1) is { b11,c11,c12}.Meanwhile, Because the candidate reference sample of family 2 only has 1, new baseline sample (the second baseline sample) is directly added to, benchmark is obtained Sample set is { b21,c21}。
Step 2:Test sample image texture feature extraction
Each test malicious code is converted into gray level image, and extracts image texture characteristic, specific method is as described above.
Step 3:Detection classification
Use the baseline sample collection { b of family (1)11,c11,c12With the baseline sample collection { b of family (2)21,c21Surveyed to each Sample this progress matching test sample again, wherein test sample collection includes ten test sample { S1,S2,S3,S4,S5,S6,S7, S8,S9,S10}.The comprehensive matching result of each test sample obtained using above-mentioned formula (6) is specially:
Table 2:Test matching result table.
Test sample S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
Family 1 3 3 3 1 3 0 0 3 1 0
Family 2 0 0 0 2 0 2 2 0 2 2
If comprehensive matching result is more than 0, it is judged to belonging to corresponding family's classification, otherwise to be not belonging to.Then according to upper State comprehensive matching result and obtain final testing result for { S1,S2,S3,S5,S8Belong to family (1), { S4, S6, S7, S9, S10 } category In family (2).From testing result, the above-mentioned detection method of the present embodiment can accurately divide malicious code family classification, And detection efficiency is high.
Above-mentioned simply presently preferred embodiments of the present invention, not makees any formal limitation to the present invention.Although of the invention It is disclosed above with preferred embodiment, but it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention Content, according to the technology of the present invention essence to any simple modifications, equivalents, and modifications made for any of the above embodiments, all should fall In the range of technical solution of the present invention protection.

Claims (9)

1. a kind of malicious code detecting method based on images match, it is characterised in that step includes:
S1. baseline sample is chosen:The training sample of the different family's classification malicious codes of correspondence is obtained, respectively by the training sample Be converted to gray level image and extract corresponding image texture characteristic;The first benchmark is chosen from the training sample of each family's classification Sample, and the second baseline sample is chosen according to the difference of image texture characteristic between first baseline sample, sample, will be each The first baseline sample, the second baseline sample of individual family's classification selection constitute corresponding baseline sample collection;
S2. image characteristics extraction:Malicious code to be detected is converted into gray level image, and extracts corresponding image texture characteristic;
S3. code classification is tested:The image texture characteristic that the step S2 is extracted is corresponding with each family's classification described respectively Baseline sample collection is matched, and family's classification of malicious code to be detected is confirmed according to matching result.
2. the malicious code detecting method according to claim 1 based on images match, it is characterised in that the step S1 It is middle to choose concretely comprising the following steps for the second baseline sample:
S11. candidate reference sample acquisition:By each first baseline sample of selection respectively with remaining training sample carry out Match somebody with somebody, found out according to matching result in each family's classification by the training sample of mistake distribution and as candidate reference sample;
S12. the second baseline sample is determined:Each candidate reference sample and other candidate references in each family's classification are calculated respectively Difference value between sample, if the difference value calculated be more than specified threshold, using corresponding candidate reference sample as Second baseline sample of correspondence family classification.
3. the malicious code detecting method according to claim 2 based on images match, it is characterised in that the step Especially by the Gabor functional values for calculating each candidate reference sample in S12, and each candidate reference sample and other The distance between candidate reference sample is worth, and is calculated according to the Gabor functional values, the distance value and obtains each candidate reference Difference value between sample and other candidate reference samples.
4. the malicious code detecting method according to claim 3 based on images match, it is characterised in that candidate's base Difference value between quasi- sample and other candidate reference samples, which is specifically calculated as follows, to be obtained;
<mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>e</mi> <mi>s</mi> <mi>i</mi> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mo>...</mo> <mo>...</mo> <mi>M</mi> </mrow> </munder> <mroot> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>......</mn> </mrow> </msub> <msup> <mrow> <mo>(</mo> <msub> <mi>gabor</mi> <mi>l</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>gabor</mi> <mi>l</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> <mn>2</mn> </mroot> <mo>+</mo> <mfrac> <mn>1</mn> <mrow> <mi>N</mi> <mi>&amp;mu;</mi> </mrow> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>d</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>p</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow>
pd(esid)=∑J=0,1 ..., ND(esid,eshj)
Wherein, esidFor d-th of candidate reference sample of the i-th class, eshjFor h class j-th candidates baseline samples, D (esid,eshj) For sample esidWith sample eshjThe distance between, h is esidBy family's classification of mistake distribution, μ is balance coefficient, and N is family The quantity of sample on the basis of the quantity for the baseline sample that classification h is included, M, l is the vector length of described image textural characteristics.
5. the malicious code detecting method based on images match according to any one in Claims 1 to 4, its feature exists In described image textural characteristics are the static textural characteristics of signal type.
6. the malicious code detecting method based on images match according to any one in Claims 1 to 4, its feature exists In:Described image textural characteristics are specifically extracted using Gabor filter and obtained.
7. the malicious code detecting method based on images match according to any one in Claims 1 to 4, its feature exists In confirming that family's classification of malicious code to be detected is concretely comprised the following steps in the step S3:
S31. malicious code to be detected and of all baseline samples of baseline sample concentration of each family's classification are obtained respectively With result;
S32. the comprehensive matching value of each family's classification of correspondence is respectively obtained by all matching results of each family's classification, according to each family The other comprehensive matching value of the same clan judges whether malicious code to be detected belongs to correspondence family classification.
8. the malicious code detecting method according to claim 7 based on images match, it is characterised in that the synthesis Specifically it is calculated as follows and obtains with value;
<mrow> <mi>R</mi> <mo>=</mo> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow>
Wherein,estestFor malicious code to be detected, esijFor the i-th class J-th of baseline sample, quantity of the N by the family classification i baseline samples included.
9. the malicious code detecting method according to claim 8 based on images match, it is characterised in that:When target family The corresponding comprehensive matching value R of classification meets R>When 0, then it is determined as that malicious code to be detected belongs to target family classification, it is no Then it is determined as that malicious code to be detected is not belonging to target family classification.
CN201710265324.1A 2017-04-21 2017-04-21 Malicious code detection method based on image matching Active CN107092829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710265324.1A CN107092829B (en) 2017-04-21 2017-04-21 Malicious code detection method based on image matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710265324.1A CN107092829B (en) 2017-04-21 2017-04-21 Malicious code detection method based on image matching

Publications (2)

Publication Number Publication Date
CN107092829A true CN107092829A (en) 2017-08-25
CN107092829B CN107092829B (en) 2020-03-17

Family

ID=59637854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710265324.1A Active CN107092829B (en) 2017-04-21 2017-04-21 Malicious code detection method based on image matching

Country Status (1)

Country Link
CN (1) CN107092829B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657175A (en) * 2017-09-15 2018-02-02 北京理工大学 A kind of homologous detection method of malice sample based on image feature descriptor
CN107665307A (en) * 2017-09-13 2018-02-06 北京金山安全软件有限公司 Application identification method and device, electronic equipment and storage medium
CN107688744A (en) * 2017-08-31 2018-02-13 杭州安恒信息技术有限公司 Malicious file sorting technique and device based on Image Feature Matching
CN107767256A (en) * 2017-09-15 2018-03-06 重庆市个人信用管理有限责任公司 Assessing credit risks method based on image expression credit data and depth belief network
CN108280348A (en) * 2018-01-09 2018-07-13 上海大学 Android Malware recognition methods based on RGB image mapping
CN108304540A (en) * 2018-01-29 2018-07-20 腾讯科技(深圳)有限公司 A kind of text data recognition methods, device and relevant device
CN108563952A (en) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 Method for detecting virus, device and the storage medium of file
CN108717512A (en) * 2018-05-16 2018-10-30 中国人民解放军陆军炮兵防空兵学院郑州校区 A kind of malicious code sorting technique based on convolutional neural networks
CN109241741A (en) * 2018-03-14 2019-01-18 中国人民解放军陆军炮兵防空兵学院郑州校区 A kind of malicious code classification method based on image texture fingerprint
CN109492692A (en) * 2018-11-07 2019-03-19 北京知道创宇信息技术有限公司 A kind of webpage back door detection method, device, electronic equipment and storage medium
CN110392056A (en) * 2019-07-24 2019-10-29 成都积微物联集团股份有限公司 A kind of the Internet of Things malware detection system and method for lightweight
CN110955891A (en) * 2018-09-26 2020-04-03 阿里巴巴集团控股有限公司 File detection method, device and system and data processing method
CN111241550A (en) * 2020-01-08 2020-06-05 湖南大学 Vulnerability detection method based on binary mapping and deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978521A (en) * 2014-04-10 2015-10-14 北京启明星辰信息安全技术有限公司 Method and system for realizing malicious code marking
CN105512555A (en) * 2014-12-12 2016-04-20 哈尔滨安天科技股份有限公司 Homologous family dividing and mutation method and system based on file string cluster

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978521A (en) * 2014-04-10 2015-10-14 北京启明星辰信息安全技术有限公司 Method and system for realizing malicious code marking
CN105512555A (en) * 2014-12-12 2016-04-20 哈尔滨安天科技股份有限公司 Homologous family dividing and mutation method and system based on file string cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩晓光等: "基于纹理指纹的恶意代码变种检测方法研究", 《通信学报》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688744B (en) * 2017-08-31 2020-03-13 杭州安恒信息技术股份有限公司 Malicious file classification method and device based on image feature matching
CN107688744A (en) * 2017-08-31 2018-02-13 杭州安恒信息技术有限公司 Malicious file sorting technique and device based on Image Feature Matching
CN107665307A (en) * 2017-09-13 2018-02-06 北京金山安全软件有限公司 Application identification method and device, electronic equipment and storage medium
CN107767256A (en) * 2017-09-15 2018-03-06 重庆市个人信用管理有限责任公司 Assessing credit risks method based on image expression credit data and depth belief network
CN107657175A (en) * 2017-09-15 2018-02-02 北京理工大学 A kind of homologous detection method of malice sample based on image feature descriptor
CN108280348A (en) * 2018-01-09 2018-07-13 上海大学 Android Malware recognition methods based on RGB image mapping
CN108280348B (en) * 2018-01-09 2021-06-22 上海大学 Android malicious software identification method based on RGB image mapping
CN108304540A (en) * 2018-01-29 2018-07-20 腾讯科技(深圳)有限公司 A kind of text data recognition methods, device and relevant device
CN108304540B (en) * 2018-01-29 2022-08-02 腾讯科技(深圳)有限公司 Text data identification method and device and related equipment
CN109241741B (en) * 2018-03-14 2021-06-22 中国人民解放军陆军炮兵防空兵学院郑州校区 Malicious code classification method based on image texture fingerprints
CN109241741A (en) * 2018-03-14 2019-01-18 中国人民解放军陆军炮兵防空兵学院郑州校区 A kind of malicious code classification method based on image texture fingerprint
CN108563952A (en) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 Method for detecting virus, device and the storage medium of file
CN108563952B (en) * 2018-04-24 2023-03-21 腾讯科技(深圳)有限公司 File virus detection method and device and storage medium
CN108717512B (en) * 2018-05-16 2021-06-18 中国人民解放军陆军炮兵防空兵学院郑州校区 Malicious code classification method based on convolutional neural network
CN108717512A (en) * 2018-05-16 2018-10-30 中国人民解放军陆军炮兵防空兵学院郑州校区 A kind of malicious code sorting technique based on convolutional neural networks
CN110955891A (en) * 2018-09-26 2020-04-03 阿里巴巴集团控股有限公司 File detection method, device and system and data processing method
CN110955891B (en) * 2018-09-26 2023-05-02 阿里巴巴集团控股有限公司 File detection method, device and system and data processing method
CN109492692A (en) * 2018-11-07 2019-03-19 北京知道创宇信息技术有限公司 A kind of webpage back door detection method, device, electronic equipment and storage medium
CN110392056A (en) * 2019-07-24 2019-10-29 成都积微物联集团股份有限公司 A kind of the Internet of Things malware detection system and method for lightweight
CN111241550A (en) * 2020-01-08 2020-06-05 湖南大学 Vulnerability detection method based on binary mapping and deep learning
CN111241550B (en) * 2020-01-08 2023-04-18 湖南大学 Vulnerability detection method based on binary mapping and deep learning

Also Published As

Publication number Publication date
CN107092829B (en) 2020-03-17

Similar Documents

Publication Publication Date Title
CN107092829A (en) A kind of malicious code detecting method based on images match
CN106096411B (en) A kind of Android malicious code family classification methods based on bytecode image clustering
CN105915555A (en) Method and system for detecting network anomalous behavior
CN102291392B (en) Hybrid intrusion detection method based on Bagging algorithm
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN110415107B (en) Data processing method, data processing device, storage medium and electronic equipment
CN107491536B (en) Test question checking method, test question checking device and electronic equipment
CN109241741B (en) Malicious code classification method based on image texture fingerprints
CN109547423A (en) A kind of WEB malicious requests depth detection system and method based on machine learning
CN104700033A (en) Virus detection method and virus detection device
CN101976331B (en) Component recognition method of multicomponent overlapped three-dimensional fluorescence spectrum
CN107679872A (en) Art work discrimination method and device, electronic equipment based on block chain
CN107315956A (en) A kind of Graph-theoretical Approach for being used to quick and precisely detect Malware on the zero
CN105893876A (en) Chip hardware Trojan horse detection method and system
CN103886337A (en) Nearest neighbor subspace SAR target identification method based on multiple sparse descriptions
Singh et al. Melford: Using neural networks to find spreadsheet errors
CN104794729A (en) SAR image change detection method based on significance guidance
CN107516370A (en) The automatic test and evaluation method of a kind of bank slip recognition
CN108564569B (en) A kind of distress in concrete detection method and device based on multicore classification learning
CN111833310A (en) Surface defect classification method based on neural network architecture search
CN110147798A (en) A kind of semantic similarity learning method can be used for network information detection
CN108931815A (en) A kind of hierarchical identification method of lithology
CN103310237A (en) Handwritten digit recognition method and system
CN105701501A (en) Trademark image identification method
CN105808602A (en) Detection method and device of junk information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant