CN107092829A

CN107092829A - A kind of malicious code detecting method based on images match

Info

Publication number: CN107092829A
Application number: CN201710265324.1A
Authority: CN
Inventors: 喻波; 刘浏; 杨强; 解炜; 唐勇; 陈曙晖; 方莹
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2017-08-25
Anticipated expiration: 2037-04-21
Also published as: CN107092829B

Abstract

The present invention discloses a kind of malicious code detecting method based on images match, and step includes：S1. the training sample of the different family's classification malicious codes of correspondence is obtained, training sample is converted into gray level image respectively and corresponding image texture characteristic is extracted；The first baseline sample is chosen from the training sample of each family's classification, and the second baseline sample is chosen according to the difference of image texture characteristic between the first baseline sample, sample, the first baseline sample, the second baseline sample that each family's classification is chosen constitute corresponding baseline sample collection；S2. malicious code to be detected to be detected is converted into gray level image, and extracts corresponding image texture characteristic；S3. baseline sample collection corresponding with each family's classification is matched image texture characteristic step S2 extracted respectively, and family's classification of malicious code to be detected is confirmed according to matching result.Concrete methods of realizing of the present invention is simple, strong robustness, accuracy in detection and the high advantage of Detection results.

Description

A kind of malicious code detecting method based on images match

Technical field

The present invention relates to Malicious Code Detection analysis technical field, more particularly to a kind of malicious code based on images match Detection method.

Background technology

As malicious code automates the extensive use of Core Generator, and application of the Open Source Code in malicious code, The mutation of malicious code and the quantity of new malicious code family also rapidly increase, and the malicious code that year detects according to statistics becomes Plant and reach 4.3 hundred million, malicious code has become the significant challenge of cyberspace safety.Traditional malicious code detecting method is main It is divided into two kinds：A kind of is the detection method based on signature mechanism, can be quickly detected from known malicious code sample, but lack Point is to need substantial amounts of expertise and artificial participation analysis, and is difficult to reply deformation and the malicious code sample obscured；Separately A kind of is the detection method based on abnormal behaviour, can detect the malicious code sample of zero-day vulnerability and new family, but its Rate of false alarm is also very high.

Malicious code detecting method based on automated analysis can solve the above problems, and such method, which is mainly, uses machine The method of device study is analyzed malicious code, is generally divided into three steps：1. malicious code feature is extracted；2. it is appropriate to select Model；3. classification results are obtained.Malicious code detecting method based on automated analysis in the prior art, from feature selecting Angle can be divided into two types：A kind of is the method based on static nature, and another is the method based on behavioral characteristics, wherein The first type method need not run malicious code, it is only necessary to pass through the instruments such as IDA, PEView and hap-depends Binary code or command code of malicious code etc. are obtained, such as using detection features of the bytecode N-Gram as malicious code, then Characteristic dimension is reduced by word frequency method, eventually through the classification malicious code such as SVM models, or OpCode and byte code-phase is used With reference to feature, recycle integrated learning approach carry out sample classification；Dynamic learning can be obtained accurately compared to Static Learning The behavioural information and purpose of malicious code are obtained, second of mode based on behavioral characteristics needs operation program, obtains program Dynamic behaviour, the feature finally extracted by feature extracting method is static nature and behavioral characteristics, such as calling based on API Character string and parameter information extract the behavior knowledge of malicious code, and these knowledge transformations are characterized into vector.

There is following defect in the above-mentioned malicious code detecting method based on automated analysis：

(1) poor robustness, accuracy of detection are low.The feature of the malicious code based on extraction carries out classification inspection in such method Survey, accuracy of detection acquired in different features may be different, the selection of the precision and feature of feature extraction in itself all will be straight Connect the precision of the final detection and analysis result of influence, thus actually detected poor robustness and accuracy of detection is low；

(2) detection efficiency is low.Such method generally realize it is complex, be generally required for longer time carry out model instruction Practice so that detection efficiency is low.

The content of the invention

The technical problem to be solved in the present invention is that：The technical problem existed for prior art, the present invention provides one Plant simple implementation method, strong robustness, accuracy in detection and the high Malicious Code Detection based on images match of Detection results Method.

In order to solve the above technical problems, technical scheme proposed by the present invention is：

A kind of malicious code detecting method based on images match, step includes：

S1. baseline sample is chosen：The training sample of the different family's classification malicious codes of correspondence is obtained, respectively by the training Sample is converted to gray level image and extracts corresponding image texture characteristic；First is chosen from the training sample of each family's classification Baseline sample, and the second baseline sample is chosen according to the difference of image texture characteristic between first baseline sample, sample, The first baseline sample, the second baseline sample that each family's classification is chosen constitute corresponding baseline sample collection；

S2. image characteristics extraction：Malicious code to be detected is converted into gray level image, and it is special to extract corresponding image texture Levy；

S3. code classification is tested：The image texture characteristic that the step S2 is extracted is corresponding with each family's classification respectively The baseline sample collection is matched, and family's classification of malicious code to be detected is confirmed according to matching result.

As a further improvement on the present invention, each family classification chooses the specific of the second baseline sample in the step S1 Step is：

S11. candidate reference sample acquisition：Each first baseline sample of selection is entered with remaining training sample respectively Row matching, finds out in each family's classification by the training sample of mistake distribution and as candidate reference sample according to matching result This；

S12. the second baseline sample is determined：Each candidate reference sample and other candidates in each family's classification are calculated respectively Difference value between baseline sample, if the difference value calculated is more than specified threshold, by corresponding candidate reference sample It is used as the second baseline sample of correspondence family classification.

As a further improvement on the present invention, especially by each candidate reference sample of calculating in the step S12 Gabor functional values, and the distance between each candidate reference sample and other training samples value, according to the Gabor letters Numerical value, the distance value calculate the difference value between the candidate reference sample and other candidate reference samples.

As a further improvement on the present invention, the difference value between a candidate reference sample and other candidate reference samples Specific be calculated as follows obtains；

p_d(es_id)=∑_{J=0,1 ..., N}D(es_id,es_hj)

Wherein, es_idFor d-th of candidate reference sample of the i-th class, es_hjFor h class j-th candidates baseline samples, D (es_id, es_hj) it is sample es_idWith sample es_hjThe distance between, h is es_idBy family's classification of mistake distribution, μ is balance coefficient, and N is The quantity of sample on the basis of the quantity for the baseline sample that family classification h is included, M, l is the vector length of described image textural characteristics Degree.

As a further improvement on the present invention：Described image textural characteristics are the static textural characteristics of signal type.

As a further improvement on the present invention：Described image textural characteristics are specifically extracted using Gabor filter and obtained.

As a further improvement on the present invention, confirm in the step S3 malicious code to be detected family's classification it is specific Step is：

S31. malicious code to be detected is obtained respectively and the baseline sample of each family's classification concentrates all baseline samples Matching result；

S32. the comprehensive matching value of each family's classification of correspondence is respectively obtained by all matching results of each family's classification, according to The comprehensive matching value of each family's classification judges whether malicious code to be detected belongs to correspondence family classification.

As a further improvement on the present invention：The comprehensive matching value, which is specifically calculated as follows, to be obtained；

Wherein,es_testFor malicious code to be detected, es_ijFor I-th j-th of class baseline sample, quantity of the N by the family classification i baseline samples included.

As a further improvement on the present invention：When the corresponding comprehensive matching value R of target family classification meets R ＞ 0, Then it is determined as that malicious code to be detected belongs to target family classification, is otherwise determined as that malicious code to be detected is not belonging to target family Classification.

Compared with prior art, the advantage of the invention is that：

1) malicious code detecting method of the invention based on images match, image texture characteristic, base are extracted by automating Images match is carried out in characteristic similarity analysis, then realizes that family classification judges based on images match result, detection can be realized Automation, can conveniently, efficiently realize the detection and analysis of large-scale malicious code family；

2) malicious code detecting method of the invention based on images match, it is each by first choosing during images match First baseline sample of family's classification, the is chosen further according to the image texture characteristic difference between the first baseline sample and sample Two baseline samples, constitute baseline sample set pair malicious code to be detected by the first baseline sample, the second baseline sample and carry out image Matching, finally confirms family's classification of malicious code to be detected, without carrying out prolonged model training, selected benchmark sample This reliability is high, can substantially reduce and choose the influence caused to testing result by baseline sample, so as to improve accuracy of detection；

3) malicious code detecting method of the invention based on images match, further first chooses the first baseline sample, then base The second baseline sample is chosen in the first baseline sample, the sample of matching error is found out by the matching status with the first baseline sample This calculates the difference value between the training sample of current family classification, most as candidate reference sample by candidate reference sample Determined whether afterwards based on difference value as new baseline sample so that by mistake distribution, while between other samples difference compared with Big sample also serves as baseline sample, and implementation method is simple, and baseline sample is directly chosen compared to traditional images match, can The reliability of baseline sample selection is effectively increased, so as to further improve the precision of Malicious Code Detection.

Brief description of the drawings

Fig. 1 is the implementation process schematic diagram of malicious code detecting method of the present embodiment based on images match.

Fig. 2 is the realization principle schematic diagram of malicious code detecting method of the present embodiment based on images match.

Fig. 3 is that greyscale image transitions implement schematic flow sheet in the present embodiment.

Fig. 4 is the gray-scale map obtained in the specific embodiment of the invention.

Embodiment

Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and Limit the scope of the invention.

As shown in Figure 1, 2, malicious code detecting method of the present embodiment based on images match, step includes：

S1. baseline sample is chosen：The training sample of the different family's classification malicious codes of correspondence is obtained, respectively by training sample Be converted to gray level image and extract corresponding image texture characteristic；The first benchmark is chosen from the training sample of each family's classification Sample, and the second baseline sample is chosen according to the difference of image texture characteristic between the first baseline sample, sample, by each family The first baseline sample that the same clan does not choose, the second baseline sample constitute corresponding baseline sample collection；

S3. code classification is tested：The image texture characteristic that step S2 is extracted benchmark corresponding with each family's classification respectively Sample set is matched, and family's classification of malicious code to be detected is confirmed according to matching result.

The texture of image can reflect the visual signature of homogeneity phenomenon in image, while can embody body surface is had Slowly varying or periodically variable surface textural alignment attribute.The present embodiment is special using above-mentioned characteristic texture Property, the mode based on images match realizes the detection of malicious code, by automating extraction image texture characteristic, feature based phase Images match is carried out like property analysis, then realizes that family classification judges based on images match result, the automation of detection can be realized, The rear end categorizing system of PC ends, mobile client etc. or large-scale malicious code family homology analysis can be conveniently applied to Middle carry out malware detection, can efficiently excavate malice generation by the mode such as online or offline from magnanimity sample to be detected Code.

The present embodiment is during images match, by first choosing the first baseline sample of each family's classification, further according to Image texture characteristic difference between one baseline sample and sample chooses the second baseline sample, by the first baseline sample, second Baseline sample constitutes baseline sample set pair malicious code to be detected and carries out images match, finally confirms the family of malicious code to be detected Not, without carrying out prolonged model training, detection efficiency is high, and selected baseline sample reliability is high, can be significantly for the same clan Reduce and the influence caused to testing result is chosen by baseline sample, so as to improve accuracy of detection.

In the present embodiment, each family classification chooses concretely comprising the following steps for the second baseline sample in step S1：

S11. candidate reference sample acquisition：By each first baseline sample of selection respectively with remaining training sample carry out Match somebody with somebody, found out according to matching result in each family's classification by the training sample of mistake distribution and as candidate reference sample；

S12. the second baseline sample is determined：Each candidate reference sample and other candidate references in each family's classification are calculated respectively Difference value between sample, if the difference value calculated is more than specified threshold, regard corresponding candidate reference sample as correspondence Second baseline sample of family's classification.

It is directly to select a baseline sample to be used as this classification from the sample set of known class in traditional images match Baseline sample, if sample to be detected and the image of baseline sample match, sample on the basis of the sample to be detected is judged In classification corresponding to this, then select the matching effect obtained by the benchmark image of different samples may be different, baseline sample Choosing directly to produce the precision of matching effect influences.When the present embodiment chooses baseline sample, based on the first baseline sample again The second baseline sample is chosen, during the second baseline sample is chosen, first by being searched with the matching status of the first baseline sample Go out the sample of matching error to calculate between other candidate reference samples as candidate reference sample, then by candidate reference sample Difference value, is determined whether as new baseline sample based on difference value so that by mistake distribution, while with other candidate reference samples The sample differed greatly between this also serves as baseline sample, and implementation method is simple, is directly chosen compared to traditional images match Baseline sample, can be effectively increased the reliability of baseline sample selection, so as to further improve the precision of Malicious Code Detection.

Sample for training sample, is carried out image conversion, the malice generation that two are entered by the present embodiment first in step sl Code is converted into gray level image form.As shown in figure 3, when sample is carried out gray level image conversion by the present embodiment, due to gray-scale map Each pixel is to use the unsigned int data being located between [0,255] to represent, first by the malicious code of binary form Unsigned int data matrix is converted into, and 0 is greater than and less than 256, two because 8bit binary data is converted into integer Binary file is specifically to be cut and converted in units of continuous 8 bit, can finely tune image according to conversion requirements wide Degree, obtains the gray level image of each training sample.

Image texture characteristic mainly has statistical textural characteristics, model textural characteristics, signal type textural characteristics and structure Four kinds of type textural characteristics.In the present embodiment, image texture characteristic is specially signal type textural characteristics, using signal type texture processing Method carries out feature extraction, and image texture characteristic is specifically extracted using Gabor filter and obtained, the present embodiment is based on static special Levy, it is not necessary to run malicious code, realize simple.

Gabor filter is the linear filter extracted for picture edge characteristic, can be defined as a sine wave and multiply With Gaussian function, wherein being then sinusoidal plane wave for two-dimensional Gabor filter.Due to multiplication Convolution Properties, Gabor filter Impulse response Fourier transform be its harmonic function Fourier transform and Gaussian function Fourier transform convolution, then The wave filter is made up of real part and empty step and both are orthogonal each other.The Gabor filter institute specific as follows that the present embodiment is used Show, wherein complex expression is：

Real part is：

Imaginary part part is：

In formula, x '=xcos θ+ysin θ, y '=- xsin θ+ycos θ, λ is wavelength, in units of pixel；θ represents direction, Span is for 0 °≤θ≤360 °；ψ represents phase offset, belongs in [- 180 °, 180 °] regional extent；γ value is determined The ellipticity of the shape of Gabor functions；Σ represents the standard deviation of the Gauss factor of Gabor functions, and with bandwidth change and Change.

The Gabor Function Arrays of different frequency different directions can be obtained during image characteristics extraction, texture is calculated based on Gabor During feature, the image texture characteristic of each sample is represented by T=([a1, a2], [b1, b2], [c1, c2], [d1, d2]), i.e., Image texture characteristic is made up of tetra- characteristic values of a, b, c and d, and each characteristic value is respectively by real part (under be designated as 1) and imaginary part (subscript 2) to constitute.

When being matched between sample, the specific difference value for calculating the image texture characteristic between sample, according to difference value The difference value of image texture characteristic is smaller between matching between size judgement sample, sample corresponding more matches.It is single Sample s_iAnd s_jBetween otherness calculation formula be：

In the present embodiment, especially by the Gabor functional values for calculating each candidate reference sample, training sample in step S12, And the distance between each candidate reference sample and other candidate reference samples value, calculated according to Gabor functional values, distance value Difference value between each candidate reference sample and other candidate reference samples.

In the present embodiment, a candidate reference sample es_idDifference value between other candidate reference samples is specifically pressed Formula is calculated and obtained；

p_d(es_id)=∑_{J=0,1 ..., N}D(es_id,es_hj)

Wherein, es_idFor d-th of candidate reference sample of the i-th class, es_hjFor h class j-th candidates baseline samples, D (es_id, es_hj) it is sample es_idWith sample es_hjThe distance between, h is es_idBy family's classification of mistake distribution, μ is balance coefficient, and N is The quantity of sample on the basis of the quantity for the baseline sample that family classification h is included, M, l is that the image texture that G wave filters are obtained is special Levy the length of vector.Difference value size between each candidate reference sample and the training sample of current family's classification, you can really It is fixed whether to should be used as new baseline sample, to improve the reliability of baseline sample.

For any n sample and the sample set C of m family's classification is included, be designated as C={ C₁,C₂,…,C_m, set C It is middle to include n respectively₁Individual training sample and n₂Individual unknown sample, and n=n₁+n₂, choose the detailed step of candidate reference sample such as Under：

1. it is random that m sample { b is selected from training sample₁₁,b₂₁,…,b_m1On the basis of sample, wherein b_ijExpression comes from In family i j-th of baseline sample, i.e., a training sample is randomly choosed from the training sample of each family's classification as first Beginning baseline sample (the first baseline sample)；

2. remaining training sample is matched with initial baseline sample respectively, and the sample of statistical match mistake, i.e., wrong The sample distributed by mistake, the sample of matching error is also accordingly m classes, regard the sample of matching error as candidate reference sample, it is assumed that m The sample of class matching error is expressed as：

Es={ { es₁₁,es₁₂,...},{es₂₁,es₂₂,...},...,{es_n,1,es_n,2,...}}

3. Secondary Match will be carried out inside the matching error sample set of each family's classification, is specially：Family i time It is { es to select baseline sample set representations_i1,es_i2..., the Gabor functional values { gabor of each candidate reference sample is calculated respectively_l (es_i1), gabor_l(es_i2),gabor_l(es_i3) ..., then the otherness between different candidate reference samples is calculated, it is specific by public affairs Formula (5) calculates sample es_idOn family i difference value, if candidate samples es_idDifference value meet D (es_id)>ρ, then add es_idFor family i new baseline sample (the second baseline sample).

In malice to be detected generation, is carried out image conversion first, is treated binary by the present embodiment to malicious code to be detected Detection malicious code is converted into gray level image form, then extracts image texture characteristic, specifically with the processing side of above-mentioned training sample Method is identical.

In the present embodiment, the image texture characteristic based on extraction in step S3 confirms the same clan of family of malicious code to be detected It is other to concretely comprise the following steps：

S31. malicious code to be detected is obtained respectively concentrates matching for each baseline sample with the baseline sample of each family's classification As a result；

In the present embodiment, comprehensive matching value, which is specifically calculated as follows, to be obtained；

The present embodiment is by malicious code es to be detected_testWith baseline sample es_ijDuring matching, if matching, matching result is 1, if mismatching, matching result is -1, and matching result can also be set according to the actual requirements certainly；Again by each family All matching results that the same clan does not obtain are added up, and obtain final comprehensive matching value, and affiliated family is judged by comprehensive matching value Classification.

In the present embodiment, when the corresponding comprehensive matching value R of target family classification meets R>When 0, then it is determined as evil to be detected Meaning code belongs to target family classification, is otherwise determined as that malicious code to be detected is not belonging to target family classification.

Hereinafter the present invention is further illustrated by taking the detection classification of 10 test samples in Liang Ge families classification as an example.

The training sample that the present embodiment is used is as shown in table 1.

Table 1：Training sample table.

Step 1：Baseline sample is chosen

Step 1.1：Training sample image texture feature extraction

To be respectively two training sample S1 (0B06744D7C5822BA585C5992B10ADFA0) in above-mentioned family (1) , S2 (0BDAFFBA037A4880D31C93C0AADCC1FE), two training sample S3 in family (2) Exemplified by (2C69C485A46B03C277B5F88DED0BABF0), S4 (2C9F38EF39CFD73AA52E22869E8ABD90), Aforementioned four malicious code training sample is converted into gray-scale map by binary file first, such as one section binary code " 01100111 " it is 206 to be converted into unsigned int, then it is 206, gray-scale map to show the value for being converted into corresponding pixel points after gray-scale map As a result as shown in figure 4, wherein figure (a) is family (2), sample S3 and S4 are corresponded to respectively；It is family (2) to scheme (b), and sample is corresponded to respectively This S1 and S2；The textural characteristics of malicious code, the implementation method of the Gabor filter of use are extracted using Gabor filters again Specifically as shown in above-mentioned formula (1)~(3), the textural characteristics that calculating obtains four samples are respectively：

T_{Sample S1}=([3.64589196e-01,1.78531921e-02], [1.11456886e-01,3.62631582e- 03],[2.45940133e-01,4.82167451e-03],[3.66851460e-04,1.85390288e-04])；

T_{Sample S2}=([3.67820753e-01,2.47166142e-02], [1.12444790e-01,5.22362168e- 03],[2.48120037e-01,7.30584538e-03],[3.70103068e-04,3.69625304e-04])；

T_{Sample S3}=([3.82683113e-01,1.65478632e-02], [1.16988294e-01,5.31969120e- 03],[2.58145706e-01,3.20882018e-03],[3.85057648e-04,4.53992963e-04])；

T_{Sample S4}=([3.78114609e-01,2.53183776e-02], [1.15591678e-01,5.70669572e- 03],[2.55063941e-01,7.49917029e-03],[3.80460797e-04,3.41053618e-04])。

Step 1.2：Candidate reference sample is chosen

The present embodiment family (1) provides training sample set 1 and 2 as shown in table 1, and each sample respectively with family (2) Concentrate comprising 10 training samples.Use formula (4) to carry out otherness to training sample to calculate to match, and assume just Beginning baseline sample is sample S1 and sample S3, then matching result is：In training sample set 1, sample S7 and sample S10 is allocated mistake By mistake；In training sample set 2, sample S5 is allocated mistake；These three samples are then added to candidate reference sample { [c₁₁,c₁₂], [c₂₁]}。

Step 1.3：Second baseline sample is determined

Calculate the textural characteristics of candidate reference sample：

([3.53133564e-01,2.24345224e-02],[1.07954837e-01,4.99747304e-03], [2.38212532e-01,6.49801062e-03],[3.55324746e-04,4.32171770e-04]), ([3.54380214e-01,2.24449735e-02],[1.08335945e-01,5.00705347e-03], [2.39053482e-01,6.41765146e-03], [3.56579131e-04,3.85045161e-04]), ([3.66485717e-01,2.55705031e-02],[1.12036663e-01,5.15760513e-03], [2.47219465e-01,8.83855001e-03],[3.68759749e-04,3.02423971e-04])。

The difference value between candidate reference sample and other baseline samples is calculated using above-mentioned formula (5), can be obtained：

Wherein, μ=2 are set.

Present embodiment assumes that threshold value ρ=0.45, due to there is D (c₁₁)>ρ and D (c₁₂)>ρ, then by candidate reference sample c₁₁With c₁₂New baseline sample (the second baseline sample) is added to, the baseline sample collection of Ze You families (1) is { b₁₁,c₁₁,c₁₂}.Meanwhile, Because the candidate reference sample of family 2 only has 1, new baseline sample (the second baseline sample) is directly added to, benchmark is obtained Sample set is { b₂₁,c₂₁}。

Step 2：Test sample image texture feature extraction

Each test malicious code is converted into gray level image, and extracts image texture characteristic, specific method is as described above.

Step 3：Detection classification

Use the baseline sample collection { b of family (1)₁₁,c₁₁,c₁₂With the baseline sample collection { b of family (2)₂₁,c₂₁Surveyed to each Sample this progress matching test sample again, wherein test sample collection includes ten test sample { S₁,S₂,S₃,S₄,S₅,S₆,S₇, S₈,S₉,S₁₀}.The comprehensive matching result of each test sample obtained using above-mentioned formula (6) is specially：

Table 2：Test matching result table.

Test sample	S₁	S₂	S₃	S₄	S₅	S₆	S₇	S₈	S₉	S₁₀
											Family 1	3	3	3	1	3	0	0	3	1	0
Family 2	0	0	0	2	0	2	2	0	2	2

If comprehensive matching result is more than 0, it is judged to belonging to corresponding family's classification, otherwise to be not belonging to.Then according to upper State comprehensive matching result and obtain final testing result for { S₁,S₂,S₃,S₅,S₈Belong to family (1), { S4, S6, S7, S9, S10 } category In family (2).From testing result, the above-mentioned detection method of the present embodiment can accurately divide malicious code family classification, And detection efficiency is high.

Above-mentioned simply presently preferred embodiments of the present invention, not makees any formal limitation to the present invention.Although of the invention It is disclosed above with preferred embodiment, but it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention Content, according to the technology of the present invention essence to any simple modifications, equivalents, and modifications made for any of the above embodiments, all should fall In the range of technical solution of the present invention protection.

Claims

1. a kind of malicious code detecting method based on images match, it is characterised in that step includes：

S1. baseline sample is chosen：The training sample of the different family's classification malicious codes of correspondence is obtained, respectively by the training sample Be converted to gray level image and extract corresponding image texture characteristic；The first benchmark is chosen from the training sample of each family's classification Sample, and the second baseline sample is chosen according to the difference of image texture characteristic between first baseline sample, sample, will be each The first baseline sample, the second baseline sample of individual family's classification selection constitute corresponding baseline sample collection；

S2. image characteristics extraction：Malicious code to be detected is converted into gray level image, and extracts corresponding image texture characteristic；

S3. code classification is tested：The image texture characteristic that the step S2 is extracted is corresponding with each family's classification described respectively Baseline sample collection is matched, and family's classification of malicious code to be detected is confirmed according to matching result.

2. the malicious code detecting method according to claim 1 based on images match, it is characterised in that the step S1 It is middle to choose concretely comprising the following steps for the second baseline sample：

S12. the second baseline sample is determined：Each candidate reference sample and other candidate references in each family's classification are calculated respectively Difference value between sample, if the difference value calculated be more than specified threshold, using corresponding candidate reference sample as Second baseline sample of correspondence family classification.

3. the malicious code detecting method according to claim 2 based on images match, it is characterised in that the step Especially by the Gabor functional values for calculating each candidate reference sample in S12, and each candidate reference sample and other The distance between candidate reference sample is worth, and is calculated according to the Gabor functional values, the distance value and obtains each candidate reference Difference value between sample and other candidate reference samples.

4. the malicious code detecting method according to claim 3 based on images match, it is characterised in that candidate's base Difference value between quasi- sample and other candidate reference samples, which is specifically calculated as follows, to be obtained；

<mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>e</mi> <mi>s</mi> <mi>i</mi> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mo>...</mo> <mo>...</mo> <mi>M</mi> </mrow> </munder> <mroot> <mrow> <msub> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>......</mn> </mrow> </msub> <msup> <mrow> <mo>(</mo> <msub> <mi>gabor</mi> <mi>l</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>gabor</mi> <mi>l</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> <mn>2</mn> </mroot> <mo>+</mo> <mfrac> <mn>1</mn> <mrow> <mi>N</mi> <mi>&mu;</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>p</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>es</mi> <mrow> <mi>i</mi> <mi>d</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow>

p_d(es_id)=∑_{J=0,1 ..., N}D(es_id,es_hj)

Wherein, es_idFor d-th of candidate reference sample of the i-th class, es_hjFor h class j-th candidates baseline samples, D (es_id,es_hj) For sample es_idWith sample es_hjThe distance between, h is es_idBy family's classification of mistake distribution, μ is balance coefficient, and N is family The quantity of sample on the basis of the quantity for the baseline sample that classification h is included, M, l is the vector length of described image textural characteristics.

5. the malicious code detecting method based on images match according to any one in Claims 1 to 4, its feature exists In described image textural characteristics are the static textural characteristics of signal type.

6. the malicious code detecting method based on images match according to any one in Claims 1 to 4, its feature exists In：Described image textural characteristics are specifically extracted using Gabor filter and obtained.

7. the malicious code detecting method based on images match according to any one in Claims 1 to 4, its feature exists In confirming that family's classification of malicious code to be detected is concretely comprised the following steps in the step S3：

S31. malicious code to be detected and of all baseline samples of baseline sample concentration of each family's classification are obtained respectively With result；

S32. the comprehensive matching value of each family's classification of correspondence is respectively obtained by all matching results of each family's classification, according to each family The other comprehensive matching value of the same clan judges whether malicious code to be detected belongs to correspondence family classification.

8. the malicious code detecting method according to claim 7 based on images match, it is characterised in that the synthesis Specifically it is calculated as follows and obtains with value；

<mrow> <mi>R</mi> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow>

Wherein,es_testFor malicious code to be detected, es_ijFor the i-th class J-th of baseline sample, quantity of the N by the family classification i baseline samples included.

9. the malicious code detecting method according to claim 8 based on images match, it is characterised in that：When target family The corresponding comprehensive matching value R of classification meets R>When 0, then it is determined as that malicious code to be detected belongs to target family classification, it is no Then it is determined as that malicious code to be detected is not belonging to target family classification.