CN107092829B - Malicious code detection method based on image matching - Google Patents
Malicious code detection method based on image matching Download PDFInfo
- Publication number
- CN107092829B CN107092829B CN201710265324.1A CN201710265324A CN107092829B CN 107092829 B CN107092829 B CN 107092829B CN 201710265324 A CN201710265324 A CN 201710265324A CN 107092829 B CN107092829 B CN 107092829B
- Authority
- CN
- China
- Prior art keywords
- reference sample
- family
- samples
- matching
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 46
- 239000013074 reference sample Substances 0.000 claims abstract description 135
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 13
- 239000000523 sample Substances 0.000 claims description 45
- 238000012360 testing method Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 8
- 230000003068 static effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 8
- 230000006872 improvement Effects 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000013100 final test Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000010363 phase shift Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a malicious code detection method based on image matching, which comprises the following steps: s1, obtaining training samples corresponding to malicious codes of different family categories, respectively converting the training samples into gray level images and extracting corresponding image texture features; selecting a first reference sample from training samples of each family type, selecting a second reference sample according to the difference of image texture characteristics between the first reference sample and the samples, and forming the first reference sample and the second reference sample selected by each family type into a corresponding reference sample set; s2, converting the malicious codes to be detected into gray images, and extracting corresponding image texture features; and S3, matching the image texture features extracted in the step S2 with the reference sample sets corresponding to the family categories respectively, and confirming the family categories of the malicious codes to be detected according to matching results. The method has the advantages of simple implementation method, strong robustness, high detection accuracy and high detection effect.
Description
Technical Field
The invention relates to the technical field of malicious code detection and analysis, in particular to a malicious code detection method based on image matching.
Background
With the wide application of automatic generation tools of malicious codes and the application of open source codes in the malicious codes, the number of variants of the malicious codes and new malicious code families is also rapidly increased, the number of variants of the malicious codes detected by statistics year reaches 4.3 hundred million, and the malicious codes become a great challenge for network space security. The traditional malicious code detection method is mainly divided into two types: one is a detection method based on a signature mechanism, which can quickly detect known malicious code samples, but has the disadvantages that a great deal of expert experience and manual participation analysis are required, and deformed and confused malicious code samples are difficult to deal with; the other method is a detection method based on abnormal behaviors, which can detect a zero-day vulnerability and a novel family of malicious code samples, but has a high false alarm rate.
The method mainly comprises the steps of using a machine learning method to analyze malicious codes, wherein the steps are generally divided into ① to extract characteristics of the malicious codes, ② to select a proper model, and ③ to obtain a classification result.
The malicious code detection method based on automatic analysis has the following defects:
(1) the robustness is poor, and the detection precision is low. In the method, classification detection is carried out based on the extracted characteristics of malicious codes, the detection precision obtained by different characteristics may be different, and the precision of characteristic extraction and the selection of the characteristics directly influence the precision of a final detection analysis result, so that the actual detection robustness is poor and the detection precision is low;
(2) the detection efficiency is low. The method is usually complex to implement, and usually needs a long time for model training, so that the detection efficiency is low.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the malicious code detection method based on image matching, which has the advantages of simple implementation method, strong robustness, high detection accuracy and high detection effect.
In order to solve the technical problems, the technical scheme provided by the invention is as follows:
a malicious code detection method based on image matching comprises the following steps:
s1, selecting a reference sample: acquiring training samples corresponding to malicious codes of different family categories, respectively converting the training samples into gray level images and extracting corresponding image texture features; selecting a first reference sample from training samples of each family type, selecting a second reference sample according to the difference of image texture characteristics between the first reference sample and the samples, and forming the first reference sample and the second reference sample selected by each family type into a corresponding reference sample set;
s2, image feature extraction: converting the malicious codes to be detected into gray level images, and extracting corresponding image texture features;
s3, testing code classification: and matching the image texture features extracted in the step S2 with the reference sample sets corresponding to the family categories respectively, and confirming the family categories of the malicious codes to be detected according to matching results.
As a further improvement of the present invention, the specific steps of selecting the second reference sample for each family category in step S1 are as follows:
s11, obtaining candidate reference samples: matching the selected first reference samples with the rest training samples respectively, and finding out the training samples which are wrongly distributed in each family type according to the matching result and using the training samples as candidate reference samples;
s12, determining a second reference sample: and respectively calculating difference values between each candidate reference sample and other candidate reference samples in each family type, and if the calculated difference values are greater than a specified threshold value, taking the corresponding candidate reference sample as a second reference sample of the corresponding family type.
As a further improvement of the present invention, in step S12, specifically, a Gabor function value of each candidate reference sample and a distance value between each candidate reference sample and another training sample are calculated, and a difference value between the candidate reference sample and another candidate reference sample is calculated according to the Gabor function value and the distance value.
As a further improvement of the invention, the difference value between one candidate reference sample and other candidate reference samples is calculated according to the following formula;
pd(esid)=∑j=0,1,......,ND(esid,eshj)
wherein esidFor the ith class of candidate reference samples, eshjFor the h-th class jth candidate reference sample, D (es)id,eshj) As samples esidWith the sample eshjH is esidAnd μ is a weighting coefficient, N is the number of reference samples contained in the family class h, M is the number of reference samples, and l is the vector length of the image texture feature.
As a further improvement of the invention: the image texture features are signal type static texture features.
As a further improvement of the invention: the image texture features are obtained by extracting through a Gabor filter.
As a further improvement of the present invention, the specific steps of confirming the family category of the malicious code to be detected in step S3 are:
s31, respectively obtaining matching results of the malicious codes to be detected and all reference samples in the reference sample sets of all family categories;
and S32, respectively obtaining a comprehensive matching value corresponding to each family type according to all matching results of each family type, and judging whether the malicious codes to be detected belong to the corresponding family type according to the comprehensive matching value of each family type.
As a further improvement of the invention: the comprehensive matching value is obtained by calculation according to the following formula;
wherein,estestfor malicious code to be detected, esijIs the jth reference sample of the ith class, and N is the number of reference samples contained in the family class i.
As a further improvement of the invention: and when the comprehensive matching value R corresponding to the target family category meets the condition that R is more than 0, judging that the malicious codes to be detected belong to the target family category, otherwise, judging that the malicious codes to be detected do not belong to the target family category.
Compared with the prior art, the invention has the advantages that:
1) according to the malicious code detection method based on image matching, image texture features are automatically extracted, image matching is carried out based on feature similarity analysis, family classification judgment is realized based on an image matching result, detection automation can be realized, and large-scale malicious code family detection analysis can be conveniently and efficiently realized;
2) in the image matching process, a first reference sample of each family type is selected, a second reference sample is selected according to the difference of image texture characteristics between the first reference sample and the samples, a reference sample set is formed by the first reference sample and the second reference sample to carry out image matching on the malicious code to be detected, the family type of the malicious code to be detected is finally confirmed, long-time model training is not needed, the reliability of the selected reference sample is high, the influence of the selection of the reference sample on a detection result can be greatly reduced, and the detection precision is improved;
3) the malicious code detection method based on image matching further comprises the steps of selecting a first reference sample, selecting a second reference sample based on the first reference sample, searching out a sample with a wrong matching through a matching state of the first reference sample as a candidate reference sample, calculating a difference value between the candidate reference sample and a training sample of a current family type, and finally determining whether the candidate reference sample is used as a new reference sample or not based on the difference value, so that the sample which is distributed by errors and has a larger difference with other samples is also used as the reference sample.
Drawings
Fig. 1 is a schematic flow chart of an implementation of the malicious code detection method based on image matching according to the embodiment.
Fig. 2 is a schematic diagram illustrating an implementation principle of the malicious code detection method based on image matching according to the embodiment.
Fig. 3 is a schematic flow chart of a specific implementation of the grayscale image conversion in this embodiment.
Fig. 4 is a gray scale map obtained in an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.
As shown in fig. 1 and 2, the malicious code detection method based on image matching in the present embodiment includes the steps of:
s1, selecting a reference sample: acquiring training samples corresponding to malicious codes of different family categories, respectively converting the training samples into gray level images and extracting corresponding image texture features; selecting a first reference sample from training samples of each family type, selecting a second reference sample according to the difference of image texture characteristics between the first reference sample and the samples, and forming the first reference sample and the second reference sample selected by each family type into a corresponding reference sample set;
s2, image feature extraction: converting the malicious codes to be detected into gray level images, and extracting corresponding image texture features;
s3, testing code classification: and (5) matching the image texture features extracted in the step (S2) with the reference sample sets corresponding to the family categories respectively, and confirming the family categories of the malicious codes to be detected according to matching results.
The texture of the image can reflect the visual characteristics of the homogeneous phenomenon in the image, and can reflect the slowly-changing or periodically-changing surface structure organization arrangement attributes of the surface of the object. The embodiment utilizes the characteristic texture characteristics, realizes malicious code detection based on an image matching mode, automatically extracts image texture characteristics, performs image matching based on characteristic similarity analysis, and then realizes family classification judgment based on an image matching result, can realize detection automation, can be conveniently applied to a back-end classification system of PC (personal computer) end, mobile client end and the like or large-scale malicious code family homology analysis to perform malicious software detection, and can efficiently mine malicious codes from massive samples to be detected in an online or offline mode and the like.
In the image matching process, the first reference sample of each family type is selected firstly, the second reference sample is selected according to the image texture characteristic difference between the first reference sample and the samples, the first reference sample and the second reference sample form a reference sample set to perform image matching on the malicious code to be detected, and finally the family type of the malicious code to be detected is confirmed.
In this embodiment, the specific step of selecting the second reference sample for each family category in step S1 is as follows:
s11, obtaining candidate reference samples: matching each selected first reference sample with the rest training samples respectively, and searching out the training samples which are wrongly distributed in each family type according to the matching result and using the training samples as candidate reference samples;
s12, determining a second reference sample: and respectively calculating difference values between each candidate reference sample and other candidate reference samples in each family type, and if the calculated difference values are greater than a specified threshold value, taking the corresponding candidate reference sample as a second reference sample of the corresponding family type.
In the traditional image matching, a reference sample is directly selected from a sample set with a known class as a reference sample of the class, if the sample to be detected is matched with the image of the reference sample, the sample to be detected is judged as the class corresponding to the reference sample, the matching effects obtained by selecting the reference images of different samples are possibly different, and the selection of the reference sample directly influences the precision of the matching effects. In the embodiment, when the reference sample is selected, the second reference sample is selected based on the first reference sample, in the selection process of the second reference sample, firstly, a sample with a wrong matching is found out through the matching state of the second reference sample and the first reference sample to serve as a candidate reference sample, then, a difference value between the second reference sample and other candidate reference samples is calculated through the candidate reference sample, and whether the second reference sample is used as a new reference sample is determined based on the difference value, so that a sample which is distributed by errors and has a larger difference with other candidate reference samples is also used as the reference sample.
In step S1, for the training sample, the embodiment first performs image transformation on the sample, and transforms the binary malicious code into a grayscale image. As shown in fig. 3, in the embodiment, when the gray scale image of the sample is converted, since each pixel of the gray scale image is represented by unsigned integer data located between [0 and 255], a malicious code in a binary form is first converted into an unsigned integer data matrix, and since the binary data of 8 bits is converted into an integer greater than 0 and less than 256, the binary file is specifically cut and converted in units of continuous 8 bits, and the image width can be fine-tuned according to the conversion requirement, so as to obtain the gray scale image of each training sample.
The image texture features mainly comprise four types, namely statistical texture features, model texture features, signal texture features and structural texture features. In this embodiment, the image texture features are specifically signal texture features, feature extraction is performed by using a signal texture processing method, the image texture features are specifically extracted by using a Gabor filter, and this embodiment is based on static features, does not need to run malicious codes, and is simple to implement.
The Gabor filter is a linear filter for image edge feature extraction, and can be defined as a sine wave multiplied by a gaussian function, wherein a two-dimensional Gabor filter is a sine plane wave. Due to the multiplicative convolution property, the fourier transform of the impulse response of a Gabor filter is the convolution of the fourier transform of its harmonic function and the fourier transform of a gaussian function, then the filter consists of real and imaginary parts and are orthogonal to each other. The Gabor filter used in this embodiment is specifically as follows, where the complex expression is:
the real part is:
the virtual part is:
wherein, x 'is xcos θ + ysin θ, y' is xsin θ + ycos θ, λ is wavelength, and pixel is unit; theta represents the direction and the value range is between 0 degree and 360 degrees; psi denotes a phase shift, falling within the region of [ -180 °,180 ° ]; the value of γ determines the ellipticity of the shape of the Gabor function; Σ represents the standard deviation of the gaussian factor of the Gabor function and varies with the bandwidth.
When the image features are extracted, an array of Gabor functions with different frequencies and different directions can be obtained, and when texture features are calculated based on Gabor, the image texture features of each sample can be represented as T ([ a1, a2], [ b1, b2], [ c1, c2], [ d1, d2]), that is, the image texture features are composed of four feature values of a, b, c and d, and each feature value is respectively composed of a real part (subscript 1) and an imaginary part (subscript 2).
When the samples are matched, the difference value of the image textural features between the samples is calculated specifically, the matching performance between the samples is judged according to the difference value, and the smaller the difference value of the image textural features between the samples is, the more the matching is correspondingly performed. Single sample siAnd sjThe difference between the two is calculated by the formula:
in this embodiment, in step S12, the difference value between each candidate reference sample and another candidate reference sample is obtained by calculating the Gabor function value of each candidate reference sample and the training sample, and the distance value between each candidate reference sample and another candidate reference sample according to the Gabor function value and the distance value.
In this embodiment, one candidate reference sample esidCalculating the difference value between the reference sample and other candidate reference samples according to the following formula;
pd(esid)=∑j=0,1,......,ND(esid,eshj)
wherein esidFor the ith class of candidate reference samples, eshjFor the h-th class jth candidate reference sample, D (es)id,eshj) As samples esidWith the sample eshjH is esidIncorrectly assigned family class, μ is a trade-off factor, and N is a family classThe number of the reference samples contained in the odd h, M is the number of the reference samples, and l is the length of the image texture feature vector obtained by the G filter. And determining whether the candidate reference samples are used as new reference samples according to the difference value between each candidate reference sample and the training sample of the current family class so as to improve the reliability of the reference samples.
For any n samples and sample set C containing m family classes, it is noted as C ═ C1,C2,…,CmN in the set C1A training sample and n2Unknown sample, and n ═ n1+n2The detailed steps of selecting the candidate reference sample are as follows:
① randomly selects m samples b from the training samples11,b21,…,bm1Is a reference sample, where bijRepresents the jth reference sample from family i, i.e. randomly selects one training sample from the training samples of each family class as the initial reference sample (the first reference sample);
②, matching the residual training samples with the initial reference samples respectively, and counting the samples with the matching errors, namely the samples allocated incorrectly, wherein the samples with the matching errors are also m types correspondingly, the samples with the matching errors are taken as candidate reference samples, and the samples with the matching errors of m types are assumed to be expressed as:
es={{es11,es12,...},{es21,es22,...},...,{esn,1,esn,2,...}}
③, performing secondary matching inside the matching error sample set of each family category, specifically, representing the candidate reference sample set of the family i as { esi1,esi2…, a Gabor function value { Gabor is calculated for each candidate reference samplel(esi1),gaborl(esi2),gaborl(esi3) …, and calculating the difference between different candidate reference samples, specifically calculating the sample es according to the formula (5)idRegarding the difference value of family i, if the candidate sample esidSatisfies the difference value of D (es)id)>ρ, then es is addedidNew reference sample for family i(second reference sample).
In this embodiment, to-be-detected malicious codes are first subjected to image transformation, binary to-be-detected malicious codes are transformed into a gray image form, and then image texture features are extracted, which is specifically the same as the processing method of the training samples.
In this embodiment, the specific steps of determining the family type of the malicious code to be detected based on the extracted image texture features in step S3 are as follows:
s31, respectively obtaining matching results of the malicious codes to be detected and all reference samples in the reference sample set of all family categories;
and S32, respectively obtaining a comprehensive matching value corresponding to each family type according to all matching results of each family type, and judging whether the malicious codes to be detected belong to the corresponding family type according to the comprehensive matching value of each family type.
In this embodiment, the comprehensive matching value is calculated according to the following formula;
wherein,estestfor malicious code to be detected, esijIs the jth reference sample of the ith class, and N is the number of reference samples contained in the family class i.
The embodiment detects malicious codes es to be detectedtestWith reference samples esijDuring matching, if the matching is carried out, the matching result is 1, if the matching is not matched, the matching result is-1, and certainly, the matching result can be set according to actual requirements; and accumulating all matching results obtained by each family category to obtain a final comprehensive matching value, and judging the family category to which the comprehensive matching value belongs according to the comprehensive matching value.
In this embodiment, when the comprehensive matching value R corresponding to the target family category satisfies R >0, it is determined that the malicious code to be detected belongs to the target family category, and otherwise, it is determined that the malicious code to be detected does not belong to the target family category.
The invention is further illustrated below by taking the detection classification of 10 test samples in two family classes as an example.
The training samples used in this example are shown in table 1.
Table 1: and training a sample table.
Step 1: reference sample selection
Step 1.1: training sample image texture feature extraction
Taking two training samples S1(0B06744D7C5822BA585C5992B10ADFA0), S2(0BDAFFBA037a4880D31C93C0AADCC1FE) in family (1), and two training samples S3(2C69C485a46B03C277B5F88DED0BABF0), S4(2C9F38EF39CFD73AA52E22869E8ABD90) in family (2) as examples, the four malicious code training samples are first converted from binary files into gray maps, wherein a binary code segment "01100111" is converted into an unsigned integer 206, which indicates that the value of the corresponding pixel point after conversion into a gray map is 206, and the gray map result is shown in fig. 4, wherein map (a) is family (2) and corresponds to samples S3 and S4, respectively; panel (b) is family (2), corresponding to samples S1 and S2, respectively; and extracting texture features of the malicious code by using a Gabor filter, wherein the implementation method of the Gabor filter is specifically shown in the formulas (1) to (3), and the texture features of the four samples obtained by calculation are respectively as follows:
Tsample S1=([3.64589196e-01,1.78531921e-02],[1.11456886e-01,3.62631582e-03],[2.45940133e-01,4.82167451e-03],[3.66851460e-04,1.85390288e-04]);
TSample S2=([3.67820753e-01,2.47166142e-02],[1.12444790e-01,5.22362168e-03],[2.48120037e-01,7.30584538e-03],[3.70103068e-04,3.69625304e-04]);
TSample S3=([3.82683113e-01,1.65478632e-02],[1.16988294e-01,5.31969120e-03],[2.58145706e-01,3.20882018e-03],[3.85057648e-04,4.53992963e-04]);
TSample S4=([3.78114609e-01,2.53183776e-02],[1.15591678e-01,5.70669572e-03],[2.55063941e-01,7.49917029e-03],[3.80460797e-04,3.41053618e-04])。
Step 1.2: candidate reference sample selection
In this embodiment, the families (1) and (2) provide training sample sets 1 and 2 shown in table 1, respectively, and each sample set contains 10 training samples. The training samples are differentially calculated using equation (4) to match, and assuming that the initial reference samples are sample S1 and sample S3, the matching result is: in training sample set 1, sample S7 and sample S10 are assigned errors; in training sample set 2, sample S5 is assigned an error; then add these three samples as candidate reference samples { [ c ]11,c12],[c21]}。
Step 1.3: second reference sample determination
Calculating the texture features of the candidate reference samples:
([3.53133564e-01,2.24345224e-02],[1.07954837e-01,4.99747304e-03],[2.38212532e-01,6.49801062e-03],[3.55324746e-04,4.32171770e-04]),([3.54380214e-01,2.24449735e-02],[1.08335945e-01,5.00705347e-03],[2.39053482e-01,6.41765146e-03],[3.56579131e-04,3.85045161e-04]),([3.66485717e-01,2.55705031e-02],[1.12036663e-01,5.15760513e-03],[2.47219465e-01,8.83855001e-03],[3.68759749e-04,3.02423971e-04])。
calculating the difference value between the candidate reference sample and the other reference samples by using the above formula (5), it can be obtained:
where, μ is set to 2.
In the present embodiment, the threshold ρ is assumed to be 0.45, since D (c) is present11)>ρ and D (c)12)>ρ, then the candidate reference sample c11And c12Added as a new benchmarkSample (second reference sample), then there is a reference sample set of family (1) as { b }11,c11,c12}. Meanwhile, because only 1 candidate reference sample of family 2 is directly added as a new reference sample (second reference sample), the reference sample set is obtained as { b }21,c21}。
Step 2: test sample image texture feature extraction
And converting each test malicious code into a gray level image, and extracting image texture features, wherein the specific method is as described above.
And step 3: detection classification
Reference sample set { b) using family (1)11,c11,c12And the reference sample set of family (2) { b }21,c21Re-matching test samples for each test sample, wherein the set of test samples comprises ten test samples { S }1,S2,S3,S4,S5,S6,S7,S8,S9,S10}. The comprehensive matching result of each test sample obtained by adopting the formula (6) is specifically as follows:
table 2: and testing a matching result table.
Test specimen | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 |
Family 1 | 3 | 3 | 3 | 1 | 3 | 0 | 0 | 3 | 1 | 0 |
Family 2 | 0 | 0 | 0 | 2 | 0 | 2 | 2 | 0 | 2 | 2 |
If the comprehensive matching result is larger than 0, the comprehensive matching result is judged to belong to the corresponding family category, otherwise, the comprehensive matching result is judged not to belong to the family category. Then according to the above-mentioned comprehensive matching resultThe final test result is obtained as { S1,S2,S3,S5,S8Belongs to family (1), { S4, S6, S7, S9, S10} belongs to family (2). According to the detection result, the detection method can accurately divide the malicious code family category, and has high detection efficiency.
The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.
Claims (8)
1. A malicious code detection method based on image matching is characterized by comprising the following steps:
s1, selecting a reference sample: acquiring training samples corresponding to malicious codes of different family categories, respectively converting the training samples into gray level images and extracting corresponding image texture features; selecting a first reference sample from training samples of each family type, selecting a second reference sample according to the difference of image texture characteristics between the first reference sample and the samples, and forming the first reference sample and the second reference sample selected by each family type into a corresponding reference sample set;
s2, image feature extraction: converting the malicious codes to be detected into gray level images, and extracting corresponding image texture features;
s3, testing code classification: matching the image texture features extracted in the step S2 with the reference sample sets corresponding to the family categories respectively, and confirming the family categories of the malicious codes to be detected according to matching results;
the specific step of selecting the second reference sample in step S1 is:
s11, obtaining candidate reference samples: matching the selected first reference samples with the rest training samples respectively, and finding out the training samples which are wrongly distributed in each family type according to the matching result and using the training samples as candidate reference samples;
s12, determining a second reference sample: and respectively calculating difference values between each candidate reference sample and other candidate reference samples in each family type, and if the calculated difference values are greater than a specified threshold value, taking the corresponding candidate reference sample as a second reference sample of the corresponding family type.
2. The method according to claim 1, wherein in step S12, a difference value between each candidate reference sample and the other candidate reference samples is calculated according to the Gabor function value and the distance value by specifically calculating a Gabor function value of each candidate reference sample and a distance value between each candidate reference sample and the other candidate reference samples.
3. The image matching-based malicious code detection method according to claim 2, wherein the difference value between one candidate reference sample and the other candidate reference samples is calculated according to the following formula;
pd(esid)=∑j=0,1,......,ND(esid,eshj)
wherein esidFor the ith class of candidate reference samples, eshjFor the h-th class jth candidate reference sample, D (es)id,eshj) As samples esidWith the sample eshjH is esidAnd μ is a weighting coefficient, N is the number of reference samples contained in the family class h, M is the number of reference samples, and l is the vector length of the image texture feature.
4. The malicious code detection method based on image matching according to any one of claims 1 to 3, wherein the image texture features are signal type static texture features.
5. The malicious code detection method based on image matching according to any one of claims 1 to 3, characterized in that: the image texture features are obtained by extracting through a Gabor filter.
6. The image matching-based malicious code detection method according to any one of claims 1 to 3, wherein the specific steps of confirming the family category of the malicious code to be detected in the step S3 are as follows:
s31, respectively obtaining matching results of the malicious codes to be detected and all reference samples in the reference sample sets of all family categories;
and S32, respectively obtaining a comprehensive matching value corresponding to each family type according to all matching results of each family type, and judging whether the malicious codes to be detected belong to the corresponding family type according to the comprehensive matching value of each family type.
7. The image matching-based malicious code detection method according to claim 6, wherein the comprehensive matching value is calculated according to the following formula;
8. The image matching-based malicious code detection method according to claim 7, wherein: and when the comprehensive matching value R corresponding to the target family category meets R >0, judging that the malicious codes to be detected belong to the target family category, otherwise, judging that the malicious codes to be detected do not belong to the target family category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710265324.1A CN107092829B (en) | 2017-04-21 | 2017-04-21 | Malicious code detection method based on image matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710265324.1A CN107092829B (en) | 2017-04-21 | 2017-04-21 | Malicious code detection method based on image matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107092829A CN107092829A (en) | 2017-08-25 |
CN107092829B true CN107092829B (en) | 2020-03-17 |
Family
ID=59637854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710265324.1A Active CN107092829B (en) | 2017-04-21 | 2017-04-21 | Malicious code detection method based on image matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107092829B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107688744B (en) * | 2017-08-31 | 2020-03-13 | 杭州安恒信息技术股份有限公司 | Malicious file classification method and device based on image feature matching |
CN107665307A (en) * | 2017-09-13 | 2018-02-06 | 北京金山安全软件有限公司 | Application identification method and device, electronic equipment and storage medium |
CN107657175A (en) * | 2017-09-15 | 2018-02-02 | 北京理工大学 | A kind of homologous detection method of malice sample based on image feature descriptor |
CN107767256A (en) * | 2017-09-15 | 2018-03-06 | 重庆市个人信用管理有限责任公司 | Assessing credit risks method based on image expression credit data and depth belief network |
CN108280348B (en) * | 2018-01-09 | 2021-06-22 | 上海大学 | Android malicious software identification method based on RGB image mapping |
CN108304540B (en) * | 2018-01-29 | 2022-08-02 | 腾讯科技(深圳)有限公司 | Text data identification method and device and related equipment |
CN108416213A (en) * | 2018-03-14 | 2018-08-17 | 中国人民解放军陆军炮兵防空兵学院郑州校区 | A kind of malicious code sorting technique based on image texture fingerprint |
CN108563952B (en) * | 2018-04-24 | 2023-03-21 | 腾讯科技(深圳)有限公司 | File virus detection method and device and storage medium |
CN108717512B (en) * | 2018-05-16 | 2021-06-18 | 中国人民解放军陆军炮兵防空兵学院郑州校区 | Malicious code classification method based on convolutional neural network |
CN110955891B (en) * | 2018-09-26 | 2023-05-02 | 阿里巴巴集团控股有限公司 | File detection method, device and system and data processing method |
CN109492692A (en) * | 2018-11-07 | 2019-03-19 | 北京知道创宇信息技术有限公司 | A kind of webpage back door detection method, device, electronic equipment and storage medium |
CN110392056A (en) * | 2019-07-24 | 2019-10-29 | 成都积微物联集团股份有限公司 | A kind of the Internet of Things malware detection system and method for lightweight |
CN111241550B (en) * | 2020-01-08 | 2023-04-18 | 湖南大学 | Vulnerability detection method based on binary mapping and deep learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104978521A (en) * | 2014-04-10 | 2015-10-14 | 北京启明星辰信息安全技术有限公司 | Method and system for realizing malicious code marking |
CN105512555A (en) * | 2014-12-12 | 2016-04-20 | 哈尔滨安天科技股份有限公司 | Homologous family dividing and mutation method and system based on file string cluster |
-
2017
- 2017-04-21 CN CN201710265324.1A patent/CN107092829B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104978521A (en) * | 2014-04-10 | 2015-10-14 | 北京启明星辰信息安全技术有限公司 | Method and system for realizing malicious code marking |
CN105512555A (en) * | 2014-12-12 | 2016-04-20 | 哈尔滨安天科技股份有限公司 | Homologous family dividing and mutation method and system based on file string cluster |
Non-Patent Citations (1)
Title |
---|
基于纹理指纹的恶意代码变种检测方法研究;韩晓光等;《通信学报》;20140825;第35卷(第8期);第125-136页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107092829A (en) | 2017-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107092829B (en) | Malicious code detection method based on image matching | |
Ryu et al. | Rotation invariant localization of duplicated image regions based on Zernike moments | |
CN107563433B (en) | Infrared small target detection method based on convolutional neural network | |
CN109241741B (en) | Malicious code classification method based on image texture fingerprints | |
WO2016205286A1 (en) | Automatic entity resolution with rules detection and generation system | |
CN110197209A (en) | A kind of Emitter Recognition based on multi-feature fusion | |
CN109344845B (en) | Feature matching method based on triple deep neural network structure | |
CN105160303A (en) | Fingerprint identification method based on mixed matching | |
CN107491536B (en) | Test question checking method, test question checking device and electronic equipment | |
CN112036323B (en) | Signature handwriting authentication method, client and server | |
CN104794729A (en) | SAR image change detection method based on significance guidance | |
CN111126504A (en) | Multi-source incomplete information fusion image target classification method | |
CN116366313A (en) | Small sample abnormal flow detection method and system | |
Wiesner et al. | Dataset of digitized RACs and their rarity score analysis for strengthening shoeprint evidence | |
CN108992033B (en) | Grading device, equipment and storage medium for vision test | |
CN102262723A (en) | Face recognition method and device | |
CN113935034A (en) | Malicious code family classification method and device based on graph neural network and storage medium | |
Marcinowski | Top interpretable neural network for handwriting identification | |
CN111783789A (en) | Image sensitive information identification method | |
CN104462826B (en) | The detection of multisensor evidences conflict and measure based on Singular Value Decomposition Using | |
CN110990383A (en) | Similarity calculation method based on industrial big data set | |
CN114969761A (en) | Log anomaly detection method based on LDA theme characteristics | |
CN115797804A (en) | Abnormity detection method based on unbalanced time sequence aviation flight data | |
CN114743048A (en) | Method and device for detecting abnormal straw picture | |
CN111209567B (en) | Method and device for judging perceptibility of improving robustness of detection model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |