CN106650440A - Malicious program detection method integrating multiple detection results - Google Patents
Malicious program detection method integrating multiple detection results Download PDFInfo
- Publication number
- CN106650440A CN106650440A CN201610909053.4A CN201610909053A CN106650440A CN 106650440 A CN106650440 A CN 106650440A CN 201610909053 A CN201610909053 A CN 201610909053A CN 106650440 A CN106650440 A CN 106650440A
- Authority
- CN
- China
- Prior art keywords
- program
- normal procedure
- string
- detected
- gene pool
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Abstract
The invention discloses a malicious program detection method integrating multiple detection results. The method includes the steps of establishing a character string set, establishing a gene pool, constructing feature vectors, generating a malicious program detector, judging whether a to-be-detected program is a malicious program or not, and comprehensively judging whether the to-be-detected program is a malicious program or not according to multiple detection results. According to the method, malicious program detection is achieved by establishing the gene pool, then generating a normal program feature space and then generating the detector covering a malicious program space, so that extraction of malicious program feature codes and maintenance of a huge feature library are effectively avoided; meanwhile, the accuracy rate of malicious program detection is further increased by integrating multiple malicious program detection results and conducting comprehensive judgment again. The method has a good detection effect on unknown malicious programs and known malicious programs, and a beneficial method is provided for detection of malicious programs and integration of existing detection methods.
Description
Technical field
The invention belongs to information security field, and in particular to a kind of malware detection methods of many testing results of fusion.
Background technology
With the continuous popularization and development of the Internet, applications, emerged in large numbers a large amount of network services, make cyberspace into
It is the 5th territory behind land, sea, air, day.Cyberspace also becomes safely the much-talked-about topic of global common concern, by wide
General concern.In disparate networks security incident, the rogue program with computer virus as representative has become menace network and letter
One of principal element of breath safety.Rogue program typically refers to the one section of program write with attack intension, mainly includes:Meter
Calculation machine virus, trapdoor, logic bomb, Trojan Horse, worm etc..In a sense, 21 century be computer virus with
The anti-virus fierce epoch contended, and intellectuality, hommization, hiddenization, diversified also it is being increasingly becoming new century computeritis
The development trend of poison.
In general, malicious code detecting method can be divided into two kinds:One kind is static detection method, and another kind is dynamic
Detection method.
Static detection method mainly includes:Detection method and inspirational education method of feature based code etc..
Dynamic analysing method mainly includes:System monitoring method and dynamic tracing etc..
The detection method of feature based code is a kind of wide variety of method in existing commercial anti-virus software, almost existing
All anti-viral softwares all have this function.First, known viruse sample is gathered, extracts its condition code and included disease
Malicious database.During detection, open and be detected file, search for hereof, whether check in file containing in virus database
Virus pattern code.If it find that virus signature, due to condition code and virus one-to-one corresponding, just it can be concluded that being looked into file
In with which kind of virus.But this method can not detect unknown virus;The feature code of known viruse is collected, charge costs are big;
The species of malicious code, quantity are more and more, and the maintenance of feature database also becomes more and more difficult and excessive condition code can shadow
Ring detection efficiency.Trigger-initiated scanning technology is exactly in fact that the one kind to feature based code detection method is improved, and this method is not only
The malicious code that oneself knows can be detected, moreover it is possible to identify some mutation, deformation and unknown malicious code.But the method is still
The extraction of too busy to get away condition code.
System monitoring method-as be malicious code to be operated in one controllable environment, by comparison system some mark
Property the change that occurs before and after malicious code is performed of status information determining the function and purpose of malicious code.System monitoring side
Method can be with monitoring programme to system resource every operation, real-time detection does well change, it is possible to quickly finding known
With unknown malicious code.Dynamic behaviour when dynamic tracing refers to that monitor in real time malicious code runs, so as to analyze malice
The function and purpose of code.This detection method can effectively analyze the behavior of malicious code, but its technology realized is difficult
Degree is than larger.Additionally, the high susceptibility of dynamic analysing method may produce false alarm, can not recognize Virus Name, it is unfavorable for
The removing of infected file virus.Existing antivirus software all has the function of behavior monitoring mostly, but often due to configuration is improper
And perform practically no function.
The rogue program of cybertimes is more concealed, and degree of hiding is higher, and spread scope is wider, and the loss for bringing is more tight
Weight.Plus the prevalence of Code Obfuscation Security Technology, rogue program automatically generates the wide-scale distribution of engine, causes rogue program number continuous
Increase, serious threat is caused safely to cyberspace.Therefore, development of new rogue program detection technique is empty to guaranteeing network
Between safety it is significant.
The content of the invention
The purpose of the present invention is that the detection method for being directed to feature based code can not detect unknown rogue program and known evil
A kind of problem of meaning program mutation, there is provided the malware detection methods of the various testing results of fusion.To unknown rogue program and
Known malicious program has good Detection results, detection for rogue program and provides one with the fusion of existing detection method
Plant beneficial method.
To realize the object of the invention, technical scheme is as follows:A kind of rogue program inspection for merging many testing results
Survey method, comprises the following steps:
Step S1, set up string assemble:The string assemble of setting up includes collecting normal journey in computer systems
Sequence, constitutes normal procedure set Bp, collects a part of representational rogue program and constitutes rogue program collection Mp;In normal procedure
A part of common program is chosen in set Bp and constitutes normal procedure subset Bp1, a part is chosen in rogue program set Bp
Common program constitutes rogue program subset Mp1;Length is extracted in normal procedure subset Bp1 and rogue program subset Mp1 is
Len, unduplicated hexadecimal string, and be added in string assemble, the span of len is 2-20 words
Section.
Step S2, the information gain for calculating each character string, choose certain amount of character string and constitute gene pool:Calculate
The information gain value of each hexadecimal string in string assemble, is then carried out in descending order from big to small by information gain
Sequence, chooses the larger N number of hexadecimal string of information gain amount and constitutes gene pool, and N is less than or equal to more than or equal to 100
2000。
Step S3, the characteristic vector for extracting normal procedure, form normal procedure vector space:Specially built based on step S2
Vertical gene pool, characteristic vector is set up to the normal procedure in computer system, and is further built under current gene pool
The vector space of normal procedure.
Step S4, generation rogue program detector:The normal procedure vector space set up based on step S3, it is further raw
Into the rogue program detector for covering rogue program characteristic vector space.
Whether step S5, preliminary judgement program to be checked is rogue program:Carry out feature to each program to be detected to carry
Take, generate the characteristic vector of program to be detected, calculate continuous positions of r between the characteristic vector and detector of program to be detected away from
From if the r continuous positions distance judges program to be detected as rogue program more than or equal to the threshold value of setting, otherwise for normal
Program.
Step S6, judge whether program to be checked is rogue program according to fusion rule:Specially in step S1 using different
Value, the gene pool of different length is obtained by step S2, repeat step S3 to step S5 is examined again to program to be detected
Survey, obtain new testing result, and be at least repeated once, new testing result is obtained again, by multiple testing results, use and melt
Normally, the basic trust distribution that it is normal procedure or rogue program is calculated, finally judges that program to be detected is malice
Program or normal procedure.
Further, also including the step of setting up string assemble described in step 1:
Step S11, to each program p in normal procedure subset Bp1 and the sub- Mp1 of rogue program, from its start bit
Put and slide every time a byte, hexadecimal string of the length for len is extracted successively, till procedure epilogue.
Step S12, each hexadecimal string to extracting carry out judging whether it occurs in length as len's
In string assemble, rapid S13 is if so, then performed, if otherwise execution step S14.
Step S13, discarding.
Step S14, this character string is added in string assemble.
Step S15:Change the value of mrna length len, repeat to operate above, obtain the string assemble of different length, len
Span be 2 to 20.
Further, also including the step of setting up gene pool described in step 2:
Step S21, definition information gain IG (Information Gain) are:
Wherein, Ci={ Bp, Mp }, Str represent the hexadecimal string of extraction, vStr=1 represents character string in a program
Occur, otherwise vStr=0, P (vStr, C) and it is that character string value is vStrWhen ratio in set C, P (vStr) be for Str values
vStrWhen ratio in whole test set, P (C) representation program belongs to the probability of C, in length identical string assemble
Each character string, according to above formula the value of its information gain is calculated.
Step S22, information gain is arranged according to descending, choose the larger N number of hexadecimal word of information gain value
Used as gene pool, the size N span of gene pool is symbol string:100-2000.
Step S23:Operate more than repeating to different length string assemble, by the string assemble to different length
Size obtains the gene pool of different length after being cut down.
Further, step 3 is further comprising the steps of:
Step S31, to the program in each normal procedure set from the beginning of first character section, extract length for len
Hexadecimal string, slides backward every time a byte, till EOF.
Step S32, judge length for the gene pool of len in the presence or absence of the hexadecimal string extracted;In this way, then hold
Row step S34, if otherwise execution step S33.
Step S33, using default value 0, be left intact.
Step S34, the corresponding position of characteristic vector is set into 1.
Step S35, each program in normal collection of programs is repeated more than operate, then can build normal procedure and work as
Normal procedure vector space under front gene pool.
Step S36:All programs in normal collection of programs carry out feature extraction with the gene pool of different length successively,
Construct the normal procedure vector space under different length gene pool.
Further, rogue program detector is generated described in step 4 further comprising the steps of:
Step S41, at random one dimension of generation binary string equal with gene pool gene dosage, being regarded as one can
The detector of energy, calculates the detector and the continuous position distance values of each vectorial r in normal procedure vector space.
Step S42, judge that whether detector is with the vectorial continuous position distance values of r in all normal procedure vector spaces
It is no less than threshold value set in advance, if so, then execution step S44, if it is not, then execution step S43.
Step S43, discarding.
Step S44, addition detectors set.
The detector number that step S45, judgement are generated reaches predetermined value, and predetermined value is 300-5000, is if so, then held
Row step S46, if it is not, then execution step S41 continues to generate detector.
Step S46, to different length gene pool generate normal procedure vector space, repeat more than operate, produce with not
The corresponding detector in homogenic storehouse.
Further, whether the program to be checked of preliminary judgement described in step 5 is that rogue program is comprised the following steps:
Step S51:To each program to be detected from the beginning of first character section, hexadecimal word of the length for len is extracted
Symbol string, a byte of sliding every time, till EOF.
Step S52, with the presence or absence of the hexadecimal string extracted in judging length for len gene pools;If showing to carry
The hexadecimal string for taking has been occurred in gene pool, execution step S54, if it is not, then showing the hexadecimal character for extracting
String is not appeared in gene pool, execution step S53.
Step S53:Using default value 0, it is left intact.
Step S54:The corresponding position of characteristic vector is set into 1, the characteristic vector of program to be detected is obtained with this.
Step S55:It is advance whether the characteristic vector and the continuous distance values of the r of all detectors for judging program to be detected is less than
The threshold value of setting;In this way, then program to be detected is normal procedure, otherwise, shows that the program is recognized by rogue program detector, can
Tentatively to judge program to be detected as rogue program.
Further, judge whether program to be checked is that rogue program is further comprising the steps of by fusion rule described in step 6:
Step S61:It is { B, M } to take framework of identification Θ, and B is normal procedure, and M is rogue program, is hadDefine base
This trust partition function m:P({B,M})→[0,1],M (B)+m (M)=1.Wherein, m (M) is represented and is supported rogue program
Basic trust distribution, m (B) represent support normal procedure basic trust distribution.
Step S62:According to fusion formulaTo multiple rogue programs
The basic trust distribution of the testing result of detection method is merged, and obtains new basic trust distribution, i.e. m1...n(B) and
m1...n(M) n recognition result, is represented respectively to normal procedure, and the basic trust of rogue program distributes.
Step S63:Relatively m1...nAnd m (B)1...n(M) size of the two values, if m1...n(B) > m1...n(V), then treat
Detection program is normal procedure, is rogue program otherwise.
Further, the number of times for being detected again to program to be detected described in step 6 is 2 to 5 times, chooses different len
The number of times that the detector that value is generated is detected again to program to be detected should not very little be unfavorable for that carrying out many result fusions sentences very little
It is disconnected, it is also unsuitable too many, detection time is then increased too much, therefore the number of times for being detected again is advisable for 2 to 5 times.
Beneficial effects of the present invention are:
1st, by building gene pool, the feature space of normal procedure is then generated, and then produces covering rogue program space
Detector realizing the detection to rogue program, effectively prevent extraction and the dimension of huge feature database of rogue program condition code
Shield.
2nd, the present invention has the ability of the unknown rogue program of detection, due to the present invention by build the feature of normal procedure to
Quantity space, generates the detector for covering rogue program feature space, and then realizes the detection to rogue program, it is not necessary to extract not
Know the condition code of rogue program, therefore energy of the present invention with the good unknown rogue program of detection and known malicious program mutation
Power.
3rd, the present invention can improve the accuracy rate of rogue program detection, because the present invention is to multiple (or various inspections
Survey method) testing result carries out Comprehensive Evaluation, therefore the present invention can improve the accuracy rate of rogue program detection.
Description of the drawings
Fig. 1 is a kind of system framework figure of the malware detection methods of many testing results of fusion of the present invention.
Fig. 2 is that a kind of malware detection methods of many testing results of fusion of the present invention generate different length string assemble
Flow chart.
Fig. 3 is that a kind of malware detection methods of many testing results of fusion of the present invention are carried out greatly to different length gene pool
The flow chart of little reduction.
Fig. 4 is that a kind of malware detection methods of many testing results of fusion of the present invention build normal procedure vector space
Flow chart.
Fig. 5 is the stream that a kind of malware detection methods of many testing results of fusion of the present invention generate rogue program detector
Cheng Tu.
Fig. 6 is that a kind of malware detection methods of many testing results of fusion of the present invention are tentatively sentenced to program to be detected
Disconnected flow chart.
Fig. 7 is that a kind of malware detection methods of the present invention many testing results of fusion carry out fusion and sentence to program to be detected
Fixed flow chart.
Specific embodiment
In order to be more clearly understood from the purpose of the present invention, technical scheme and beneficial effect, below in conjunction with the accompanying drawings to this
It is bright to be described further, but not by protection scope of the present invention limit in the examples below.
As shown in Figure 1, a kind of malware detection methods of many testing results of fusion of the invention comprise the steps:
Step 1, the normal procedure in computer system is collected, constitute normal procedure set Bp, receive in a secured manner
Collect a part of representational rogue program and constitute rogue program collection Mp;Choose a part of common in normal procedure set Bp
Program constitutes normal procedure subset Bp1, a part of common program is chosen in rogue program set Bp and constitutes rogue program
Collection Mp1;It is len, unduplicated hexadecimal word that length is extracted in normal procedure subset Bp1 and rogue program subset Mp1
Symbol string, and be added in string assemble.Different length string assemble, tool can be built according to the len of different length
Body is as shown in Fig. 2 can build according to the following steps:Step S11, in normal procedure subset Bp1 and rogue program subset Mp1
Each program p, from each byte of sliding in its starting position, extracts successively hexadecimal string of the length for len,
Till procedure epilogue;Step S12, each hexadecimal string to extracting carry out judging whether it occurs in length
In spending the string assemble for len, if so, show existed in gene set, then execution step S13;If it is not, then showing gene
There is no this character string in concentration, then execution step S14;Step S13, discarding.Step S14, this character string is added into character string
In set;Step S15:Change the value of mrna length len, repeat to operate above, obtain the string assemble of different length, generally
The span of len is 2 to 20 characters;For example when extracting for the first time the value of mrna length len is 2, then it is extracted next time when
Mrna length len value be 3.
Step S2, the information gain for calculating each character string, choose certain amount of character string and constitute gene pool:Calculate
The information gain value of each hexadecimal string in string assemble, is then carried out in descending order from big to small by information gain
Sequence, 100-2000 larger hexadecimal string of information gain amount constitutes gene pool.Ignore information gain value it is little ten
Senary string is because that with the increase of len values the gene pool of the hexadecimal string composition extracted from program is also into finger
Several levels increase so that the efficiency of extraction procedure feature is reduced;Therefore, it is necessary to cut down each string assemble size.
As shown in Figure 3, it is that the step of different length string assemble size is cut down includes:Step S21:Definition information gain IG
(Information Gain) is:
Wherein, Ci={ Bp, Mp }, Str represent the hexadecimal string of extraction, vStr=1 represents character string in a program
Occur, otherwise vStr=0, P (vStr, C) and it is that character string value is vStrWhen ratio in set C, P (vStr) be for Str values
vStrWhen ratio in whole test set, P (C) representation program belongs to the probability of C;To in length identical string assemble
Each character string, according to above formula the value of its information gain is calculated;Step S22:Information gain is arranged according to descending, is selected
The larger N number of hexadecimal string of information gain value is taken as gene pool, the size N span of gene pool is:100-
2000, gene is shorter, then the big I of gene pool suitably takes larger, conversely, gene is longer, then the size of gene pool can be fitted
It is smaller when taking, with the efficiency for guaranteeing to detect;Step S23:Operate more than repeating to different length string assemble, by not
The gene pool of different length is obtained after being cut down with the size of the string assemble of length.
Step S3, the gene pool set up based on step S2, to the normal procedure in computer system feature is extracted, and sets up special
Levy vector, and further build the vector space of the normal procedure under current gene pool.For each journey that normal procedure is concentrated
Sequence builds the characteristic vector of and gene pool size same dimension, that is, feature extraction is carried out to normal procedure, all of
The characteristic vector of normal procedure builds normal program status space, as shown in figure 4, concrete step includes:Step S31:To every
Program in one normal procedure set from the beginning of first character section, extract length for len hexadecimal string, every time to
Afterwards slide a byte, till EOF;Step S32, judge length for the gene pool of len in the presence or absence of extracting
Hexadecimal string;If so, then illustrate that the hexadecimal string extracted is not appeared in gene pool, execution step S34,
If it is not, then illustrate that the hexadecimal string extracted has been occurred in gene pool, execution step S33;Step S33, employing acquiescence
Value 0, is left intact;Step S34, the corresponding position of characteristic vector is set to 1, the corresponding position of characteristic vector is 1, and representing should
Gene occurs in a program;Thus, the characteristic vector (x of program can be obtained1,x2,…,xn) in every one-dimensional numerical value xi(xi∈
{ 0,1 }, n), Jing after performance of program is extracted, the performance of program vector for being obtained is a size and gene pool gene number to i=1 ...
The equal string of binary characters of amount;Step S35, each program in normal collection of programs is repeated more than operate, then can be with structure
Build normal procedure vector space of the normal procedure under current gene pool;Step S36, to normal collection of programs in all programs
Successively feature extraction is carried out with the gene pool of different length, construct the normal procedure vector space under different length gene pool.
Step S4, the normal procedure vector space set up based on step S3, further generate cover rogue program feature to
The rogue program detector of quantity space.Specifically, generating rogue program detector can adopt step as shown in Figure 5:Step
S41, at random one dimension of generation binary string equal with gene pool gene dosage, are regarded as a possible detector,
Calculate the detector and the continuous position distance values of each vectorial r in normal procedure vector space.Step S42, judge detector
Whether whether threshold value set in advance, threshold value are less than with the vectorial continuous position distance values of r in all normal procedure vector spaces
Setting according to detection needs be configured, it is however generally that, threshold value is less, and detectability is stronger, the false negative rate of model
It is lower, but the false positive rate of model is then higher, is to obtain satisfied Detection results, and user can voluntarily set as needed
It is fixed, such as it is set as 10, if so, then illustrate that the detector is located in certain normal procedure vector space, execution step S44, if it is not,
It is respectively less than by the continuous position distances of r for showing the arbitrary normal procedure characteristic vector in the detector and normal procedure vector space
10, then illustrate that the detector is not in normal procedure space, can be used as a legal detector, then execution step
S43;Step S43, discarding;Step S44, addition detectors set;Step 45, judge generate detector number whether reach it is pre-
Fixed value, predetermined value is 300-5000, if so, then execution step S46, if it is not, then execution step S41 continues to generate detection
Device;Step S46:The normal procedure vector space generated to different length gene pool, repeats to operate above, produces and different genes
The corresponding detector in storehouse.
Step S5, feature extraction is carried out to each program to be detected, generate the characteristic vector of program to be detected, calculating is treated
The continuous position distances of r between the characteristic vector of detection program and all detectors, if the continuous position distances of one r of presence are more than or wait
In the threshold value 10 of setting, program to be detected is judged as rogue program, be otherwise normal procedure.Specifically, it is more accurately to sentence
Whether fixed program to be checked is rogue program, as shown in fig. 6, the step of detecting to program to be detected includes:Step S51:It is right
Each program to be detected extracts hexadecimal string of the length for len from the beginning of first character section, and slide every time a word
Section, till EOF.Step S52, with the presence or absence of the hexadecimal character for extracting in judging length for len gene pools
String;If showing that the hexadecimal string extracted has been occurred in gene pool, execution step S54, if it is not, then showing what is extracted
Hexadecimal string is not appeared in gene pool, execution step S53;Step S53:Using default value 0, any place is not done
Reason.Step S54:The corresponding position of characteristic vector is set into 1, the characteristic vector of program to be detected is obtained with this;Step S55:Sentence
Whether the characteristic vector of program to be detected of breaking is less than threshold value set in advance 10 with the continuous distance values of the r of all detectors;In this way,
Then program to be detected is normal procedure, otherwise, shows that the program is recognized by rogue program detector, can tentatively judge to be detected
Program is rogue program.
Step S6, judged whether program to be checked is rogue program by fusion rule:Different len values are adopted in step S1,
The gene pool of different length is obtained by step S2, repeat step S3 to step S5 is detected again to program to be detected, obtained
To new testing result, and at least it is repeated once, preferably 2 to 5 times, new retrieval result is obtained again, according to multiple detection
As a result, as shown in fig. 7, according to fusion rule, by following steps fusion judgement, step S61 are carried out:Take framework of identification Θ for B,
M }, B is normal procedure, and M is rogue program, is hadDefine basic trust partition function m:P({B,M})→[0,
1],M (B)+m (M)=1.Wherein, m (M) represents the basic trust distribution for supporting rogue program, and m (B) is represented and supported just
The basic trust distribution of Chang Chengxu;Step S62:According to fusion formula
Basic trust distribution to the testing result of multiple malware detection methods is merged, and obtains new basic trust distribution,
That is m1...nAnd m (B)1...n(M) n recognition result, is represented respectively to normal procedure, and the basic trust of rogue program distributes;Step
S63:Relatively m1...nAnd m (B)1...n(M) size of the two values, if m1...n(B) > m1...n(V), then program to be detected is just
Chang Chengxu, is rogue program otherwise.
By above-mentioned steps, technical scheme can successfully pass structure gene pool, then generate normal procedure
Feature space, and then produce and cover the detector in rogue program space to realize the detection to rogue program, effectively prevent evil
The extraction of meaning performance of program code and the maintenance of huge feature database, meanwhile, by the testing result for merging multiple rogue programs, again
Comprehensive Evaluation is carried out, the accuracy rate of rogue program detection is further improved.
The general principle and principal character and advantages of the present invention of the present invention has been shown and described above.The technology of the industry
Personnel it should be appreciated that the present invention is not restricted to the described embodiments, the simply explanation described in above-described embodiment and specification this
The principle of invention, without departing from the spirit and scope of the present invention, the present invention also has various changes and modifications, these changes
Change and improvement is both fallen within scope of the claimed invention.
Claims (8)
1. malware detection methods of many testing results of a kind of fusion, it is characterised in that comprise the following steps:
Step S1, set up string assemble:The string assemble of setting up includes collecting normal procedure, structure in computer systems
Into normal procedure set Bp, collect a part of representational rogue program and constitute rogue program collection Mp;In normal procedure set
A part of common program is chosen in Bp and constitutes normal procedure subset Bp1, choose a part of common in rogue program set Bp
Program constitute rogue program subset Mp1;It is len that length is extracted in normal procedure subset Bp1 and rogue program subset Mp1
, unduplicated hexadecimal string, and be added in string assemble, the span of len is 2-20 bytes;
Step S2, the information gain for calculating each character string, choose certain amount of character string and constitute gene pool:Specially calculate
The information gain value of each hexadecimal string in string assemble, is then carried out in descending order from big to small by information gain
Sequence, chooses the larger N number of hexadecimal string of information gain amount and constitutes gene pool, and N is less than or equal to more than or equal to 100
2000;
Step S3, the characteristic vector for extracting normal procedure, form normal procedure vector space:Specially set up based on step S2
Gene pool, characteristic vector is set up to the normal procedure in computer system, and is further built normal under current gene pool
The vector space of program;
Step S4, generation rogue program detector:The normal procedure vector space set up based on step S3, is further generated and is covered
The rogue program detector of lid rogue program characteristic vector space;
Whether step S5, preliminary judgement program to be checked is rogue program:Feature extraction is carried out to each program to be detected, it is raw
Into the characteristic vector of program to be detected, the continuous position distances of r between the characteristic vector and detector of program to be detected are calculated, if institute
The continuous position distances of r are stated more than or equal to the threshold value of setting, program to be detected is judged as rogue program, is otherwise normal procedure;
Step S6, judge whether program to be checked is rogue program according to fusion rule:Specially adopt different len in step S1
Value, by step S2 the gene pool of different length is obtained, and repeat step S3 to step S5 is examined again to program to be detected
Survey, obtain new testing result, and be at least repeated once, new testing result is obtained again, by multiple testing results, use and melt
Normally, the basic trust distribution that it is normal procedure or rogue program is calculated, finally judges that program to be detected is malice
Program or normal procedure.
2. malware detection methods according to claim 1, it is characterised in that set up character trail described in step 1
The step of conjunction, also includes:
Step S11, to each program p in normal procedure subset Bp1 and the sub- Mp1 of rogue program, the starting position from it is every
One byte of secondary slip, extracts successively hexadecimal string of the length for len, till procedure epilogue;
Step S12, each hexadecimal string to extracting carry out judging whether it occurs in character of the length as len
In set of strings, rapid S13 is if so, then performed, if otherwise execution step S14;
Step S13, discarding;
Step S14, this character string is added in string assemble;
Step S15:Change the value of mrna length len, repeat to operate above, obtain the string assemble of different length, len's takes
Value scope is 2 to 20.
3. malware detection methods according to claim 2, it is characterised in that set up gene pool described in step 2
Step also includes:
Step S21, definition information gain IG (Information Gain) are:
Wherein, Ci={ Bp, Mp }, Str represent the hexadecimal string of extraction, vStr=1 expression character string occurs in a program,
Otherwise vStr=0, P (vStr, C) and it is that character string value is vStrWhen ratio in set C, P (vStr) it is that Str values are vStrWhen
Ratio in whole test set, P (C) representation program belongs to the probability of C, to each in length identical string assemble
Character string, according to above formula the value of its information gain is calculated;
Step S22, information gain is arranged according to descending, choose the larger N number of hexadecimal string of information gain value
Used as gene pool, the size N span of gene pool is:100-2000;
Step S23:Operate more than repeating to different length string assemble, by the size of the string assemble to different length
The gene pool of different length is obtained after being cut down.
4. malware detection methods according to claim 3, it is characterised in that step 3 is further comprising the steps of:
Step S31, to the program in each normal procedure set from the beginning of first character section, extract length for len 16
System character string, slides backward every time a byte, till EOF;
Step S32, judge length for the gene pool of len in the presence or absence of the hexadecimal string extracted;In this way, then step is performed
Rapid S34, if otherwise execution step S33;
Step S33, using default value 0, be left intact;
Step S34, the corresponding position of characteristic vector is set into 1;
Step S35, each program in normal collection of programs is repeated more than operate, then can build normal procedure in current base
Normal procedure vector space under Yin Ku;
Step S36:All programs in normal collection of programs carry out feature extraction with the gene pool of different length successively, build
The normal procedure vector space gone out under different length gene pool.
5. malware detection methods according to claim 4, it is characterised in that rogue program inspection is generated described in step 4
Survey device further comprising the steps of:
Step S41, generate the dimension binary string equal with gene pool gene dosage at random, be regarded as one it is possible
Detector, calculates the detector and the continuous position distance values of each vectorial r in normal procedure vector space;
Step S42, judge whether little with the vectorial continuous position distance values of r in all normal procedure vector spaces whether detector
In threshold value set in advance, if so, then execution step S44, if it is not, then execution step S43;
Step S43, discarding;
Step S44, addition detectors set;
The detector number that step S45, judgement are generated reaches predetermined value, and predetermined value scope is 300-5000, is if so, then held
Row step S46, if it is not, then execution step S41 continues to generate detector;
Step S46, the normal procedure vector space generated to different length gene pool, repeat to operate above, produce and different bases
Because of the corresponding detector in storehouse.
6. malware detection methods according to claim 5, it is characterised in that the rogue program described in step 5 judges
Step is further comprising the steps of:
Step S51:To each program to be detected from the beginning of first character section, hexadecimal string of the length for len is extracted,
Each byte of sliding, till EOF;
Step S52, with the presence or absence of the hexadecimal string extracted in judging length for len gene pools;If showing what is extracted
Hexadecimal string has been occurred in gene pool, execution step S54, if it is not, then showing that the hexadecimal string extracted does not have
Occur in gene pool, execution step S53;
Step S53:Using default value 0, it is left intact;
Step S54:The corresponding position of characteristic vector is set into 1, the characteristic vector of program to be detected is obtained with this;
Step S55:The characteristic vector for judging program to be detected presets with whether the continuous distance values of the r of all detectors are less than
Threshold value;In this way, then program to be detected is normal procedure, otherwise, shows that the program is recognized by rogue program detector, Ke Yichu
Step judges program to be detected for rogue program.
7. malware detection methods according to claim 6, it is characterised in that sentenced according to fusion rule described in step 6
Whether program to be checked of breaking is the step of be rogue program also comprising the following steps:
Step S61:It is { B, M } to take framework of identification Θ, and B is normal procedure, and M is rogue program, is hadThe basic letter of definition
Appoint partition function m:P({B,M})→[0,1],M (B)+m (M)=1, wherein, m (M) represents the base for supporting rogue program
This trust is distributed, and m (B) represents the basic trust distribution for supporting normal procedure;
Step S62:According to fusion formulaMultiple rogue programs are detected
The basic trust distribution of the testing result of method is merged, and obtains new basic trust distribution, i.e. m1...nAnd m (B)1...n
(M) n recognition result, is represented respectively to normal procedure, and the basic trust of rogue program distributes;
Step S63:Relatively m1...nAnd m (B)1...n(M) size of the two values, if m1...n(B) > m1...n(V) it is, then to be detected
Program is normal procedure, is rogue program otherwise.
8. malware detection methods according to claim 6, it is characterised in that described in step 6 to program to be detected again
The number of times for being detected is 2 to 5 times.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610909053.4A CN106650440A (en) | 2016-10-18 | 2016-10-18 | Malicious program detection method integrating multiple detection results |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610909053.4A CN106650440A (en) | 2016-10-18 | 2016-10-18 | Malicious program detection method integrating multiple detection results |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106650440A true CN106650440A (en) | 2017-05-10 |
Family
ID=58856629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610909053.4A Pending CN106650440A (en) | 2016-10-18 | 2016-10-18 | Malicious program detection method integrating multiple detection results |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650440A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414277A (en) * | 2018-04-27 | 2019-11-05 | 北京大学 | Gate leve hardware Trojan horse detection method based on more characteristic parameters |
CN112789831A (en) * | 2018-11-21 | 2021-05-11 | 松下电器(美国)知识产权公司 | Abnormality detection method and abnormality detection device |
-
2016
- 2016-10-18 CN CN201610909053.4A patent/CN106650440A/en active Pending
Non-Patent Citations (1)
Title |
---|
覃仁超等: "基于免疫和 D-S 证据理论的计算机病毒检测方法", 《计算机应用研究》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414277A (en) * | 2018-04-27 | 2019-11-05 | 北京大学 | Gate leve hardware Trojan horse detection method based on more characteristic parameters |
CN110414277B (en) * | 2018-04-27 | 2021-08-03 | 北京大学 | Gate-level hardware Trojan horse detection method based on multi-feature parameters |
CN112789831A (en) * | 2018-11-21 | 2021-05-11 | 松下电器(美国)知识产权公司 | Abnormality detection method and abnormality detection device |
CN112789831B (en) * | 2018-11-21 | 2023-05-02 | 松下电器(美国)知识产权公司 | Abnormality detection method and abnormality detection device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103177215B (en) | Based on the computer malware new detecting method of software control stream feature | |
CN110826059A (en) | Method and device for defending black box attack facing malicious software image format detection model | |
CN105956180B (en) | A kind of filtering sensitive words method | |
CN107241352A (en) | A kind of net security accident classificaiton and Forecasting Methodology and system | |
CN107360152A (en) | A kind of Web based on semantic analysis threatens sensory perceptual system | |
EP2415229A1 (en) | Method and system for alert classification in a computer network | |
CN110933083B (en) | Vulnerability grade evaluation device and method based on word segmentation and attack matching | |
CN110765459A (en) | Malicious script detection method and device and storage medium | |
CN112492059A (en) | DGA domain name detection model training method, DGA domain name detection device and storage medium | |
CN107181726A (en) | Cyberthreat case evaluating method and device | |
CN105072214A (en) | C&C domain name identification method based on domain name feature | |
CN111488590A (en) | SQ L injection detection method based on user behavior credibility analysis | |
CN105046152A (en) | Function call graph fingerprint based malicious software detection method | |
CN112685738B (en) | Malicious confusion script static detection method based on multi-stage voting mechanism | |
CN112866292B (en) | Attack behavior prediction method and device for multi-sample combination attack | |
CN115987615A (en) | Network behavior safety early warning method and system | |
CN114039758A (en) | Network security threat identification method based on event detection mode | |
CN108491717A (en) | A kind of xss systems of defense and its implementation based on machine learning | |
CN106650440A (en) | Malicious program detection method integrating multiple detection results | |
CN106650449B (en) | Script heuristic detection method and system based on variable name confusion degree | |
Harbola et al. | Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set | |
CN106874762A (en) | Android malicious code detecting method based on API dependence graphs | |
CN112257076B (en) | Vulnerability detection method based on random detection algorithm and information aggregation | |
CN103455754A (en) | Regular expression-based malicious search keyword recognition method | |
CN113886832A (en) | Intelligent contract vulnerability detection method, system, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |