CN106650440A - Malicious program detection method integrating multiple detection results - Google Patents

Malicious program detection method integrating multiple detection results Download PDF

Info

Publication number
CN106650440A
CN106650440A CN201610909053.4A CN201610909053A CN106650440A CN 106650440 A CN106650440 A CN 106650440A CN 201610909053 A CN201610909053 A CN 201610909053A CN 106650440 A CN106650440 A CN 106650440A
Authority
CN
China
Prior art keywords
program
normal procedure
string
detected
gene pool
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610909053.4A
Other languages
Chinese (zh)
Inventor
覃仁超
曾金全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN201610909053.4A priority Critical patent/CN106650440A/en
Publication of CN106650440A publication Critical patent/CN106650440A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Abstract

The invention discloses a malicious program detection method integrating multiple detection results. The method includes the steps of establishing a character string set, establishing a gene pool, constructing feature vectors, generating a malicious program detector, judging whether a to-be-detected program is a malicious program or not, and comprehensively judging whether the to-be-detected program is a malicious program or not according to multiple detection results. According to the method, malicious program detection is achieved by establishing the gene pool, then generating a normal program feature space and then generating the detector covering a malicious program space, so that extraction of malicious program feature codes and maintenance of a huge feature library are effectively avoided; meanwhile, the accuracy rate of malicious program detection is further increased by integrating multiple malicious program detection results and conducting comprehensive judgment again. The method has a good detection effect on unknown malicious programs and known malicious programs, and a beneficial method is provided for detection of malicious programs and integration of existing detection methods.

Description

A kind of malware detection methods for merging many testing results
Technical field
The invention belongs to information security field, and in particular to a kind of malware detection methods of many testing results of fusion.
Background technology
With the continuous popularization and development of the Internet, applications, emerged in large numbers a large amount of network services, make cyberspace into It is the 5th territory behind land, sea, air, day.Cyberspace also becomes safely the much-talked-about topic of global common concern, by wide General concern.In disparate networks security incident, the rogue program with computer virus as representative has become menace network and letter One of principal element of breath safety.Rogue program typically refers to the one section of program write with attack intension, mainly includes:Meter Calculation machine virus, trapdoor, logic bomb, Trojan Horse, worm etc..In a sense, 21 century be computer virus with The anti-virus fierce epoch contended, and intellectuality, hommization, hiddenization, diversified also it is being increasingly becoming new century computeritis The development trend of poison.
In general, malicious code detecting method can be divided into two kinds:One kind is static detection method, and another kind is dynamic Detection method.
Static detection method mainly includes:Detection method and inspirational education method of feature based code etc..
Dynamic analysing method mainly includes:System monitoring method and dynamic tracing etc..
The detection method of feature based code is a kind of wide variety of method in existing commercial anti-virus software, almost existing All anti-viral softwares all have this function.First, known viruse sample is gathered, extracts its condition code and included disease Malicious database.During detection, open and be detected file, search for hereof, whether check in file containing in virus database Virus pattern code.If it find that virus signature, due to condition code and virus one-to-one corresponding, just it can be concluded that being looked into file In with which kind of virus.But this method can not detect unknown virus;The feature code of known viruse is collected, charge costs are big; The species of malicious code, quantity are more and more, and the maintenance of feature database also becomes more and more difficult and excessive condition code can shadow Ring detection efficiency.Trigger-initiated scanning technology is exactly in fact that the one kind to feature based code detection method is improved, and this method is not only The malicious code that oneself knows can be detected, moreover it is possible to identify some mutation, deformation and unknown malicious code.But the method is still The extraction of too busy to get away condition code.
System monitoring method-as be malicious code to be operated in one controllable environment, by comparison system some mark Property the change that occurs before and after malicious code is performed of status information determining the function and purpose of malicious code.System monitoring side Method can be with monitoring programme to system resource every operation, real-time detection does well change, it is possible to quickly finding known With unknown malicious code.Dynamic behaviour when dynamic tracing refers to that monitor in real time malicious code runs, so as to analyze malice The function and purpose of code.This detection method can effectively analyze the behavior of malicious code, but its technology realized is difficult Degree is than larger.Additionally, the high susceptibility of dynamic analysing method may produce false alarm, can not recognize Virus Name, it is unfavorable for The removing of infected file virus.Existing antivirus software all has the function of behavior monitoring mostly, but often due to configuration is improper And perform practically no function.
The rogue program of cybertimes is more concealed, and degree of hiding is higher, and spread scope is wider, and the loss for bringing is more tight Weight.Plus the prevalence of Code Obfuscation Security Technology, rogue program automatically generates the wide-scale distribution of engine, causes rogue program number continuous Increase, serious threat is caused safely to cyberspace.Therefore, development of new rogue program detection technique is empty to guaranteeing network Between safety it is significant.
The content of the invention
The purpose of the present invention is that the detection method for being directed to feature based code can not detect unknown rogue program and known evil A kind of problem of meaning program mutation, there is provided the malware detection methods of the various testing results of fusion.To unknown rogue program and Known malicious program has good Detection results, detection for rogue program and provides one with the fusion of existing detection method Plant beneficial method.
To realize the object of the invention, technical scheme is as follows:A kind of rogue program inspection for merging many testing results Survey method, comprises the following steps:
Step S1, set up string assemble:The string assemble of setting up includes collecting normal journey in computer systems Sequence, constitutes normal procedure set Bp, collects a part of representational rogue program and constitutes rogue program collection Mp;In normal procedure A part of common program is chosen in set Bp and constitutes normal procedure subset Bp1, a part is chosen in rogue program set Bp Common program constitutes rogue program subset Mp1;Length is extracted in normal procedure subset Bp1 and rogue program subset Mp1 is Len, unduplicated hexadecimal string, and be added in string assemble, the span of len is 2-20 words Section.
Step S2, the information gain for calculating each character string, choose certain amount of character string and constitute gene pool:Calculate The information gain value of each hexadecimal string in string assemble, is then carried out in descending order from big to small by information gain Sequence, chooses the larger N number of hexadecimal string of information gain amount and constitutes gene pool, and N is less than or equal to more than or equal to 100 2000。
Step S3, the characteristic vector for extracting normal procedure, form normal procedure vector space:Specially built based on step S2 Vertical gene pool, characteristic vector is set up to the normal procedure in computer system, and is further built under current gene pool The vector space of normal procedure.
Step S4, generation rogue program detector:The normal procedure vector space set up based on step S3, it is further raw Into the rogue program detector for covering rogue program characteristic vector space.
Whether step S5, preliminary judgement program to be checked is rogue program:Carry out feature to each program to be detected to carry Take, generate the characteristic vector of program to be detected, calculate continuous positions of r between the characteristic vector and detector of program to be detected away from From if the r continuous positions distance judges program to be detected as rogue program more than or equal to the threshold value of setting, otherwise for normal Program.
Step S6, judge whether program to be checked is rogue program according to fusion rule:Specially in step S1 using different Value, the gene pool of different length is obtained by step S2, repeat step S3 to step S5 is examined again to program to be detected Survey, obtain new testing result, and be at least repeated once, new testing result is obtained again, by multiple testing results, use and melt Normally, the basic trust distribution that it is normal procedure or rogue program is calculated, finally judges that program to be detected is malice Program or normal procedure.
Further, also including the step of setting up string assemble described in step 1:
Step S11, to each program p in normal procedure subset Bp1 and the sub- Mp1 of rogue program, from its start bit Put and slide every time a byte, hexadecimal string of the length for len is extracted successively, till procedure epilogue.
Step S12, each hexadecimal string to extracting carry out judging whether it occurs in length as len's In string assemble, rapid S13 is if so, then performed, if otherwise execution step S14.
Step S13, discarding.
Step S14, this character string is added in string assemble.
Step S15:Change the value of mrna length len, repeat to operate above, obtain the string assemble of different length, len Span be 2 to 20.
Further, also including the step of setting up gene pool described in step 2:
Step S21, definition information gain IG (Information Gain) are:
Wherein, Ci={ Bp, Mp }, Str represent the hexadecimal string of extraction, vStr=1 represents character string in a program Occur, otherwise vStr=0, P (vStr, C) and it is that character string value is vStrWhen ratio in set C, P (vStr) be for Str values vStrWhen ratio in whole test set, P (C) representation program belongs to the probability of C, in length identical string assemble Each character string, according to above formula the value of its information gain is calculated.
Step S22, information gain is arranged according to descending, choose the larger N number of hexadecimal word of information gain value Used as gene pool, the size N span of gene pool is symbol string:100-2000.
Step S23:Operate more than repeating to different length string assemble, by the string assemble to different length Size obtains the gene pool of different length after being cut down.
Further, step 3 is further comprising the steps of:
Step S31, to the program in each normal procedure set from the beginning of first character section, extract length for len Hexadecimal string, slides backward every time a byte, till EOF.
Step S32, judge length for the gene pool of len in the presence or absence of the hexadecimal string extracted;In this way, then hold Row step S34, if otherwise execution step S33.
Step S33, using default value 0, be left intact.
Step S34, the corresponding position of characteristic vector is set into 1.
Step S35, each program in normal collection of programs is repeated more than operate, then can build normal procedure and work as Normal procedure vector space under front gene pool.
Step S36:All programs in normal collection of programs carry out feature extraction with the gene pool of different length successively, Construct the normal procedure vector space under different length gene pool.
Further, rogue program detector is generated described in step 4 further comprising the steps of:
Step S41, at random one dimension of generation binary string equal with gene pool gene dosage, being regarded as one can The detector of energy, calculates the detector and the continuous position distance values of each vectorial r in normal procedure vector space.
Step S42, judge that whether detector is with the vectorial continuous position distance values of r in all normal procedure vector spaces It is no less than threshold value set in advance, if so, then execution step S44, if it is not, then execution step S43.
Step S43, discarding.
Step S44, addition detectors set.
The detector number that step S45, judgement are generated reaches predetermined value, and predetermined value is 300-5000, is if so, then held Row step S46, if it is not, then execution step S41 continues to generate detector.
Step S46, to different length gene pool generate normal procedure vector space, repeat more than operate, produce with not The corresponding detector in homogenic storehouse.
Further, whether the program to be checked of preliminary judgement described in step 5 is that rogue program is comprised the following steps:
Step S51:To each program to be detected from the beginning of first character section, hexadecimal word of the length for len is extracted Symbol string, a byte of sliding every time, till EOF.
Step S52, with the presence or absence of the hexadecimal string extracted in judging length for len gene pools;If showing to carry The hexadecimal string for taking has been occurred in gene pool, execution step S54, if it is not, then showing the hexadecimal character for extracting String is not appeared in gene pool, execution step S53.
Step S53:Using default value 0, it is left intact.
Step S54:The corresponding position of characteristic vector is set into 1, the characteristic vector of program to be detected is obtained with this.
Step S55:It is advance whether the characteristic vector and the continuous distance values of the r of all detectors for judging program to be detected is less than The threshold value of setting;In this way, then program to be detected is normal procedure, otherwise, shows that the program is recognized by rogue program detector, can Tentatively to judge program to be detected as rogue program.
Further, judge whether program to be checked is that rogue program is further comprising the steps of by fusion rule described in step 6:
Step S61:It is { B, M } to take framework of identification Θ, and B is normal procedure, and M is rogue program, is hadDefine base This trust partition function m:P({B,M})→[0,1],M (B)+m (M)=1.Wherein, m (M) is represented and is supported rogue program Basic trust distribution, m (B) represent support normal procedure basic trust distribution.
Step S62:According to fusion formulaTo multiple rogue programs The basic trust distribution of the testing result of detection method is merged, and obtains new basic trust distribution, i.e. m1...n(B) and m1...n(M) n recognition result, is represented respectively to normal procedure, and the basic trust of rogue program distributes.
Step S63:Relatively m1...nAnd m (B)1...n(M) size of the two values, if m1...n(B) > m1...n(V), then treat Detection program is normal procedure, is rogue program otherwise.
Further, the number of times for being detected again to program to be detected described in step 6 is 2 to 5 times, chooses different len The number of times that the detector that value is generated is detected again to program to be detected should not very little be unfavorable for that carrying out many result fusions sentences very little It is disconnected, it is also unsuitable too many, detection time is then increased too much, therefore the number of times for being detected again is advisable for 2 to 5 times.
Beneficial effects of the present invention are:
1st, by building gene pool, the feature space of normal procedure is then generated, and then produces covering rogue program space Detector realizing the detection to rogue program, effectively prevent extraction and the dimension of huge feature database of rogue program condition code Shield.
2nd, the present invention has the ability of the unknown rogue program of detection, due to the present invention by build the feature of normal procedure to Quantity space, generates the detector for covering rogue program feature space, and then realizes the detection to rogue program, it is not necessary to extract not Know the condition code of rogue program, therefore energy of the present invention with the good unknown rogue program of detection and known malicious program mutation Power.
3rd, the present invention can improve the accuracy rate of rogue program detection, because the present invention is to multiple (or various inspections Survey method) testing result carries out Comprehensive Evaluation, therefore the present invention can improve the accuracy rate of rogue program detection.
Description of the drawings
Fig. 1 is a kind of system framework figure of the malware detection methods of many testing results of fusion of the present invention.
Fig. 2 is that a kind of malware detection methods of many testing results of fusion of the present invention generate different length string assemble Flow chart.
Fig. 3 is that a kind of malware detection methods of many testing results of fusion of the present invention are carried out greatly to different length gene pool The flow chart of little reduction.
Fig. 4 is that a kind of malware detection methods of many testing results of fusion of the present invention build normal procedure vector space Flow chart.
Fig. 5 is the stream that a kind of malware detection methods of many testing results of fusion of the present invention generate rogue program detector Cheng Tu.
Fig. 6 is that a kind of malware detection methods of many testing results of fusion of the present invention are tentatively sentenced to program to be detected Disconnected flow chart.
Fig. 7 is that a kind of malware detection methods of the present invention many testing results of fusion carry out fusion and sentence to program to be detected Fixed flow chart.
Specific embodiment
In order to be more clearly understood from the purpose of the present invention, technical scheme and beneficial effect, below in conjunction with the accompanying drawings to this It is bright to be described further, but not by protection scope of the present invention limit in the examples below.
As shown in Figure 1, a kind of malware detection methods of many testing results of fusion of the invention comprise the steps:
Step 1, the normal procedure in computer system is collected, constitute normal procedure set Bp, receive in a secured manner Collect a part of representational rogue program and constitute rogue program collection Mp;Choose a part of common in normal procedure set Bp Program constitutes normal procedure subset Bp1, a part of common program is chosen in rogue program set Bp and constitutes rogue program Collection Mp1;It is len, unduplicated hexadecimal word that length is extracted in normal procedure subset Bp1 and rogue program subset Mp1 Symbol string, and be added in string assemble.Different length string assemble, tool can be built according to the len of different length Body is as shown in Fig. 2 can build according to the following steps:Step S11, in normal procedure subset Bp1 and rogue program subset Mp1 Each program p, from each byte of sliding in its starting position, extracts successively hexadecimal string of the length for len, Till procedure epilogue;Step S12, each hexadecimal string to extracting carry out judging whether it occurs in length In spending the string assemble for len, if so, show existed in gene set, then execution step S13;If it is not, then showing gene There is no this character string in concentration, then execution step S14;Step S13, discarding.Step S14, this character string is added into character string In set;Step S15:Change the value of mrna length len, repeat to operate above, obtain the string assemble of different length, generally The span of len is 2 to 20 characters;For example when extracting for the first time the value of mrna length len is 2, then it is extracted next time when Mrna length len value be 3.
Step S2, the information gain for calculating each character string, choose certain amount of character string and constitute gene pool:Calculate The information gain value of each hexadecimal string in string assemble, is then carried out in descending order from big to small by information gain Sequence, 100-2000 larger hexadecimal string of information gain amount constitutes gene pool.Ignore information gain value it is little ten Senary string is because that with the increase of len values the gene pool of the hexadecimal string composition extracted from program is also into finger Several levels increase so that the efficiency of extraction procedure feature is reduced;Therefore, it is necessary to cut down each string assemble size. As shown in Figure 3, it is that the step of different length string assemble size is cut down includes:Step S21:Definition information gain IG (Information Gain) is:
Wherein, Ci={ Bp, Mp }, Str represent the hexadecimal string of extraction, vStr=1 represents character string in a program Occur, otherwise vStr=0, P (vStr, C) and it is that character string value is vStrWhen ratio in set C, P (vStr) be for Str values vStrWhen ratio in whole test set, P (C) representation program belongs to the probability of C;To in length identical string assemble Each character string, according to above formula the value of its information gain is calculated;Step S22:Information gain is arranged according to descending, is selected The larger N number of hexadecimal string of information gain value is taken as gene pool, the size N span of gene pool is:100- 2000, gene is shorter, then the big I of gene pool suitably takes larger, conversely, gene is longer, then the size of gene pool can be fitted It is smaller when taking, with the efficiency for guaranteeing to detect;Step S23:Operate more than repeating to different length string assemble, by not The gene pool of different length is obtained after being cut down with the size of the string assemble of length.
Step S3, the gene pool set up based on step S2, to the normal procedure in computer system feature is extracted, and sets up special Levy vector, and further build the vector space of the normal procedure under current gene pool.For each journey that normal procedure is concentrated Sequence builds the characteristic vector of and gene pool size same dimension, that is, feature extraction is carried out to normal procedure, all of The characteristic vector of normal procedure builds normal program status space, as shown in figure 4, concrete step includes:Step S31:To every Program in one normal procedure set from the beginning of first character section, extract length for len hexadecimal string, every time to Afterwards slide a byte, till EOF;Step S32, judge length for the gene pool of len in the presence or absence of extracting Hexadecimal string;If so, then illustrate that the hexadecimal string extracted is not appeared in gene pool, execution step S34, If it is not, then illustrate that the hexadecimal string extracted has been occurred in gene pool, execution step S33;Step S33, employing acquiescence Value 0, is left intact;Step S34, the corresponding position of characteristic vector is set to 1, the corresponding position of characteristic vector is 1, and representing should Gene occurs in a program;Thus, the characteristic vector (x of program can be obtained1,x2,…,xn) in every one-dimensional numerical value xi(xi∈ { 0,1 }, n), Jing after performance of program is extracted, the performance of program vector for being obtained is a size and gene pool gene number to i=1 ... The equal string of binary characters of amount;Step S35, each program in normal collection of programs is repeated more than operate, then can be with structure Build normal procedure vector space of the normal procedure under current gene pool;Step S36, to normal collection of programs in all programs Successively feature extraction is carried out with the gene pool of different length, construct the normal procedure vector space under different length gene pool.
Step S4, the normal procedure vector space set up based on step S3, further generate cover rogue program feature to The rogue program detector of quantity space.Specifically, generating rogue program detector can adopt step as shown in Figure 5:Step S41, at random one dimension of generation binary string equal with gene pool gene dosage, are regarded as a possible detector, Calculate the detector and the continuous position distance values of each vectorial r in normal procedure vector space.Step S42, judge detector Whether whether threshold value set in advance, threshold value are less than with the vectorial continuous position distance values of r in all normal procedure vector spaces Setting according to detection needs be configured, it is however generally that, threshold value is less, and detectability is stronger, the false negative rate of model It is lower, but the false positive rate of model is then higher, is to obtain satisfied Detection results, and user can voluntarily set as needed It is fixed, such as it is set as 10, if so, then illustrate that the detector is located in certain normal procedure vector space, execution step S44, if it is not, It is respectively less than by the continuous position distances of r for showing the arbitrary normal procedure characteristic vector in the detector and normal procedure vector space 10, then illustrate that the detector is not in normal procedure space, can be used as a legal detector, then execution step S43;Step S43, discarding;Step S44, addition detectors set;Step 45, judge generate detector number whether reach it is pre- Fixed value, predetermined value is 300-5000, if so, then execution step S46, if it is not, then execution step S41 continues to generate detection Device;Step S46:The normal procedure vector space generated to different length gene pool, repeats to operate above, produces and different genes The corresponding detector in storehouse.
Step S5, feature extraction is carried out to each program to be detected, generate the characteristic vector of program to be detected, calculating is treated The continuous position distances of r between the characteristic vector of detection program and all detectors, if the continuous position distances of one r of presence are more than or wait In the threshold value 10 of setting, program to be detected is judged as rogue program, be otherwise normal procedure.Specifically, it is more accurately to sentence Whether fixed program to be checked is rogue program, as shown in fig. 6, the step of detecting to program to be detected includes:Step S51:It is right Each program to be detected extracts hexadecimal string of the length for len from the beginning of first character section, and slide every time a word Section, till EOF.Step S52, with the presence or absence of the hexadecimal character for extracting in judging length for len gene pools String;If showing that the hexadecimal string extracted has been occurred in gene pool, execution step S54, if it is not, then showing what is extracted Hexadecimal string is not appeared in gene pool, execution step S53;Step S53:Using default value 0, any place is not done Reason.Step S54:The corresponding position of characteristic vector is set into 1, the characteristic vector of program to be detected is obtained with this;Step S55:Sentence Whether the characteristic vector of program to be detected of breaking is less than threshold value set in advance 10 with the continuous distance values of the r of all detectors;In this way, Then program to be detected is normal procedure, otherwise, shows that the program is recognized by rogue program detector, can tentatively judge to be detected Program is rogue program.
Step S6, judged whether program to be checked is rogue program by fusion rule:Different len values are adopted in step S1, The gene pool of different length is obtained by step S2, repeat step S3 to step S5 is detected again to program to be detected, obtained To new testing result, and at least it is repeated once, preferably 2 to 5 times, new retrieval result is obtained again, according to multiple detection As a result, as shown in fig. 7, according to fusion rule, by following steps fusion judgement, step S61 are carried out:Take framework of identification Θ for B, M }, B is normal procedure, and M is rogue program, is hadDefine basic trust partition function m:P({B,M})→[0, 1],M (B)+m (M)=1.Wherein, m (M) represents the basic trust distribution for supporting rogue program, and m (B) is represented and supported just The basic trust distribution of Chang Chengxu;Step S62:According to fusion formula Basic trust distribution to the testing result of multiple malware detection methods is merged, and obtains new basic trust distribution, That is m1...nAnd m (B)1...n(M) n recognition result, is represented respectively to normal procedure, and the basic trust of rogue program distributes;Step S63:Relatively m1...nAnd m (B)1...n(M) size of the two values, if m1...n(B) > m1...n(V), then program to be detected is just Chang Chengxu, is rogue program otherwise.
By above-mentioned steps, technical scheme can successfully pass structure gene pool, then generate normal procedure Feature space, and then produce and cover the detector in rogue program space to realize the detection to rogue program, effectively prevent evil The extraction of meaning performance of program code and the maintenance of huge feature database, meanwhile, by the testing result for merging multiple rogue programs, again Comprehensive Evaluation is carried out, the accuracy rate of rogue program detection is further improved.
The general principle and principal character and advantages of the present invention of the present invention has been shown and described above.The technology of the industry Personnel it should be appreciated that the present invention is not restricted to the described embodiments, the simply explanation described in above-described embodiment and specification this The principle of invention, without departing from the spirit and scope of the present invention, the present invention also has various changes and modifications, these changes Change and improvement is both fallen within scope of the claimed invention.

Claims (8)

1. malware detection methods of many testing results of a kind of fusion, it is characterised in that comprise the following steps:
Step S1, set up string assemble:The string assemble of setting up includes collecting normal procedure, structure in computer systems Into normal procedure set Bp, collect a part of representational rogue program and constitute rogue program collection Mp;In normal procedure set A part of common program is chosen in Bp and constitutes normal procedure subset Bp1, choose a part of common in rogue program set Bp Program constitute rogue program subset Mp1;It is len that length is extracted in normal procedure subset Bp1 and rogue program subset Mp1 , unduplicated hexadecimal string, and be added in string assemble, the span of len is 2-20 bytes;
Step S2, the information gain for calculating each character string, choose certain amount of character string and constitute gene pool:Specially calculate The information gain value of each hexadecimal string in string assemble, is then carried out in descending order from big to small by information gain Sequence, chooses the larger N number of hexadecimal string of information gain amount and constitutes gene pool, and N is less than or equal to more than or equal to 100 2000;
Step S3, the characteristic vector for extracting normal procedure, form normal procedure vector space:Specially set up based on step S2 Gene pool, characteristic vector is set up to the normal procedure in computer system, and is further built normal under current gene pool The vector space of program;
Step S4, generation rogue program detector:The normal procedure vector space set up based on step S3, is further generated and is covered The rogue program detector of lid rogue program characteristic vector space;
Whether step S5, preliminary judgement program to be checked is rogue program:Feature extraction is carried out to each program to be detected, it is raw Into the characteristic vector of program to be detected, the continuous position distances of r between the characteristic vector and detector of program to be detected are calculated, if institute The continuous position distances of r are stated more than or equal to the threshold value of setting, program to be detected is judged as rogue program, is otherwise normal procedure;
Step S6, judge whether program to be checked is rogue program according to fusion rule:Specially adopt different len in step S1 Value, by step S2 the gene pool of different length is obtained, and repeat step S3 to step S5 is examined again to program to be detected Survey, obtain new testing result, and be at least repeated once, new testing result is obtained again, by multiple testing results, use and melt Normally, the basic trust distribution that it is normal procedure or rogue program is calculated, finally judges that program to be detected is malice Program or normal procedure.
2. malware detection methods according to claim 1, it is characterised in that set up character trail described in step 1 The step of conjunction, also includes:
Step S11, to each program p in normal procedure subset Bp1 and the sub- Mp1 of rogue program, the starting position from it is every One byte of secondary slip, extracts successively hexadecimal string of the length for len, till procedure epilogue;
Step S12, each hexadecimal string to extracting carry out judging whether it occurs in character of the length as len In set of strings, rapid S13 is if so, then performed, if otherwise execution step S14;
Step S13, discarding;
Step S14, this character string is added in string assemble;
Step S15:Change the value of mrna length len, repeat to operate above, obtain the string assemble of different length, len's takes Value scope is 2 to 20.
3. malware detection methods according to claim 2, it is characterised in that set up gene pool described in step 2 Step also includes:
Step S21, definition information gain IG (Information Gain) are:
I G ( S t r ) = Σ v S t r ∈ { 0 , 1 } Σ C ∈ C i P ( v S t r , C ) log 2 P ( v S t r , C ) P ( v S t r ) P ( C )
Wherein, Ci={ Bp, Mp }, Str represent the hexadecimal string of extraction, vStr=1 expression character string occurs in a program, Otherwise vStr=0, P (vStr, C) and it is that character string value is vStrWhen ratio in set C, P (vStr) it is that Str values are vStrWhen Ratio in whole test set, P (C) representation program belongs to the probability of C, to each in length identical string assemble Character string, according to above formula the value of its information gain is calculated;
Step S22, information gain is arranged according to descending, choose the larger N number of hexadecimal string of information gain value Used as gene pool, the size N span of gene pool is:100-2000;
Step S23:Operate more than repeating to different length string assemble, by the size of the string assemble to different length The gene pool of different length is obtained after being cut down.
4. malware detection methods according to claim 3, it is characterised in that step 3 is further comprising the steps of:
Step S31, to the program in each normal procedure set from the beginning of first character section, extract length for len 16 System character string, slides backward every time a byte, till EOF;
Step S32, judge length for the gene pool of len in the presence or absence of the hexadecimal string extracted;In this way, then step is performed Rapid S34, if otherwise execution step S33;
Step S33, using default value 0, be left intact;
Step S34, the corresponding position of characteristic vector is set into 1;
Step S35, each program in normal collection of programs is repeated more than operate, then can build normal procedure in current base Normal procedure vector space under Yin Ku;
Step S36:All programs in normal collection of programs carry out feature extraction with the gene pool of different length successively, build The normal procedure vector space gone out under different length gene pool.
5. malware detection methods according to claim 4, it is characterised in that rogue program inspection is generated described in step 4 Survey device further comprising the steps of:
Step S41, generate the dimension binary string equal with gene pool gene dosage at random, be regarded as one it is possible Detector, calculates the detector and the continuous position distance values of each vectorial r in normal procedure vector space;
Step S42, judge whether little with the vectorial continuous position distance values of r in all normal procedure vector spaces whether detector In threshold value set in advance, if so, then execution step S44, if it is not, then execution step S43;
Step S43, discarding;
Step S44, addition detectors set;
The detector number that step S45, judgement are generated reaches predetermined value, and predetermined value scope is 300-5000, is if so, then held Row step S46, if it is not, then execution step S41 continues to generate detector;
Step S46, the normal procedure vector space generated to different length gene pool, repeat to operate above, produce and different bases Because of the corresponding detector in storehouse.
6. malware detection methods according to claim 5, it is characterised in that the rogue program described in step 5 judges Step is further comprising the steps of:
Step S51:To each program to be detected from the beginning of first character section, hexadecimal string of the length for len is extracted, Each byte of sliding, till EOF;
Step S52, with the presence or absence of the hexadecimal string extracted in judging length for len gene pools;If showing what is extracted Hexadecimal string has been occurred in gene pool, execution step S54, if it is not, then showing that the hexadecimal string extracted does not have Occur in gene pool, execution step S53;
Step S53:Using default value 0, it is left intact;
Step S54:The corresponding position of characteristic vector is set into 1, the characteristic vector of program to be detected is obtained with this;
Step S55:The characteristic vector for judging program to be detected presets with whether the continuous distance values of the r of all detectors are less than Threshold value;In this way, then program to be detected is normal procedure, otherwise, shows that the program is recognized by rogue program detector, Ke Yichu Step judges program to be detected for rogue program.
7. malware detection methods according to claim 6, it is characterised in that sentenced according to fusion rule described in step 6 Whether program to be checked of breaking is the step of be rogue program also comprising the following steps:
Step S61:It is { B, M } to take framework of identification Θ, and B is normal procedure, and M is rogue program, is hadThe basic letter of definition Appoint partition function m:P({B,M})→[0,1],M (B)+m (M)=1, wherein, m (M) represents the base for supporting rogue program This trust is distributed, and m (B) represents the basic trust distribution for supporting normal procedure;
Step S62:According to fusion formulaMultiple rogue programs are detected The basic trust distribution of the testing result of method is merged, and obtains new basic trust distribution, i.e. m1...nAnd m (B)1...n (M) n recognition result, is represented respectively to normal procedure, and the basic trust of rogue program distributes;
Step S63:Relatively m1...nAnd m (B)1...n(M) size of the two values, if m1...n(B) > m1...n(V) it is, then to be detected Program is normal procedure, is rogue program otherwise.
8. malware detection methods according to claim 6, it is characterised in that described in step 6 to program to be detected again The number of times for being detected is 2 to 5 times.
CN201610909053.4A 2016-10-18 2016-10-18 Malicious program detection method integrating multiple detection results Pending CN106650440A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610909053.4A CN106650440A (en) 2016-10-18 2016-10-18 Malicious program detection method integrating multiple detection results

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610909053.4A CN106650440A (en) 2016-10-18 2016-10-18 Malicious program detection method integrating multiple detection results

Publications (1)

Publication Number Publication Date
CN106650440A true CN106650440A (en) 2017-05-10

Family

ID=58856629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610909053.4A Pending CN106650440A (en) 2016-10-18 2016-10-18 Malicious program detection method integrating multiple detection results

Country Status (1)

Country Link
CN (1) CN106650440A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414277A (en) * 2018-04-27 2019-11-05 北京大学 Gate leve hardware Trojan horse detection method based on more characteristic parameters
CN112789831A (en) * 2018-11-21 2021-05-11 松下电器(美国)知识产权公司 Abnormality detection method and abnormality detection device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
覃仁超等: "基于免疫和 D-S 证据理论的计算机病毒检测方法", 《计算机应用研究》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414277A (en) * 2018-04-27 2019-11-05 北京大学 Gate leve hardware Trojan horse detection method based on more characteristic parameters
CN110414277B (en) * 2018-04-27 2021-08-03 北京大学 Gate-level hardware Trojan horse detection method based on multi-feature parameters
CN112789831A (en) * 2018-11-21 2021-05-11 松下电器(美国)知识产权公司 Abnormality detection method and abnormality detection device
CN112789831B (en) * 2018-11-21 2023-05-02 松下电器(美国)知识产权公司 Abnormality detection method and abnormality detection device

Similar Documents

Publication Publication Date Title
CN103177215B (en) Based on the computer malware new detecting method of software control stream feature
CN110826059A (en) Method and device for defending black box attack facing malicious software image format detection model
CN105956180B (en) A kind of filtering sensitive words method
CN107241352A (en) A kind of net security accident classificaiton and Forecasting Methodology and system
CN107360152A (en) A kind of Web based on semantic analysis threatens sensory perceptual system
EP2415229A1 (en) Method and system for alert classification in a computer network
CN110933083B (en) Vulnerability grade evaluation device and method based on word segmentation and attack matching
CN110765459A (en) Malicious script detection method and device and storage medium
CN112492059A (en) DGA domain name detection model training method, DGA domain name detection device and storage medium
CN107181726A (en) Cyberthreat case evaluating method and device
CN105072214A (en) C&C domain name identification method based on domain name feature
CN111488590A (en) SQ L injection detection method based on user behavior credibility analysis
CN105046152A (en) Function call graph fingerprint based malicious software detection method
CN112685738B (en) Malicious confusion script static detection method based on multi-stage voting mechanism
CN112866292B (en) Attack behavior prediction method and device for multi-sample combination attack
CN115987615A (en) Network behavior safety early warning method and system
CN114039758A (en) Network security threat identification method based on event detection mode
CN108491717A (en) A kind of xss systems of defense and its implementation based on machine learning
CN106650440A (en) Malicious program detection method integrating multiple detection results
CN106650449B (en) Script heuristic detection method and system based on variable name confusion degree
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
CN106874762A (en) Android malicious code detecting method based on API dependence graphs
CN112257076B (en) Vulnerability detection method based on random detection algorithm and information aggregation
CN103455754A (en) Regular expression-based malicious search keyword recognition method
CN113886832A (en) Intelligent contract vulnerability detection method, system, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510