CN108287996A - A kind of malicious code obscures feature cleaning method - Google Patents

A kind of malicious code obscures feature cleaning method Download PDF

Info

Publication number
CN108287996A
CN108287996A CN201810013584.4A CN201810013584A CN108287996A CN 108287996 A CN108287996 A CN 108287996A CN 201810013584 A CN201810013584 A CN 201810013584A CN 108287996 A CN108287996 A CN 108287996A
Authority
CN
China
Prior art keywords
feature
value
sample
malicious code
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810013584.4A
Other languages
Chinese (zh)
Inventor
王栎汉
宁振虎
薛菲
蔡永泉
梁鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201810013584.4A priority Critical patent/CN108287996A/en
Publication of CN108287996A publication Critical patent/CN108287996A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of malicious codes to obscure feature cleaning method, belongs to machine learning information security field.This method includes feature selection approach and obscures feature cleaning method, improves the validity of traditional malicious code feature extracting method.Compared to traditional malicious code feature extracting method, the present invention can effectively extend the effective time limit of malicious code feature extraction algorithm, and improve the anti-interference of feature extraction algorithm.The present invention passes through n gram feature extracting method construction features library first.Since what this feature extraction algorithm can not solve malicious code obscures operation, cause to obscure characteristic value containing a large amount of malicious codes in feature database.Algorithm is cleaned by obscuring feature, interference of the abnormal data to Model Identification rule can be eliminated.On this basis from the angle of training dataset scale, a kind of feature selection approach is proposed.This method effectively reduces the number of features that model finally uses on the basis of ensureing that Model Identification precision does not decline.

Description

A kind of malicious code obscures feature cleaning method
Technical field
The present invention relates to a kind of malicious codes to obscure feature minimizing technology, can improve traditional malicious code feature extraction The effective time limit of method.Belong to machine learning information security field, be related to machine learning classification algorithm and obscure feature removal and The combination and use of feature selecting algorithm.
Background technology
It is counted according to Symantec, most of emerging malicious code is all on the basis of original malicious code, by one It is generated after a little map functions.Therefore usually Malicious Code Detection is feature based vector, this feature vectorial malice generation The substantive characteristics of code.Good feature extraction algorithm is the core technology of malicious code mutation detection.Common anti-viral software Malicious code is identified usually using the method based on signature.For giving one group of malicious code sample, first by maliciously generation Code is labeled as a family.It, should feature having the same for the malicious code of the same family.By these public spies Sign extracts, construction feature library, the mutation for detecting the malicious code family.But the detection in this feature based library System, safety depend on the validity of used feature extracting method.This is because new mutation malicious code can be directed to Previous feature extracting method is interfered, and then is achieved the purpose that around detecting system.Such as based on key-strings In malicious code detection system, malicious code passes through the addition to key-strings progress equivalencing or idle character string Escape the identification of detecting system.For malicious code to obscuring operation used by feature extraction, there are many scholars to propose Various different malicious code feature extracting methods obscure operation caused by detecting system to eliminate malicious code It influences, obtains best Malicious Code Detection effect.However on the one hand these feature extracting methods can be attacked gradually by malicious code Broken, on the other hand safer feature extracting method will also result in that computing resource expense is excessive, and system real time is poor etc. and ask Topic.
The research of Malicious Code Detection technology at this stage is concentrated mainly in the extraction to malicious code feature vector.For Improve the anti-interference of malicious code feature vector.Researcher is from security attribute, dependence, true semantic multiple angles Degree carries out feature extraction to malicious code.Kirda et al. utilizes spy's Code obtaining user's sensitive data, then by leak data Behavioural characteristic be detected.But this method is only limitted to the malicious code of detection spy's class, can not detect other and not made to data At the malicious code of leakage.Wang Rui et al. is based on malicious code from the practical semantic angle of malicious code by structure Semantic behavioural characteristic figure, be detected using this feature figure calculating characteristic value to malicious code achieve it is very good Detection result.But this method is the detection method based on program behavior itself, does not account for calling of the program for resource Problem, therefore the malicious code of certain special mutation can not be identified well.And this method timeliness is poor, needs Larger computing resource, does not have practicability.
And with the quickening of malicious code iteration speed, the timeliness of feature extracting method is also shorter and shorter.Pass through replacement Feature extracting method maintains the safety of system to become more and more difficult.Therefore how effectively to eliminate in feature database and obscure Feature, which becomes, the problem of very strong practical significance.
Invention content
Effective time limit in order to solve the problems, such as traditional malicious code feature extracting method is shorter, the skill that the present invention uses Art scheme is that one kind obscuring feature cleaning algorithm and feature selection approach, and it is that one kind being directed to n-gram to obscure feature cleaning algorithm Malicious code feature extraction algorithm, malicious code obscures feature minimizing technology.This obscures feature cleaning algorithm can be a small amount of Malicious code sample is analyzed, feature cleaning is carried out to remaining malicious code in sample database.Finally obtained feature database has feature Number is stablized, the features such as being not easily susceptible to obscure.Feature selection approach can then be replaced according to training data sample set, automation Change the characteristics of feature selected in feature database reaches optimization feature database.
A kind of malicious code obscures feature cleaning method, and this method includes feature selection approach and obscures feature cleaning side Method improves the validity of traditional malicious code feature extracting method.Compared to traditional malicious code feature extracting method, this hair It is bright effectively to extend the effective time limit of malicious code feature extraction algorithm, and improve the anti-interference of feature extraction algorithm.
The present invention passes through n-gram feature extracting method construction features library first.Since this feature extraction algorithm can not solve Certainly malicious code obscures operation, causes to obscure characteristic value containing a large amount of malicious codes in feature database.It is clear by obscuring feature Algorithm is washed, interference of the abnormal data to Model Identification rule can be eliminated.On this basis from the angle of training dataset scale On, propose a kind of feature selection approach.This method effectively reduces model on the basis of ensureing that Model Identification precision does not decline The number of features finally used.
The main technological route of the present invention is as follows:
1) it is based on multisample to analyze, structure obscures feature cleaning method.This method passes through to the detailed of a small amount of sample data The characteristics of analyzing, finding to obscure feature in sample simultaneously builds linear regression algorithm model.
2) it feature cleaning method dynamic is obscured based on this calculates in remaining each sample and obscure the threshold value of characteristic value, and be based on The value carries out obscuring removing to the feature vector of remaining sample in sample database.
3) training set construction feature selection method is inputted according to sample.This method first carries out obtained feature vector Normalized, and according to input training sample number, dynamic is removed and contributes smaller characteristic value in data set.
It is of the present invention obscure feature cleaning method specific implementation steps are as follows:
1) consider that malicious code sample situation is complicated, method of obscuring used by each malicious code sample is dynamic Variation, and the feature Distribution value that different samples are extracted is also different.Therefore it for each sample, needs to move State solves the size that sample obscures value.The threshold xi that characteristic value is obscured in each malicious code sample, referred to as obscures threshold value, and ξ is sample Obscure minimum value in characteristic value in this, which is dynamic change in different samples.In order to preferably weigh and characterize The size of the value.Following two indices are defined, are characterized desired value Feature respectivelyaveragesWith characteristic standard value Featuremedian.The two indexs are as obtained from the dynamic solution to single sample, for describing in the sample Feature distribution situation.The function has reacted the relationship between threshold value and desired value and standard value:ξ=α * Featureaverages+β* Featuremedian, α and β are characterized the weight of desired value and characteristic standard value respectively.
2) feature desired value FeatureaveragesRepresent the ideal value condition of sample most original situation lower eigenvalue. By calculating the summation of each characteristic value and averaging in the sample, a characteristic value in current sample distribution is obtained Ideal values.In view of n-gram algorithms to most of malicious code sample when carrying out feature extraction, can cause in sample Containing largely only there is the invalid feature of single.Therefore feature desired value Feature is being calculatedmedianWhen by sample After each characteristic value carries out duplicate removal, then carry out averaging operation.Such processing can eliminate shadow of a large amount of noise datas to mean value It rings.M is remaining Characteristic Number after duplicate removal, featureiRepresent the characteristic value size of ith feature.
The calculating of feature desired value:
3) characteristic standard value FeaturemedianObscure interference of the characteristic value to final result, feature for reducing larger Standard value is obtained by calculating in sample the median of all characteristic values, and the preferable sample that reacts is special when undisturbed The ideal values of value indicative.Since in a malicious code sample, whole characteristic value distribution situation tends to Gaussian Profile, In feature of obscuring considerably less ratio is only accounted in its feature distribution.Although characteristic standard value can also be made by obscuring characteristic value At influence.But it is relatively low due to obscuring characteristic value proportion in feature distribution, by solving the median in being distributed Value obtains the range that rear desired characteristics value value is obscured in a very close removal.M is remaining Characteristic Number after duplicate removal, featureiRepresent the characteristic value size of ith feature.Mid functions are to solve for the median of sequence.Characteristic standard value calculates letter Number:
Featuremedian=mid (feature1, feature2..., featurem)。
When carrying out feature extraction to malicious code sample collection, obtained by preliminary using characteristic value cleaning method is obscured Feature database is obscured in removing for processing.The characteristic value of obscuring for generating larger interference in this feature library to training pattern has been cleared by, but It is if being directly based upon this feature library carries out model training, it is difficult to obtain good effect.It is concentrated and is existed due to malicious code sample The mutation malicious code of a variety of families can cause number of features in feature database excessively huge.In view of smaller in these characteristic values Feature in, remove most of noise data, also partly belong to family's feature important in malicious code.These families are special Only there is less number in sign, so if the smaller feature of characteristic value is all removed, inevitably removed part Good feature generates interference to the precision of model.In order to further obscure feature database to removing and clean, big portion is being eliminated Retain important malicious code family feature while the noise data divided.
A kind of feature selection approach based on input training dataset scale.The specific implementation technical solution of this method is such as Under:
1) due to the diversity of malicious code sample, the value range of feature vector is also different in each sample.It is right In the characteristic value of same numerical value, the significance level in different samples is different.In order to eliminate because of value range not Influence caused by when together, finally weighing feature.Method proposes a kind of normalizing operations based on accounting.For single Sample weighs each characteristic value in the sample important by calculating the ratio that each characteristic value and characteristic value are summed up in single sample Degree.featurei' represent feature after standardizationiNew value.Characteristic standard algorithm:
2) for the training characteristics library after standardization, the sum of all characteristic values of single sample are 1.Therefore total for inputting Sample number S, the sum of all characteristic values are S.While in order to eliminate noise data in single sample, it is not destroyed In certain important family's features.Method proposes one kind being based on input sample number S, the same clan of malicious code man in training set Not Shuo n feature selection approach.For obscuring after each sampling feature vectors are standardized in feature database, then to all The feature of appearance adds up, and obtains each feature summation characteristic value based on sample set.Since malicious code family feature can be It can repeat in identical family's sample, therefore this feature value can improve the size of final characteristic value after cumulative.And for Remaining noise data, since its feature only occurs in individual samples, this feature value is 0 in remaining sample. In whole sample characteristics, shared ratio can also reduce final accumulated value accordingly.For some feature FeatureiValue It is by the sum of the value of this feature in all sample files.Wherein FeatureiIt is the value of final ith feature, S is instruction Practice collection number of samples, featureiRepresent the value of current signature in each sample.
Feature Selection formula:
Description of the drawings
Fig. 1:Model general frame figure
Fig. 2:Disassembler segment
Fig. 3:Random forest tree algorithm model
Fig. 4:Malicious code sample test set
Fig. 5:F2 Experimental comparisons
Fig. 6:E1 Experimental comparisons
Specific implementation mode
The present invention is explained and illustrated with reference to relevant drawings:
To make the purpose of the present invention, technical solution and feature be more clearly understood, below in conjunction with specific embodiment, and With reference to attached drawing, further refinement explanation is carried out to the present invention.The method of the present invention general frame figure is as shown in Figure 1.Each step Process description is as follows:
(1) n-gram algorithms are based on and extract original malicious code feature, structure initial characteristics library.
(2) sample is extracted at random, carries out the research for obscuring threshold value.Training linear fit equation is for predicting unknown sample Obscure threshold value.
(3) obscured in feature cleaning method cleaning feature database based on this and obscure feature.
(4) feature database is standardized.
(5) it is based on this feature selection algorithm structure training characteristics library.
The method of the present invention running environment is as follows:
The hardware environment of operation is IBM servers (2 four cores of Intel to strong E54202.5 GHz/EM64T, 12MB L2 Caching), configuration 32G PC2-5300CL5ECC DDR2667MHz memories, 2 pieces of 500G hard disks;Operating system uses CentOS 7.064, background data base uses MongoDB 2.2.6.Malicious code mapping process, piecemeal, feature extraction, retrieval and mistake It filters journey and uses Python, correlation packet is Anaconda-1.8.0-Linux-x86_64, includes that engineering is related to The packet arrived.Wherein, MongoDB stores the relevant information of malicious code sample, such as MD5 values, file size, family's mark Note, malicious code PE block relevant informations etc..Used is the malicious code library provided by Microsoft.The library contains 9 The famous malicious code family of kind, every malicious code sample are owned by unique ID number.Each malicious code sample has only One classification value corresponds to each malicious code family respectively from 1 to 9, shares 10868 malicious code samples.Malicious code Test sample collection explanation such as Fig. 2.
Obscure feature cleaning method in the present invention, is the core methed for extending feature extracting method timeliness.Therefore it is The verification present invention obscures the validity of feature cleaning method, and the present invention selects random forest tree algorithm to obscure for detecting this The advantages of feature cleaning method.Since malicious code mutation situation is complicated, the feature database finally extracted also is difficult to completely All invalid features of removing.Therefore it needs to carry out the sampling put back to again to sample set, builds a variety of training sample sets for mould The mode of type training improves the generalization and diversity of data set.Improve anti-interference, the robustness of final detecting system.Choosing Select random forest tree algorithm and build multiple graders, final testing result in such a way that multiple graders are voted come It determines.Because what model took malicious code sample is to have the random sampling put back to, therefore even if sample size and original Data capacity is identical, the sample that still will appear repetition in sample set or miss.This sample mode for sample is made At otherness, training sample set is maximumlly utilized, improves generalization of the model to the following mutation malicious code.With Machine forest tree algorithm model such as Fig. 3.
Due to the characteristic that random forest tree is extracted at random, when carry out it is a large amount of repeat to test when, using obscure feature compared with The accuracy of detection fluctuation that more training characteristics libraries can cause its final is larger.For a Malicious Code Detection model, detection essence It is more unstable with the loophole that can be broken that degree fluctuates larger explanation model.Obscure value cleaning side to detect difference Influence of the case to final training characteristics, the present invention test the fluctuation situation of the precision between different schemes, F2, E1, E2 experimental precisions With number such as Fig. 4,5,6.Wherein F2 obscures cleaning program to be proposed by the present invention, remaining two kinds are respectively security fields research In common obscure feature sweep-out method.Obscure cleaning program such as table 1.
In order to weigh Fig. 4, each testing scheme is concentrated in different data in 5,6, the fluctuation situation of accuracy of detection.This The fluctuation situation of each testing scheme final mask precision is weighed in invention by standard deviation.Each experimental program standard deviation such as table 2, as shown in Table 2.F2 testing scheme entirety standard deviations are smaller, and with the growth of input sample collection, and standard deviation presents The decline of regularity.Compared to remaining testing scheme, F2 not only in 1000 input sample collection, just there is preferable stability. And when input number reaches 5000, F2 testing scheme model accuracy fluctuation ranges minimum is stablized the most.It is noted that Although when input sample reaches 2000, the standard deviation of E2 testing schemes is minimum.But one is to be due to this experimental calculation One observation, therefore have certain deviation.In addition E2 schemes input sample be 2000 when compared to 1000, standard deviation It substantially reduces.And when reaching 5000, standard deviation reduces standard deviation that is limited, and being less than in F2 testing schemes.Therefore the side F2 Case obscures value cleaning program for optimal.It solves to use failure characteristics extracting method because of model, causes to obscure feature in feature database Value accounting heavier the problem of being affected to model final detection result.Table 1
Testing scheme Test sample collection Feature selecting scheme Cleaning condition Obscure value cleaning method
E1 1000、2000、5000 C1 featurei> 300
E2 1000、2000、5000 C1 featurei> 500 featurei=0
F1 1000、2000、5000 C1 featurei> ξ featurei
F2 1000、2000、5000 C1 featurei> ξ
Table 2

Claims (2)

1. a kind of malicious code obscures feature cleaning method, this method includes feature selection approach and obscures feature cleaning method, Improve the validity of traditional malicious code feature extracting method;
Pass through n-gram feature extracting method construction features library first;Since this feature extraction algorithm can not solve malicious code Obscure operation, causes to obscure characteristic value containing a large amount of malicious codes in feature database;Algorithm is cleaned by obscuring feature, is eliminated different Interference of the regular data to Model Identification rule;On this basis from the angle of training dataset scale, a kind of feature choosing is proposed Selection method;This method effectively reduces the number of features that model finally uses on the basis of ensureing that Model Identification precision does not decline;
It is characterized in that:The implementing procedure of this method is as follows,
1) it is based on multisample to analyze, structure obscures feature cleaning method;This method by the detailed analysis to a small amount of sample data, It was found that the characteristics of obscuring feature in sample and building linear regression algorithm model;
2) it feature cleaning method dynamic is obscured based on this calculates and obscure the threshold value of characteristic value in remaining each sample, and be based on the value pair The feature vector of remaining sample carries out obscuring removing in sample database;
3) training set construction feature selection method is inputted according to sample;This method is first normalized obtained feature vector Processing, and according to input training sample number, dynamic is removed and contributes smaller characteristic value in data set;
Steps are as follows for specific implementation:
1) consider that malicious code sample situation is complicated, it is dynamic change to obscure method used by each malicious code sample , and the feature Distribution value that different samples are extracted is also different;Therefore for each sample, dynamic solution is needed Sample obscures the size of value;The threshold xi that characteristic value is obscured in each malicious code sample, referred to as obscures threshold value, and ξ is obscured in sample Minimum value in characteristic value, the minimum value are dynamic changes in different samples;In order to preferably weigh and characterize the big of the value It is small;Following two indices are defined, are characterized desired value Feature respectivelyaveragesWith characteristic standard value Featuremedian;This Two indices are as obtained from the dynamic solution to single sample, for describing the feature distribution situation in the sample;It should Function has reacted the relationship between threshold value and desired value and standard value:ξ=α * Featureaverages+β*Featuremedian, α and β It is characterized the weight of desired value and characteristic standard value respectively;
2) feature desired value FeatureaveragesRepresent the ideal value condition of sample most original situation lower eigenvalue;Pass through meter The summation of each characteristic value and averaging in the sample are calculated, the ideal for obtaining a characteristic value in current sample distribution takes Value;In view of n-gram algorithms to most of malicious code sample when carrying out feature extraction, can cause in sample containing a large amount of Only there is the invalid feature of single;Therefore feature desired value Feature is being calculatedmedianWhen by each characteristic value in sample After carrying out duplicate removal, then carry out averaging operation;Such processing can eliminate influence of a large amount of noise datas to mean value;M is duplicate removal Remaining Characteristic Number afterwards, featureiRepresent the characteristic value size of ith feature;
The calculating of feature desired value:
3) characteristic standard value FeaturemedianObscure interference of the characteristic value to final result, characteristic standard for reducing larger Value is obtained by calculating in sample the median of all characteristic values, and the preferable sample that reacts is when undisturbed, characteristic value Ideal values;Since in a malicious code sample, whole characteristic value distribution situation tends to Gaussian Profile, therein mixed Feature of confusing only accounts for considerably less ratio in its feature distribution;Although obscure characteristic value will also result in influence to characteristic standard value; But it is relatively low due to obscuring characteristic value proportion in feature distribution, by solving the median value in being distributed, obtain The range of rear desired characteristics value value is obscured to a very close removal;M is remaining Characteristic Number after duplicate removal, featureiGeneration The characteristic value size of table ith feature;Mid functions are to solve for the median of sequence;Characteristic standard value calculates function:
Featuremedian=mid (feature1, feature2..., featurem)。
2. a kind of malicious code according to claim 1 obscures feature cleaning method, it is characterised in that:To malicious code When sample set carries out feature extraction, obtain obscuring feature database by removing for preliminary treatment using characteristic value cleaning method is obscured;It should The characteristic value of obscuring for generating larger interference in feature database to training pattern has been cleared by, but if is directly based upon this feature library Model training is carried out, it is difficult to obtain good effect;Since malicious code sample is concentrated there are the mutation malicious code of a variety of families, Number of features in feature database can be caused excessively huge;In view of in the smaller feature of these characteristic values, removing most of noise Data also partly belong to family's feature important in malicious code;Only there is less number in these family's features, therefore If the smaller feature of characteristic value all removed, it is dry to the precision generation of model inevitably to remove the good feature in part It disturbs;For that further can obscure feature database to removing and clean, retain while eliminating most noise data important Malicious code family feature;
Realize that specific technical solution is as follows using a kind of feature selection approach based on input training dataset scale:
1) due to the diversity of malicious code sample, the value range of feature vector is also different in each sample;For same The characteristic value of one numerical value, the significance level in different samples are different;In order to eliminate because of value range difference, to most It is influenced caused by when weighing feature eventually;Method proposes a kind of normalizing operations based on accounting;For single sample, pass through The ratio for calculating each characteristic value and characteristic value sum total in single sample, weighs the significance level of each characteristic value in the sample; featurei' represent feature after standardizationiNew value;Characteristic standard algorithm:
2) for the training characteristics library after standardization, the sum of all characteristic values of single sample are 1;Therefore for inputting total number of samples S, the sum of all characteristic values are S;While in order to eliminate noise data in single sample, some of which weight is not destroyed The family's feature wanted;Method proposes one kind being based on input sample number S, the spy of malicious code family classification number n in training set Levy selection method;For obscuring after each sampling feature vectors are standardized in feature database, then the feature to being occurred It adds up, obtains each feature summation characteristic value based on sample set;Since malicious code family feature can be in identical family's sample It can repeat in this, therefore this feature value can improve the size of final characteristic value after cumulative;And for remaining noise Data, since its feature only occurs in individual samples, this feature value is 0 in remaining sample;Final accumulated value In whole sample characteristics, shared ratio can also reduce accordingly;For some feature FeatureiValue be by all samples The sum of the value of this feature in this document;Wherein FeatureiIt is the value of final ith feature, S is training set number of samples, featureiRepresent the value of current signature in each sample;
Feature Selection formula:
CN201810013584.4A 2018-01-08 2018-01-08 A kind of malicious code obscures feature cleaning method Pending CN108287996A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810013584.4A CN108287996A (en) 2018-01-08 2018-01-08 A kind of malicious code obscures feature cleaning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810013584.4A CN108287996A (en) 2018-01-08 2018-01-08 A kind of malicious code obscures feature cleaning method

Publications (1)

Publication Number Publication Date
CN108287996A true CN108287996A (en) 2018-07-17

Family

ID=62835008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810013584.4A Pending CN108287996A (en) 2018-01-08 2018-01-08 A kind of malicious code obscures feature cleaning method

Country Status (1)

Country Link
CN (1) CN108287996A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257354A (en) * 2018-09-25 2019-01-22 平安科技(深圳)有限公司 Abnormal flow analysis method and device, electronic equipment based on model tree algorithm
CN109308413A (en) * 2018-11-28 2019-02-05 杭州复杂美科技有限公司 Feature extracting method, model generating method and malicious code detecting method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150113650A1 (en) * 2010-01-27 2015-04-23 Mcafee, Inc. Method and system for proactive detection of malicious shared libraries via a remote reputation system
CN106709349A (en) * 2016-12-15 2017-05-24 中国人民解放军国防科学技术大学 Multi-dimension behavior characteristic-based malicious code classification method
CN107169358A (en) * 2017-05-24 2017-09-15 中国人民解放军信息工程大学 Code homology detection method and its device based on code fingerprint
CN107256358A (en) * 2017-07-04 2017-10-17 北京工业大学 Industrial configuration monitoring software implementation procedure dynamic protection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150113650A1 (en) * 2010-01-27 2015-04-23 Mcafee, Inc. Method and system for proactive detection of malicious shared libraries via a remote reputation system
CN106709349A (en) * 2016-12-15 2017-05-24 中国人民解放军国防科学技术大学 Multi-dimension behavior characteristic-based malicious code classification method
CN107169358A (en) * 2017-05-24 2017-09-15 中国人民解放军信息工程大学 Code homology detection method and its device based on code fingerprint
CN107256358A (en) * 2017-07-04 2017-10-17 北京工业大学 Industrial configuration monitoring software implementation procedure dynamic protection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUEHAN WANG等: "A Novel Anti-Obfuscation Model for Detecting Malicious Code", 《INTERNATIONAL JOURNAL OF OPEN SOURCE SOFTWARE AND PROCESSES》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257354A (en) * 2018-09-25 2019-01-22 平安科技(深圳)有限公司 Abnormal flow analysis method and device, electronic equipment based on model tree algorithm
CN109257354B (en) * 2018-09-25 2021-11-12 平安科技(深圳)有限公司 Abnormal flow analysis method and device based on model tree algorithm and electronic equipment
CN109308413A (en) * 2018-11-28 2019-02-05 杭州复杂美科技有限公司 Feature extracting method, model generating method and malicious code detecting method

Similar Documents

Publication Publication Date Title
CN105915555B (en) Method and system for detecting network abnormal behavior
CN111639497B (en) Abnormal behavior discovery method based on big data machine learning
CN106096411B (en) A kind of Android malicious code family classification methods based on bytecode image clustering
CN107493277B (en) Large data platform online anomaly detection method based on maximum information coefficient
CN110415107B (en) Data processing method, data processing device, storage medium and electronic equipment
Liu et al. A robust malware detection system using deep learning on API calls
CN107807987A (en) A kind of string sort method, system and a kind of string sort equipment
CN109284626A (en) Random forests algorithm towards difference secret protection
Zhou et al. Data augmentation for graph classification
CN109711163B (en) Android malicious software detection method based on API (application program interface) calling sequence
CN108170467B (en) Constraint limited clustering and information measurement software memorial feature selection method and computer
CN111639337A (en) Unknown malicious code detection method and system for massive Windows software
CN105072214A (en) C&C domain name identification method based on domain name feature
CN108491228A (en) A kind of binary vulnerability Code Clones detection method and system
CN114091661B (en) Oversampling method for improving intrusion detection performance based on generation countermeasure network and k-nearest neighbor algorithm
CN108287996A (en) A kind of malicious code obscures feature cleaning method
CN114036531A (en) Multi-scale code measurement-based software security vulnerability detection method
bin Asad et al. Analysis of malware prediction based on infection rate using machine learning techniques
Juvonen et al. Adaptive framework for network traffic classification using dimensionality reduction and clustering
Hruby Using similarity measures in benthic impact assessments
CN106330861A (en) Website detection method and apparatus
CN111737694A (en) Behavior tree-based malicious software homology analysis method
CN115471258A (en) Violation behavior detection method and device, electronic equipment and storage medium
CN115829712A (en) Data information security classification method and device
CN109739840A (en) Data processing empty value method, apparatus and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180717