CN108287996A - A kind of malicious code obscures feature cleaning method - Google Patents
A kind of malicious code obscures feature cleaning method Download PDFInfo
- Publication number
- CN108287996A CN108287996A CN201810013584.4A CN201810013584A CN108287996A CN 108287996 A CN108287996 A CN 108287996A CN 201810013584 A CN201810013584 A CN 201810013584A CN 108287996 A CN108287996 A CN 108287996A
- Authority
- CN
- China
- Prior art keywords
- feature
- value
- sample
- malicious code
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of malicious codes to obscure feature cleaning method, belongs to machine learning information security field.This method includes feature selection approach and obscures feature cleaning method, improves the validity of traditional malicious code feature extracting method.Compared to traditional malicious code feature extracting method, the present invention can effectively extend the effective time limit of malicious code feature extraction algorithm, and improve the anti-interference of feature extraction algorithm.The present invention passes through n gram feature extracting method construction features library first.Since what this feature extraction algorithm can not solve malicious code obscures operation, cause to obscure characteristic value containing a large amount of malicious codes in feature database.Algorithm is cleaned by obscuring feature, interference of the abnormal data to Model Identification rule can be eliminated.On this basis from the angle of training dataset scale, a kind of feature selection approach is proposed.This method effectively reduces the number of features that model finally uses on the basis of ensureing that Model Identification precision does not decline.
Description
Technical field
The present invention relates to a kind of malicious codes to obscure feature minimizing technology, can improve traditional malicious code feature extraction
The effective time limit of method.Belong to machine learning information security field, be related to machine learning classification algorithm and obscure feature removal and
The combination and use of feature selecting algorithm.
Background technology
It is counted according to Symantec, most of emerging malicious code is all on the basis of original malicious code, by one
It is generated after a little map functions.Therefore usually Malicious Code Detection is feature based vector, this feature vectorial malice generation
The substantive characteristics of code.Good feature extraction algorithm is the core technology of malicious code mutation detection.Common anti-viral software
Malicious code is identified usually using the method based on signature.For giving one group of malicious code sample, first by maliciously generation
Code is labeled as a family.It, should feature having the same for the malicious code of the same family.By these public spies
Sign extracts, construction feature library, the mutation for detecting the malicious code family.But the detection in this feature based library
System, safety depend on the validity of used feature extracting method.This is because new mutation malicious code can be directed to
Previous feature extracting method is interfered, and then is achieved the purpose that around detecting system.Such as based on key-strings
In malicious code detection system, malicious code passes through the addition to key-strings progress equivalencing or idle character string
Escape the identification of detecting system.For malicious code to obscuring operation used by feature extraction, there are many scholars to propose
Various different malicious code feature extracting methods obscure operation caused by detecting system to eliminate malicious code
It influences, obtains best Malicious Code Detection effect.However on the one hand these feature extracting methods can be attacked gradually by malicious code
Broken, on the other hand safer feature extracting method will also result in that computing resource expense is excessive, and system real time is poor etc. and ask
Topic.
The research of Malicious Code Detection technology at this stage is concentrated mainly in the extraction to malicious code feature vector.For
Improve the anti-interference of malicious code feature vector.Researcher is from security attribute, dependence, true semantic multiple angles
Degree carries out feature extraction to malicious code.Kirda et al. utilizes spy's Code obtaining user's sensitive data, then by leak data
Behavioural characteristic be detected.But this method is only limitted to the malicious code of detection spy's class, can not detect other and not made to data
At the malicious code of leakage.Wang Rui et al. is based on malicious code from the practical semantic angle of malicious code by structure
Semantic behavioural characteristic figure, be detected using this feature figure calculating characteristic value to malicious code achieve it is very good
Detection result.But this method is the detection method based on program behavior itself, does not account for calling of the program for resource
Problem, therefore the malicious code of certain special mutation can not be identified well.And this method timeliness is poor, needs
Larger computing resource, does not have practicability.
And with the quickening of malicious code iteration speed, the timeliness of feature extracting method is also shorter and shorter.Pass through replacement
Feature extracting method maintains the safety of system to become more and more difficult.Therefore how effectively to eliminate in feature database and obscure
Feature, which becomes, the problem of very strong practical significance.
Invention content
Effective time limit in order to solve the problems, such as traditional malicious code feature extracting method is shorter, the skill that the present invention uses
Art scheme is that one kind obscuring feature cleaning algorithm and feature selection approach, and it is that one kind being directed to n-gram to obscure feature cleaning algorithm
Malicious code feature extraction algorithm, malicious code obscures feature minimizing technology.This obscures feature cleaning algorithm can be a small amount of
Malicious code sample is analyzed, feature cleaning is carried out to remaining malicious code in sample database.Finally obtained feature database has feature
Number is stablized, the features such as being not easily susceptible to obscure.Feature selection approach can then be replaced according to training data sample set, automation
Change the characteristics of feature selected in feature database reaches optimization feature database.
A kind of malicious code obscures feature cleaning method, and this method includes feature selection approach and obscures feature cleaning side
Method improves the validity of traditional malicious code feature extracting method.Compared to traditional malicious code feature extracting method, this hair
It is bright effectively to extend the effective time limit of malicious code feature extraction algorithm, and improve the anti-interference of feature extraction algorithm.
The present invention passes through n-gram feature extracting method construction features library first.Since this feature extraction algorithm can not solve
Certainly malicious code obscures operation, causes to obscure characteristic value containing a large amount of malicious codes in feature database.It is clear by obscuring feature
Algorithm is washed, interference of the abnormal data to Model Identification rule can be eliminated.On this basis from the angle of training dataset scale
On, propose a kind of feature selection approach.This method effectively reduces model on the basis of ensureing that Model Identification precision does not decline
The number of features finally used.
The main technological route of the present invention is as follows:
1) it is based on multisample to analyze, structure obscures feature cleaning method.This method passes through to the detailed of a small amount of sample data
The characteristics of analyzing, finding to obscure feature in sample simultaneously builds linear regression algorithm model.
2) it feature cleaning method dynamic is obscured based on this calculates in remaining each sample and obscure the threshold value of characteristic value, and be based on
The value carries out obscuring removing to the feature vector of remaining sample in sample database.
3) training set construction feature selection method is inputted according to sample.This method first carries out obtained feature vector
Normalized, and according to input training sample number, dynamic is removed and contributes smaller characteristic value in data set.
It is of the present invention obscure feature cleaning method specific implementation steps are as follows:
1) consider that malicious code sample situation is complicated, method of obscuring used by each malicious code sample is dynamic
Variation, and the feature Distribution value that different samples are extracted is also different.Therefore it for each sample, needs to move
State solves the size that sample obscures value.The threshold xi that characteristic value is obscured in each malicious code sample, referred to as obscures threshold value, and ξ is sample
Obscure minimum value in characteristic value in this, which is dynamic change in different samples.In order to preferably weigh and characterize
The size of the value.Following two indices are defined, are characterized desired value Feature respectivelyaveragesWith characteristic standard value
Featuremedian.The two indexs are as obtained from the dynamic solution to single sample, for describing in the sample
Feature distribution situation.The function has reacted the relationship between threshold value and desired value and standard value:ξ=α * Featureaverages+β*
Featuremedian, α and β are characterized the weight of desired value and characteristic standard value respectively.
2) feature desired value FeatureaveragesRepresent the ideal value condition of sample most original situation lower eigenvalue.
By calculating the summation of each characteristic value and averaging in the sample, a characteristic value in current sample distribution is obtained
Ideal values.In view of n-gram algorithms to most of malicious code sample when carrying out feature extraction, can cause in sample
Containing largely only there is the invalid feature of single.Therefore feature desired value Feature is being calculatedmedianWhen by sample
After each characteristic value carries out duplicate removal, then carry out averaging operation.Such processing can eliminate shadow of a large amount of noise datas to mean value
It rings.M is remaining Characteristic Number after duplicate removal, featureiRepresent the characteristic value size of ith feature.
The calculating of feature desired value:
3) characteristic standard value FeaturemedianObscure interference of the characteristic value to final result, feature for reducing larger
Standard value is obtained by calculating in sample the median of all characteristic values, and the preferable sample that reacts is special when undisturbed
The ideal values of value indicative.Since in a malicious code sample, whole characteristic value distribution situation tends to Gaussian Profile,
In feature of obscuring considerably less ratio is only accounted in its feature distribution.Although characteristic standard value can also be made by obscuring characteristic value
At influence.But it is relatively low due to obscuring characteristic value proportion in feature distribution, by solving the median in being distributed
Value obtains the range that rear desired characteristics value value is obscured in a very close removal.M is remaining Characteristic Number after duplicate removal,
featureiRepresent the characteristic value size of ith feature.Mid functions are to solve for the median of sequence.Characteristic standard value calculates letter
Number:
Featuremedian=mid (feature1, feature2..., featurem)。
When carrying out feature extraction to malicious code sample collection, obtained by preliminary using characteristic value cleaning method is obscured
Feature database is obscured in removing for processing.The characteristic value of obscuring for generating larger interference in this feature library to training pattern has been cleared by, but
It is if being directly based upon this feature library carries out model training, it is difficult to obtain good effect.It is concentrated and is existed due to malicious code sample
The mutation malicious code of a variety of families can cause number of features in feature database excessively huge.In view of smaller in these characteristic values
Feature in, remove most of noise data, also partly belong to family's feature important in malicious code.These families are special
Only there is less number in sign, so if the smaller feature of characteristic value is all removed, inevitably removed part
Good feature generates interference to the precision of model.In order to further obscure feature database to removing and clean, big portion is being eliminated
Retain important malicious code family feature while the noise data divided.
A kind of feature selection approach based on input training dataset scale.The specific implementation technical solution of this method is such as
Under:
1) due to the diversity of malicious code sample, the value range of feature vector is also different in each sample.It is right
In the characteristic value of same numerical value, the significance level in different samples is different.In order to eliminate because of value range not
Influence caused by when together, finally weighing feature.Method proposes a kind of normalizing operations based on accounting.For single
Sample weighs each characteristic value in the sample important by calculating the ratio that each characteristic value and characteristic value are summed up in single sample
Degree.featurei' represent feature after standardizationiNew value.Characteristic standard algorithm:
2) for the training characteristics library after standardization, the sum of all characteristic values of single sample are 1.Therefore total for inputting
Sample number S, the sum of all characteristic values are S.While in order to eliminate noise data in single sample, it is not destroyed
In certain important family's features.Method proposes one kind being based on input sample number S, the same clan of malicious code man in training set
Not Shuo n feature selection approach.For obscuring after each sampling feature vectors are standardized in feature database, then to all
The feature of appearance adds up, and obtains each feature summation characteristic value based on sample set.Since malicious code family feature can be
It can repeat in identical family's sample, therefore this feature value can improve the size of final characteristic value after cumulative.And for
Remaining noise data, since its feature only occurs in individual samples, this feature value is 0 in remaining sample.
In whole sample characteristics, shared ratio can also reduce final accumulated value accordingly.For some feature FeatureiValue
It is by the sum of the value of this feature in all sample files.Wherein FeatureiIt is the value of final ith feature, S is instruction
Practice collection number of samples, featureiRepresent the value of current signature in each sample.
Feature Selection formula:
Description of the drawings
Fig. 1:Model general frame figure
Fig. 2:Disassembler segment
Fig. 3:Random forest tree algorithm model
Fig. 4:Malicious code sample test set
Fig. 5:F2 Experimental comparisons
Fig. 6:E1 Experimental comparisons
Specific implementation mode
The present invention is explained and illustrated with reference to relevant drawings:
To make the purpose of the present invention, technical solution and feature be more clearly understood, below in conjunction with specific embodiment, and
With reference to attached drawing, further refinement explanation is carried out to the present invention.The method of the present invention general frame figure is as shown in Figure 1.Each step
Process description is as follows:
(1) n-gram algorithms are based on and extract original malicious code feature, structure initial characteristics library.
(2) sample is extracted at random, carries out the research for obscuring threshold value.Training linear fit equation is for predicting unknown sample
Obscure threshold value.
(3) obscured in feature cleaning method cleaning feature database based on this and obscure feature.
(4) feature database is standardized.
(5) it is based on this feature selection algorithm structure training characteristics library.
The method of the present invention running environment is as follows:
The hardware environment of operation is IBM servers (2 four cores of Intel to strong E54202.5 GHz/EM64T, 12MB L2
Caching), configuration 32G PC2-5300CL5ECC DDR2667MHz memories, 2 pieces of 500G hard disks;Operating system uses CentOS
7.064, background data base uses MongoDB 2.2.6.Malicious code mapping process, piecemeal, feature extraction, retrieval and mistake
It filters journey and uses Python, correlation packet is Anaconda-1.8.0-Linux-x86_64, includes that engineering is related to
The packet arrived.Wherein, MongoDB stores the relevant information of malicious code sample, such as MD5 values, file size, family's mark
Note, malicious code PE block relevant informations etc..Used is the malicious code library provided by Microsoft.The library contains 9
The famous malicious code family of kind, every malicious code sample are owned by unique ID number.Each malicious code sample has only
One classification value corresponds to each malicious code family respectively from 1 to 9, shares 10868 malicious code samples.Malicious code
Test sample collection explanation such as Fig. 2.
Obscure feature cleaning method in the present invention, is the core methed for extending feature extracting method timeliness.Therefore it is
The verification present invention obscures the validity of feature cleaning method, and the present invention selects random forest tree algorithm to obscure for detecting this
The advantages of feature cleaning method.Since malicious code mutation situation is complicated, the feature database finally extracted also is difficult to completely
All invalid features of removing.Therefore it needs to carry out the sampling put back to again to sample set, builds a variety of training sample sets for mould
The mode of type training improves the generalization and diversity of data set.Improve anti-interference, the robustness of final detecting system.Choosing
Select random forest tree algorithm and build multiple graders, final testing result in such a way that multiple graders are voted come
It determines.Because what model took malicious code sample is to have the random sampling put back to, therefore even if sample size and original
Data capacity is identical, the sample that still will appear repetition in sample set or miss.This sample mode for sample is made
At otherness, training sample set is maximumlly utilized, improves generalization of the model to the following mutation malicious code.With
Machine forest tree algorithm model such as Fig. 3.
Due to the characteristic that random forest tree is extracted at random, when carry out it is a large amount of repeat to test when, using obscure feature compared with
The accuracy of detection fluctuation that more training characteristics libraries can cause its final is larger.For a Malicious Code Detection model, detection essence
It is more unstable with the loophole that can be broken that degree fluctuates larger explanation model.Obscure value cleaning side to detect difference
Influence of the case to final training characteristics, the present invention test the fluctuation situation of the precision between different schemes, F2, E1, E2 experimental precisions
With number such as Fig. 4,5,6.Wherein F2 obscures cleaning program to be proposed by the present invention, remaining two kinds are respectively security fields research
In common obscure feature sweep-out method.Obscure cleaning program such as table 1.
In order to weigh Fig. 4, each testing scheme is concentrated in different data in 5,6, the fluctuation situation of accuracy of detection.This
The fluctuation situation of each testing scheme final mask precision is weighed in invention by standard deviation.Each experimental program standard deviation such as table
2, as shown in Table 2.F2 testing scheme entirety standard deviations are smaller, and with the growth of input sample collection, and standard deviation presents
The decline of regularity.Compared to remaining testing scheme, F2 not only in 1000 input sample collection, just there is preferable stability.
And when input number reaches 5000, F2 testing scheme model accuracy fluctuation ranges minimum is stablized the most.It is noted that
Although when input sample reaches 2000, the standard deviation of E2 testing schemes is minimum.But one is to be due to this experimental calculation
One observation, therefore have certain deviation.In addition E2 schemes input sample be 2000 when compared to 1000, standard deviation
It substantially reduces.And when reaching 5000, standard deviation reduces standard deviation that is limited, and being less than in F2 testing schemes.Therefore the side F2
Case obscures value cleaning program for optimal.It solves to use failure characteristics extracting method because of model, causes to obscure feature in feature database
Value accounting heavier the problem of being affected to model final detection result.Table 1
Testing scheme | Test sample collection | Feature selecting scheme | Cleaning condition | Obscure value cleaning method |
E1 | 1000、2000、5000 | C1 | featurei> 300 | |
E2 | 1000、2000、5000 | C1 | featurei> 500 | featurei=0 |
F1 | 1000、2000、5000 | C1 | featurei> ξ | featurei=ξ |
F2 | 1000、2000、5000 | C1 | featurei> ξ |
Table 2
Claims (2)
1. a kind of malicious code obscures feature cleaning method, this method includes feature selection approach and obscures feature cleaning method,
Improve the validity of traditional malicious code feature extracting method;
Pass through n-gram feature extracting method construction features library first;Since this feature extraction algorithm can not solve malicious code
Obscure operation, causes to obscure characteristic value containing a large amount of malicious codes in feature database;Algorithm is cleaned by obscuring feature, is eliminated different
Interference of the regular data to Model Identification rule;On this basis from the angle of training dataset scale, a kind of feature choosing is proposed
Selection method;This method effectively reduces the number of features that model finally uses on the basis of ensureing that Model Identification precision does not decline;
It is characterized in that:The implementing procedure of this method is as follows,
1) it is based on multisample to analyze, structure obscures feature cleaning method;This method by the detailed analysis to a small amount of sample data,
It was found that the characteristics of obscuring feature in sample and building linear regression algorithm model;
2) it feature cleaning method dynamic is obscured based on this calculates and obscure the threshold value of characteristic value in remaining each sample, and be based on the value pair
The feature vector of remaining sample carries out obscuring removing in sample database;
3) training set construction feature selection method is inputted according to sample;This method is first normalized obtained feature vector
Processing, and according to input training sample number, dynamic is removed and contributes smaller characteristic value in data set;
Steps are as follows for specific implementation:
1) consider that malicious code sample situation is complicated, it is dynamic change to obscure method used by each malicious code sample
, and the feature Distribution value that different samples are extracted is also different;Therefore for each sample, dynamic solution is needed
Sample obscures the size of value;The threshold xi that characteristic value is obscured in each malicious code sample, referred to as obscures threshold value, and ξ is obscured in sample
Minimum value in characteristic value, the minimum value are dynamic changes in different samples;In order to preferably weigh and characterize the big of the value
It is small;Following two indices are defined, are characterized desired value Feature respectivelyaveragesWith characteristic standard value Featuremedian;This
Two indices are as obtained from the dynamic solution to single sample, for describing the feature distribution situation in the sample;It should
Function has reacted the relationship between threshold value and desired value and standard value:ξ=α * Featureaverages+β*Featuremedian, α and β
It is characterized the weight of desired value and characteristic standard value respectively;
2) feature desired value FeatureaveragesRepresent the ideal value condition of sample most original situation lower eigenvalue;Pass through meter
The summation of each characteristic value and averaging in the sample are calculated, the ideal for obtaining a characteristic value in current sample distribution takes
Value;In view of n-gram algorithms to most of malicious code sample when carrying out feature extraction, can cause in sample containing a large amount of
Only there is the invalid feature of single;Therefore feature desired value Feature is being calculatedmedianWhen by each characteristic value in sample
After carrying out duplicate removal, then carry out averaging operation;Such processing can eliminate influence of a large amount of noise datas to mean value;M is duplicate removal
Remaining Characteristic Number afterwards, featureiRepresent the characteristic value size of ith feature;
The calculating of feature desired value:
3) characteristic standard value FeaturemedianObscure interference of the characteristic value to final result, characteristic standard for reducing larger
Value is obtained by calculating in sample the median of all characteristic values, and the preferable sample that reacts is when undisturbed, characteristic value
Ideal values;Since in a malicious code sample, whole characteristic value distribution situation tends to Gaussian Profile, therein mixed
Feature of confusing only accounts for considerably less ratio in its feature distribution;Although obscure characteristic value will also result in influence to characteristic standard value;
But it is relatively low due to obscuring characteristic value proportion in feature distribution, by solving the median value in being distributed, obtain
The range of rear desired characteristics value value is obscured to a very close removal;M is remaining Characteristic Number after duplicate removal, featureiGeneration
The characteristic value size of table ith feature;Mid functions are to solve for the median of sequence;Characteristic standard value calculates function:
Featuremedian=mid (feature1, feature2..., featurem)。
2. a kind of malicious code according to claim 1 obscures feature cleaning method, it is characterised in that:To malicious code
When sample set carries out feature extraction, obtain obscuring feature database by removing for preliminary treatment using characteristic value cleaning method is obscured;It should
The characteristic value of obscuring for generating larger interference in feature database to training pattern has been cleared by, but if is directly based upon this feature library
Model training is carried out, it is difficult to obtain good effect;Since malicious code sample is concentrated there are the mutation malicious code of a variety of families,
Number of features in feature database can be caused excessively huge;In view of in the smaller feature of these characteristic values, removing most of noise
Data also partly belong to family's feature important in malicious code;Only there is less number in these family's features, therefore
If the smaller feature of characteristic value all removed, it is dry to the precision generation of model inevitably to remove the good feature in part
It disturbs;For that further can obscure feature database to removing and clean, retain while eliminating most noise data important
Malicious code family feature;
Realize that specific technical solution is as follows using a kind of feature selection approach based on input training dataset scale:
1) due to the diversity of malicious code sample, the value range of feature vector is also different in each sample;For same
The characteristic value of one numerical value, the significance level in different samples are different;In order to eliminate because of value range difference, to most
It is influenced caused by when weighing feature eventually;Method proposes a kind of normalizing operations based on accounting;For single sample, pass through
The ratio for calculating each characteristic value and characteristic value sum total in single sample, weighs the significance level of each characteristic value in the sample;
featurei' represent feature after standardizationiNew value;Characteristic standard algorithm:
2) for the training characteristics library after standardization, the sum of all characteristic values of single sample are 1;Therefore for inputting total number of samples
S, the sum of all characteristic values are S;While in order to eliminate noise data in single sample, some of which weight is not destroyed
The family's feature wanted;Method proposes one kind being based on input sample number S, the spy of malicious code family classification number n in training set
Levy selection method;For obscuring after each sampling feature vectors are standardized in feature database, then the feature to being occurred
It adds up, obtains each feature summation characteristic value based on sample set;Since malicious code family feature can be in identical family's sample
It can repeat in this, therefore this feature value can improve the size of final characteristic value after cumulative;And for remaining noise
Data, since its feature only occurs in individual samples, this feature value is 0 in remaining sample;Final accumulated value
In whole sample characteristics, shared ratio can also reduce accordingly;For some feature FeatureiValue be by all samples
The sum of the value of this feature in this document;Wherein FeatureiIt is the value of final ith feature, S is training set number of samples,
featureiRepresent the value of current signature in each sample;
Feature Selection formula:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810013584.4A CN108287996A (en) | 2018-01-08 | 2018-01-08 | A kind of malicious code obscures feature cleaning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810013584.4A CN108287996A (en) | 2018-01-08 | 2018-01-08 | A kind of malicious code obscures feature cleaning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108287996A true CN108287996A (en) | 2018-07-17 |
Family
ID=62835008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810013584.4A Pending CN108287996A (en) | 2018-01-08 | 2018-01-08 | A kind of malicious code obscures feature cleaning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108287996A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109257354A (en) * | 2018-09-25 | 2019-01-22 | 平安科技(深圳)有限公司 | Abnormal flow analysis method and device, electronic equipment based on model tree algorithm |
CN109308413A (en) * | 2018-11-28 | 2019-02-05 | 杭州复杂美科技有限公司 | Feature extracting method, model generating method and malicious code detecting method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150113650A1 (en) * | 2010-01-27 | 2015-04-23 | Mcafee, Inc. | Method and system for proactive detection of malicious shared libraries via a remote reputation system |
CN106709349A (en) * | 2016-12-15 | 2017-05-24 | 中国人民解放军国防科学技术大学 | Multi-dimension behavior characteristic-based malicious code classification method |
CN107169358A (en) * | 2017-05-24 | 2017-09-15 | 中国人民解放军信息工程大学 | Code homology detection method and its device based on code fingerprint |
CN107256358A (en) * | 2017-07-04 | 2017-10-17 | 北京工业大学 | Industrial configuration monitoring software implementation procedure dynamic protection method |
-
2018
- 2018-01-08 CN CN201810013584.4A patent/CN108287996A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150113650A1 (en) * | 2010-01-27 | 2015-04-23 | Mcafee, Inc. | Method and system for proactive detection of malicious shared libraries via a remote reputation system |
CN106709349A (en) * | 2016-12-15 | 2017-05-24 | 中国人民解放军国防科学技术大学 | Multi-dimension behavior characteristic-based malicious code classification method |
CN107169358A (en) * | 2017-05-24 | 2017-09-15 | 中国人民解放军信息工程大学 | Code homology detection method and its device based on code fingerprint |
CN107256358A (en) * | 2017-07-04 | 2017-10-17 | 北京工业大学 | Industrial configuration monitoring software implementation procedure dynamic protection method |
Non-Patent Citations (1)
Title |
---|
YUEHAN WANG等: "A Novel Anti-Obfuscation Model for Detecting Malicious Code", 《INTERNATIONAL JOURNAL OF OPEN SOURCE SOFTWARE AND PROCESSES》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109257354A (en) * | 2018-09-25 | 2019-01-22 | 平安科技(深圳)有限公司 | Abnormal flow analysis method and device, electronic equipment based on model tree algorithm |
CN109257354B (en) * | 2018-09-25 | 2021-11-12 | 平安科技(深圳)有限公司 | Abnormal flow analysis method and device based on model tree algorithm and electronic equipment |
CN109308413A (en) * | 2018-11-28 | 2019-02-05 | 杭州复杂美科技有限公司 | Feature extracting method, model generating method and malicious code detecting method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105915555B (en) | Method and system for detecting network abnormal behavior | |
CN111639497B (en) | Abnormal behavior discovery method based on big data machine learning | |
CN106096411B (en) | A kind of Android malicious code family classification methods based on bytecode image clustering | |
CN107493277B (en) | Large data platform online anomaly detection method based on maximum information coefficient | |
Liu et al. | A robust malware detection system using deep learning on API calls | |
CN110415107B (en) | Data processing method, data processing device, storage medium and electronic equipment | |
CN107807987A (en) | A kind of string sort method, system and a kind of string sort equipment | |
CN109711163B (en) | Android malicious software detection method based on API (application program interface) calling sequence | |
CN109284626A (en) | Random forests algorithm towards difference secret protection | |
CN110222058A (en) | Multi-source data based on FP-growth is associated with privacy leakage risk evaluating system | |
CN108170467B (en) | Constraint limited clustering and information measurement software memorial feature selection method and computer | |
CN111639337A (en) | Unknown malicious code detection method and system for massive Windows software | |
Zhao et al. | A simple and effective outlier detection algorithm for categorical data | |
CN108491228A (en) | A kind of binary vulnerability Code Clones detection method and system | |
CN115620812B (en) | Resampling-based feature selection method and device, electronic equipment and storage medium | |
bin Asad et al. | Analysis of malware prediction based on infection rate using machine learning techniques | |
CN108287996A (en) | A kind of malicious code obscures feature cleaning method | |
CN114036531A (en) | Multi-scale code measurement-based software security vulnerability detection method | |
CN109240807A (en) | A kind of malicious program detection system and method based on VMI | |
Hruby | Using similarity measures in benthic impact assessments | |
CN106330861A (en) | Website detection method and apparatus | |
CN115471258A (en) | Violation behavior detection method and device, electronic equipment and storage medium | |
CN115242431A (en) | Industrial Internet of things data anomaly detection method based on random forest and long-short term memory network | |
CN109739840A (en) | Data processing empty value method, apparatus and terminal device | |
CN114553468A (en) | Three-level network intrusion detection method based on feature intersection and ensemble learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180717 |