CN111783093A - Malicious software classification and detection method based on soft dependence - Google Patents

Malicious software classification and detection method based on soft dependence Download PDF

Info

Publication number
CN111783093A
CN111783093A CN202010595193.5A CN202010595193A CN111783093A CN 111783093 A CN111783093 A CN 111783093A CN 202010595193 A CN202010595193 A CN 202010595193A CN 111783093 A CN111783093 A CN 111783093A
Authority
CN
China
Prior art keywords
malware
value
malicious software
family
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010595193.5A
Other languages
Chinese (zh)
Inventor
刘哲
张永超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202010595193.5A priority Critical patent/CN111783093A/en
Publication of CN111783093A publication Critical patent/CN111783093A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Virology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for classifying and detecting malicious software based on soft dependency, which is used for relieving the problem of misclassification of the malicious software during classification. The method divides the extracted malicious software features into a plurality of feature subsets, and trains models respectively aiming at the feature subsets; then, classifying the new malicious software by using the models to generate respective classification results; and finally, obtaining a final classification result by using s-value (soft dependence value proposed based on a mixed distance standard in pattern recognition). The method mainly combines the classification results of a plurality of models through the mixed distance standard in the pattern recognition, and the result is regarded as the final classification result.

Description

Malicious software classification and detection method based on soft dependence
Technical Field
The invention relates to a malicious software classification and detection method based on soft dependence, and belongs to the field of safety.
Background
Along with the popularization and development of computers, malicious software also grows. However, due to the large amount of malware, it is difficult for conventional antivirus software to identify emerging malware. For example, word.whboy.cw and ranomware were shown at the end of 2006 and 2017, respectively. The former are variants of worms and the latter are novel viruses that bind to worms to increase their infectious range. These viruses are difficult to detect using conventional antivirus software. According to statistical analysis, the malicious software will grow explosively in this situation, and the user will inevitably be attacked by the malicious software which is ineffective in the conventional anti-virus software defense. Therefore, there is a need to propose a more efficient method to solve this problem. However, conventional antivirus software identifies malware through signature-based methods, which require large local databases to store the malicious signatures. However, this approach can be easily avoided by using encryption, obfuscation, and packaging to implement malware polymorphism. And traditional antivirus software does not work well against new malware variants.
Based on the above problems, the method based on machine learning/deep learning attracts attention from both academic and industrial fields. The machine learning/deep learning-based method can make up for the defects of traditional malicious software classification and identification, and can obtain good identification accuracy aiming at malicious software variants. Features are extracted only for the malicious software, then a machine learning/deep learning method is used for training, then a training model is generated, and finally the training model can be used for predicting the new malicious software. Some methods do not even need to extract features, and only need to convert malware into a certain format, for example, convert malware in a binary form into a gray map, a CNN training model can be used, which is undoubtedly more convenient.
However, the machine learning/deep learning based approach brings many advantages and opportunities as well as challenges. Since the machine learning/deep learning method is to train the model according to the samples, the first problem is the over-fitting and under-fitting problems, both the over-fitting and under-fitting problems lead to the reduction of the model performance, the generalization capability is low, and the method is not suitable for large-scale use. In addition, malware feature extraction is a relatively large problem to be faced. Since the extraction of features requires relatively professional feature engineering knowledge, it is difficult for a person who does not know the feature engineering. Moreover, it is unknown whether the dimension of the extracted features is too high, whether the extracted features are useful for training the model, and whether more powerful features are not found. Although some malware can be classified and identified by converting into other forms by using a special deep learning method, the malware accounts for a few, and extraction of characteristics of the malware cannot be avoided. In addition, the choice of machine learning/deep learning methods is also a matter of consideration, for example, some deep learning methods are not suitable for performing malware classification and detection tasks. In addition, although the model trained by some methods can achieve good accuracy, the training time and the prediction time are long, and the accuracy is generally recognized in a time-based mode. For the sample feature portion, the features may be processed using Principal Component Analysis (PCA). Although the academia currently proposes many methods for classifying or detecting malware, each flaw has more or less flaws. For example, feature-based methods also rely too heavily on the features of the training samples, which may result in overfitting of the model. Furthermore, if the model classifies a new type of malware that does not belong to any training series, misclassification is highly likely to occur. The occurrence of these problems is likely to lead to a degradation of the model performance. Meanwhile, there is no good balance among model accuracy, training time and prediction time, and time is often used for accuracy. Current research is generally less concerned about these problems.
Disclosure of Invention
Aiming at the problems that the existing method has low identification precision on some malware variants and the balance among the identification precision, training time and prediction time is poor, a soft-dependence-based malware classification and detection method is provided. The invention can effectively detect some malicious software variants, finds a good balance among the recognition accuracy, the training time and the prediction time, and reduces the training time and the prediction time while ensuring the recognition accuracy.
The invention adopts the following technical scheme for solving the technical problems:
a method for malware classification and detection based on soft dependency, comprising the steps of:
the method comprises the following steps: performing feature extraction on all the malware samples in the malware sample training set, and dividing the extracted features into n feature subsets F0,F1,…,Fn-1(ii) a The malware sample training set is train ═ D0,D1,......,Dk-1In which D isiRepresenting the ith malware family in the malware sample training set, wherein i is more than or equal to 0 and less than or equal to k-1, and k represents the number of the malware families in the malware sample training set; j-th feature subset Fj={Df0,Df1,......,Dfk-1},DfiRepresenting the characteristics of all the malicious software samples in the ith malicious software family in the characteristic subset, wherein j is more than or equal to 0 and less than or equal to n-1;
step two: respectively using n feature subsets F in step one0,F1,…,Fn-1Machine learning for training samples to generate n models M0,M1,…,Mn-1
Step three: using n models M in step two0,M1,…,Mn-1Respectively classifying the malicious software to be detected to obtain n classification results P0,P1,…,Pn-1
Step four: calculating a mixer off-standard value s-value for each feature subset;
step five: according to the result predicted by each model obtained in the third step, s-value of each characteristic subset in the fourth step is s-value0,s-value1,...,s-valuen-1Respectively as the weight of the classification result obtained in the step three, and obtaining the classification result P | | | s-value of the malicious software to be detected0||·P0+||s-value1||·P1+...+||s-valuen-1||·Pn-1
Further, the jth feature subset F in step fourjIs off the standard value sjComprises the following steps:
s-valuej=ECjα+ENjβ–EIjγ
wherein, FjStandard value EC of distance between centroids of family of moderate malwarej=(ecpq)k*k,ecpqIs FjThe distance between the pth and qth malware family centroids; fjMinimum distance standard value EN between middle malware familiesj=(enab)k*k,enabIs FjThe shortest distance between the a-th and b-th malware families; fjInter-class distance criterion EI within the Medium malware familyj=(eiab)k*2,eia1Is FjMaximum distance between malware sample features within the a-th malware family, eia2Is Fjα, β and gamma are respectively the weight of EC, EN and the inter-class distance standard value EI in the malware family.
Further, the malware family has a centroid of
Figure BDA0002557215290000031
αuIs a feature of the u-th malware sample in the malware family.
Further, α, β and γ are found in the machine learning process in step two by minimizing a loss function, the loss function being
Figure BDA0002557215290000032
yvAnd N is the number of the malware samples in the malware sample training set.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
the invention provides a method for identifying the malicious software variants and seeking the balance problem among the identification accuracy, the training time and the prediction time, thereby relieving the problem of low identification accuracy of the malicious software variants and relieving the problem of replacing the identification accuracy with time. In addition, the number of the malicious software is explosively increased in an exponential mode, most of the malicious software is evolved from known malicious software, certain malicious software variants can be identified by the method, and a new malicious software identification model can be trained rapidly while the accuracy is guaranteed for the unrecognized malicious software variants. The invention can relieve the influence of the violent increase of the amount of the malicious software on the computer safety.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
the invention provides a malware classification and detection method based on soft dependency, aiming at the problems of low malware variant identification accuracy and balance among identification accuracy, training time and prediction time. The invention aims to reduce the training time and the prediction time while ensuring the recognition accuracy, find a balance among the recognition accuracy, the training time and the prediction time, and simultaneously identify some malware variants and the like.
The classification and detection of the malicious software are divided into 2 stages: a model training phase and an s-value generation final result phase.
Before describing these two stages, the description of the parameters used in the present invention will be described. First, the malware sample training set is labeled as train ═ D0,D1,......,Dk-1In which D isiThe method is characterized by representing the ith malware family in a malware sample training set (the same type of malware is called as a malware family), i is more than or equal to 0 and less than or equal to k-1, k represents the number of the malware families in the malware sample training set, and y represents the labels label of all malware samples in the malware sample training set. During feature extraction, feature extraction is performed on all malware samples in a malware sample training set, and then the IDs (generally represented by hash values of the names of the malware) of the malware samples, the malware families to which the malware samples belong and files in which the features of the malware samples are stored are obtained. The features of all malware samples are divided into n feature subsets F0,F1,…,Fn-1When the feature subsets are divided, some features with larger relevance of the malicious software sample are divided into the same feature subset, such as the malicious software sampleThe size, the compressed size, the compression ratio and other characteristics of the image can be divided into the same characteristic subset; typically we use the chi-square test to compute the sum j (index of feature subset, 0) of the feature values of a certain column of all samples<=j<N-1) to measure the similarity between features, and the first k features after ascending sorting are considered as the features with larger relevance. Thus, the jth feature subset may be denoted as Fj={Df0,Df1,......,Dfk-1Where Df isiRepresenting the features of all malware samples in the ith malware family in the feature subset. The vector of the features of a certain malware sample s of a certain malware family in a certain feature subset is denoted samplei={f1,f2,......fm-1In which fxThe characteristic value of a sample s is shown, m represents the number of characteristic values, and 0 < ═ x < ≦ m-1.
The model training phase and the s-value generation final result phase are described beginning below.
1) Model training phase
In the stage, each feature subset is extracted according to a malicious software sample, then certain machine learning/deep learning algorithms (XGboost) are used for training corresponding sub-models according to the feature subsets, and finally the sub-models are used for generating prediction results which are marked as P0,P1,…,Pn-1
2) s-value final result generation phase
This stage generates final malware classification results for the prediction results generated in 1) using s-value. Wherein s-value is proposed based on a hybrid distance criterion in pattern recognition, and the formula is as follows:
s-value=ECα+ENβ–EIγ
wherein s-value is a mixed distance standard value, EC is a centroid distance standard value, EN is a minimum distance standard value between malware families, EI is an inter-class distance standard value within a malware family, and α, β, and γ are weights of the 3 standard values, respectively, for adjusting the importance of each standard.
The method comprises the following steps: and solving the EC.
ECjRepresenting a subset of features FjDistance standard value between centers of mass of family of medium malicious software, in feature subset FjThe larger the value of the criterion, the higher the distance between the centroids of any two malware families, the higher the discrimination between different families. In this section, to obtain the EC standard value, first of all, the feature subset F should bejAccording to the feature vector Df of all malware samples in each malware familyiCalculating the centroid of each malware family, the formula is as follows:
Figure BDA0002557215290000051
where U is the number of malware in a malware family, αuIs the feature vector of the u-th malware sample in the malware family.
Finally, obtaining a feature subset F according to the formula for calculating the mass centerjCalculating the centroid of each malware family, then calculating the distance between the centroids of any two malware families, and finally obtaining a k x k matrix, and marking the matrix as EC and element ECpqIs represented by FjThe distance between the pth and qth malware family centroids, the main diagonal elements of the matrix are both 0.
Step two: and solving EN.
ENjIs represented in feature subset FjShortest distance between two malware families, i.e. traversal FjFinding the distance between any feature vector in the a-th malware family and any feature vector in the b-th malware family, taking the minimum value of the distance as the shortest distance between the a-th malware family and the b-th malware family, finally obtaining a k x k matrix, and marking the matrix as EN, wherein the element EN in the ENabIs FjThe shortest distance between the a-th and b-th malware families in the matrix, the diagonal element in the matrix is 0.
Step three: and solving the EI.
EIjTo representFeature subset FjThe inter-class distance criterion value in the malware family has two values, namely in the feature subset FjThe maximum value of the distance between the feature vectors of the malware samples in a malware family and the sum of the distance between the feature vectors of all the malware samples in the malware family.
Finally, a matrix k x 2 is obtained, and the matrix is marked as EIj=(eiab)k*2EI in EIa1Is FjMaximum distance between malware sample features within the a-th malware family, eia2Is FjSum of distances between malware sample features within the a-th malware family.
Step four: and solving the s-value.
Having obtained the values of EC, EN and EI through the three steps described above, we also need to know the values of α, β and γ in order to solve for the s-value of each feature subset. These three values are found by optimizing the loss function loss during the training of the corresponding sub-model using machine learning/deep learning. After the calculation, the s-value of each feature subset can be finally obtained, and then the final classification result of the malicious software can be obtained according to the prediction result generated in 1).
In the following, we will specifically describe the solution methods of α, β, and γ and the final malware classification result calculation method.
To obtain the values of α, β and γ, the formula s-value ═ EC α + EN β -EI γ can be written as follows:
Figure BDA0002557215290000061
when each feature subset is used as input to train each corresponding sub-model, the prediction result of each malicious software sample of the corresponding sub-model is output on the corresponding test set of the corresponding sub-model, and the prediction result of a certain malicious software sample is recorded as p0,p1,……,pn-1(all vectors are 1 x n). EC, EN and EI obtained by each feature subset are respectively marked as [ EC0,EN0,EI0]、[EC1,EN1,EI1]、……、[ECn-1,ENn-1,EIn-1](ii) a Then obtaining a group of s-values which are respectively marked as s-values0、s-value1、……、s-valuen-1To find the values of α and γ, the following equation is used as the loss function:
Figure BDA0002557215290000062
wherein y isvAnd (3) using a vector representation of 1 x N for the label of the v-th malware sample, wherein N represents the number of all training samples in the training set.
In order to converge the model, only the loss function needs to be minimized. Finally, by minimizing the loss function, the values of α, β, and γ can be found.
In addition, sub-models trained on the respective feature subsets have been derived, each sub-model deriving its own prediction, denoted P, from predictions of the test set0,P1,……,Pn-1. Then, through calculation, an s-value of each feature subset can be obtained and is marked as s0,s1,…sn-1. And finally calculating a final classification result through the following formula:
P=||s-value0||·P0+||s-value1||·P1+...+||s-valuen-1||·Pn-1
as shown in fig. 1, the specific implementation process of the present invention is as follows:
the method comprises the following steps: performing feature extraction on all the malware samples in the malware sample training set, and dividing the extracted features into n feature subsets F0,F1,…,Fn-1
Step two: according to the feature subset F0,F1,…,Fn-1Training the corresponding models respectively, denoted as M0,M1,…,Mn-1. The model is trained using a machine learning/deep learning algorithm, such as the XGBoost algorithm.
Step three: using model M0,M1,…,Mn-1Classifying the malware to be detected to respectively obtain corresponding classification results P0,P1,…,Pn-1
Step four: the mixer off-standard value s-value for each subset of features is calculated.
Step five: according to the result predicted by each model obtained in the third step, s-value of each characteristic subset in the fourth step is s-value0、s-value1、……、s-valuen-1And respectively taking the weights of the classification results obtained in the step three to obtain the classification results of the malicious software to be detected.
The method reduces the time for training the model by spatializing the features into a plurality of feature spaces and reducing the dimensionality of the feature spaces, and finally integrates the classification results of a plurality of sub-models into a final malware classification result through s-value. The model training can be performed in parallel for a plurality of feature subsets, and the prediction of new malicious software can be performed in parallel, so that the training time and the prediction time are both reduced.
In conclusion, the invention mainly aims at the problems that the identification accuracy of the deep learning method on the malicious software variants is low and the balance among the model accuracy, the training time and the prediction time is high, and provides an improved method. Soft-dependent (s-value) based on the criterion of the mixture distance in pattern recognition, an evaluation value can be obtained for each feature subset by using s-value, and finally the evaluation value is used as the weight of the corresponding predicted value to finally obtain the prediction result of the final malicious software. By the method, the accuracy of classification of the malicious software is improved, a balance is found among the classification accuracy, the model training time and the prediction time, and the model training time and the prediction time are shortened while the classification accuracy is ensured.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims (4)

1. A malware classification and detection method based on soft dependency is characterized by comprising the following steps:
the method comprises the following steps: performing feature extraction on all the malware samples in the malware sample training set, and dividing the extracted features into n feature subsets F0,F1,…,Fn-1(ii) a The malware sample training set is train ═ D0,D1,......,Dk-1In which D isiRepresenting the ith malware family in the malware sample training set, wherein i is more than or equal to 0 and less than or equal to k-1, and k represents the number of the malware families in the malware sample training set; j-th feature subset Fj={Df0,Df1,......,Dfk-1},DfiRepresenting the characteristics of all the malicious software samples in the ith malicious software family in the characteristic subset, wherein j is more than or equal to 0 and less than or equal to n-1;
step two: respectively using n feature subsets F in step one0,F1,…,Fn-1Machine learning for training samples to generate n models M0,M1,…,Mn-1
Step three: using n models M in step two0,M1,…,Mn-1Respectively classifying the malicious software to be detected to obtain n classification results P0,P1,…,Pn-1
Step four: calculating a mixer off-standard value s-value for each feature subset;
step five: according to the result predicted by each model obtained in the third step, s-value of each characteristic subset in the fourth step is s-value0,s-value1,…,s-valuen-1Respectively as the weight of the classification result obtained in the step three, and obtaining the classification result P | | | s-value of the malicious software to be detected0||·P0+||s-value1||·P1+...+||s-valuen-1||·Pn-1
2. The soft dependency-based malware classification and detection method as claimed in claim 1, wherein the jth feature subset F in step fourjIs off the standard value sjComprises the following steps:
s-valuej=ECjα+ENjβ–EIjγ
wherein, FjStandard value EC of distance between centroids of family of moderate malwarej=(ecpq)k*k,ecpqIs FjThe distance between the pth and qth malware family centroids; fjMinimum distance standard value EN between middle malware familiesj=(enab)k*k,enabIs FjThe shortest distance between the a-th and b-th malware families; fjInter-class distance criterion EI within the Medium malware familyj=(eiab)k*2,eia1Is FjMaximum distance between malware sample features within the a-th malware family, eia2Is Fjα, β and gamma are respectively the weight of EC, EN and the inter-class distance standard value EI in the malware family.
3. The soft dependency-based malware classification and detection method as claimed in claim 2, wherein the malware family has a centroid of
Figure FDA0002557215280000021
αuIs a feature of the u-th malware sample in the malware family.
4. As claimed in2, the method for classifying and detecting malware based on soft dependency is characterized in that α, β and gamma are obtained by minimizing a loss function in the machine learning process in the second step, wherein the loss function is
Figure FDA0002557215280000022
yvAnd N is the number of the malware samples in the malware sample training set.
CN202010595193.5A 2020-06-28 2020-06-28 Malicious software classification and detection method based on soft dependence Pending CN111783093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010595193.5A CN111783093A (en) 2020-06-28 2020-06-28 Malicious software classification and detection method based on soft dependence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010595193.5A CN111783093A (en) 2020-06-28 2020-06-28 Malicious software classification and detection method based on soft dependence

Publications (1)

Publication Number Publication Date
CN111783093A true CN111783093A (en) 2020-10-16

Family

ID=72760131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010595193.5A Pending CN111783093A (en) 2020-06-28 2020-06-28 Malicious software classification and detection method based on soft dependence

Country Status (1)

Country Link
CN (1) CN111783093A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416364A (en) * 2018-01-31 2018-08-17 重庆大学 Integrated study data classification method is merged in subpackage
CN108920953A (en) * 2018-06-16 2018-11-30 温州职业技术学院 A kind of malware detection method and system
CN110222511A (en) * 2019-06-21 2019-09-10 杭州安恒信息技术股份有限公司 The recognition methods of Malware family, device and electronic equipment
CN110378119A (en) * 2019-07-16 2019-10-25 合肥智瑞工程科技有限公司 A kind of malware detection method and system
CN110968869A (en) * 2019-11-22 2020-04-07 上海交通大学 Deep learning-based large-scale malicious software classification system and method
CN111027069A (en) * 2019-11-29 2020-04-17 暨南大学 Malicious software family detection method, storage medium and computing device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416364A (en) * 2018-01-31 2018-08-17 重庆大学 Integrated study data classification method is merged in subpackage
CN108920953A (en) * 2018-06-16 2018-11-30 温州职业技术学院 A kind of malware detection method and system
CN110222511A (en) * 2019-06-21 2019-09-10 杭州安恒信息技术股份有限公司 The recognition methods of Malware family, device and electronic equipment
CN110378119A (en) * 2019-07-16 2019-10-25 合肥智瑞工程科技有限公司 A kind of malware detection method and system
CN110968869A (en) * 2019-11-22 2020-04-07 上海交通大学 Deep learning-based large-scale malicious software classification system and method
CN111027069A (en) * 2019-11-29 2020-04-17 暨南大学 Malicious software family detection method, storage medium and computing device

Similar Documents

Publication Publication Date Title
He et al. Hashing as tie-aware learning to rank
Naeem et al. Identification of malicious code variants based on image visualization
Baptista et al. A novel malware detection system based on machine learning and binary visualization
Large et al. On time series classification with dictionary-based classifiers
Santos et al. Semi-supervised learning for unknown malware detection
Marteau Sequence covering for efficient host-based intrusion detection
Liu et al. A new learning approach to malware classification using discriminative feature extraction
CN110602120B (en) Network-oriented intrusion data detection method
CN111259397B (en) Malware classification method based on Markov graph and deep learning
CN112241530B (en) Malicious PDF document detection method and electronic equipment
EP2953062A1 (en) Learning method, image processing device and learning program
Assegie An optimized KNN model for signature-based malware detection
Tuncer et al. Automated malware recognition method based on local neighborhood binary pattern
AU2021100392A4 (en) A method for malware detection and classification using multi-level resnet paradigm on pe binary images
CN111783093A (en) Malicious software classification and detection method based on soft dependence
CN116582300A (en) Network traffic classification method and device based on machine learning
Nassar et al. Throttling malware families in 2d
Ting et al. Faster classification using compression analytics
Yang et al. A comparative study on feature extraction from protein sequences for subcellular localization prediction
Ziubina et al. Detection of Viruses Using Machine Learning Method
Peng Research On Detection Of Malicious Software
Sheikhi et al. A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
Tariang et al. Malware classification through attention residual network based visualization
US10394867B2 (en) Functional summarization of non-textual content based on a meta-algorithmic pattern
Hilool et al. Intrusion detection system based on bagging with support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination