CN109784046B - Malicious software detection method and device and electronic equipment - Google Patents

Malicious software detection method and device and electronic equipment Download PDF

Info

Publication number
CN109784046B
CN109784046B CN201811495637.7A CN201811495637A CN109784046B CN 109784046 B CN109784046 B CN 109784046B CN 201811495637 A CN201811495637 A CN 201811495637A CN 109784046 B CN109784046 B CN 109784046B
Authority
CN
China
Prior art keywords
feature
subset
features
characteristic
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811495637.7A
Other languages
Chinese (zh)
Other versions
CN109784046A (en
Inventor
胡一博
朱诗兵
李长青
帅海峰
吕登龙
徐华正
张记瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Original Assignee
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peoples Liberation Army Strategic Support Force Aerospace Engineering University filed Critical Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority to CN201811495637.7A priority Critical patent/CN109784046B/en
Publication of CN109784046A publication Critical patent/CN109784046A/en
Application granted granted Critical
Publication of CN109784046B publication Critical patent/CN109784046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a malicious software detection method, a malicious software detection device and electronic equipment, relates to the field of security protection of mobile terminals, can effectively detect malicious software, and solves the problems of redundancy, irrelevance and noise existing in the extraction of malicious software features in the prior art. The malware detection method comprises the following steps: extracting characteristics; generating a subset; and generating a detection model. The malware detection apparatus includes: the device comprises a feature extraction module, a subset generation module and a detection model generation module. The electronic equipment comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the processor realizes the malicious software detection method when executing the program.

Description

Malicious software detection method and device and electronic equipment
Technical Field
The present invention relates to security protection of a mobile terminal, and in particular, to a method and an apparatus for detecting malicious software, and an electronic device.
Background
The mobile intelligent terminal is a general name of various mobile terminals with network access capability and provided with an operating system and an application program. The great popularization and powerful functions of the mobile intelligent terminal including the Android system make the mobile intelligent terminal become an irreplaceable tool in various fields of modern society. Meanwhile, along with the mobile intelligent terminal, malicious software is rampant gradually, and the malicious software destroys a user system in an undetected state, steals user data and charges, and seriously threatens the privacy and property safety of a user. More seriously, relevant confidential information of the country, such as economy, politics, military affairs and the like, is threatened, and the security of the country is damaged. In order to deal with the increasing threat of malicious software of a mobile intelligent terminal and meet the detection requirement of the future mobile intelligent terminal on unknown malicious software, a detection method for Android malicious software is needed.
The existing detection method adopting machine learning starts from the artificial intelligence perspective, the classification algorithm is utilized to learn the characteristics of known malicious software, and an continuously evolving and generalized intelligent monitoring model is constructed to realize the automatic intelligent detection of the Android software. The key of the detection method is the selection of the characteristics, and the more effectively the selected characteristics can distinguish the malicious software from the normal software, the higher the efficiency of an intelligent detection model obtained by utilizing a machine learning classification algorithm is, and the better the detection effect on the malicious software is. However, the features of malware extracted by the existing method have the problems of redundancy, irrelevance and noise: the redundancy of the features influences the calculation efficiency of the classification algorithm and reduces the effectiveness of the detection model; the irrelevance of the characteristics leads to the need of more training samples to obtain a proper detection model; noise interference of features can directly lead to the construction of wrong detection models. The above-mentioned problems can greatly increase the consumption of machine learning in time and space, thereby causing the classification algorithm to be completely ineffective in analyzing and processing the features due to high cost.
Disclosure of Invention
In view of this, the present invention provides a detection method, an apparatus and an electronic device, which can meet the detection requirement of a mobile terminal on unknown malware and solve the problems of redundancy, irrelevance and noise faced by feature selection.
Based on the above purpose, the present invention provides a malware detection method. The malware detection method comprises the following steps:
extracting feature information of sample set software, abstracting the feature information into a digital form, and obtaining a sample set feature set and a sample set feature matrix;
filtering invalid features in the feature set by using a feature selection algorithm to obtain an optimal feature subset;
and training the feature matrix corresponding to the optimal feature subset by adopting a machine learning classification algorithm to generate a detection model.
Optionally, the extracting feature information of the sample set software, and abstracting the feature information into a digital form to obtain a sample set feature set and a sample set feature matrix includes:
processing the sample set software installation package to obtain a global configuration file containing authority information and a decompiled file containing API information;
extracting corresponding authority and API characteristic information from the global configuration file and the decompilated file;
and vectorizing and abstracting the extracted authority and API characteristic information into a digital form to obtain a sample set characteristic set and a sample set characteristic matrix.
Optionally, the filtering invalid features in the sample set feature set by using a feature selection algorithm to obtain an optimal feature subset includes:
the method comprises the following steps: initializing and setting the sample set feature set and the related parameter constants in the sample set feature matrix and the related parameters used in the subset generation process;
step two: calculating the characteristic frequency of each characteristic in the sample set characteristic set according to a characteristic frequency calculation formula, and filtering out irrelevant characteristics through calculation and comparison to obtain a irrelevant characteristic removing subset;
the characteristic frequency calculation formula is as follows:
Figure BDA0001896816610000021
wherein, TF (f)j) Representing a feature fjCharacteristic frequency of (1), NbenignIndicating a normal number of samples in a normal software set,
Figure BDA0001896816610000022
representing a feature fjThe number of samples present; n is a radical ofmalwareRepresenting the number of malicious samples in the set of malicious samples,
Figure BDA0001896816610000023
is characterized byjThe number of samples present;
step three: calculating the information gain of each feature in the decorrelation feature subset according to an information gain calculation formula, and obtaining a denoising feature subset through calculation, comparison and screening;
the information gain calculation formula is as follows:
IG(fj)=H(Y)-H(Y|fj)
wherein, IG (f)j) Representing a feature fjInformation gain for classification system, H (Y) represents entropy of classification system, H (Y | f)j) Representing the conditional entropy of the classification system;
step four: according to chi2A statistic calculation formula for calculating CHI value (χ) of each feature in the de-noised feature subset and the corresponding feature matrix2Statistic value) and CHI value between the features, and obtaining redundancy-removing feature subsets through calculation, comparison and screening;
the x2The statistical value calculation formula is as follows:
CHI(fi,fj)=ξ11122122
wherein CHI (f)i,fj) Representing a feature fi,fjChi of2Statistical value xi11Representing a feature fiAnd feature fjSimultaneous deviation of theoretical and actual values, ξ12Representing a feature fiThe feature f does not appear in the appearing samplejDeviation of the theoretical value from the actual value, ξ21Indicates the absence of feature fiIs present in the number of samples of (a) to (b)jDeviation of the theoretical value from the actual value, ξ22Indicates the absence of feature fiHas no feature f in the number of samplesjDeviation of the theoretical value from the actual value of (a);
step five: and analyzing and judging the redundancy-removing feature subset, and performing subset optimization according to a judgment result to obtain an optimal feature subset.
Optionally, the step one specifically includes:
characterizing said sample set as FvThe sample set feature matrix is XtrainThe number of selected features is Mv(ii) a Setting an initial threshold value of information gain to a specific value thetaigSetting the information gain step length as lambda and setting the information gain cycle step numberSetting the initial value n to be 0, and setting the detection rate threshold value to be 0.95; utilizing a machine learning classification algorithm to perform a feature matrix X on the sample settrainTraining is carried out, and the maximum detection rate is recorded as TPmax
Optionally, the second step includes:
step 1: calculating the characteristic frequency of each characteristic in the characteristic set of the sample set;
step 2: filtering out the features with the feature frequency value of 0, wherein the rest features form an intermediate feature subset
Figure BDA0001896816610000031
And step 3: the intermediate feature subsets are classified by machine learning
Figure BDA0001896816610000032
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf
And 4, step 4: filtering out the intermediate feature subset
Figure BDA0001896816610000033
Features with the smallest medium feature frequency, the remaining features constituting a subset of features
Figure BDA0001896816610000041
The feature subsets are classified by machine learning
Figure BDA0001896816610000042
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf′;
And 5: comparison of TPtfAnd TPtfValue of' if TPtf=TPtf', then subset the features
Figure BDA0001896816610000043
Intermediate feature subset as new
Figure BDA0001896816610000044
Returning to the step 3; if TPtf≠TPtf' outputting the intermediate feature subset
Figure BDA0001896816610000045
Step 6: subset the intermediate features
Figure BDA0001896816610000046
Is expressed as a feature subset Fv1The number of selected features is Mv1Said subset of features Fv1I.e. the decorrelated feature subset.
Optionally, the third step includes:
step 1: calculating an information gain for each feature in the decorrelated subset of features;
step 2: adding 1 to the number of the circulation steps on the basis of the original number, namely n is n + 1;
and step 3: select out of satisfaction IG>(θigFeatures of- (n-1) lambda) constitute a subset of features
Figure BDA0001896816610000047
The number of selected features is recorded as
Figure BDA0001896816610000048
Select out of satisfaction IG>(θig-n λ) of features constituting a feature subset
Figure BDA0001896816610000049
The number of selected features is recorded as
Figure BDA00018968166100000410
And 4, step 4: comparison
Figure BDA00018968166100000411
And
Figure BDA00018968166100000412
a value of, if
Figure BDA00018968166100000413
Returning to the step 2; if it is not
Figure BDA00018968166100000414
Subset of output features
Figure BDA00018968166100000415
And 5: the feature subset to be output
Figure BDA00018968166100000416
Is expressed as a feature subset Fv2The number of selected features is Mv2Said subset of features Fv2I.e. the subset of de-noised features.
Optionally, the fourth step includes:
step 1: computing the de-noised feature subset Fv2Each feature in (b) and the CHI value of the corresponding feature matrix, and the largest CHI value is recorded as θchi
Step 2: calculating CHI values between features, and selecting CHI values between features larger than thetachiAnd selecting the features with smaller IG value, and arranging the features from CHI value from large to small to form a redundant feature set
Figure BDA00018968166100000417
The number of redundant features selected is
Figure BDA00018968166100000418
And step 3: setting the cycle step number m to be 0;
and 4, step 4: adding 1 to the number of the circulation steps on the basis of the original number, namely m is m + 1;
and 5: according to the redundant feature set
Figure BDA00018968166100000419
Arranged order culling of medium redundancy features Fv2To obtain a subset of features
Figure BDA00018968166100000420
The number of selected features is
Figure BDA00018968166100000421
Feature subsets by machine learning classification algorithms
Figure BDA00018968166100000422
Training the corresponding characteristic matrix to obtain the corresponding detection rate
Figure BDA00018968166100000423
Step 6: comparing m with
Figure BDA00018968166100000424
A value of, if
Figure BDA00018968166100000425
Returning to the step 4; otherwise, executing the next step;
and 7: comparing all detection rates
Figure BDA00018968166100000426
The maximum detection rate was recorded as
Figure BDA00018968166100000427
Maximum detection rate
Figure BDA0001896816610000051
The corresponding feature subset is denoted as Fv3The number of selected features is Mv3Said subset of features Fv3I.e. a subset of de-redundant features.
Optionally, the step five specifically includes:
comparing the detection rate in the fourth step
Figure BDA0001896816610000052
And the maximum detection rate TPmaxComparing the two values, and assigning the larger value to TPmax(ii) a The TP ismaxComparing with the initial set detection rate threshold value of 0.95 if TPmax<0.95, returning to the third step; if TPmaxGreater than or equal to 0.95, then TPmaxThe corresponding feature subset is the optimal feature subset, and the optimal feature subset is denoted as Fv
Optionally, the method for generating a detection model specifically includes:
setting a detection rate threshold, training a feature matrix corresponding to the optimal feature subset by respectively utilizing a Bayesian algorithm, a support vector machine algorithm, a decision tree algorithm and a nearest neighbor classification algorithm, and selecting the optimal detection model to output according to the set detection rate threshold.
The method for selecting the optimal detection model according to the set detection rate threshold specifically comprises the following steps:
if the detection rate of the detection model obtained through training is not less than the threshold value, outputting the corresponding detection model;
if the detection rate of the detection model obtained through training is lower than the threshold value, changing the combination mode of the characteristics, retraining to obtain a new detection model until the threshold value requirement is met, and outputting the detection model meeting the threshold value requirement;
and if all possible feature combination modes are traversed and the detection rate of the obtained detection model still fails to meet the threshold requirement, outputting the detection model with the highest detection rate in the traversal process.
The invention also provides a malicious software detection device, which comprises:
a feature extraction module: the system comprises a sample set software, a sample set feature matrix and a sample set feature matrix, wherein the sample set software is used for extracting feature information of the sample set software, and the feature information is abstracted into a digital form to obtain the sample set feature set and the sample set feature matrix;
a subset generation module: the characteristic selection algorithm is used for filtering invalid characteristics in the sample set characteristic set to obtain an optimal characteristic subset;
a detection model generation module: and the detection model is generated by training the feature matrix corresponding to the optimal feature subset by adopting a machine learning classification algorithm.
The invention also provides electronic equipment for detecting the malicious software, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the malicious software detection method provided by the invention.
From the above, it can be seen that the malware detection method, the malware detection device and the electronic device provided by the invention can effectively detect malware by extracting the permission and sensitive API features of malware and normal software, performing optimal selection on the extracted features by using a feature selection algorithm, and training the selected permission and sensitive API combined features by using a machine learning classification algorithm. The adopted feature selection algorithm is based on feature frequency, information gain and chi2Statistical design: filtering features irrelevant to classification in the feature set by using a feature frequency method, selecting features having large influence on classification by using an information gain method, and adopting Chi2The statistical method eliminates the characteristic with high redundancy in the characteristic set. Therefore, the malicious software detection method provided by the invention can well overcome the problems of redundancy, irrelevance and noise existing in the malicious software features extracted by the prior art. The feature selection algorithm is to select feature frequency, information gain and x2The three methods are combined according to the preferred specific sequence, and compared with the method that the three methods are simply combined or one or two of the three methods are selected, the method has a better optimization selection effect, the method trains the feature subsets obtained by the feature selection algorithm by utilizing the machine learning classification algorithm, the finally obtained detection model has higher efficiency, and the detection effect on the malicious software is better.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram illustrating a malware detection method according to an embodiment of the present invention;
FIG. 2 is a block flow diagram of a malware detection method in an embodiment of the invention;
fig. 3 is a flow chart of a feature extraction method in the malware detection method in the embodiment of the present invention;
FIG. 4 is a diagram illustrating a subset generation method in the malware detection method according to an embodiment of the present invention;
FIG. 5 is a flow chart of a method for decorrelating feature frequencies in a malware detection method in an embodiment of the present invention;
FIG. 6 is a flowchart illustrating an information gain denoising method in a malware detection method according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating χ "in the malware detection method according to the embodiment of the present invention2A statistical redundancy removal method flow diagram;
fig. 8 is a schematic diagram illustrating a method for generating a detection model in a malware detection method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
In one aspect of the invention, a malware detection method is provided.
As shown in fig. 1 and 2, in some embodiments of a malware detection method provided by the present invention, the malware detection method specifically includes:
s101: and (5) feature extraction. Decompiling the software sample by using reverse engineering, extracting characteristic information, abstracting the characteristic information into a sample set characteristic set and a sample set characteristic matrix which are easy to analyze and in a digital form, and storing the sample set characteristic set and the sample set characteristic matrix into a database;
s102: and (4) generating a subset. Filtering invalid features in the sample set feature set by using a feature selection algorithm to obtain an optimal feature subset;
s103: and generating a detection model. And training the feature matrix corresponding to the optimal feature subset by adopting a machine learning classification algorithm to generate a dual-feature detection model.
As shown in fig. 3, in another embodiment of a malware detection method provided by the present invention, the feature extraction method specifically includes:
s301: processing the software APK package, decoding an android manifest xml file containing authority information into a global configuration file in a plaintext format, and inversely compiling a classes.
S302: extracting corresponding authority and API characteristic information from the global configuration file and the smali file;
s303: vectorizing and abstracting the extracted feature semantic information into a digital form to obtain a sample set feature set and a sample set feature matrix. The sample set feature matrix specifically comprises: a feature is represented in the sample by 1, the feature is represented in the sample by 0, and finally a binary sample set feature matrix is used to describe the features of the sample set, wherein the rows represent sample vectors and the columns represent feature vectors.
As shown in fig. 4, in another embodiment of a malware detection method provided by the present invention, the subset generation method specifically includes:
s401, step one: constant initialization: and initializing and setting the sample set feature set, the related parameter constants in the sample set feature matrix and the related parameters used in the subset generation process.
S402, step two: the characteristic frequencies are decorrelated: and calculating the characteristic frequency of each characteristic in the characteristic set of the sample set according to a characteristic frequency calculation formula. Filtering irrelevant features through calculation comparison to obtain a irrelevant feature removing subset;
the characteristic frequency calculation formula is as follows:
Figure BDA0001896816610000081
wherein, TF (f)j) Representing a feature fjThe characteristic frequency of (c). N is a radical ofbenignIndicating a normal number of samples in a normal software set,
Figure BDA0001896816610000082
representing a feature fjThe number of samples present; n is a radical ofmalwareRepresenting the number of malicious samples in the set of malicious samples,
Figure BDA0001896816610000083
is characterized byjNumber of samples present.
Step S403, III: information gain denoising: calculating the information gain of each feature in the decorrelation feature subset according to an information gain calculation formula, and obtaining a denoising feature subset through calculation, comparison and screening;
the information gain calculation formula is as follows:
IG(fj)=H(Y)-H(Y|fj)
wherein, IG (f)j) Representing a feature fjInformation gain for classification system, H (Y) represents entropy of classification system, H (Y | f)j) Representing the conditional entropy of the classification system.
The information gain calculation formula is specifically explained as follows:
the probability of the occurrence of a normal software sample is P (c)0) The probability of the occurrence of a malware sample is P (c)1) Then the entropy of the classification system is defined as:
Figure BDA0001896816610000084
given characteristic fjConditional probability of each class P (c) of occurrencei|fj1), the conditional entropy of the classification system is defined as:
Figure BDA0001896816610000085
then, characteristic fjWhen not present, the entropy of the classification system is defined as:
Figure BDA0001896816610000086
wherein the probability P (c)i) Has a value of ciThe proportion of the number of class samples to the total number of training samples; probability P (f)jThe value of 1) is the occurrence characteristic fjThe ratio of the number of samples to the total number of samples, the probability P (f)j0) is the characteristic fjThe number of non-appearing samples is the ratio of the total number of samples.
Thus, characteristic fjInformation gain IG (f) for classification systemj) The calculation formula of (a) is as follows:
Figure BDA0001896816610000091
s404, step four: chi shape2And (3) statistical redundancy removal: according to chi2A statistic calculation formula for calculating CHI value (χ) of each feature in the de-noised feature subset and the corresponding feature matrix2Statistical value) and the CHI value between features. Obtaining redundancy-removing characteristic subsets through calculation, comparison and screening;
two characteristics fi,fjChi of2The statistical value calculation formula is as follows:
CHI(fi,fj)=ξ11122122
wherein CHI (f)i,fj) Representing a feature fi,fjChi of2Statistical value xi11Representing a feature fiAnd feature fjDeviations of the theoretical value and the actual value occurring at the same time; xi12Representing a feature fiThe feature f does not appear in the appearing samplejDeviation of the theoretical value from the actual value of (a); xi21Indicates the absence of feature fiIs present in the number of samples of (a) to (b)jDeviation of the theoretical value from the actual value of (a); xi22Indicates the absence of feature fiHas no feature f in the number of samplesjDeviation of the theoretical value from the actual value.
χ2The statistical value calculation formula is specifically explained as follows:
χ2the statistics being based on actual and theoretical valuesThe deviation measures the degree of correlation between the features and the categories. Suppose two features fiAnd fjThe number of samples in which both features occur simultaneously is
Figure BDA0001896816610000092
The number of simultaneously non-appearing samples is
Figure BDA0001896816610000093
Characteristic fiIs present and fjThe number of samples not appearing is
Figure BDA0001896816610000094
Characteristic fiNot present but fjThe number of samples appearing is
Figure BDA0001896816610000095
The specific relationship between them is shown in table 1:
TABLE 1 characteristic distribution Table
Figure BDA0001896816610000096
Figure BDA0001896816610000101
Where N is the total number of samples, the value of which is the sum of the four cases, i.e.
Figure BDA0001896816610000102
Figure BDA0001896816610000103
Thus, the feature f can be obtainediThe frequency of occurrence is:
Figure BDA0001896816610000104
characteristic fjThe number of samples appearing is
Figure BDA0001896816610000105
The characteristic f appears theoreticallyjIn the sample of (2), the feature f also appearsiThe number of samples of (a) is:
Figure BDA0001896816610000106
then, characteristic fiAnd fjDeviation xi of given theoretical value and actual value simultaneously11Comprises the following steps:
Figure BDA0001896816610000107
in the same way, the feature f can be obtainediOccurrence of characteristic fjNumber of non-existent theoretical samples E12Characteristic fiNot present but feature fjNumber of theoretical samples E present21Characteristic fiAnd characteristic fjNumber of theoretical samples E that did not appear22And their theoretical and actual values deviate ξ12、ξ21、ξ22The calculation formula is as follows:
Figure BDA0001896816610000108
Figure BDA0001896816610000109
Figure BDA00018968166100001010
thus, two characteristics fiAnd fjChi of2The statistical value is deviation xi11、ξ12、ξ21、ξ22To sum, i.e.
Figure BDA00018968166100001011
S405, step five: generating an optimal feature subset: and analyzing and judging the redundancy-removed feature subset, and performing further operation according to a judgment result to obtain a final optimal feature subset.
Wherein, the first step is specifically as follows:
recording the sample set characteristic matrix as XtrainThe selected feature set is FvThe number of selected features is Mv. Setting an initial threshold value of information gain to a specific value thetaigThe information gain step is set to λ, the information gain cycle step number n is set to 0, and the detection rate threshold is set to 0.95. Using machine learning classification algorithm to carry out on original feature matrix XtrainTraining is carried out, and the maximum detection rate is recorded as TPmax
As shown in fig. 5, the second step specifically includes:
s501: calculating the characteristic frequency of each characteristic in all the sample set characteristic sets;
s502: filtering out the features with the feature frequency value of 0, wherein the rest features form an intermediate feature subset
Figure BDA0001896816610000111
S503: intermediate feature subsets by machine learning classification algorithms
Figure BDA0001896816610000112
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf
S504: filtering out intermediate feature subsets
Figure BDA0001896816610000113
Features with the smallest medium feature frequency, the remaining features constituting a subset of features
Figure BDA0001896816610000114
Feature subsets by machine learning classification algorithms
Figure BDA0001896816610000115
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf′;
S505: comparison of TPtfAnd TPtfValue of' if TPtf=TPtf', then subset the features
Figure BDA0001896816610000116
Intermediate feature subset as new
Figure BDA0001896816610000117
Returning to step S503; if TPtf≠TPtf', output the intermediate feature subset
Figure BDA0001896816610000118
S506: denote the feature subset of the output as Fv1The number of selected features is Mv1Said subset of features Fv1I.e. the decorrelated feature subset.
As shown in fig. 6, the third step specifically includes:
s601, calculating the information gain of each feature in the decorrelation feature subset;
s602: adding 1 to the number of the circulation steps on the basis of the original number, namely n is n + 1;
s603: select out of satisfaction IG>(θigFeatures of- (n-1) lambda) constitute a subset of features
Figure BDA0001896816610000119
The number of selected features is recorded as
Figure BDA00018968166100001110
Select out of satisfaction IG>(θig-n λ) of features constituting a feature subset
Figure BDA00018968166100001111
The number of selected features is recorded as
Figure BDA00018968166100001112
S604: comparison
Figure BDA00018968166100001113
And
Figure BDA00018968166100001114
a value of, if
Figure BDA00018968166100001115
Returning to step S602; if it is not
Figure BDA00018968166100001116
Subset of output features
Figure BDA00018968166100001117
S605: denote the feature subset of the output as Fv2The number of selected features is Mv2Said subset of features Fv2I.e. the subset of de-noised features.
As shown in fig. 7, the fourth step specifically includes:
s701: computing a feature subset Fv2Each feature in (1) is associated with a corresponding feature matrix CHI value (χ)2Statistical value) of the values, the greatest CHI value among the values is represented as θchi
S702: calculating CHI values between features, and selecting CHI values between features larger than thetachiAnd selecting the features with smaller IG value, and arranging the features from CHI value from large to small to form a redundant feature set
Figure BDA00018968166100001118
The number of redundant features selected is
Figure BDA0001896816610000121
S703: setting the cycle step number m to be 0;
s704: adding 1 to the number of the circulation steps on the basis of the original number, namely m is m + 1;
s705: from a set of redundant features
Figure BDA0001896816610000122
Arranged order culling of medium redundancy features Fv2To obtain a subset of features
Figure BDA0001896816610000123
The number of selected features is
Figure BDA0001896816610000124
Feature subsets by machine learning classification algorithms
Figure BDA0001896816610000125
Training the corresponding characteristic matrix to obtain the corresponding detection rate
Figure BDA0001896816610000126
S706: comparing m with
Figure BDA0001896816610000127
A value of, if
Figure BDA0001896816610000128
Returning to step S704; otherwise, executing the next step;
s707: comparing all detection rates
Figure BDA0001896816610000129
The maximum detection rate was recorded as
Figure BDA00018968166100001210
Maximum detection rate
Figure BDA00018968166100001211
The corresponding feature subset is denoted as Fv3The number of selected features is Mv3Said subset of features Fv3I.e. a subset of de-redundant features.
The method for generating the optimal feature subset in the fifth step specifically comprises the following steps:
the detection rate obtained in the fourth step
Figure BDA00018968166100001212
And maximum detection rate TPmaxComparing the two values, and assigning the larger value to TPmax(ii) a The TP ismaxComparing with the initial set detection rate threshold value of 0.95 if TPmax<0.95, returning to the third step; if TPmaxGreater than or equal to 0.95, then TPmaxThe corresponding feature subset, i.e. the best feature subset, is denoted as Fv
As shown in fig. 8, in another embodiment of the malware detection method provided by the present invention, the method for generating the detection model specifically includes:
firstly, setting a detection rate threshold, then respectively training authority and sensitive API (application program interface) characteristics by utilizing a Bayesian algorithm (NB), a support vector machine algorithm (SVM), a decision tree algorithm (DT) and a nearest neighbor classification algorithm (KNN), and selecting the optimal detection model to output according to the set detection rate threshold.
As shown in fig. 8, in another embodiment of the malware detection method provided by the present invention, the method for selecting the optimal detection model output according to the set detection threshold specifically includes:
if the detection rate of the detection model obtained through training is not less than the threshold value, outputting the corresponding detection model;
if the detection rate of the detection model obtained through training is lower than the threshold value, changing the combination mode of the characteristics, retraining to obtain a new detection model until the threshold value requirement is met, and outputting the detection model meeting the threshold value requirement;
and if all possible feature combination modes are traversed and the detection rate of the obtained detection model still fails to meet the threshold requirement, outputting the detection model with the highest detection rate in the traversal process.
As shown in table 2, the results of testing the detection performance in the embodiment of the malware detection method provided by the present invention are shown.
TABLE 2 test performance results
Figure BDA0001896816610000131
The specific implementation method comprises the following steps:
5000 normal software detected in the Anzhi market and 5000 malicious software on VirusShare are used as sample sets, and a 10-fold cross validation method is adopted for testing (10-fold cross validation is to divide sample data into 10 mutually exclusive subsets with similar sizes, a union set of 9 subsets is used as a training set each time, the rest subset is used as a testing set, 10 times of training and testing are carried out, and the average value of the 10 testing results is finally obtained).
And respectively carrying out comparative analysis on the detection performances of the unused characteristic selection algorithm and the characteristic selection algorithm based on characteristic frequency, information gain and statistics.
Wherein the significance of the performance index is as follows
(1) The TPR (detection rate) is the ratio of the correct positive case to the actual positive case for the final classification of the classifier, and the greater the TPR, the better the classification effect of the classifier on the positive case is. The calculation formula is as follows:
Figure BDA0001896816610000132
(2) FPR (false alarm rate) is the ratio of the positive case and the actual counter case of the final classification error of the classifier, and the larger the FPR is, the poorer the classification effect of the classifier on the counter case is. The calculation formula is as follows:
Figure BDA0001896816610000133
(3) acc (accuracy) is the ratio of all the finally correctly classified samples of the classifier to the total samples, and represents the accurate classification degree of the classifier, and the larger the Acc is, the better the whole classification capability of the classifier is. The calculation formula is as follows:
Figure BDA0001896816610000141
in the formula, the number of the samples with the TP (true example) being true is detected as the number of the true examples, namely the correct true examples are detected; the FP (false positive example) is the sample whose true condition is negative, and is detected as the number of positive examples, namely the negative example of the detection error; FN (false negative) is the true case is the positive sample is detected as the number of the negative examples, namely the classification error positive example; the sample whose true TN is true is detected as the number of counter-examples, i.e. the correct counter-example is detected.
According to the test detection results, after the feature selection algorithm is used for optimally selecting the software features, the detection rate and the accuracy of the detection model obtained by training the software features by using four different machine learning classification algorithms are higher than those of the detection model obtained by the traditional malware detection method without using the feature selection algorithm, and the false alarm rate of the detection model obtained by the malware detection method is lower than that of the detection model obtained by the traditional malware detection method.
The malicious software detection method provided by the invention has higher detection efficiency and better detection effect.
In another aspect of the invention, a malware detection apparatus is provided.
In some embodiments of a malware detection apparatus provided by the present invention, the apparatus comprises:
a feature extraction module: the system comprises a sample set software, a sample set feature matrix and a sample set feature matrix, wherein the sample set software is used for extracting feature information of the sample set software, and the feature information is abstracted into a digital form to obtain the sample set feature set and the sample set feature matrix;
a subset generation module: the characteristic selection algorithm is used for filtering invalid characteristics in the sample set characteristic set to obtain an optimal characteristic subset;
a detection model generation module: and the detection model is generated by training the feature matrix corresponding to the optimal feature subset by adopting a machine learning classification algorithm.
In another aspect of the invention, a malware detection electronic device is provided.
In some embodiments of the present invention, an electronic device for malware detection includes:
a memory, a processor, and a computer program stored on the memory and executable on the processor.
When the processor executes the program, the malicious software detection method provided by the invention is realized.
The apparatus and the electronic device of the foregoing embodiments are used to implement the corresponding method in the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (7)

1. A malware detection method, comprising:
extracting feature information of sample set software, abstracting the feature information into a digital form, and obtaining a sample set feature set and a sample set feature matrix;
filtering invalid features in the feature set by using a feature selection algorithm to obtain an optimal feature subset;
training a feature matrix corresponding to the optimal feature subset by adopting a machine learning classification algorithm to generate a detection model;
wherein the filtering the invalid features in the sample set feature set by using the feature selection algorithm to obtain the optimal feature subset comprises:
the method comprises the following steps: initializing and setting the sample set feature set, the relevant parameter constants in the sample set feature matrix and the relevant parameters used in the subset generation process, including:
characterizing said sample set as FvThe sample set feature matrix is XtrainThe number of selected features is Mv(ii) a Setting an initial threshold value of information gain to a specific value thetaigSetting the information gain step length as lambda, setting the initial value n of the information gain circulation step number as 0, and setting the detection rate threshold value as 0.95; utilizing a machine learning classification algorithm to perform a feature matrix X on the sample settrainTraining is carried out, and the maximum detection rate is recorded as TPmax
Step two: calculating the characteristic frequency of each characteristic in the sample set characteristic set according to a characteristic frequency calculation formula, and filtering out irrelevant characteristics through calculation and comparison to obtain a irrelevant characteristic removing subset;
the characteristic frequency calculation formula is as follows:
Figure FDA0002760719390000011
wherein, TF (f)j) Representing a feature fjCharacteristic frequency of (1), NbenignIndicating a normal number of samples in a normal software set,
Figure FDA0002760719390000012
representing a feature fjThe number of samples present; n is a radical ofmalwareRepresenting the number of malicious samples in the set of malicious samples,
Figure FDA0002760719390000013
is characterized byjThe number of samples present;
the calculating the characteristic frequency of each characteristic in the sample set characteristic set according to the characteristic frequency calculation formula, and filtering out irrelevant characteristics through calculation and comparison to obtain a irrelevant characteristic subset, comprises:
step 1: calculating the characteristic frequency of each characteristic in the characteristic set of the sample set;
step 2: filtering out the features with the feature frequency value of 0, wherein the rest features form an intermediate feature subset
Figure FDA0002760719390000014
And step 3: the intermediate feature subsets are classified by machine learning
Figure FDA0002760719390000015
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf
And 4, step 4: filtering out the intermediate feature subset
Figure FDA0002760719390000021
Features with the smallest medium feature frequency, the remaining features constituting a subset of features
Figure FDA0002760719390000022
The feature subsets are classified by machine learning
Figure FDA0002760719390000023
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf′;
And 5: comparison of TPtfAnd TPtfValue of' if TPtf=TPtf', then subset the features
Figure FDA0002760719390000024
Intermediate feature subset as new
Figure FDA0002760719390000025
Returning to the step 3; if TPtf≠TPtf' outputting the intermediate feature subset
Figure FDA0002760719390000026
Step 6: subset the intermediate features
Figure FDA0002760719390000027
Is expressed as a feature subset Fv1The number of selected features is Mv1Said subset of features Fv1I.e. the decorrelated feature subset;
step three: calculating the information gain of each feature in the decorrelation feature subset according to an information gain calculation formula, and obtaining a denoising feature subset through calculation, comparison and screening;
the information gain calculation formula is as follows:
IG(fj)=H(Y)-H(Y|fj)
wherein, IG (f)j) Representing a feature fjInformation gain for classification system, H (Y) represents entropy of classification system, H (Y | f)j) Representing the conditional entropy of the classification system;
step four: according to chi2A statistic calculation formula for calculating CHI value (χ) of each feature in the de-noised feature subset and the corresponding feature matrix2Statistic value) and CHI value between the features, and obtaining redundancy-removing feature subsets through calculation, comparison and screening;
the x2The statistical value calculation formula is as follows:
CHI(fi,fj)=ξ11122122
wherein CHI (f)i,fj) Representing a feature fi,fjChi of2Statistical value xi11Representing a feature fiAnd feature fjSimultaneous deviation of theoretical and actual values, ξ12Representing a feature fiThe feature f does not appear in the appearing samplejDeviation of the theoretical value from the actual value, ξ21Indicates the absence of feature fiIs present in the number of samples of (a) to (b)jDeviation of the theoretical value from the actual value, ξ22Indicates the absence of feature fiHas no feature f in the number of samplesjDeviation of the theoretical value from the actual value of (a);
step five: and analyzing and judging the redundancy-removing feature subset, and performing subset optimization according to a judgment result to obtain an optimal feature subset.
2. The method of claim 1, wherein the extracting feature information of the sample set software, and abstracting the feature information into a digital form to obtain a sample set feature set and a sample set feature matrix comprises:
processing the sample set software installation package to obtain a global configuration file containing authority information and a decompiled file containing API information;
extracting corresponding authority and API characteristic information from the global configuration file and the decompilated file;
and vectorizing and abstracting the extracted authority and API characteristic information into a digital form to obtain a sample set characteristic set and a sample set characteristic matrix.
3. The method of claim 1, wherein step three comprises:
step 1: calculating an information gain for each feature in the decorrelated subset of features;
step 2: adding 1 to the number of the circulation steps on the basis of the original number, namely n is n + 1;
and step 3: selecting a composition satisfying IG > (theta)igFeatures of- (n-1) lambda) constitute a subset of features
Figure FDA0002760719390000031
The number of selected features is recorded as
Figure FDA0002760719390000032
Selecting a composition satisfying IG > (theta)ig-n λ) of features constituting a feature subset
Figure FDA0002760719390000033
The number of selected features is recorded as
Figure FDA0002760719390000034
And 4, step 4: comparison
Figure FDA0002760719390000035
And
Figure FDA0002760719390000036
a value of, if
Figure FDA0002760719390000037
Returning to the step 2; if it is not
Figure FDA0002760719390000038
Figure FDA0002760719390000039
Subset of output features
Figure FDA00027607193900000310
And 5: the feature subset to be output
Figure FDA00027607193900000311
Is expressed as a feature subset Fv2The number of selected features is Mv2Said subset of features Fv2I.e. the subset of de-noised features.
4. The method of claim 3, wherein the fourth step comprises:
step 1: computing the de-noised feature subset Fv2Each feature in (b) and the CHI value of the corresponding feature matrix, and the largest CHI value is recorded as θchi
Step 2: calculating CHI values between features, and selecting CHI values between features larger than thetachiAnd selecting the features with smaller IG value, and arranging the features from CHI value from large to small to form a redundant feature set
Figure FDA00027607193900000312
The number of redundant features selected is
Figure FDA00027607193900000313
And step 3: setting the cycle step number m to be 0;
and 4, step 4: adding 1 to the number of the circulation steps on the basis of the original number, namely m is m + 1;
and 5: according to the redundant feature set
Figure FDA00027607193900000314
Arranged order culling of medium redundancy features Fv2To obtain a subset of features
Figure FDA00027607193900000315
The number of selected features is
Figure FDA00027607193900000316
Feature subsets by machine learning classification algorithms
Figure FDA00027607193900000317
Training the corresponding characteristic matrix to obtain the corresponding detection rate
Figure FDA00027607193900000318
Step 6: comparing m with
Figure FDA00027607193900000319
A value of, if
Figure FDA00027607193900000320
Returning to the step 4; otherwise, executing the next step;
and 7: comparing all detection rates
Figure FDA00027607193900000321
The maximum detection rate was recorded as
Figure FDA00027607193900000322
Maximum detection rate
Figure FDA0002760719390000041
The corresponding feature subset is denoted as Fv3The number of selected features is Mv3Said subset of features Fv3I.e. a subset of de-redundant features.
5. The method according to claim 4, wherein the step five is specifically:
comparing the detection rate in the fourth step
Figure FDA0002760719390000042
And the maximum detection rate TPmaxComparing the two values, and assigning the larger value to TPmax(ii) a The TP ismaxComparing with the initial set detection rate threshold value of 0.95 if TPmaxIf the value is less than 0.95, returning to the third step; if TPmaxGreater than or equal to 0.95, then TPmaxThe corresponding feature subset is the optimal feature subset, and the optimal feature subset is denoted as Fv
6. A malware detection apparatus, comprising:
the characteristic extraction module is used for extracting the characteristic information of the sample set software, abstracting the characteristic information into a digital form and obtaining a sample set characteristic set and a sample set characteristic matrix;
the subset generation module is used for filtering invalid features in the feature set by using a feature selection algorithm to obtain an optimal feature subset;
the detection model generation module is used for training the feature matrix corresponding to the optimal feature subset by adopting a machine learning classification algorithm to generate a detection model;
the subset generation module filters invalid features in the sample set feature set by using a feature selection algorithm to obtain an optimal feature subset, and the method comprises the following steps:
the method comprises the following steps: initializing and setting the sample set feature set, the relevant parameter constants in the sample set feature matrix and the relevant parameters used in the subset generation process, including:
characterizing said sample set as FvThe sample set feature matrix is XtrainThe number of selected features is Mv(ii) a Setting an initial threshold value of information gain to a specific value thetaigSetting the information gain step length as lambda, setting the initial value n of the information gain circulation step number as 0, and setting the detection rate threshold value as 0.95; utilizing a machine learning classification algorithm to perform a feature matrix X on the sample settrainTraining is carried out, and the maximum detection rate is recorded as TPmax
Step two: calculating the characteristic frequency of each characteristic in the sample set characteristic set according to a characteristic frequency calculation formula, and filtering out irrelevant characteristics through calculation and comparison to obtain a irrelevant characteristic removing subset;
the characteristic frequency calculation formula is as follows:
Figure FDA0002760719390000043
wherein, TF (f)j) Representing a feature fjCharacteristic frequency of (1), NbenignIndicating a normal number of samples in a normal software set,
Figure FDA0002760719390000044
representing a feature fjThe number of samples present; n is a radical ofmalwareRepresenting the number of malicious samples in the set of malicious samples,
Figure FDA0002760719390000045
is characterized byjThe number of samples present;
the subset generation module calculates the characteristic frequency of each characteristic in the sample set characteristic set according to a characteristic frequency calculation formula, and obtains a decorrelation characteristic subset by calculating, comparing and filtering out irrelevant characteristics, and the subset generation module comprises:
step 1: calculating the characteristic frequency of each characteristic in the characteristic set of the sample set;
step 2: filtering out the features with the feature frequency value of 0, wherein the rest features form an intermediate feature subset
Figure FDA0002760719390000051
And step 3: the intermediate feature subsets are classified by machine learning
Figure FDA0002760719390000052
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf
And 4, step 4: filter elementExcluding the intermediate feature subset
Figure FDA0002760719390000053
Features with the smallest medium feature frequency, the remaining features constituting a subset of features
Figure FDA0002760719390000054
The feature subsets are classified by machine learning
Figure FDA0002760719390000055
Training the corresponding characteristic matrix to obtain the corresponding detection rate TPtf′;
And 5: comparison of TPtfAnd TPtfValue of' if TPtf=TPtf', then subset the features
Figure FDA0002760719390000056
Intermediate feature subset as new
Figure FDA0002760719390000057
Returning to the step 3; if TPtf≠TPtf' outputting the intermediate feature subset
Figure FDA0002760719390000058
Step 6: subset the intermediate features
Figure FDA0002760719390000059
Is expressed as a feature subset Fv1The number of selected features is Mv1Said subset of features Fv1I.e. the decorrelated feature subset;
step three: calculating the information gain of each feature in the decorrelation feature subset according to an information gain calculation formula, and obtaining a denoising feature subset through calculation, comparison and screening;
the information gain calculation formula is as follows:
IG(fj)=H(Y)-H(Y|fj)
wherein, IG (f)j) Representing a feature fjInformation gain for classification system, H (Y) represents entropy of classification system, H (Y | f)j) Representing the conditional entropy of the classification system;
step four: according to chi2A statistic calculation formula for calculating CHI value (χ) of each feature in the de-noised feature subset and the corresponding feature matrix2Statistic value) and CHI value between the features, and obtaining redundancy-removing feature subsets through calculation, comparison and screening;
the x2The statistical value calculation formula is as follows:
CHI(fi,fj)=ξ11122122
wherein CHI (f)i,fj) Representing a feature fi,fjChi of2Statistical value xi11Representing a feature fiAnd feature fjSimultaneous deviation of theoretical and actual values, ξ12Representing a feature fiThe feature f does not appear in the appearing samplejDeviation of the theoretical value from the actual value, ξ21Indicates the absence of feature fiIs present in the number of samples of (a) to (b)jDeviation of the theoretical value from the actual value, ξ22Indicates the absence of feature fiHas no feature f in the number of samplesjDeviation of the theoretical value from the actual value of (a);
step five: and analyzing and judging the redundancy-removing feature subset, and performing subset optimization according to a judgment result to obtain an optimal feature subset.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the computer program.
CN201811495637.7A 2018-12-07 2018-12-07 Malicious software detection method and device and electronic equipment Active CN109784046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811495637.7A CN109784046B (en) 2018-12-07 2018-12-07 Malicious software detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811495637.7A CN109784046B (en) 2018-12-07 2018-12-07 Malicious software detection method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109784046A CN109784046A (en) 2019-05-21
CN109784046B true CN109784046B (en) 2021-02-02

Family

ID=66495778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811495637.7A Active CN109784046B (en) 2018-12-07 2018-12-07 Malicious software detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109784046B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110464345B (en) * 2019-08-22 2020-10-30 北京航空航天大学 Independent head biological power supply signal interference elimination method and system
CN110990834B (en) * 2019-11-19 2022-12-27 重庆邮电大学 Static detection method, system and medium for android malicious software
CN110955895B (en) * 2019-11-29 2022-03-29 珠海豹趣科技有限公司 Operation interception method and device and computer readable storage medium
CN112632539B (en) * 2020-12-28 2024-04-09 西北工业大学 Dynamic and static hybrid feature extraction method in Android system malicious software detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2128798A1 (en) * 2008-05-27 2009-12-02 Deutsche Telekom AG Unknown malcode detection using classifiers with optimal training sets
CN104298715A (en) * 2014-09-16 2015-01-21 北京航空航天大学 TF-IDF based multiple-index result merging and sequencing method
CN105320887A (en) * 2015-10-12 2016-02-10 湖南大学 Static characteristic extraction and selection based detection method for Android malicious application
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2128798A1 (en) * 2008-05-27 2009-12-02 Deutsche Telekom AG Unknown malcode detection using classifiers with optimal training sets
CN104298715A (en) * 2014-09-16 2015-01-21 北京航空航天大学 TF-IDF based multiple-index result merging and sequencing method
CN105320887A (en) * 2015-10-12 2016-02-10 湖南大学 Static characteristic extraction and selection based detection method for Android malicious application
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection

Also Published As

Publication number Publication date
CN109784046A (en) 2019-05-21

Similar Documents

Publication Publication Date Title
CN109784046B (en) Malicious software detection method and device and electronic equipment
Rouhani et al. Deepsigns: A generic watermarking framework for ip protection of deep learning models
CN106709345B (en) Method, system and equipment for deducing malicious code rules based on deep learning method
Yasaei et al. Gnn4tj: Graph neural networks for hardware trojan detection at register transfer level
CN111600919B (en) Method and device for constructing intelligent network application protection system model
KR20170098733A (en) Method of testing the resistance of a circuit to a side channel analysis of second order or more
CN110287735B (en) Trojan horse infected circuit identification method based on chip netlist characteristics
CN111614599A (en) Webshell detection method and device based on artificial intelligence
CN111062036A (en) Malicious software identification model construction method, malicious software identification medium and malicious software identification equipment
CN112329713A (en) Network flow abnormity online detection method, system, computer equipment and storage medium
Wang et al. Characteristic examples: High-robustness, low-transferability fingerprinting of neural networks
Halim et al. Recurrent neural network for malware detection
Brown et al. Detection of mobile malware: an artificial immunity approach
CN114662602A (en) Outlier detection method and device, electronic equipment and storage medium
CN114239083A (en) Efficient state register identification method based on graph neural network
CN109213850B (en) System and method for determining text containing confidential data
Rouhani et al. DeepSigns: a generic watermarking framework for protecting the ownership of deep learning models
Rahmani et al. Closed-form, provable, and robust pca via leverage statistics and innovation search
CN109784047B (en) Program detection method based on multiple features
Gad et al. Active learning on weighted graphs using adaptive and non-adaptive approaches
Kabin et al. Horizontal Attacks using K-Means: Comparison with Traditional Analysis Methods
CN115758337A (en) Back door real-time monitoring method based on timing diagram convolutional network, electronic equipment and medium
CN115643065A (en) Network attack event detection method and system
WO2023129762A2 (en) A design automation methodology based on graph neural networks to model integrated circuits and mitigate hardware security threats
Chakraborty et al. Dynamarks: Defending against deep learning model extraction using dynamic watermarking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant