CN112241532A - Method for generating and detecting malignant confrontation sample based on jacobian matrix - Google Patents

Method for generating and detecting malignant confrontation sample based on jacobian matrix Download PDF

Info

Publication number
CN112241532A
CN112241532A CN202010982111.2A CN202010982111A CN112241532A CN 112241532 A CN112241532 A CN 112241532A CN 202010982111 A CN202010982111 A CN 202010982111A CN 112241532 A CN112241532 A CN 112241532A
Authority
CN
China
Prior art keywords
sample
detection model
malignant
sample set
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010982111.2A
Other languages
Chinese (zh)
Other versions
CN112241532B (en
Inventor
陈红松
曹永瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202010982111.2A priority Critical patent/CN112241532B/en
Publication of CN112241532A publication Critical patent/CN112241532A/en
Application granted granted Critical
Publication of CN112241532B publication Critical patent/CN112241532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Virology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a method for generating and detecting a malignant confrontation sample based on a Jacobian matrix, and belongs to the technical field of computer safety. The method comprises the following steps: loading a malignant software detection model trained by using an original training sample set; classifying the countermeasure samples generated by using the malignant software in the original test sample set into benign prediction confidence degrees by using a detection model, wherein the prediction confidence degrees are greater than a set confidence degree threshold value, and the set confidence degree threshold value serves as a condition for successful generation of the countermeasure samples to generate a countermeasure sample set; and forming corresponding countermeasure training sets by the original training sample set and the generated countermeasure sample set according to different proportions, respectively inputting each countermeasure training set into a detection model for countermeasure training, and selecting the detection model with high malignant software detection accuracy as a final detection model. By adopting the method and the device, the robustness of the detection model to detection of malignant software and confrontation samples can be improved.

Description

Method for generating and detecting malignant confrontation sample based on jacobian matrix
Technical Field
The invention relates to the technical field of computer security, in particular to a method for generating and detecting a malignant confrontation sample based on a Jacobian matrix.
Background
With the development of artificial intelligence technology, Deep Neural Networks (DNNs) are widely used in the fields of computer vision, natural language processing, and the like, as well as in computer security applications such as malignant software detection. The early malignant software detection model is easily interfered by the countermeasure sample, and the elaborated countermeasure sample is easily classified wrongly, so that the malignant software escapes the detection of the detection model, and the purpose of an attacker is achieved. Therefore, researchers propose a method for confrontational training, and a confrontational sample is added into a training set to train a detection model, so that the robustness of the detection model is enhanced. However, the countermeasure samples generated by the conventional countermeasure sample generation method are not strong enough in countermeasure, and when countermeasure training is performed using the countermeasure samples, the detection model does not learn enough countermeasure information. And there is no explicit way to construct the training set.
In the prior art, Kathrin and the like propose an android malignant software countermeasure sample generation method based on a Jacobian matrix, the method classifies android malignant software into benign by using a detection model as a condition for successful generation of countermeasure samples, and the quality of the generated countermeasure samples is not high. And only adding the countermeasure sample set into the original training sample set for countermeasure training cannot obtain a high-robustness detection model.
Disclosure of Invention
The embodiment of the invention provides a method for generating and detecting a malignant countermeasure sample based on a Jacobian matrix, which can improve the robustness of a detection model for detecting malignant software and the countermeasure sample. The technical scheme is as follows:
in one aspect, a method for generating and detecting a malignant confrontation sample based on a jacobian matrix is provided, and the method is applied to an electronic device and comprises the following steps:
loading a malignant software detection model trained by using an original training sample set;
classifying the countermeasure samples generated by using the malignant software in the original test sample set into benign prediction confidence degrees by using a detection model, wherein the prediction confidence degrees are greater than a set confidence degree threshold value, and the set confidence degree threshold value serves as a condition for successful generation of the countermeasure samples to generate a countermeasure sample set;
and forming corresponding countermeasure training sets by the original training sample set and the generated countermeasure sample set according to different proportions, respectively inputting each countermeasure training set into a detection model for countermeasure training, and selecting the detection model with high malignant software detection accuracy as a final detection model.
Further, before loading the malicious software detection model trained using the original training sample set, the method includes:
extracting characteristics for detecting the malignant software from each application program to form respective sample files;
numbering different features in all sample files to obtain a total number n of the features, and converting the features in each sample file into corresponding numbering values;
converting the characteristics in each sample file into one-dimensional characteristic vectors consisting of n elements to obtain a matrix consisting of the characteristic vectors of all samples, and marking the sample labels as benign or malignant;
selecting m characteristics with the largest influence on classification from a matrix formed by the characteristic vectors of all samples to form a sample characteristic vector;
dividing all samples with m characteristics into an original training sample set and an original testing sample set;
constructing a malignant software detection model; wherein the malignancy software detection model comprises: the device comprises an input layer, a first hidden layer connected with the input layer, a second hidden layer connected with the first hidden layer and an output layer connected with the second hidden layer;
the malignant software detection model is trained using an original training sample set.
Further, the converting the features in each sample file into a one-dimensional feature vector consisting of n elements, and labeling the sample label as benign or malignant comprises:
converting the features in each sample file into one-dimensional feature vectors consisting of n elements of 0 and 1 to obtain a matrix consisting of the feature vectors of all samples, wherein the vector position corresponding to the feature of the sample file containing the number value is 1, and the vector position corresponding to the feature not containing the number value is 0;
the sample label is labeled 0 or 1, where label 0 indicates benign and label 1 indicates malignant.
Further, the dividing all samples with m features into an original training sample set and an original testing sample set includes:
renumbering the selected m characteristics from 1-m, and reducing the one-dimensional characteristic vector of each sample into a vector containing m vectors consisting of 0 and 1;
and dividing all samples with the reduced features into an original training sample set and an original testing sample set.
Further, the utilizing the detection model to classify the countermeasure samples generated using the malignant software in the original test sample set as benign, with a prediction confidence greater than a set confidence threshold, as a condition for successful generation of the countermeasure samples, the generating the countermeasure sample set includes:
step A1, traversing each sample in the original test sample set, if the label of the sample is malignant, executing step A2, otherwise, continuously traversing the next sample;
step A2, setting confidence threshold alpha, modifiable characteristic index gamma as 0-m-1, confrontation sample x*=xmalWherein x ismalThe method comprises the steps that unmodified malignant software samples in an input original test sample set are used, and m is the total number of features of the malignant software samples;
step A3, initializing the feature index i to be modifiedmaxCalculating the detection model pair input x as 0malForward derivative matrix forward _ derivative, and deleting feature derivative values not in Γ;
step a4, let μ equal to max (forward _ derivative [0 ]])、ν=max(forward_derivative[1]) Wherein the 0 th column of the forward derivative matrix forward _ derivative [0 ]]Representing benign class output versus input xmalDerivative value of max (forward _ derivative [0 ]]) Column 1 forward _ derivative [1 ] of the forward derivative matrix, representing the maximum derivative value of a benign class]Representing the malignant class of output versus input xmalDerivative value of (1), max (forward _ derivative [)]) A maximum derivative value representing a malignancy class;
step A5, comparing the sizes of mu and v, if mu>V, then the feature index i is to be modifiedmaxSet as the characteristic index corresponding to the maximum derivative value of the benign class and when imaxIndex positionCharacteristic value of (2) is in x*When the step length theta is not equal to 1; mu.s of<V, then the feature index i is to be modifiedmaxSet as the characteristic index corresponding to the maximum derivative value of the malignancy class and when imaxIndex position feature value at x*When the step length theta is equal to-1;
step A6, by
Figure BDA0002687922570000031
Modifying challenge sample x*Wherein, in the step (A),
Figure BDA0002687922570000032
denotes x*InmaxA characteristic value of the index position;
step A7, removing the modified feature index from the modifiable feature index Γ, i.e. imaxDeletion from Γ;
step A8, mixing x*Input detection model to obtain benign prediction confidence F0(x*) If F is0(x*)<Alpha and gamma ray>0, returning to continue the step A3, otherwise returning to the countermeasure sample x*Where | Γ | represents a modifiable characteristic number, F0(x*) Indicating that sample x is to be confronted*Inputting the classification model F to obtain prediction results classified as benign and malignant, and taking the first output of the prediction results as the probability of being predicted as benign, namely the prediction confidence coefficient of being predicted as benign.
Further, the formation of the confrontation training set is represented as:
xadvTrain=βxtrain+(1-β)xadv
wherein x isadvTrainRepresenting the antagonistic training set, xtrainRepresenting the original training sample set, xadvRepresenting the generated confrontation sample set, beta is the proportion of the samples in the original training sample set used for forming the confrontation training set to all the samples in the original training sample set.
Further, the selecting the detection model with high detection accuracy of the malignant software as the final detection model comprises:
testing each detection model in original test sample set xtestAnd confrontation sample set xadvThe accuracy of (1) for the same detection model, the detection model is put in the original test sample set xtestAnd confrontation sample set xadvAnd multiplying the accuracy rates, and selecting the detection model with the maximum product as the final detection model.
In one aspect, an electronic device is provided and includes a processor and a memory, where at least one instruction is stored in the memory and loaded by and executed by the processor to implement the above-described method for generating and detecting a malignant countermeasure sample based on a jacobian matrix.
In one aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the above method for generating and detecting a malignant countermeasure sample based on a jacobian matrix.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
in the embodiment of the invention, the confidence threshold value is set, the prediction confidence value of the detection model for classifying the confrontation samples into benign is larger than the set confidence threshold value to serve as the condition for successfully generating the confrontation samples, so that the generated confrontation samples are enhanced in confrontation, and the detection model performs confrontation training by using the confrontation samples generated by the method, so that more confrontation information can be learned, and the detection accuracy of the detection model on malignant software and the confrontation samples is improved; and the original test sample set and the countermeasure sample set are constructed into an countermeasure training set according to different proportions for countermeasure training, and a detection model with high detection accuracy of the malignant software is selected as a final detection model, so that the robustness of the detection model for detecting the malignant software and the countermeasure sample is further improved, and the vulnerability of the detection model for detecting the malignant software and the countermeasure sample is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for generating and detecting a malignant confrontation sample based on a Jacobian matrix according to an embodiment of the present invention;
FIG. 2 is a detailed flowchart of a method for generating and detecting a malignant confrontation sample based on a Jacobian matrix according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a single sample file after feature extraction according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a portion of a feature reduction option provided by an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a detection model according to an embodiment of the present invention;
FIG. 6 is a sample schematic diagram of android malignant software provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of an android malicious software countermeasure sample provided by an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides a method for generating and detecting a malignant confrontation sample based on a jacobian matrix, where the method may be implemented by an electronic device, and the electronic device may be a terminal or a server, and the method includes:
s101, loading a malignant software detection model trained by using an original training sample set;
s102, classifying the confrontation samples generated by the malignant software in the original test sample set into benign by using a detection model, and taking the prediction confidence coefficient greater than a set confidence coefficient threshold value as a condition for successful generation of the confrontation samples to generate a confrontation sample set;
s103, forming corresponding countermeasure training sets by the original training sample set and the generated countermeasure sample set according to different proportions, respectively inputting each countermeasure training set into a detection model for countermeasure training, and selecting the detection model with high detection accuracy of the malignant software as a final detection model.
According to the method for generating and detecting the malignant countermeasure samples based on the Jacobian matrix, the confidence coefficient threshold value is set, the prediction confidence coefficient of the countermeasure samples classified as benign by the detection model is larger than the set confidence coefficient threshold value to serve as the condition that the countermeasure samples are successfully generated, the generated countermeasure samples are enhanced in countermeasure, the detection model performs countermeasure training by using the countermeasure samples generated by the method, more countermeasure information can be learned, and the detection accuracy of the detection model on the malignant software and the countermeasure samples is improved; and the original test sample set and the countermeasure sample set are constructed into an countermeasure training set according to different proportions for countermeasure training, and a detection model with high detection accuracy of the malignant software is selected as a final detection model, so that the robustness of the detection model for detecting the malignant software and the countermeasure sample is further improved, and the vulnerability of the detection model for detecting the malignant software and the countermeasure sample is reduced.
In the foregoing embodiment of the method for generating and detecting a malignant confrontation sample based on a jacobian matrix, further, before loading a malignant software detection model trained by an original training sample set, as shown in fig. 2, the method includes:
h1, extracting characteristics for detecting the malignant software from each application program to form respective sample files;
in this embodiment, assuming that the application is an android application, static features for detecting malicious software may be extracted from each android application by using a static analysis method to form a sample file, which may specifically include the following steps:
h11: analyzing the program list and extracting corresponding features;
in this embodiment, each application developed for android must include a program manifest file named android manifest, which provides data that supports installation and subsequent execution of the application, and the static feature set that can be extracted from the file includes: one or more of a hardware accessory, a desired privilege, an application component, a filtered message object; wherein the hardware accessory comprises: cameras, touch screens, GPS, etc.; the required rights include: sending short messages, reading contacts and the like; the application component comprises: activities, services, content providers, broadcast receivers, and the like.
H12: obtaining characteristics such as API call in the disassembled code, comprising: API calls, network communication addresses.
H13: and writing the characteristics extracted by each android application program through static analysis into a file with the SHA256 encryption result as a file name to form respective sample files.
In this embodiment, fig. 3 is a schematic diagram of a single sample file.
H2, numbering different features in all sample files to obtain a total number n of the features, and converting the features in each sample file into corresponding number values;
in this embodiment, each sample file is traversed, different feature numbers are numbered to obtain the total number n of features, and the features in the original file are converted into corresponding numbers.
H3, converting the features in each sample file into one-dimensional feature vectors consisting of n elements to obtain a matrix consisting of the feature vectors of all samples, and marking the sample labels as benign or malignant; the method specifically comprises the following steps:
h31, converting the features in each sample file into one-dimensional feature vectors consisting of n elements 0 and 1 to obtain a matrix consisting of the feature vectors of all samples, wherein the vector position corresponding to the feature of the sample file containing the serial number value is 1, and the vector position corresponding to the feature not containing the serial number value is 0;
h32, labeling the sample label as 0 or 1, wherein label 0 indicates benign and label 1 indicates malignant.
In this embodiment, it is assumed that x is a matrix formed by feature vectors of all samples, l is the number of samples, and y is a sample label matrix. Shape ═ x, n, y.shape ═ l, where shape represents the shape of the matrix, x.shape represents the shape of the feature matrix x, y.shape represents the shape of the tag matrix y.
H4, selecting m characteristic vectors which have the largest influence on classification from a matrix consisting of the characteristic vectors of all samples to form sample characteristic vectors, and reducing the characteristic vector of each sample into m characteristics;
in this embodiment, a SelectKBest function in scinit spare library can be used, which obtains a score for the degree of correlation between each feature and the classification result based on chi-square test, and then selects and classifies (benign or malignant) the top m features that are most correlated from large to small, as shown in fig. 4.
In this embodiment, SelectKBest is a feature selection method, which specifically includes: SelectKBest (chi2, k ═ m) fit _ transform (x, y), which can select the k best features according to the specific method parameters coming in; where fit _ transform is feature selection based on incoming data (x, y), chi2 is the score value based on chi-square test, m is the number of features to be selected, x is the matrix composed of feature vectors of all samples, and y is the sample label matrix. Finally the function returns the index position in the original feature space.
H5, dividing all samples with m characteristics into an original training sample set and an original testing sample set;
in this embodiment, the m features selected in step H4 are renumbered from 1 to m, and the one-dimensional feature vector of each sample is reduced to contain m vectors composed of 0 and 1; then, dividing all samples with reduced characteristics into an original training sample set x according to a certain proportiontrain(including features and labels) and original test sample set xtest(including: features and labels); wherein, the original training sample set and the original testing sample set both contain malignant software and benign software in a certain proportion.
H6, constructing a malignant software detection model; wherein the malignancy software detection model comprises: an Input layer (Input), a first Hidden layer (Hidden layer) connected to the Input layer, a second Hidden layer connected to the first Hidden layer, and an Output layer (Output) connected to the second Hidden layer, as shown in fig. 5;
in this embodiment, an input layer of the detection model is m neurons, an output layer includes a deep neural network of 2 neurons, the output layer converts a logical value into a corresponding class probability using softmax (logistic regression function), each hidden layer uses Relu (linear rectification function) and dropout to prevent overfitting, and uses a cross entropy loss function and Adam (adaptive moment estimation) as an optimizer.
H7, training the malignancy software detection model using the original training sample set.
In this embodiment, the number of training rounds is set, and x is usedtrainAnd training, wherein the construction and the training of the detection model are completed through a keras neural network framework.
In this embodiment, the trained detection model is saved as a local file, and the accuracy of the detection model is tested using the original test sample set, where the accuracy of the trained detection model is more than 98% of the accuracy of the original test sample set.
In this embodiment, the detection model saved as a local file may be loaded for generating the countermeasure sample.
In the foregoing specific embodiment of the method for generating and detecting a malignant countermeasure sample based on a jacobian matrix, further, the classifying the countermeasure sample generated by using the malignant software in the original test sample set as benign by using the detection model with a prediction confidence greater than a set confidence threshold as a condition for success of generation of the countermeasure sample by using the detection model, the generating the countermeasure sample set includes:
step A1, traversing each sample in the original test sample set, if the label of the sample is malignant, namely the label value is 1, executing step A2, otherwise, continuously traversing the next sample;
step a2, set confidence threshold α (preferably set α ═ 0.9), modifiable feature index Γ of 0 to m-1, countermeasure sample x*=xmalWherein x ismalFor unmodified malignant software samples in the original test sample set of inputs, m is malignant softThe total number of the features of the piece sample, wherein Γ is used for recording whether the features corresponding to the indexes can be modified or not, and if the features are modified, the indexes of the modified features are deleted from Γ;
step A3, initializing the feature index i to be modifiedmaxCalculating the detection model pair input x as 0malForward derivative matrix forward _ derivative, and deleting feature derivative values not in Γ;
step a4, let μ equal to max (forward _ derivative [0 ]])、ν=max(forward_derivative[1]) Wherein the 0 th column of the forward derivative matrix forward _ derivative [0 ]]Representing benign class output versus input xmalDerivative value of max (forward _ derivative [0 ]]) Column 1 forward _ derivative [1 ] of the forward derivative matrix, representing the maximum derivative value of a benign class]Representing the malignant class of output versus input xmalDerivative value of (1), max (forward _ derivative [)]) A maximum derivative value representing a malignancy class;
step A5, comparing the sizes of mu and v, if mu>V, then the feature index i is to be modifiedmaxSet as the characteristic index corresponding to the maximum derivative value of the benign class (i.e. the characteristic index corresponding to the maximum value mu) and when imaxIndex position feature value at x*If not, setting the step size θ to 1 (i.e., adding the feature); mu.s of<V, then the feature index i is to be modifiedmaxSetting a characteristic index corresponding to the maximum derivative value of the malignant class (namely, the characteristic index corresponding to the maximum value v) and when i ismaxIndex position feature value at x*If so, setting the step size θ to-1 (i.e., deleting the feature); therefore, in each iteration, the addition or deletion of the features can be automatically selected according to the magnitude of the forward derivative value, so that the mode of generating the confrontation sample is more flexible, and the confrontation of the generated confrontation sample is further improved.
Step A6, by
Figure BDA0002687922570000091
Modifying challenge sample x*Wherein, in the step (A),
Figure BDA0002687922570000092
denotes x*InmaxA characteristic value of the index position; thus, i will bemaxModifying the feature corresponding to the position, if the maximum derivative value is from the forward derivative matrix of the benign class, adding the feature, otherwise deleting the feature;
step A7, removing the modified feature index from the modifiable feature index Γ, i.e. imaxDeletion from Γ;
step A8, mixing x*Input detection model to obtain benign prediction confidence F0(x*) If F is0(x*)<Alpha and gamma ray>0 (i.e., the number of modifiable features is greater than 0), then return to continue to step A3, otherwise return to the countermeasure sample x*Where | Γ | represents a modifiable characteristic number, F0(x*)=F(x*)[0],F0(x*) Indicating that sample x is to be confronted*Inputting the classification model F to obtain prediction results classified as benign and malignant, and taking the first output of the prediction results as the probability of being predicted as benign, namely the prediction confidence coefficient of being predicted as benign.
Figure BDA0002687922570000093
Figure BDA0002687922570000101
In this embodiment, fig. 6 is a schematic diagram of an android malicious software sample, and fig. 7 is a schematic diagram of a generated android malicious software countermeasure sample.
In this embodiment, the detection model stored as the local file is loaded, and according to step a1-A8, the improved jacobian matrix-based confrontation sample generation method is used to generate confrontation samples from the malignant software in the original test sample set on the detection model, so that the confrontation samples escape the detection of the detection model, and the returned confrontation samples have high confrontation (because the benign accuracy on the detection model reaches 0.9), contain more confrontation information, and when the confrontation training is performed by using the confrontation samples, the detection model can learn more confrontation information.
In this embodiment, the confidence threshold α has a value range of [0.5, 1%]And the larger the confidence coefficient threshold value alpha is, the higher the antagonism of the obtained detection model is. The algorithm generates a vicious software countermeasure sample once, and calls the algorithm in a circulating way, so that all vicious software in the original test set can generate corresponding countermeasure samples to obtain an countermeasure sample set xadv
In this embodiment, then, the countermeasure sample set x is utilizedadvTesting the detection model to obtain the confrontation sample set x of the detection modeladvTo the accuracy of (3).
In this embodiment, if the proportion of the countermeasure sample set is small, the detection model cannot sufficiently learn the countermeasure information, and the robustness of the countermeasure sample is not greatly improved; if the ratio is too large, the accuracy of the test model on the original test sample set is reduced significantly. Therefore, an appropriate proportion is determined to form a confrontation training set for confrontation training, and a malignant software detection model with high accuracy on the confrontation sample is obtained while the accuracy of the original test sample set is maintained. Therefore, the invention provides a new countermeasure training set composition mode.
In the foregoing specific embodiment of the method for generating and detecting a malignant confrontation sample based on a jacobian matrix, further, the confrontation training set is represented by the following formula:
xadvTrain=βxtrain+(1-β)xadv
wherein x isadvTrainRepresenting the antagonistic training set, xtrainRepresenting the original training sample set, xadvRepresenting the generated confrontation sample set, beta is the proportion of the samples in the original training sample set used for forming the confrontation training set to all the samples in the original training sample set.
In this embodiment, the value range of β is [0,1 ]](ii) a Suppose that the values of beta are 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1 respectively to form different confrontation training sets xadvTrainRespectively using the confrontation training sets to train the classification model again to respectively obtain a classification model F0、F0.1、F0.2、F0.3、F0.4、F0.5、F0.6、F0.7、F0.8、F0.9、F1
In this embodiment, when training the classification model using the antagonistic training set, the loss function loss of the training processadvTrainComprises the following steps:
lossadvTrain(θ,xadvTrain,y)=βloss(θ,xtrain,y)+(1-β)loss(θ,xadv,y)
where loss represents the loss function, θ represents the parameters (including weights) of the detection model after training with the challenge training set, and y represents the original sample label y, where the challenge sample set xadvThe corresponding tags are their original tags, all being 1. The confrontation training is to retrain theta to learn some information of the confrontation sample.
In the foregoing specific embodiment of the method for generating and detecting a malignant countermeasure sample based on a jacobian matrix, further, the selecting a detection model with high accuracy of malignant software detection as an ultimate detection model includes:
testing each detection model in original test sample set xtestAnd confrontation sample set xadvThe accuracy of (1) for the same detection model, the detection model is put in the original test sample set xtestAnd confrontation sample set xadvAnd multiplying the accuracy rates, and selecting the detection model with the maximum product as the final detection model.
Fig. 8 is a schematic structural diagram of an electronic device 600 according to an embodiment of the present invention, where the electronic device 600 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where the memory 602 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 601 to implement the above method for generating and detecting a malignant countermeasure sample based on a jacobian matrix.
In an exemplary embodiment, a computer-readable storage medium, such as a memory including instructions executable by a processor in a terminal, to perform the above method of generating and detecting a malignant challenge sample based on a jacobian matrix is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A method for generating and detecting malignant confrontation samples based on a jacobian matrix, comprising:
loading a malignant software detection model trained by using an original training sample set;
classifying the countermeasure samples generated by using the malignant software in the original test sample set into benign prediction confidence degrees by using a detection model, wherein the prediction confidence degrees are greater than a set confidence degree threshold value, and the set confidence degree threshold value serves as a condition for successful generation of the countermeasure samples to generate a countermeasure sample set;
and forming corresponding countermeasure training sets by the original training sample set and the generated countermeasure sample set according to different proportions, respectively inputting each countermeasure training set into a detection model for countermeasure training, and selecting the detection model with high malignant software detection accuracy as a final detection model.
2. The method of claim 1, wherein prior to loading a malicious software detection model trained using an original training sample set, the method comprises:
extracting characteristics for detecting the malignant software from each application program to form respective sample files;
numbering different features in all sample files to obtain a total number n of the features, and converting the features in each sample file into corresponding numbering values;
converting the characteristics in each sample file into one-dimensional characteristic vectors consisting of n elements to obtain a matrix consisting of the characteristic vectors of all samples, and marking the sample labels as benign or malignant;
selecting m characteristics with the largest influence on classification from a matrix formed by the characteristic vectors of all samples to form a sample characteristic vector;
dividing all samples with m characteristics into an original training sample set and an original testing sample set;
constructing a malignant software detection model; wherein the malignancy software detection model comprises: the device comprises an input layer, a first hidden layer connected with the input layer, a second hidden layer connected with the first hidden layer and an output layer connected with the second hidden layer;
the malignant software detection model is trained using an original training sample set.
3. The method of claim 2, wherein the converting the features in each sample file into a one-dimensional feature vector consisting of n elements and labeling the sample label as benign or malignant comprises:
converting the features in each sample file into one-dimensional feature vectors consisting of n elements of 0 and 1 to obtain a matrix consisting of the feature vectors of all samples, wherein the vector position corresponding to the feature of the sample file containing the number value is 1, and the vector position corresponding to the feature not containing the number value is 0;
the sample label is labeled 0 or 1, where label 0 indicates benign and label 1 indicates malignant.
4. The method of claim 2, wherein the dividing all samples with m features into an original training sample set and an original testing sample set comprises:
renumbering the selected m characteristics from 1-m, and reducing the one-dimensional characteristic vector of each sample into a vector containing m vectors consisting of 0 and 1;
and dividing all samples with the reduced features into an original training sample set and an original testing sample set.
5. The method of claim 1, wherein the classifying the confrontation samples generated by the malignant software in the original test sample set as benign by the detection model with a prediction confidence greater than a set confidence threshold as a condition for successful generation of the confrontation samples by the detection model, and the generating the confrontation sample set comprises:
step A1, traversing each sample in the original test sample set, if the label of the sample is malignant, executing step A2, otherwise, continuously traversing the next sample;
step A2, setting confidence threshold alpha, modifiable characteristic index gamma as 0-m-1, confrontation sample x*=xmalWherein x ismalThe method comprises the steps that unmodified malignant software samples in an input original test sample set are used, and m is the total number of features of the malignant software samples;
step A3, initializing the feature index i to be modifiedmaxCalculating the detection model pair input x as 0malForward derivative matrix forward _ derivative, and deleting feature derivative values not in Γ;
step a4, let μ equal to max (forward _ derivative [0 ]])、ν=max(forward_derivative[1]) Wherein the 0 th column of the forward derivative matrix forward _ derivative [0 ]]Representing benign class output versus input xmalDerivative value of max (forward _ derivative [0 ]]) Column 1 forward _ derivative [1 ] of the forward derivative matrix, representing the maximum derivative value of a benign class]Representing the malignant class of output versus input xmalDerivative value of (1), max (forward _ derivative [)]) A maximum derivative value representing a malignancy class;
step A5, comparing the sizes of mu and v, if mu>V, then the feature index i is to be modifiedmaxSet as the characteristic index corresponding to the maximum derivative value of the benign class and when imaxIndex position feature value at x*When the step length theta is not equal to 1; mu.s of<V, then the feature index i is to be modifiedmaxSet as the characteristic index corresponding to the maximum derivative value of the malignancy class and when imaxIndex position feature value at x*When the step length theta is equal to-1;
step A6, by
Figure FDA0002687922560000031
Modifying challenge sample x*Wherein, in the step (A),
Figure FDA0002687922560000032
denotes x*InmaxA characteristic value of the index position;
step A7, removing the modified feature index from the modifiable feature index Γ, i.e. imaxDeletion from Γ;
step A8, mixing x*Input detection model to obtain benign prediction confidence F0(x*) If F is0(x*)<Alpha and gamma ray>0, returning to continue the step A3, otherwise returning to the countermeasure sample x*Where | Γ | represents a modifiable characteristic number, F0(x*) Indicating that sample x is to be confronted*Inputting the classification model F to obtain prediction results classified as benign and malignant, and taking the first output of the prediction results as the probability of being predicted as benign, namely the prediction confidence coefficient of being predicted as benign.
6. The method for generating and detecting malignant confrontation samples based on the jacobian matrix as claimed in claim 1, wherein the confrontation training set is composed by:
xadvTrain=βxtrain+(1-β)xadv
wherein x isadvTrainRepresenting the antagonistic training set, xtrainRepresenting the original training sample set, xadvRepresenting the generated confrontation sample set, beta is the proportion of the samples in the original training sample set used for forming the confrontation training set to all the samples in the original training sample set.
7. The method of claim 1, wherein the selecting the detection model with high accuracy of malignancy software detection as the final detection model comprises:
testing each detection model in original test sample set xtestAnd confrontation sample set xadvThe accuracy of (1) for the same detection model, the detection model is put in the original test sample set xtestAnd confrontation sample set xadvAnd multiplying the accuracy rates, and selecting the detection model with the maximum product as the final detection model.
CN202010982111.2A 2020-09-17 2020-09-17 Method for generating and detecting malignant countermeasure sample based on jacobian matrix Active CN112241532B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010982111.2A CN112241532B (en) 2020-09-17 2020-09-17 Method for generating and detecting malignant countermeasure sample based on jacobian matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010982111.2A CN112241532B (en) 2020-09-17 2020-09-17 Method for generating and detecting malignant countermeasure sample based on jacobian matrix

Publications (2)

Publication Number Publication Date
CN112241532A true CN112241532A (en) 2021-01-19
CN112241532B CN112241532B (en) 2024-02-20

Family

ID=74171003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010982111.2A Active CN112241532B (en) 2020-09-17 2020-09-17 Method for generating and detecting malignant countermeasure sample based on jacobian matrix

Country Status (1)

Country Link
CN (1) CN112241532B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165504A (en) * 2018-08-27 2019-01-08 广西大学 A kind of electric system false data attack recognition method generating network based on confrontation
CN110443367A (en) * 2019-07-30 2019-11-12 电子科技大学 A kind of method of strength neural network model robust performance
WO2019222401A2 (en) * 2018-05-17 2019-11-21 Magic Leap, Inc. Gradient adversarial training of neural networks
US20200082097A1 (en) * 2018-09-12 2020-03-12 Aleksandr Poliakov Combination of Protection Measures for Artificial Intelligence Applications Against Artificial Intelligence Attacks
CN110942094A (en) * 2019-11-26 2020-03-31 电子科技大学 Norm-based antagonistic sample detection and classification method
CA3060144A1 (en) * 2018-10-26 2020-04-26 Royal Bank Of Canada System and method for max-margin adversarial training
CN111275115A (en) * 2020-01-20 2020-06-12 星汉智能科技股份有限公司 Method for generating counterattack sample based on generation counternetwork
CN111401407A (en) * 2020-02-25 2020-07-10 浙江工业大学 Countermeasure sample defense method based on feature remapping and application
WO2020173056A1 (en) * 2019-02-25 2020-09-03 百度在线网络技术(北京)有限公司 Traffic image recognition method and apparatus, and computer device and medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019222401A2 (en) * 2018-05-17 2019-11-21 Magic Leap, Inc. Gradient adversarial training of neural networks
CN109165504A (en) * 2018-08-27 2019-01-08 广西大学 A kind of electric system false data attack recognition method generating network based on confrontation
US20200082097A1 (en) * 2018-09-12 2020-03-12 Aleksandr Poliakov Combination of Protection Measures for Artificial Intelligence Applications Against Artificial Intelligence Attacks
CA3060144A1 (en) * 2018-10-26 2020-04-26 Royal Bank Of Canada System and method for max-margin adversarial training
WO2020173056A1 (en) * 2019-02-25 2020-09-03 百度在线网络技术(北京)有限公司 Traffic image recognition method and apparatus, and computer device and medium
CN110443367A (en) * 2019-07-30 2019-11-12 电子科技大学 A kind of method of strength neural network model robust performance
CN110942094A (en) * 2019-11-26 2020-03-31 电子科技大学 Norm-based antagonistic sample detection and classification method
CN111275115A (en) * 2020-01-20 2020-06-12 星汉智能科技股份有限公司 Method for generating counterattack sample based on generation counternetwork
CN111401407A (en) * 2020-02-25 2020-07-10 浙江工业大学 Countermeasure sample defense method based on feature remapping and application

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
严飞;张铭伦;张立强;: "基于边界值不变量的对抗样本检测方法", 网络与信息安全学报, no. 01, pages 38 - 45 *
仝鑫;王罗娜;王润正;王靖亚;: "面向中文文本分类的词级对抗样本生成方法", 信息网络安全, no. 09, pages 12 - 16 *
段广晗;马春光;宋蕾;武朋;: "深度学习中对抗样本的构造及防御研究", 网络与信息安全学报, no. 02, pages 1 - 11 *
王树伟;周刚;巨星海;陈靖元;: "基于生成对抗网络的恶意软件对抗样本生成综述", 信息工程大学学报, no. 05, pages 616 - 621 *

Also Published As

Publication number Publication date
CN112241532B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
US20200380210A1 (en) Event Recognition Method and Apparatus, Model Training Method and Apparatus, and Storage Medium
CN106055574A (en) Method and device for recognizing illegal URL
KR20210062687A (en) Image classification model training method, image processing method and apparatus
Yu et al. An improved steganography without embedding based on attention GAN
US11637858B2 (en) Detecting malware with deep generative models
US12050993B2 (en) Dynamic gradient deception against adversarial examples in machine learning models
CN111324810A (en) Information filtering method and device and electronic equipment
CN116431597A (en) Method, electronic device and computer program product for training a data classification model
CN112270334A (en) Few-sample image classification method and system based on abnormal point exposure
CN114896594A (en) Malicious code detection device and method based on image feature multi-attention learning
CN114091551A (en) Pornographic image identification method and device, electronic equipment and storage medium
CN113283388A (en) Training method, device and equipment of living human face detection model and storage medium
CN117332409A (en) Method for detecting steal attack aiming at image classification model
CN113591892A (en) Training data processing method and device
CN112241532A (en) Method for generating and detecting malignant confrontation sample based on jacobian matrix
Alohali et al. Optimal Deep Learning Based Ransomware Detection and Classification in the Internet of Things Environment.
CN114638984B (en) Malicious website URL detection method based on capsule network
CN116258906A (en) Object recognition method, training method and device of feature extraction model
CN113688915B (en) Difficult sample mining method and device for content security
CN115827876B (en) Method and device for determining unlabeled text and electronic equipment
WO2024016945A1 (en) Training method for image classification model, image classification method, and related device
CN111259237B (en) Method for identifying public harmful information
CN117932457B (en) Model fingerprint identification method and system based on error classification
CN114928477B (en) Network intrusion detection method and device, readable storage medium and terminal equipment
US20230019779A1 (en) Trainable differential privacy for machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant