CN112749391B - Detection method and device for malware countermeasure sample and electronic equipment - Google Patents

Detection method and device for malware countermeasure sample and electronic equipment Download PDF

Info

Publication number
CN112749391B
CN112749391B CN202011630878.5A CN202011630878A CN112749391B CN 112749391 B CN112749391 B CN 112749391B CN 202011630878 A CN202011630878 A CN 202011630878A CN 112749391 B CN112749391 B CN 112749391B
Authority
CN
China
Prior art keywords
sample
granularity
detection
encoder
function call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011630878.5A
Other languages
Chinese (zh)
Other versions
CN112749391A (en
Inventor
李珩
袁巍
尹路飞
佟萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202011630878.5A priority Critical patent/CN112749391B/en
Publication of CN112749391A publication Critical patent/CN112749391A/en
Application granted granted Critical
Publication of CN112749391B publication Critical patent/CN112749391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a method and a device for detecting a malicious software countermeasure sample and electronic equipment, belonging to the field of android software ecology, wherein the method comprises the following steps: s1: extracting a multi-granularity function call graph from each normal APK sample; s2: training a corresponding variogram self-encoder for each granularity function call graph based on normal APK samples, wherein the variogram self-encoder comprises an encoder and a decoder; s3: constructing an anti-sample detection model for each granularity by using a variogram self-encoder; the antagonism sample detection model is used for learning the data distribution of the APP normal sample from each granularity; s4: and inputting the detection sample into the countersample detection model after training is finished, and judging a detection result according to the hidden variable output by the encoder and the reconstruction result corresponding to the decoder. According to the invention, the normal sample is used for training the anti-sample detection model, so that the malicious software detection of one granularity is improved to the malicious software detection of a plurality of granularities, and the malicious software detection accuracy can be improved.

Description

Detection method and device for malware countermeasure sample and electronic equipment
Technical Field
The invention belongs to the field of android software ecology, and particularly relates to a method and a device for detecting a malicious software countermeasure sample and electronic equipment.
Background
With the rapid development of the mobile internet, mobile devices such as mobile phones gradually become a main tool for people to surf the internet. Among these mobile devices, android (Android) occupies about 87% of the share, which is currently the most mainstream operating system. However, the openness, vulnerability and imperfect application market censoring mechanisms of the Android system also result in massive growth and widespread of Malware (Malware).
In order to cope with endlessly layered Android malicious software, various Android malicious software detection methods based on machine learning are continuously emerging, the detection performance is continuously increased, and the F1 score of part of detection methods is even close to 0.99. However, these android malware face a serious threat, namely against the sample (advssarialexample). The challenge sample is obtained by challenge disturbance based on the original sample xDynamically generated new sample x which can spoof classifier F * Can be expressed as:
x * =x+δ x =x+min||x * -x||,s.t.F(x * )≠F(x)
wherein delta x Is a very small interference value for disturbing the original sample x. In order to make classification erroneous, the challenge sample may be generated according to the following equation: x is x adv ∈argmaxL(θ,x * Y), L is the loss function of the attack object, and y is the label of sample x.
The antagonism sample is formed by adding a fine perturbation to the normal sample (also known as the natural sample) that can cause the machine learning model to give a false classification result with high confidence. And applying elaborate disturbance to samples of Android malicious software (APP malicious samples for short) to deceive the detection system. The countermeasure sample provides a brand-new technical means capable of escaping detection for Android malicious software manufacturers, and brings great threat to the existing detection system.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a method, a device and electronic equipment for detecting a malicious software countermeasure sample, and aims to provide the method for detecting the countermeasure sample based on an android APP system, which is used for training a countermeasure sample detection model by using a normal sample, so that the malicious software detection of one granularity is improved to the malicious software detection of a plurality of granularities, the malicious software detection accuracy can be improved, and the technical problem that the recognition rate of the conventional detection method for the countermeasure sample is low is solved.
To achieve the above object, according to one aspect of the present invention, there is provided a method for detecting a malware challenge sample, including:
s1: extracting a multi-granularity function call graph from each normal APK sample, wherein the granularity corresponding to the multi-granularity function call graph at least comprises: family, class, package and Function; each granularity function call graph comprises node information and side information;
s2: training a corresponding variogram self-encoder for each granularity function call graph based on the normal APK samples, wherein the variogram self-encoder comprises an encoder and a decoder;
s3: constructing an challenge sample detection model for each granularity using the variogram self-encoder; the countermeasure sample detection model is used for learning the data distribution of the APP normal samples from each granularity;
s4: and inputting the detection sample into the countersample detection model after training is finished, and judging a detection result according to the hidden variable output by the encoder and the reconstruction result output by the decoder.
In one embodiment, the step S1 includes:
s101: generating a smali file by a decompilation tool from an original file corresponding to the normal APK sample;
s102: and extracting a function call relation from the smali file to form a function call graph with each granularity.
In one embodiment, the call graph of each granularity includes node information and side information, and the step S102 includes:
extracting Function call relations from the smali file by utilizing four dimensions of Family, class, package and Function and characterizing the multi-granularity Function call graphs, which are respectively denoted as G function 、G class 、G package And G family
Using one-hot code to represent each node in the function call graph with each granularity; the node information reflects semantic information of each node; the side information corresponding to the side between the two nodes reflects the topological relation of the function call graph and corresponds to the call relation between the functions in the APK.
In one embodiment, the step S2 includes:
inputting each granularity function call graph corresponding to the normal APK sample into an encoder corresponding to the variation graph self-encoder so that the encoder outputs a feature matrix corresponding to Gaussian distribution; the encoder is built based on a graph rolling network GCN;
when the feature matrix is input into a decoder, hidden variables are obtained by sampling from the generated Gaussian distribution, and then the hidden variables are decoded by inner product operation to reconstruct samples.
In one embodiment, the mean μ and variance σ of the gaussian distribution are:
wherein X is a feature matrix of a semantic feature map with one granularity, A is an adjacent matrix, D is a degree matrix, reLU is an activation function, and the generation process of the mean and variance shares W 0 Parameters;
the reconstructed adjacent matrix obtained by decoding and sample reconstruction by using the inner product operation is marked as:
in one embodiment, the loss function of the challenge sample detection model is defined as:
L=-E q(Z|X,A) [logp(A|Z)]+KL[q(Z|X,A)|p(Z)];
wherein q (Z|X, A) is the distribution calculated by GCN, p (Z) is the standard Gaussian distribution, E q(Z|X,A) KL [ q (z|x, a) |p (Z) for decoder to generate a expectation of a]KL divergence of the distribution generated for the decoder and the standard gaussian distribution.
In one embodiment, the step S4 includes:
s401: inputting a detection sample into a challenge sample detection model after training is finished so that the encoder outputs hidden variables, and comparing the mean value and variance of the hidden variables with normal distribution to obtain a detection result;
s402: inputting a detection sample into an countermeasure sample detection model after training is completed, and comparing a reconstruction result output by the countermeasure sample detection model with the detection sample; when a detection sample is input into the countermeasure sample detection model, if the difference between the reconstruction result and the sample is larger than a preset value, the sample reconstruction is failed, namely the sample is a countermeasure sample; and the difference between the reconstruction result and the sample is smaller than or equal to the preset value, which indicates that the sample is successfully reconstructed, namely the sample is a normal sample.
In one embodiment, the step S401 includes:
comparing the reconstruction result corresponding to one granularity with the detection sample to obtain a first test result aiming at each granularity, wherein the first test result is as follows:
obtaining a second test result by utilizing the difference between the output distribution of the encoder and the standard normal distribution, wherein the second test result is as follows:
performing OR operation on the first test result and the second test result to obtain a detection result corresponding to the granularity,
wherein sign is a sign function, G i Graph data representing a granularity of G i ' represents reconstructed data of a certain granularity, thr1 and thr2 represent preset thresholds;
the detection result r corresponding to each granularity function 、r class 、r package And r family And performing OR operation to obtain the detection result r.
According to another aspect of the present invention, there is provided a detection apparatus for a malicious application of security Zhuo Duan, comprising:
the extracting module is used for extracting a multi-granularity function call graph from each normal APK sample, and the granularity corresponding to the multi-granularity function call graph at least comprises: family, class, package and Function; each granularity function call graph comprises node information and side information;
the training module is used for training a corresponding variogram self-encoder for each granularity function call graph based on the normal APK sample, and the variogram self-encoder comprises an encoder and a decoder;
a modeling module for constructing an challenge sample detection model for each granularity using the variogram self-encoder; the countermeasure sample detection model is used for learning the data distribution of the APP normal samples from each granularity;
and the test module is used for inputting the detection sample into the countersample detection model after training is completed, and judging the detection result according to the hidden variable output by the encoder and the reconstruction result output by the decoder.
According to another aspect of the present invention there is provided an electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the detection method as described.
In general, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
1. the invention provides an anti-sample detection method based on the android APP field, which utilizes a normal sample to train an anti-sample detection model, improves the detection of malicious software with one granularity into the detection of malicious software with a plurality of granularities, and can improve the detection accuracy of the malicious software.
2. According to the invention, single class classification (one-class) is used for training an antagonistic sample detection model, so that dependence on a negative sample in the training process is reduced, and the generalization capability of the model is improved; in addition, the invention uses the discrete disturbance countermeasure sample to test the countermeasure sample detection model, can test the detection accuracy, and further can improve the detection performance of the countermeasure sample detection model.
3. According to the invention, the output distribution and the reconstruction result are obtained by inputting the detection sample into the countermeasure sample detection model after training is completed, the output distribution and the standard normal distribution are respectively compared with the reconstruction result and the detection sample from each granularity to obtain the difference information corresponding to each granularity, and finally the difference information corresponding to each granularity is processed or operated, and when the difference degree of any granularity is larger than the difference threshold value, the countermeasure sample can be judged, and the detection accuracy of the countermeasure sample can be improved.
Drawings
FIG. 1 is a flowchart of a method for detecting a malware challenge sample according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a process of extracting a function call graph from an APK according to an embodiment of the invention;
FIG. 3 is a diagram illustrating a function call graph with different granularity according to an embodiment of the present invention;
fig. 4 is a flow chart of implementing multi-granularity APP challenge sample detection based on a variogram-based self-encoder in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
According to one aspect of the present invention, as shown in fig. 1, there is provided a method for detecting a malware challenge sample, including:
s1: extracting a multi-granularity function call graph from each normal APK sample, wherein the granularity corresponding to the multi-granularity function call graph at least comprises: family, class, package and Function; the function call graph of each granularity includes node information and side information.
S2: training a corresponding variogram self-encoder for each granularity function call graph based on normal APK samples, wherein the variogram self-encoder comprises an encoder and a decoder.
S3: constructing an challenge sample detection model for each granularity using the variogram self-encoder; the countermeasure sample detection model is used for learning the data distribution of the APP normal samples from each granularity;
s4: and inputting the detection sample into the countersample detection model after training is finished, and judging a detection result according to the hidden variable output by the encoder and the reconstruction result output by the decoder.
In one embodiment, step S1 includes: s101: and generating a smali file from the original file corresponding to the normal APK sample through a decompilation tool. S102: and extracting function call relations with different granularities from the smali file to respectively form a function call graph with each granularity.
Specifically, as shown in fig. 2, the process of generating the smali file and the function call graph with each granularity includes: 1. generating various files from the original APK through a decompilation tool; 2. extracting a function call relation from the generated smali file to form a most basic function call graph; 3. as functions may be represented by different granularities. As shown in FIG. 2, the entire Function call graph is characterized by four dimensions, family, class, package, function, respectively. As shown in FIG. 3, the function call graph may be mapped with four new graphs, G function 、G class 、G package And G family And (3) representing.
It should be noted that, the call graph of each granularity is divided into two important components, including node information and edge information. The side information reflects the topological relation of the graph and corresponds to the calling relation among functions in the APK; the node information represents semantic information of each node. The generation of the challenge sample often breaks the semantic information of each node in the normal APK, so each node in the function call graph can be represented by a one-hot code, which provides effective support for the subsequent detection of the challenge sample.
After the vector of APP samples is acquired, a detection method needs to be designed to distinguish between challenge samples and normal samples. However, the generation of APP against samples is diverse and evolving, and it is difficult to propose a detection scheme for each method separately. On the other hand, it is difficult to obtain a large number of challenge samples in practice. Therefore, the conventional challenge sample training method cannot be used. To overcome these difficulties, the present invention proposes to detect APP challenge samples from multiple granularities based on a single class classification (one-class classification) model. The method has the advantages that only APP normal samples (including benign and malignant samples and commonly called normal samples) are needed to train the classification model, the method is still effective for various types of even unknown types of challenge samples, and meanwhile the characteristic of multiple granularities of the android malicious software characteristics is ingeniously utilized to resist the attack of the challenge samples from the characteristic layers.
In one embodiment, each granularity call graph contains node information and side information, and step S102 includes: extracting Function call relations from smali files by utilizing four dimensions of Family, class, package and Function and characterizing a multi-granularity Function call graph, which are respectively denoted as G function 、G class 、G package And G family . Each node in the function call graph representing a respective granularity is encoded with one-hot. The node information reflects semantic information of each node. The side information corresponding to the side formed by the two nodes reflects the topological relation of the function call graph and corresponds to the call relation among the functions in the APK.
In one embodiment, as shown in fig. 4, step S2 includes: and inputting each granularity function call graph corresponding to the normal APK sample into an encoder corresponding to the variable-division graph self-encoder so that the encoder outputs a characteristic matrix corresponding to Gaussian distribution. The encoder is built on the basis of the graph roll network GCN. The feature matrix is input into a decoder, hidden variables are obtained by sampling from the generated Gaussian distribution, and then the inner product operation is used for decoding to reconstruct samples.
The present invention does not directly distinguish between APP challenge samples and normal samples, but rather, maps samples to a hidden space (space) before comparing and distinguishing vector representations thereof. This procedure tends to use a related method of graph representation learning or graph embedding. The Graph convolutional neural network (Graph ConvolutionalNetwork, GCN) is a Graph representation learning method. The method is a natural popularization of the convolutional neural network on graph data, and can perform end-to-end learning on node attribute information and topology structure information at the same time. GCN is significantly superior to other methods in many tasks such as node classification and edge prediction, so complex graph data is represented in a vector form that can be processed by a machine learning model using GCN.
Specifically, a challenge sample detection model is constructed for each granularity using a variogram self Encoder (Variational Graph Auto-Encoder, VGAE). The proposal of this method is mainly based on two considerations: 1) In the low-dimensional hidden space, the normal sample is obviously distinguished from the countermeasure sample; 2) Compared with a normal sample, the reconstruction difficulty of the countersample is higher; 3) Even if a certain granularity of the model is defended against sample attacks, the model can defend against samples from other granularities well. The basic idea is as follows: firstly, training a variogram self-encoder by using a normal sample as a function call diagram of each granularity, so that the normal sample can be better reconstructed; once the model training is completed, the model reconstruction will fail once the APP challenge sample is entered, and it is difficult to fall into the distribution preset in the training phase in the low-dimensional hidden space.
In one embodiment, the mean μ and variance σ of the gaussian are:
wherein X is the feature matrix of one granularity semantic feature graph of the input, A is the adjacency matrix, D is the degree matrix, reLU is the activation function, and the generation process of the mean and variance shares W 0 Parameters;
and decoding by using the inner product operation to reconstruct samples to obtain a reconstructed adjacent matrix, which is marked as:
in one embodiment, the loss function of the challenge sample detection model is defined as:
L=-E q(Z|X,A) [logp(A|Z)]+KL[q(Z|X,A)|p(Z)];
wherein q (Z|X, A) is the distribution calculated by GCN, p (Z) is the standard Gaussian distribution, E q(Z|X,A) KL [ q (z|x, a) |p (Z) for decoder to generate a expectation of a]KL divergence of the distribution generated for the decoder and the standard gaussian distribution.
Specifically, the challenge sample detection model can understand the data distribution of the APP normal sample from each granularity, and can better keep the difference between the challenge sample and the normal sample in the hidden space. The invention proposes a method of measuring the difference between an antagonistic sample and a normal sample for data distribution in hidden space (called hidden layer distribution). Accordingly, the challenge sample need only be identified based on the reconstruction effect of the model and the hidden layer distribution. In particular, inputs that lead to poor reconstruction effects and excessive hidden layer distribution differences may be identified as challenge samples. The method does not need to obtain priori knowledge about the challenge sample in advance, has stronger generalization capability, and can effectively detect the challenge sample of different types and even unknown types.
In one embodiment, step S4 includes: s401: inputting a detection sample into a challenge sample detection model after training is finished so that the encoder outputs hidden variables, and comparing the mean value and variance of the hidden variables with normal distribution to obtain a detection result;
s402: inputting a detection sample into an countermeasure sample detection model after training is completed, and comparing a reconstruction result output by the countermeasure sample detection model with the detection sample; when a detection sample is input into the countermeasure sample detection model, if the difference between the reconstruction result and the sample is larger than a preset value, the sample reconstruction is failed, namely the sample is a countermeasure sample; and the difference between the reconstruction result and the sample is smaller than or equal to the preset value, which indicates that the sample is successfully reconstructed, namely the sample is a normal sample.
Specifically, in the specific detection, there are two points of discrimination indexes, 1) if the output mean μ and variance σ obtained for the encoder are smaller, the smaller the difference between the output mean μ and variance σ and the normal distribution is, the lower the probability that the app is an countermeasure sample is; 2) If the difference value between the input graph G and the output graph G' obtained during training of the preset detector is smaller, the graph reconstruction effect is better, the probability that the app is a challenge sample is lower, and otherwise, the app is judged to be the challenge sample. Therefore, whether the sample is a countermeasure sample is determined by using the difference between the encoder output distribution and the standard normal distribution and the difference value between the input to-be-detected graph and the output reconstructed graph.
In one embodiment, step S402 includes:
for each granularity, comparing a reconstruction result corresponding to one granularity with a detection sample to obtain a first test result, wherein the first test result is as follows:
obtaining a second test result by utilizing the difference between the output distribution of the encoder and the standard normal distribution, wherein the second test result is as follows:
performing OR operation on the first test result and the second test result to obtain a detection result corresponding to the granularity,
wherein sign is a sign function, G i Graph data representing a granularity of G i ' represents reconstructed data of a certain granularity, thr1 and thr2 represent preset thresholds, and i represents granularity type. thr1 and thr2 represent preset thresholds, which may be empirically determined; the difference of the normal samples may also be calculated, with the maximum value of the sample difference set as the threshold. r is (r) i =1 represents a determination that a certain apk is discriminated as a challenge sample at this granularity.
The detection result r corresponding to each granularity function 、r class 、r package And r family And performing OR operation to obtain a detection result r. Specifically, all discrimination results are integrated in an OR mode, so long as a certain granularity is adoptedA challenge sample is identified and the sample is identified as a challenge sample.
According to another aspect of the present invention, there is provided a detection apparatus for a malicious application of security Zhuo Duan, comprising:
the extracting module is used for extracting a multi-granularity function call graph from each normal APK sample, and the granularity corresponding to the multi-granularity function call graph at least comprises: family, class, package and Function; each granularity function call graph comprises node information and side information;
the training module is used for training a corresponding variable-division graph self-encoder for each granularity function call graph based on the normal APK sample, and the variable-division graph self-encoder comprises an encoder and a decoder;
a modeling module for constructing an antagonistic sample detection model for each granularity using a variational graph self-encoder; the antagonism sample detection model is used for learning the data distribution of the APP normal sample from each granularity;
and the test module is used for inputting the detection sample into the countersample detection model after training is completed, and judging the detection result according to the hidden variable output by the encoder and the reconstruction result output by the decoder.
According to another aspect of the present invention, there is provided an electronic device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the detection method as described.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (3)

1. A method for detecting a malware challenge sample, comprising:
s1: extracting a multi-granularity function call graph from each normal APK sample, wherein the granularity corresponding to the multi-granularity function call graph comprises the following steps: family, class, package and Function; each granularity function call graph comprises node information and side information;
s2: training a corresponding variogram self-encoder for each granularity function call graph based on the normal APK samples, wherein the variogram self-encoder comprises an encoder and a decoder;
s3: constructing an challenge sample detection model for each granularity using the variogram self-encoder; the countermeasure sample detection model is used for learning the data distribution of the APP normal samples from each granularity;
s4: inputting the detection sample into a countersample detection model after training is completed, and judging a detection result according to the hidden variable output by the encoder and the reconstruction result output by the decoder;
the S1 comprises the following steps: s101: generating a smali file by a decompilation tool from an original file corresponding to the normal APK sample; s102: extracting function call relations from the smali file to form a function call graph of each granularity; the S102 includes: extracting Function call relations from the smali file by utilizing four dimensions of Family, class, package and Function and characterizing the multi-granularity Function call graphs, which are respectively denoted as G function 、G class 、G package And G family The method comprises the steps of carrying out a first treatment on the surface of the Using one-hot code to represent each node in the function call graph with each granularity; the node information reflects semantic information of each node; the side information corresponding to the side between the two nodes reflects the topological relation of the function call graph and corresponds to the call relation between functions in the APK;
the step S2 comprises the following steps: inputting a function call graph of each granularity corresponding to the normal APK sample into an encoder corresponding to the variation graph self-encoder so that the encoder outputs a feature matrix corresponding to Gaussian distribution; the encoder is built based on a graph roll-up neural network GCN; inputting the feature matrix into a decoder, sampling from the generated Gaussian distribution to obtain hidden variables, and decoding by using inner product operation to reconstruct samples; the mean mu and variance sigma of the gaussian distribution are:
x is a feature matrix of a semantic feature map with one granularity, A is an adjacent matrix, D is a degree matrix, reLU is an activation function, and the generation process of the mean and the variance shares W 0 Parameters;
the reconstructed adjacent matrix obtained by decoding and sample reconstruction by using the inner product operation is marked as:ε~N(0,1);
the loss function of the challenge sample detection model is defined as:
L=-E q(Z|X,A) [logp(A|Z)]+KL[q(Z|X,A)|p(Z)];
q (Z|X, A) is the distribution calculated by GCN, p (Z) is the standard Gaussian distribution, E q(Z|X,A) KL [ q (z|x, a) |p (Z) for decoder to generate a expectation of a]KL divergence of the distribution generated for the decoder and the standard gaussian distribution;
the step S4 comprises the following steps: s401: inputting a detection sample into a challenge sample detection model after training is finished so that the encoder outputs hidden variables, and comparing the mean value and variance of the hidden variables with normal distribution to obtain a detection result; s402: inputting a detection sample into an countermeasure sample detection model after training is completed, and comparing a reconstruction result output by the countermeasure sample detection model with the detection sample; when a detection sample is input into the countermeasure sample detection model, if the difference between the reconstruction result and the sample is larger than a preset value, the sample reconstruction is failed, namely the sample is a countermeasure sample; the difference between the reconstruction result and the sample is smaller than or equal to the preset value, which indicates that the sample is successfully reconstructed, namely the sample is a normal sample;
the S401 includes: comparing the reconstruction result corresponding to one granularity with the detection sample to obtain a first test result aiming at each granularity, wherein the first test result is as follows:
i={function,class,package,family};
obtaining a second test result by utilizing the difference between the output distribution of the encoder and the standard normal distribution, wherein the second test result is as follows:
i= { function, class, package, family }; performing OR operation on the first test result and the second test result to obtain a detection result corresponding to the granularity, wherein the detection result is->i={function,class,package,family};
Wherein sign is a sign function, G i Graph data representing a granularity of G i ' represents reconstructed data of a certain granularity, thr1 and thr2 represent preset thresholds; the detection result r corresponding to each granularity function 、r class 、r package And r family And performing OR operation to obtain the detection result r.
2. A detection apparatus for a Zhuo Duan malicious application, configured to perform the method for detecting a malware challenge sample of claim 1, comprising:
the extracting module is used for extracting a multi-granularity function call graph from each normal APK sample, and the granularity corresponding to the multi-granularity function call graph at least comprises: family, class, package and Function; each granularity function call graph comprises node information and side information;
the training module is used for training a corresponding variogram self-encoder for each granularity function call graph based on the normal APK sample, and the variogram self-encoder comprises an encoder and a decoder;
a modeling module for constructing an challenge sample detection model for each granularity using the variogram self-encoder; the countermeasure sample detection model is used for learning the data distribution of the APP normal samples from each granularity;
and the test module is used for inputting the detection sample into the countersample detection model after training is completed, and judging the detection result according to the hidden variable output by the encoder and the reconstruction result output by the decoder.
3. An electronic device comprising a memory and a processor, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the detection method of claim 1.
CN202011630878.5A 2020-12-31 2020-12-31 Detection method and device for malware countermeasure sample and electronic equipment Active CN112749391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011630878.5A CN112749391B (en) 2020-12-31 2020-12-31 Detection method and device for malware countermeasure sample and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011630878.5A CN112749391B (en) 2020-12-31 2020-12-31 Detection method and device for malware countermeasure sample and electronic equipment

Publications (2)

Publication Number Publication Date
CN112749391A CN112749391A (en) 2021-05-04
CN112749391B true CN112749391B (en) 2024-04-09

Family

ID=75650752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011630878.5A Active CN112749391B (en) 2020-12-31 2020-12-31 Detection method and device for malware countermeasure sample and electronic equipment

Country Status (1)

Country Link
CN (1) CN112749391B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826059A (en) * 2019-09-19 2020-02-21 浙江工业大学 Method and device for defending black box attack facing malicious software image format detection model
CN111062036A (en) * 2019-11-29 2020-04-24 暨南大学 Malicious software identification model construction method, malicious software identification medium and malicious software identification equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503558B (en) * 2016-11-18 2019-02-19 四川大学 A kind of Android malicious code detecting method based on community structure analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826059A (en) * 2019-09-19 2020-02-21 浙江工业大学 Method and device for defending black box attack facing malicious software image format detection model
CN111062036A (en) * 2019-11-29 2020-04-24 暨南大学 Malicious software identification model construction method, malicious software identification medium and malicious software identification equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于函数调用图的Android恶意代码检测方法研究;李自清;;计算机测量与控制;20171025(第10期);全文 *

Also Published As

Publication number Publication date
CN112749391A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
Pei et al. AMalNet: A deep learning framework based on graph convolutional networks for malware detection
Vinayakumar et al. Evaluating deep learning approaches to characterize and classify the DGAs at scale
CN106713324B (en) Flow detection method and device
EP3422262A1 (en) Method of monitoring the performance of a machine learning algorithm
Li et al. Opcode sequence analysis of Android malware by a convolutional neural network
CN113283476A (en) Internet of things network intrusion detection method
CN111931179B (en) Cloud malicious program detection system and method based on deep learning
US20200159925A1 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
CN111159697B (en) Key detection method and device and electronic equipment
CN113408558B (en) Method, apparatus, device and medium for model verification
CN111753290A (en) Software type detection method and related equipment
CN110602120A (en) Network-oriented intrusion data detection method
Assefa et al. Intelligent phishing website detection using deep learning
CN114024761B (en) Network threat data detection method and device, storage medium and electronic equipment
Zhao et al. Natural backdoor attacks on deep neural networks via raindrops
Wang et al. An evolutionary computation-based machine learning for network attack detection in big data traffic
CN113542252A (en) Detection method, detection model and detection device for Web attack
CN112749391B (en) Detection method and device for malware countermeasure sample and electronic equipment
CN112580044A (en) System and method for detecting malicious files
CN116962009A (en) Network attack detection method and device
Wang et al. Malware detection using cnn via word embedding in cloud computing infrastructure
CN111488574A (en) Malicious software classification method, system, computer equipment and storage medium
CN114510720A (en) Android malicious software classification method based on feature fusion and NLP technology
CN112860573A (en) Smartphone malicious software detection method
CN117688565B (en) Malicious application detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant