CN116910753A

CN116910753A - Malicious software detection and model construction method, device, equipment and medium

Info

Publication number: CN116910753A
Application number: CN202310925369.2A
Authority: CN
Inventors: 陈达; 安通鉴; 王欣; 潘澳涔; 许浪骋; 范渊
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2023-10-20

Abstract

The application discloses a method, a device, equipment and a medium for detecting malicious software and constructing a model, which are applied to the field of information security, wherein the method for constructing the model comprises the following steps: constructing a supervised learning model and a self-supervised learning model; calculating a labeled software sample through a supervised learning model to obtain cross entropy loss of a supervised side; calculating the unlabeled software sample by using contrast learning through a self-supervision learning model to obtain contrast learning loss; and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model. The method is based on supervised malicious software detection, introduces a contrast learning framework in self-supervision learning, learns a large amount of unlabeled data in a production environment, enables a model to adapt to a rapidly-changing malicious software attack mode in a real network scene, can alleviate the aging problem of the model to a certain extent, and can detect new malicious attacks.

Description

Malicious software detection and model construction method, device, equipment and medium

Technical Field

The present application relates to the field of information security, and in particular, to a method, an apparatus, a device, and a medium for malware detection and model construction.

Background

The model for detecting the software is usually a model constructed based on a supervised machine learning or deep learning method, and the construction of the model needs a data set with labeling software, namely the data set comprises labeled malicious software and normal software, but the number of the data set with labels is usually small and cannot be obtained in a large amount; moreover, manual experience and marking strategies are needed for marking, so that time and labor are consumed; the labeling of the software in the data set has a larger influence on the software detection result; and, newly generated malware attack behavior cannot be detected.

Moreover, the malicious software is rich in variety, various in attack form and long in manual research and judgment marking time, and the existing malicious software detection technology based on machine learning often has the problems that a large amount of unlabeled data cannot be utilized, the model generalization is weak, incremental samples cannot be quickly learned and the like.

Disclosure of Invention

In view of the above, the present application aims to provide a method, a device, a medium for malware detection and model construction, which solves the problems of dependency on known data samples and incapability of detecting new attacks in the prior art.

In order to solve the technical problems, the application provides a method for constructing a malicious software detection model, which comprises the following steps:

Constructing a supervised learning model and a self-supervised learning model;

calculating the labeled software sample through the supervised learning model to obtain cross entropy loss of the supervised side;

calculating an unlabeled software sample by using contrast learning through the self-supervision learning model to obtain contrast learning loss;

and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain the malicious software detection model.

Optionally, the calculating, by the self-supervised learning model, the unlabeled software sample by using contrast learning to obtain contrast learning loss includes:

acquiring an unlabeled API sequence of the unlabeled software sample through a sandbox dynamic analysis technology;

before the unlabeled API sequence enters an embedding layer, performing data enhancement by shielding part of the unlabeled API by using a masking method in contrast learning to obtain enhanced positive and negative sample pairs;

and calculating the positive and negative sample pairs after the enhancement to obtain the contrast learning loss.

extracting feature vectors of the label-free API sequence, and obtaining enhanced positive and negative sample pairs by losing part of the feature vectors by using a dropout method in contrast learning;

before the unlabeled API sequence enters an embedding layer, performing first-stage data enhancement by using a masking method in contrast learning;

extracting feature vectors of the label-free API sequence, and carrying out second-stage data enhancement by using a dropout method in contrast learning to obtain enhanced positive and negative sample pairs;

Dividing the unlabeled software sample according to a preset proportion to obtain a first sample and a second sample;

acquiring a first API sequence corresponding to the first sample and a second API sequence corresponding to the second sample through a sandbox dynamic analysis technology;

extracting the feature vector of the first API sequence, and carrying out loss processing on part of the feature vector of the first API sequence by using a dropout method in contrast learning to obtain a reinforced first positive and negative sample pair;

before the second API sequence enters an enabling layer, performing data enhancement by shielding part of the second API by using a masking method in contrast learning, extracting feature vectors of the second API sequence, and performing loss processing on part of the feature vectors of the second API sequence by using a dropout method in contrast learning to obtain an enhanced second positive and negative sample pair;

and calculating the first positive and negative sample pair and the second positive and negative sample pair to obtain the contrast learning loss.

Optionally, the method further comprises:

taking the supervised learning model as a main task, wherein the supervised learning model is used for sample classification;

taking the self-supervision learning model as an auxiliary task, wherein the self-supervision learning model is used for sample similarity judgment;

The supervised learning model and the self-supervised learning model adopt a set of deep learning algorithm, and parameter sharing is carried out on an embedding layer.

The application also provides a malicious software detection method, which comprises the following steps:

acquiring an API sequence generated by software to be detected in a sandbox;

inputting the API sequence to an enabling layer, a feature extraction layer and a pooling layer in the constructed malicious software detection model to obtain feature vectors;

and obtaining a classification result through a softmax layer in the malicious detection model by using the feature vector.

The application also provides a malicious software detection model construction device, which comprises:

the construction module is used for constructing a supervised learning model and a self-supervised learning model;

the supervised calculation module is used for calculating the labeled software sample through the supervised learning model to obtain cross entropy loss of the supervised side;

the self-supervision calculation module is used for calculating the unlabeled software sample by using the self-supervision learning model through contrast learning to obtain contrast learning loss;

and the joint training module is used for carrying out fusion and joint training on the cross entropy loss and the contrast learning loss to obtain the malicious software detection model.

a memory for storing a computer program;

and the processor is used for realizing the steps of the method for constructing the malicious software detection model when executing the computer program.

The application also provides a storage medium, wherein the storage medium is stored with a computer program, and the computer program realizes the steps of the method for constructing the malicious software detection model when being executed by a processor.

Therefore, the application constructs a supervised learning model and a self-supervised learning model; calculating a labeled software sample through a supervised learning model to obtain cross entropy loss of a supervised side; calculating the unlabeled software sample by using contrast learning through a self-supervision learning model to obtain contrast learning loss; and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model. The method is based on supervised malicious software detection, introduces a contrast learning framework in self-supervision learning, learns a large amount of unlabeled data in a production environment, enables a model to adapt to a rapidly-changing malicious software attack mode in a real network scene, can alleviate the aging problem of the model to a certain extent, and can detect new malicious attacks.

In addition, the application also provides a method, a device, equipment and a medium for detecting malicious software, which have the same beneficial effects.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for constructing a malware detection model according to an embodiment of the present application;

FIG. 2 is a flowchart of another method for constructing a malware detection model according to an embodiment of the present application;

FIG. 3 is a flowchart of another method for constructing a malware detection model according to an embodiment of the present application;

FIG. 4 is a diagram illustrating an exemplary method for enhancing dropout data according to an embodiment of the present application;

FIG. 5 is a flowchart of another method for constructing a malware detection model according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for constructing a malware detection model according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating a method for detecting malware according to an embodiment of the present application;

FIG. 8 is an overall architecture diagram of a malware detection model provided by an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a device for constructing a malware detection model according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a malware detection model building device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

With the development and rising of the internet in china, more and more individuals and enterprises use internet services, and the security threat of malware to the internet is increasing. Malware refers to software that is not allowed to maliciously destroy and damage the network or the user's equipment. Malware is a wide variety including, but not limited to, trojans, worms, luxury software, spyware software, and combinations and variations thereof. Because windows is a personal user host system, malicious analysis of PE software is an important point for malware identification.

To assist software security analysts in analyzing malware and its hazards and intentions, researchers have studied methods of analyzing malware, which are broadly divided into two types, i.e., static analysis methods and dynamic analysis methods. The static analysis method is a method for analyzing malicious software under the condition of not running the software, and generally comprises signature comparison, character string matching, disassembly analysis and other methods; dynamic analysis methods typically operate suspicious software in a controlled isolation environment (e.g., sandboxes, virtual machines, etc.), monitor and collect system API (Application Programming Interface ) calls of the suspicious software, and then process and analyze the sequence of API calls, e.g., build software operational behavior characteristics and identify malware using statistical analysis or machine learning methods. An API refers to some function or method reserved in a computer system or library that is provided to other applications or library calls to achieve interoperability between different software. The static analysis method can accurately capture the static characteristics of the malicious software, but the malicious software can escape detection due to the single characteristic type, confusion or shelling and other technologies, so that the detection effect is reduced; the dynamic analysis method captures the behavior of the software in the execution process, and the running behavior of the malicious code cannot be hidden no matter how the malicious code is confused and covered by the malicious software author. The traditional dynamic analysis method is basically based on a supervised machine learning or deep learning method, and marked malicious software and normal software are needed, but often, a large amount of labels cannot be normally acquired.

The detection method for the malicious software in the specific prior art mainly comprises the following steps: 1. the method has the advantages that the node mutual information and the API based on the embedded vector similarity are adopted to construct the relation graph, and the method has good effect, but needs relatively much time and calculation resources and excessively depends on a data set. If the dataset is not complete, accurate, or not representative, then the results may deviate. 2. Positive samples are defined according to TF-IDF (term frequency-inverse document frequency, a common weighting technique for information retrieval and data mining) vector similarity by using a mixed positive sample selection strategy definition map to compare positive sample sets in a network, the extraction mode of the positive samples is excessively dependent on strategies and manual experience, and the wrong positive sample set leads to the effect of comparison learning.

Therefore, the application provides a malicious software detection model with light dependence, high accuracy and strong generalization, and a large amount of unlabeled data in a production environment is learned by introducing a contrast learning framework in self-supervision learning, so that the model can adapt to a malicious software attack mode which changes rapidly in a real network scene, the aging problem of the model can be slowed down to a certain extent, and new malicious attacks can be detected. The method comprises the following steps:

Example 1:

referring to fig. 1, fig. 1 is a flowchart of a method for constructing a malware detection model according to an embodiment of the present application. The method may include:

s101: and constructing a supervised learning model and a self-supervised learning model.

The execution body of the embodiment is a terminal. The present embodiment is not limited to the type of terminal, as long as the operation of constructing the malware detection model can be completed. For example, a dedicated terminal; or may also be a general purpose terminal. The present embodiment is not limited to a specific supervised learning model and self-supervised learning model.

S102: and calculating the labeled software sample through the supervised learning model to obtain the cross entropy loss of the supervised side.

The present embodiment is not limited to specific software, and may be any software that can be judged from an API sequence. For example, it may be PE software, PE (Portable Executable file) is a binary executable file format running on a Windows operating system, which contains executable programs, dynamically linked libraries, drivers, and other code.

S103: and calculating the unlabeled software sample by using the self-supervision learning model through contrast learning to obtain contrast learning loss.

Contrast learning (contrastive learning) is an unsupervised/self-supervised learning method that learns feature representations by maximizing the similarity of positive versus negative sample pairs. In contrast learning, each data sample is compared with other data samples to derive a similarity between them. For a pair of samples (x 1, x 2), the contrast learning algorithm calculates their distances d (x 1, x 2) in the feature space and then generates a scalar value s=g (d (x 1, x 2)) using a function g defined on the same feature space, representing the similarity of the pair of samples. In the subsequent training process, the comparison learning objective is to optimize the model parameters by maximizing the similarity of similar pairs of samples and minimizing the similarity of heterogeneous pairs of samples, thereby obtaining a feature representation that can accurately distinguish between different samples.

S104: and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model.

In this embodiment, fusing and combining the cross entropy loss and the contrast learning loss with training specifically may include: acquiring a preset weight alpha for comparing learning loss; cross entropy loss L according to weight _main And contrast learning loss L _auxiliary Performing joint calculation to obtain total loss L; back-propagating the total loss and continuously updating parameters; when the total loss converges, training is finished, and a malicious software detection model is obtained.

The specific calculation process is as follows:

L＝L _main +α·L _auxiliary ；

the loss function for calculating the cross entropy loss is softmax loss, which is formulated as:

wherein { z ₁ ,z ₂ …z _C And the model predicts the value of each software classification category.

The loss function for calculating the contrast learning loss is info NCE loss, with a set of samples { k for sample q ₁ ,k ₂ ,k ₃ … } in contrast, let k be the number of samples that match ₊ . The loss function for sample q is expressed as:

when q matches sample k with it ₊ Similarly, when it is different from all other samples (considered as negative examples of q), its value is lower; τ represents the temperature coefficient, which is an adjustable super-parameter.

By applying the method for constructing the malicious software detection model provided by the embodiment of the application, based on supervised malicious software detection, a comparison learning framework in self-supervision learning is introduced, and a large amount of unlabeled data in a production environment is learned, so that the model can adapt to a rapidly-changing malicious software attack mode in a real network scene, the aging problem of the model can be slowed down to a certain extent, and new malicious attacks can be detected.

Example 2:

referring to fig. 2, fig. 2 is a flowchart of another method for constructing a malware detection model according to an embodiment of the present application. The method may include:

s201: and constructing a supervised learning model and a self-supervised learning model.

S202: calculating a labeled software sample through a supervised learning model to obtain cross entropy loss of a supervised side;

s203: and acquiring the unlabeled API sequence of the unlabeled software sample through a sandbox dynamic analysis technology.

The principle of sandbox technology is to encapsulate an application program to run in a virtual "box" that has its own resources such as operating system, file system, network interface, etc., isolated from the host system. Thus, even if an application is infected or attacked, it can only affect the environment within the "box" and cannot harm the host system and other applications.

During the sandbox run phase, the sample is executed in the virtual environment, and the sandbox captures the system API execution sequence called by the sample in the environment using an API Hook (which is a scanning tool used to detect information generated during the running of a given program). By sandboxed dynamic analysis technique, a sample API sequence fragment is collected as shown in table 1, table 1 is an example of an API execution sequence:

TABLE 1

API number	API call name	API meaning
			0	CreateMutex	Creating mutex
1	GlobalMemoryStatus	Acquiring information of system memory
			2	ResumeThread	Starting suspended threads
…	…	…
			998	GlobalMemoryStatus	Acquiring information of system memory
999	NtCreateEvent	Opening event objects
			1000	ShellExecute	Running external programs

S204: before the unlabeled API sequence enters the embedding layer, the masking method in contrast learning is utilized to carry out data enhancement by shielding part of unlabeled APIs, and the enhanced positive and negative sample pairs are obtained.

S205: and calculating the positive and negative sample pairs after enhancement to obtain contrast learning loss.

S206: and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model.

In the embodiment, the masking method is used for carrying out data enhancement on the API sequence, wherein the masking method refers to randomly masking a preset proportion of APIs for the API sequence of the same sample of the input model, so that a positive enhancement sample is generated; the API sequence for the different samples of the input model randomly masks a predetermined proportion of the APIs, thereby producing negative enhancement samples.

By applying the method for constructing the malicious software detection model provided by the embodiment of the application, based on supervised malicious software detection, a comparison learning framework in self-supervision learning is introduced, and a large amount of unlabeled data in a production environment is learned, so that the model can adapt to a rapidly-changing malicious software attack mode in a real network scene, the aging problem of the model can be slowed down to a certain extent, and new malicious attacks can be detected. And, the masking method is utilized to carry out data enhancement on the API sequence, so that the robustness of the model is improved.

Example 3:

referring to fig. 3, fig. 3 is a flowchart of another method for constructing a malware detection model according to an embodiment of the present application. The method may include:

s301: and constructing a supervised learning model and a self-supervised learning model.

S302: calculating a labeled software sample through a supervised learning model to obtain cross entropy loss of a supervised side;

s303: and acquiring the unlabeled API sequence of the unlabeled software sample through a sandbox dynamic analysis technology.

S304: extracting feature vectors of the unlabeled API sequence, and obtaining enhanced positive and negative sample pairs by carrying out losing treatment on part of the feature vectors by utilizing a dropout method in contrast learning.

S305: and calculating the positive and negative sample pairs after enhancement to obtain contrast learning loss.

S306: and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model.

In the embodiment, the data enhancement is performed on the API sequence by using a dropout method, wherein the dropout method is to make the same sample pass through the model twice to obtain two different positive example pairs of feature vectors, and the feature vectors obtained by the model of different samples are regarded as negative example pairs. Because of the dropout layer in the model, neurons randomly discard some of the parameters, so the same sample can be characterized by not being completely consistent through the model. The dropout data enhancement method is shown in fig. 4, and fig. 4 is an exemplary diagram of a dropout data enhancement method according to an embodiment of the present application. The model builds a positive example pair and a negative example pair through a dropout layer so as to carry out comparison learning. The present embodiment is not limited to the extraction method of the feature vector. For example, textCNN (an algorithm for classifying text by using convolutional neural network) may be used to extract feature vectors of API sequences; alternatively, a model based on a transducer architecture may be used to extract feature vectors of the API sequence.

By applying the method for constructing the malicious software detection model provided by the embodiment of the application, based on supervised malicious software detection, a comparison learning framework in self-supervision learning is introduced, and a large amount of unlabeled data in a production environment is learned, so that the model can adapt to a rapidly-changing malicious software attack mode in a real network scene, the aging problem of the model can be slowed down to a certain extent, and new malicious attacks can be detected. And the data enhancement is carried out on the API sequence by using a dropout method, so that the robustness of the model is improved.

Example 4:

referring to fig. 5, fig. 5 is a flowchart of another method for constructing a malware detection model according to an embodiment of the present application. The method may include:

s401: and constructing a supervised learning model and a self-supervised learning model.

S402: and calculating the labeled software sample through the supervised learning model to obtain the cross entropy loss of the supervised side.

S403: and acquiring the unlabeled API sequence of the unlabeled software sample through a sandbox dynamic analysis technology.

S404: before the unlabeled API sequence enters the ebadd layer, the first stage data enhancement is performed by using a mask method in contrast learning.

S405: extracting feature vectors of the unlabeled API sequence, and performing second-stage data enhancement by using a dropout method in contrast learning to obtain enhanced positive and negative sample pairs.

S406: and calculating the positive and negative sample pairs after enhancement to obtain contrast learning loss.

S407: and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model.

In the embodiment, the masking method and the dropout method are adopted to enhance the data of the API sequence, and the masking method is introduced because the dropout method can enable the high correlation between two enhanced samples and enable the model learning task to be too simple.

By applying the method for constructing the malicious software detection model provided by the embodiment of the application, based on supervised malicious software detection, a comparison learning framework in self-supervision learning is introduced, and a large amount of unlabeled data in a production environment is learned, so that the model can adapt to a rapidly-changing malicious software attack mode in a real network scene, the aging problem of the model can be slowed down to a certain extent, and new malicious attacks can be detected. And the data enhancement methods such as masking and dropout are integrated into the field of malicious software identification, so that the robustness of the model is improved.

Example 5:

referring to fig. 6, fig. 6 is a flowchart of a method for constructing a malware detection model according to an embodiment of the present application. The method may include:

S501: and constructing a supervised learning model and a self-supervised learning model.

S502: calculating a labeled software sample through a supervised learning model to obtain cross entropy loss of a supervised side;

s503: dividing the unlabeled software samples according to a preset proportion to obtain a first sample and a second sample.

S504: acquiring a first API sequence corresponding to a first sample and a second API sequence corresponding to a second sample through a sandbox dynamic analysis technology;

s505: extracting a feature vector of a first API sequence, and carrying out loss processing on part of the feature vector of the first API sequence by using a dropout method in contrast learning to obtain a reinforced first positive and negative sample pair;

s506: before the second API sequence enters the embedding layer, performing data enhancement by shielding part of the second API by using a masking method in contrast learning, extracting feature vectors of the second API sequence, and performing loss processing on part of the feature vectors of the second API sequence by using a dropout method in contrast learning to obtain an enhanced second positive and negative sample pair;

s507: and calculating the first positive and negative sample pair and the second positive and negative sample pair to obtain the contrast learning loss.

S508: and carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model.

In the embodiment, a masking method and a dropout method are adopted to carry out data enhancement on an API sequence, and a comparison learning task is divided into two parts, wherein the first part of comparison learning task only adopts the dropout method to carry out data enhancement, and the second part of comparison learning task adopts the dropout method and the masking method to carry out data enhancement. The specific data enhancement method of the model can be as follows: 70% of the contrast learning tasks are completed by a dropout method, and 30% of the contrast learning tasks are completed by data enhancement contrast learning in masking and dropout stages.

By applying the method for constructing the malicious software detection model provided by the embodiment of the application, based on supervised malicious software detection, a comparison learning framework in self-supervision learning is introduced, and a large amount of unlabeled data in a production environment is learned, so that the model can adapt to a rapidly-changing malicious software attack mode in a real network scene, the aging problem of the model can be slowed down to a certain extent, and new malicious attacks can be detected. And the contrast learning task is divided into two parts, namely the non-label software sample is divided into two parts, the non-label software sample of the first part only adopts the data enhancement of the dropout method, and the non-label software sample of the second part adopts the data enhancement method of two stages of masking and dropout, thereby being more beneficial to improving the robustness of the model.

In the prior art, the effect of the model is improved by using the API relationship information fused by multiple views, but better universality and robustness are not brought to the model, so that the invention introduces multi-task learning to enable the model to process various data and tasks, thereby obtaining better universality and robustness. And the coupling degree of the frame module is too high, so that the module can not be adjusted and switched according to the real deployment environment. Thus, based on any of the embodiments described above, the following steps may also be included:

step 61: taking the supervised learning model as a main task, wherein the supervised learning model is used for sample classification;

step 62: taking the self-supervision learning model as an auxiliary task, wherein the self-supervision learning model is used for sample similarity judgment; the supervised learning model and the self-supervised learning model adopt a set of deep learning algorithm, and parameter sharing is carried out on an embedding layer.

Multitasking (multitask learning) is a method of learning multiple related tasks to improve the generalization ability and efficiency of a model by sharing model parameters. In multitask learning, different tasks share the same set of neural network models, each task corresponding to a different output layer and loss function. These tasks may be interrelated, related but not identical, or independent. By sharing the underlying feature extractor, features can be shared among multiple tasks and optimizing the same parameters to advance toward the direction of minimum target loss function. In the training process, the network considers training data of all tasks at the same time, and updates network parameters according to weights of different loss functions. The mode of multi-task learning of the model is multi-task semi-supervised learning, namely the joint optimization of a supervised classification main task and an unsupervised comparison learning auxiliary task.

Based on any embodiment, the supervised learning model and the contrast learning model are fused by introducing a multi-task learning mechanism, so that richer features are learned, the overfitting risk of the malicious software detection model is reduced, and the generalization capability of the model is further enhanced. The parameter sharing method based on the hard constraint is introduced to the task identified by the malicious software, multitask learning is adopted, a plurality of tasks share hidden layer parameters, and meanwhile, the output layers of the plurality of tasks are reserved, so that the risk of model overfitting is greatly reduced. Meanwhile, a malicious sample classification task is used as a main task, and self-supervision learning judgment sample similarity is used as an auxiliary task, so that the problem of few labeled samples is solved, and the model can learn knowledge from a large number of unlabeled samples produced on line.

The following describes a method for detecting malware provided by the embodiment of the present application, where the method for detecting malware described below and the method for constructing a malware detection model described above may be referred to correspondingly.

Referring specifically to fig. 7, fig. 7 is a flowchart illustrating a method for detecting malware according to an embodiment of the present application.

S601: acquiring an API sequence generated by software to be detected in a sandbox;

S602: inputting an API sequence to an enabling layer, a feature extraction layer and a pooling layer in the constructed malicious software detection model to obtain feature vectors;

s603: and obtaining a classification result through a softmax layer in the malicious detection model by using the feature vector.

By applying the malicious software detection method provided by the embodiment of the application, the adopted malicious construction model can determine the classification result of the software to be detected according to the API sequence corresponding to the software to be detected.

In order to facilitate understanding of the present application, referring to fig. 8 specifically, fig. 8 is an overall architecture diagram of a malware detection model provided in an embodiment of the present application, which may specifically include: two modules: a supervised learning module (primary task) and a self-supervised learning module (secondary task). The two modules share parameters at the Embedding layer, and the characteristics of the API sequence are extracted by using the textCNN. The supervised learning module calculates the loss through the software samples with malicious/normal labels, and the self-supervised learning module utilizes masking, dropout and other modes in contrast learning to construct enhanced positive and negative sample pairs to calculate the loss. Finally, the loss of the two supervised and self-supervised modules is fused through a multi-task learning mechanism, and the model is trained in a combined mode, so that the model can learn from a labeled sample and can learn from an unlabeled sample.

Taking malicious detection of a PE file as an example, the purpose is to distinguish whether the PE file has malicious behaviors, wherein the steps of a model training stage comprise:

step one: putting a PE file (PE malicious file and normal file) with a label into a sandbox for running test, collecting behavior data generated by the PE file in the sandbox, and extracting an API function call sequence, namely an API sequence, as a data set;

step two: placing the PE file without the tag into a sandbox for running test, and collecting an API function call sequence, namely an API sequence, in the PE file as a training data set for comparison learning;

step three: putting the labeled API sequence data into a supervised learning side for learning, and putting the unlabeled API sequence data into a self-supervising side (contrast learning) for learning;

step four: and respectively entering the API sequences with the supervision and the non-supervision data into an embedding layer and a feature extraction layer (such as TextCNN) to obtain feature vectors.

Step five: for the self-supervision comparison learning module, a learning sample is randomly extracted from the label-free training data set; and the positive example pair obtained after the same sample passes through the dropout layer twice is different from the negative example pair obtained after the same sample passes through the dropout layer.

Step six: and constructing a diversity of contrast learning data enhancement, wherein 70% of contrast learning tasks are completed by a dropout method, and 30% of contrast learning tasks are completed by data enhancement contrast learning in masking and dropout stages.

Step six: and (3) calculating the cross entropy loss of the supervised measurement after the feature vector in the step four passes through the softmax layer, and simultaneously calculating the contrast learning loss of the self-supervision learning side.

Step seven: the cross entropy loss, the contrast learning loss, the summation and the back propagation are carried out, and the model parameter gradient is updated.

The specific steps of testing by using the malicious detection model are as follows:

step one: and placing the PE file to be tested into a sandbox for running test, collecting behavior data generated by the PE file in the sandbox, and extracting an API sequence.

Step two: the API sequence respectively enters an embedding layer, a feature extraction layer (textCNN) and a pooling layer to obtain feature vectors of the sample.

Step three: finally, the classification result of the sample to be detected is obtained through the softmax layer.

The following describes a device for constructing a malware detection model provided by the embodiment of the present application, where the device for constructing a malware detection model described below and the method for constructing a malware detection model described above can be referred to correspondingly.

Referring to fig. 9 specifically, fig. 9 is a schematic structural diagram of a device for constructing a malware detection model according to an embodiment of the present application, which may include:

a construction module 100 for constructing a supervised learning model and a self-supervised learning model;

The supervised calculation module 200 is used for calculating the labeled software sample through the supervised learning model to obtain cross entropy loss of the supervised side;

the self-supervision computing module 300 is configured to compute the unlabeled software sample by using the self-supervision learning model through contrast learning, so as to obtain contrast learning loss;

and the joint training module 400 is used for carrying out fusion and joint training on the cross entropy loss and the contrast learning loss to obtain the malicious software detection model.

Based on the above embodiment, the self-monitoring computing module 300 may include:

the API sequence first acquisition unit is used for acquiring the unlabeled API sequence of the unlabeled software sample through a sandbox dynamic analysis technology;

the first data enhancement unit is used for enhancing data by shielding part of the unlabeled APIs by using a masking method in contrast learning before the unlabeled API sequence enters the embedding layer, so as to obtain enhanced positive and negative sample pairs;

and the first calculation unit is used for calculating the positive and negative sample pairs after the enhancement to obtain the contrast learning loss.

The API sequence second acquisition unit is used for acquiring the unlabeled API sequence of the unlabeled software sample through a sandbox dynamic analysis technology;

the second data enhancement unit is used for extracting the feature vector of the label-free API sequence, and obtaining enhanced positive and negative sample pairs by carrying out losing processing on part of the feature vector by utilizing a dropout method in contrast learning;

and the second calculation unit is used for calculating the positive and negative sample pairs after the enhancement to obtain the contrast learning loss.

the third acquisition unit of API sequence is used for acquiring the label-free API sequence of the label-free software sample through sandbox dynamic analysis technology;

the third data enhancement unit is used for performing first-stage data enhancement by using a masking method in contrast learning before the unlabeled API sequence enters an embedding layer;

a fourth data enhancement unit, configured to extract a feature vector of the unlabeled API sequence, perform second-stage data enhancement by using a dropout method in contrast learning, and obtain an enhanced positive and negative sample pair;

and the third calculation unit is used for calculating the positive and negative sample pairs after the enhancement to obtain the contrast learning loss.

the sample dividing unit is used for dividing the unlabeled software samples according to a preset proportion to obtain a first sample and a second sample;

an API sequence fourth obtaining unit, configured to obtain, by using a sandbox dynamic analysis technique, a first API sequence corresponding to the first sample and a second API sequence corresponding to the second sample;

a fifth data enhancement unit, configured to extract a feature vector of the first API sequence, and perform loss processing on a part of the feature vector of the first API sequence by using a dropout method in contrast learning, so as to obtain an enhanced first positive and negative sample pair;

a sixth data enhancement unit, configured to perform data enhancement by shielding a part of the second API by using a masking method in contrast learning before the second API sequence enters the unbedding layer, extract a feature vector of the second API sequence, and perform loss processing on a part of the feature vector of the second API sequence by using a dropout method in contrast learning, so as to obtain an enhanced second positive and negative sample pair;

and the fourth calculation unit is used for calculating the first positive and negative sample pair and the second positive and negative sample pair to obtain the contrast learning loss.

Based on any of the foregoing embodiments, the malware detection model building apparatus may further include:

the main task module is used for taking the supervised learning model as a main task, and the supervised learning model is used for sample classification;

the auxiliary task module is used for taking the self-supervision learning model as an auxiliary task, and the self-supervision learning model is used for sample similarity judgment; the supervised learning model and the self-supervised learning model adopt a set of deep learning algorithm, and parameter sharing is carried out on an embedding layer.

It should be noted that, the modules and units in the malware detection model building device can change the order of the modules and units before and after the modules and units do not affect the logic.

The malicious software detection model construction provided by the embodiment of the application is applied to construct a supervised learning model and a self-supervised learning model through the construction module 100; the supervised calculation module 200 is used for calculating the labeled software sample through the supervised learning model to obtain cross entropy loss of the supervised side; the self-supervision computing module 300 is configured to compute the unlabeled software sample by using contrast learning through a self-supervision learning model, so as to obtain contrast learning loss; and the joint training module 400 is used for carrying out fusion and joint training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model. Based on supervised malicious software detection, a contrast learning framework in self-supervision learning is introduced, and a large amount of unlabeled data in a production environment is learned, so that the model can adapt to a rapidly-changing malicious software attack mode in a real network scene, the aging problem of the model can be slowed down to a certain extent, and new malicious attacks can be detected. And the masking method and the dropout method are utilized to carry out data enhancement on the API sequence, so that the robustness of the model is improved. And by introducing a multi-task learning mechanism to fuse a supervised learning model with a contrast learning model, richer features are learned, the overfitting risk of the malicious software detection model is reduced, and the generalization capability of the model is further enhanced.

The following describes a malware detection model construction device provided by an embodiment of the present application, where the malware detection model construction device described below and the malware detection model construction method described above may be referred to correspondingly.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a malware detection model construction device according to an embodiment of the present application, which may include:

a memory 10 for storing a computer program;

a processor 20 for executing a computer program to implement the malware detection model construction method described above.

The memory 10, the processor 20, and the communication interface 31 all communicate with each other via a communication bus 32.

In the embodiment of the present application, the memory 10 is used for storing one or more programs, the programs may include program codes, the program codes include computer operation instructions, and in the embodiment of the present application, the memory 10 may store programs for implementing the following functions:

constructing a supervised learning model and a self-supervised learning model;

calculating a labeled software sample through a supervised learning model to obtain cross entropy loss of a supervised side;

calculating the unlabeled software sample by using contrast learning through a self-supervision learning model to obtain contrast learning loss;

And carrying out fusion and combination training on the cross entropy loss and the contrast learning loss to obtain a malicious software detection model.

In one possible implementation, the memory 10 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, and at least one application program required for functions, etc.; the storage data area may store data created during use.

In addition, memory 10 may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various basic tasks as well as handling hardware-based tasks.

The processor 20 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a fpga or other programmable logic device, and the processor 20 may be a microprocessor or any conventional processor. The processor 20 may call a program stored in the memory 10.

The communication interface 31 may be an interface of a communication module for connecting with other devices or systems.

Of course, it should be noted that the structure shown in fig. 10 does not limit the malware detection model building apparatus in the embodiment of the present application, and the malware detection model building apparatus may include more or fewer components than those shown in fig. 10, or may combine some components in practical applications.

The following describes a readable storage medium provided by an embodiment of the present application, where the medium described below and the method for constructing a malware detection model described above may be referred to correspondingly.

The application also provides a medium, wherein the medium is stored with a computer program, and the computer program realizes the steps of the method for constructing the malicious software detection model when being executed by a processor.

The medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Finally, it is further noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The above detailed description of the method, the device, the equipment and the medium for detecting and constructing the malicious software provided by the application applies specific examples to illustrate the principle and the implementation of the application, and the description of the above examples is only used for helping to understand the method and the core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The method for constructing the malicious software detection model is characterized by comprising the following steps of:

constructing a supervised learning model and a self-supervised learning model;

2. The method for constructing a malware detection model according to claim 1, wherein the calculating, by the self-supervised learning model, the unlabeled software sample by using contrast learning to obtain contrast learning loss includes:

3. The method for constructing a malware detection model according to claim 1, wherein the calculating, by the self-supervised learning model, the unlabeled software sample by using contrast learning to obtain contrast learning loss includes:

4. The method for constructing a malware detection model according to claim 1, wherein the calculating, by the self-supervised learning model, the unlabeled software sample by using contrast learning to obtain contrast learning loss includes:

5. The method for constructing a malware detection model according to claim 1, wherein the calculating, by the self-supervised learning model, the unlabeled software sample by using contrast learning to obtain contrast learning loss includes:

6. The malware detection model construction method according to any one of claims 1 to 5, further comprising:

7. A method of malware detection, comprising:

acquiring an API sequence generated by software to be detected in a sandbox;

Inputting the API sequence to an embellishing layer, a feature extraction layer and a pooling layer in a malicious software detection model constructed in the claims 1 to 6 to obtain feature vectors;

8. A malware detection model construction apparatus, characterized by comprising:

9. A malware detection model construction apparatus, characterized by comprising:

a memory for storing a computer program;

a processor for implementing the steps of the malware detection model construction method according to any one of claims 1 to 7 when executing the computer program.

10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the malware detection model construction method of any of claims 1 to 7.