CN115879109A - Malicious software identification method based on visual transform - Google Patents
Malicious software identification method based on visual transform Download PDFInfo
- Publication number
- CN115879109A CN115879109A CN202310063452.3A CN202310063452A CN115879109A CN 115879109 A CN115879109 A CN 115879109A CN 202310063452 A CN202310063452 A CN 202310063452A CN 115879109 A CN115879109 A CN 115879109A
- Authority
- CN
- China
- Prior art keywords
- layer
- tensor
- attention
- model
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
A malicious software identification method based on visual Transformer belongs to the technical field of software security protection, and comprises the steps of visualizing an executable file of benign/malicious software into RGB images, and constructing a malicious software image dataset; pre-training a visual Transformer by adopting an ImageNet-21K image data set, and finely adjusting by adopting a malicious software image data set; constructing a lightweight visual Transformer for actual deployment on lightweight equipment; migrating the knowledge of the well-trained visual Transformer to a lightweight visual Transformer based on knowledge distillation to reduce the performance difference between the two models; detection and family classification of malware is performed using lightweight visual transformers. The detection efficiency of the model, lower hardware resource occupation, detection of the model and family classification precision are guaranteed.
Description
Technical Field
The invention relates to the technical field of software security protection, in particular to a malicious software identification method based on a visual Transformer.
Background
With the rise of the internet of things, the types and the number of the devices of the internet of things are exponentially increased. An embedded system carried by the internet of things equipment often lacks consideration of security factors, and has a wider attack surface compared with a mature Windows and Linux system, and the focus of attention of a malicious software manufacturer is gradually shifted to the internet of things equipment. Therefore, a method for detecting malware that is faster and more effective is needed to protect internet of things devices from malware. Currently, most antivirus vendors employ signature-based or rule-based techniques to detect malware, relying on constant updates to the malicious signature and rule libraries to detect more malware, but their low generalization performance makes them inadequate to cope with the growing new types of network threats. Machine learning-based malware identification has become one of the research hotspots in recent years, extracting features from software, and relying on machine learning algorithms to automatically perform malware detection or classification. At present, a method of visualizing software as a gray-scale map and then automatically performing feature extraction using a Convolutional Neural Network (CNN) end-to-end has proven to be one of the most effective methods. However, the inherent locality, translation and other degeneration of CNN induces preference, and has natural advantages for processing natural images, while software can be visualized as a gray-scale image, a 1D byte sequence is forcibly converted into a 2D gray-scale image, and longitudinal pixel points of the 2D gray-scale image do not have any correlation. Therefore, the way of processing software gray maps with CNN has some irrationality, so that its result may be suboptimal.
Under the condition of sufficient training data, the model with high complexity has stronger pattern recognition capability than the model with low complexity. However, the hardware resources and time cost inside the device, such as a large amount of memory and computational power consumed by the high-complexity model, make it unfavorable for deployment on the lightweight device. Most of the internet of things equipment is lightweight equipment with extremely limited hardware resources, and low resource occupation is one of necessary conditions of a safety protection model deployed in the internet of things equipment. Therefore, under the condition that the occupation of model resources is low enough, how to quickly, accurately and effectively detect the malicious software and judge the family to which the malicious software belongs to adopt different coping methods is one of the problems to be solved urgently.
Disclosure of Invention
In order to overcome the defects of the technologies, the invention provides a method which can accurately detect malicious software and judge the family to which the malicious software belongs while ensuring that the hardware occupation of the model is low.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a malware identification method based on visual transform comprises the following steps:
(a) Acquiring an ImageNet-21K image dataset and an executable file dataset of application software, wherein the executable file dataset comprises an executable file of benign software and a malicious software executable file comprising a family tag, and visualizing all samples in the executable file dataset into RGB images to construct a malicious software image dataset;
(b) Building a visual Transformer model containing an X-layer encoder, carrying out classification pre-training on the visual Transformer model by adopting an ImageNet-21K image data set, changing a full connection layer in the visual Transformer model after the classification pre-training into an ordered double-task classifier for carrying out malicious software detection and family classification, and carrying out fine tuning on the visual Transformer model by adopting a malicious software image data set;
(c) Constructing a lightweight visual Transformer model for actual deployment;
(d) Taking the trimmed vision Transformer model as a teacher model, taking the lightweight vision Transformer model as a student model, and performing distillation training on the student model by taking a self-attention matrix and a hidden layer state of the teacher model and a predicted logits of the double-task classifier as supervision information of the student model;
(e) And (3) judging benign software or malicious software and judging the family label of the malicious software by using the distillation-trained lightweight visual Transformer model on the unknown software.
Further, the step of visualizing all samples in the executable file data set as RGB images in step (a) comprises:
(a-1) reading the executable file of the application software in hexadecimal, and converting the hexadecimal number into a decimal number to represent the executable file of the application software as a decimal number sequence with a value range of [0,255 ];
(a-2) a decimal value sequence length ofThe length of the sequence->Has a width of ^ 4>,,/>To round down;
(a-3) adjacent three decimal numbers in the decimal value sequence as R of a single pixel in sequenceObtaining the visual RGB image of the executable file by the channel value, the G channel value and the B channel value,/>,/>Is a real number space, is>For a high image, 3 is the number of channels of the image, and the visualized RGB images of all executable files constitute the malware image dataset.
Further, the step (b) comprises the steps of:
(b-1) the visual Transformer model sequentially comprises 12 layers of encoders and a multilayer perceptron MLP, and each encoder sequentially comprises a first normalization layer LayerNorm, a multi-Head self-Attention mechanism Muti-Head Attention, a first residual connecting layer, a second normalization layer LayerNorm, a multilayer perceptron MLP and a second residual connecting layer;
(b-2) RGB image to be visualizedZooming to obtain zoomed visual RGB imageWherein->High,. For a zoomed visualized RGB image>For the width of the scaled visualized RGB image, the scaled visualized RGB image ≦ based on the Flatten function in the torch library>Middle and fifth>Row pixel valuesThe flattening treatment is->,/>Visual RGB image of 3DConversion into a 2D line sequence>,/>;
(b-3) 2D line sequenceMapping each element in the sequence to ≥ via a linear layer>Dimension resulting row embedding>,Learnable class label tensor using cat function in a torch library>And line embedding>Splicing to obtain spliced tensor, and embedding the spliced tensor and the learnable absolute positionMake an addition to obtain a tensor->,/>;
(b-4) tensorInputting the data into a first normalization layer LayerNorm of a layer 1 encoder of a visual Transformer model for normalization to obtain tensor->The multi-headed self-Attention mechanism Muti-Head Attention of the layer 1 encoder includes ^ H>Attention head, will tensor->Respectively input into a multi-Head self-Attention mechanism Muti-Head Attention, and the fifth/fifth judgment>Individual attention head pair tensor->Respectively carrying out linear mapping to obtain query matrixesKey matrix>Value matrix->,/>,,/>,/>,/>、/>、/>A weight matrix which is a linear transformation, is->、/>、/>Are all bias vectors, by formula>Calculating an embedding ÷ based on fusion of global attention>,/>In the formula>For attention scoring, is based on>,/>,For transposition, in>For Softmax activation function, pass cat function in the torch libraryNumber will>Global attention fused embed of individual attention head outputs>Splicing is carried out, and the splicing result and the tensor are greater or less>Sequentially inputting the data into a first residual connecting layer and a second normalization layer LayerNorm of a layer 1 encoder, and outputting to obtain a tensor,/>Will have a tensor->Input into the multi-layer perceptron MLP of the layer 1 encoder through a formulaThe tensor is calculated>In the formula>For the GELU activation function, <' >>Is a weight matrix of the neurons in the first layer of the multi-layer perceptron MLP->,/>Is a weight matrix of the neurons in the second layer of the multi-layer perceptron MLP->,/>For bias vectors in first layer neurons in a multi-layer perceptron MLP @>,/>For bias vectors in first layer neurons in a multi-layer perceptron MLP @>Will have a tensor->Inputting the data into a second residual connecting layer of the layer 1 encoder, and outputting to obtain the output tensor of the layer 1 encoder>,/>Embedding dimension of the first layer of neurons in the multi-layer perceptron MLP;
(b-5) tensorTensor in alternative step (b-4)>Repeating the step (b-4) to obtain an output tensor { } for the layer 2 encoder>;
(b-6) tensorThe tensor in the alternative step (b-5) is/are based>Repeating step (b-5) for output tensor/based on layer 3 encoder>;
(b-7) the firstThe output of each encoder is taken as the ^ h->An input of an encoder, based on the number of the encoder units>Repeating the step (b-6) to obtain the tensor/device output by the encoder of the 12 th layer>,/>;
(b-8) tensorThe vector for the 0 th position in is the learnable classification mark tensor->In a combined block of embedded vectors>,/>Will embed the vector &>Inputting the data into a multi-layer perceptron MLP of a vision Transformer model, and outputting the data to obtain tensor>,/>Will tensor->Input into the full connection layer FC to obtainA classification result output by the vision Transformer model;
(b-9) carrying out classification pre-training on the visual transform model by adopting ImageNet-21K image data set.
Further, in the step (b-3), the 2D lines are sequencedIs input into a linear layer and is processed by the formula>Calculate line insert->In the formula>For the weight matrix of the linear mapping layer, <' >>,/>Is a bias vector>。
Further, in the step (b), the step of changing the full link layer in the visual Transformer model after classification pre-training into an ordered double-task classifier for malware detection and family classification comprises the following steps:
(b-10) changing the full link FC in the step (b-8) into an ordered double-task classifier, wherein the ordered double-task classifier comprises a detection task for detecting malicious software and a family classification task for judging the family of the malicious software, and the detection task and the family classification task are both formed by two full link FCs;
(b-11) tensorInput into the detection task and based on the formula>The prediction logits of the detection task is calculated>,/>In the formula>For the detection of the weight matrix of the first fully connected layer FC>,/>For the detection task, the weight matrix of the second full link layer FC>,For the first offset vector of the full connection layer FC of the detection task->,/>For the second offset vector of the fully connected layer FC of the detection task->;
(b-12) tensorInput into family classification task by formulaCalculating to obtain the predicted logit of the family classification task,/>In the formula>For the weight matrix of the first fully-connected layer FC of the family classification task, <>,A weight matrix for the second fully-connected layer FC for the family classification task, <>,/>For the bias vector of the first fully-connected layer FC of the family classification task, <>,/>For the bias vector of the second fully-connected layer FC of the family classification task, <>,/>Is the number of families.
Further, the step of fine-tuning the visual Transformer model by using the malware image dataset in the step (b) comprises the following steps:
(b-13) by the formulaCalculated loss>In the formula>For Sigmoid activation function, <' > based on>In order to be a binary cross-entropy loss,for cross entropy loss>For detecting task tags, in conjunction with a timer>0 means benign, 1 means malicious and/or->Is a malicious sample family one-hot tag, based on the presence of a specific marker>。
Further, the step (c) comprises the steps of:
(c-1) the lightweight visual Transformer model sequentially comprises 3 layers of encoders and a multilayer perceptron MLP, each encoder sequentially comprises a first layer normalization layer LayerNorm, a multi-Head self-Attention mechanism Muti-Head Attention, a first residual connecting layer, a second layer normalization layer LayerNorm, a multilayer perceptron MLP and a second residual connecting layer, and the number of Attention heads of the multi-Head self-Attention mechanism Muti-Head Attention is,/>The internal embedding dimension of the lightweight visual transform model is ≥>,/>;
(c-2) 2D line sequenceMapping each element in the sequence to ≥ via a linear layer>Get line embedding->,/>Learnable class label tensor is ≦ using the cat function in the store>And line embedding>Splicing to obtain spliced tensor, and embedding the spliced tensor and the learnable absolute positionMake an addition to obtain a tensor->,/>;
(c-3) tensorThe tensor is obtained by normalization in a first normalization layer LayerNorm of a layer 1 encoder which is input into a vision Transformer model>The multi-headed self-Attention mechanism Muti-Head Attention of the layer 1 encoder includes ^ H>Attention head, will tensor->Respectively input into a multi-Head self-Attention mechanism Muti-Head Attention, and the fifth/fifth judgment>Individual attention head pair tensor->Respectively carrying out linear mapping to obtain query matrixesAnd key matrix->Value matrix->,,/>,/>,/>,/>、/>、A weight matrix which is a linear transformation, is->、/>、/>Are all biasedPut a vector, by means of the formula>The calculated embedded->,/>In the formula>In order to be a point of attention score,,/>will be ≧ by a cat function in the torch library>Global attention fused embed of individual attention head outputs>Splicing is carried out, and the splicing result and the tensor are greater or less>Sequentially inputting the signals into a first residual connecting layer and a second layer normalization layer LayerNorm of a layer 1 encoder, and outputting the signals to obtain tensor +>,/>Will tensor->Is input into a multi-layer perceptron MLP of a layer 1 encoder through a formula->The tensor is calculated>,/>Is a weight matrix of the neurons in the first layer of the multi-layer perceptron MLP->,/>Is a weight matrix of the neurons in the second layer of the multi-layer perceptron MLP->,/>Bias vectors for neurons in the first layer of the multi-layered perceptron MLP @>,/>Bias vectors for neurons in the first layer of the multi-layered perceptron MLP @>Will make the vector->Inputting the residual signal into a second residual connecting layer of the layer 1 encoder, and outputting the residual signal to obtain the output tensor of the layer 1 encoder>,/>Embedding dimension of the first layer of neurons in the multi-layer perceptron MLP;
(c-4) tensorThe tensor in the alternative step (c-3) is/are based>Repeating step (c-3) to obtain the output tensor ^ greater than or equal to the layer 2 encoder>;
(c-5) tensorThe tensor in the alternative step (c-4) is/are based>Repeating step (c-4) to obtain the output tensor ^ 4 of the layer 3 encoder>,/>;/>
(c-6) tensorThe vector for the 0 th position in is the learnable classification mark tensor->Is embedded vector pick>,/>Will embed the vector->Inputting the data into a multi-layer perceptron MLP of a vision Transformer model, and outputting the data to obtain tensor>,/>;
(c-7) tensorInput into the detection task and based on the formula>The prediction logits of the detection task is calculated>,/>In the formula>For the detection of the weight matrix of the first fully connected layer FC>,/>To detect the weight matrix of the second fully-connected layer FC of a task,,/>for the first offset vector of the full connection layer FC of the detection task->,/>For the second offset vector of the fully connected layer FC of the detection task->;
(c-8) tensorInputting into family classification task by formulaCalculating the predicted logit @ofthe family classification task>,/>In the formula>For the weight matrix of the first fully-connected layer FC of the family classification task,,/>a weight matrix for the second fully-connected layer FC for the family classification task, <>,/>For the bias vector of the first fully-connected layer FC of the family classification task, <>,/>For the bias vector of the second fully-connected layer FC of the family classification task, <>。
Further, the step (d) comprises the steps of:
(d-1) by the formula
The loss of predicted logits distillation is calculated>In the formula>For classifying an influencing factor for a loss ratio>For L2 loss, based on>For detecting a temperature over-parameter of the classifier distillation in question, is selected>Temperature over-parameter for family classification task classifier distillation;
(d-2) by the formulaCalculating a loss from attention distillation>,/>For the correlation matrix between the self-attention matrices in the student model @>Is based on the fifth->Line and/or combination>Is a correlation matrix between the self-attention matrices in the teacher model @>In a first or second section>Line,. Or>,Is a multi-head self-attention device for the teacher model>Query matrix spliced by individual self-attention heads,/>Is a multi-head self-attention device for the teacher model>Individual self-attention head spliced key matrix->,/>Is a multi-head self-attention device for the teacher model>Value matrix which is stitched together by individual self-attention heads>,/>Is transposed and is up and down>,/>,,/>,/>For the multi-head self-attention device of the student model>Query matrix combined by individual self-attention heads>,/>For multi-head self-attention system of student model>Individual self-attention head spliced key matrix->,/>For multi-head self-attention system of student model>Value matrix spliced by individual self-attention head,/>For transposition, in>,/>,;
(d-3) by the formulaCalculating to obtain the distillation loss of the hidden layer stateIn the formula>Hidden layer state association matrix for a student model>Is based on the fifth->Line,. Or>,Hidden layer state association matrix ^ for teacher model>Is based on the fifth->Line,. Or>;
(d-4) supervising the layer 1 encoder of the student model with the layer 4 encoder of the teacher model, supervising the layer 2 encoder of the student model with the layer 8 encoder of the teacher model, supervising the layer 3 encoder of the student model with the layer 12 encoder of the teacher model;
(d-5) by the formulaCalculating to obtain the total loss of the student model trainingIn the formula>For the weight value of the distillation loss of self attention>The weight of distillation loss in a hidden layer state is obtained;
(d-6) Total loss byAnd (4) carrying out iterative training on the lightweight visual Transformer model to obtain the lightweight visual Transformer model after distillation training.
Further, the step (e) comprises the steps of:
(e-2) RGB image to be visualizedZooming into zoomed visualized RGB image>RGB image scaled visualization using the Flatten function in the torch libraryMiddle and fifth>Row pixel value->Flattening processing is>,RGB image for visualization in 3D>Conversion into a 2D line sequence->,/>;
(e-3) sequencing the 2D rowsInputting the result into a lightweight visual Transformer model after distillation training to obtain the predicted logits->And predicted logit @ofa family classification task>If +>Then the unknown software is determined to be malware, if &>Judging the unknown software as benign software, and judging the family to which the malware belongs as being ^ or greater than or equal to when the distillation-trained lightweight vision Transformer model judges the input unknown software as the malware>The family corresponding to the highest value of the middle.
The beneficial effects of the invention are: the method only processes the executable file of the software based on static analysis, avoids the time cost of introducing disassembling, dynamic running or manual feature extraction, and is suitable for detection tasks with high timeliness. The visual Transformer is adopted to automatically extract the characteristics of the RGB image after software visualization, and the pixel value of each line of the image is used as a sequence element input by a model, so that the problem of suboptimal result based on CNN recognition caused by the lack of correlation among longitudinal pixel points of the visualized image is effectively solved. The method is characterized in that malware detection and classification are performed based on ordered multitask combination, and the malware can be classified while being detected so as to generate early warnings of different levels for malware of different families and adopt corresponding measures. Furthermore, the ordered multitasking alleviates the negative impact of the relatively difficult malware family classification task on the cost-sensitive malware detection task, relative to a single task (treating benign software as an innocuous malware family to jointly perform malware detection and classification). Three knowledge distillations are adopted to transfer the knowledge of a large-scale teacher model to a small-scale student model, and the performance gain of the student model is maximized.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is an exemplary illustration of an executable file of the present invention being visualized as an RGB image;
FIG. 3 is a schematic structural diagram of a visual Transformer according to the present invention;
FIG. 4 is a schematic diagram of the structure of the distillation of the knowledge of the present invention.
Detailed Description
The invention will be further described with reference to fig. 1 to 4.
As shown in fig. 1, a malware identification method based on visual Transformer includes the following steps:
(a) Acquiring an ImageNet-21K image dataset and an executable file dataset of application software, wherein the executable file dataset comprises an executable file of benign software and a malware executable file comprising a family tag, and visualizing all samples in the executable file dataset into RGB images to construct a malware image dataset.
(b) The method comprises the steps of building a visual Transformer model comprising an X-layer encoder, carrying out classification pre-training on the visual Transformer model by adopting an ImageNet-21K image data set, changing a full connection layer in the visual Transformer model after classification pre-training into an ordered double-task classifier for malicious software detection and family classification, and carrying out fine adjustment on the visual Transformer model by adopting a malicious software image data set.
(c) And (4) building a lightweight visual Transformer model for actual deployment.
(d) And taking the trimmed vision Transformer model as a teacher model and taking the light-weight vision Transformer model as a student model. In order to enable the performance of the lightweight model to be equivalent to that of a large-scale model and increase the feasibility of the model in the deployment of lightweight equipment, knowledge distillation is introduced to greatly improve the performance of the lightweight model. Specifically, the self-attention matrix and the hidden layer state of the teacher model and the prediction logits of the double-task classifier are used as the supervision information of the student model to carry out distillation training on the student model.
(e) And (3) judging benign software or malicious software and judging the family label of the malicious software by using the distillation-trained lightweight visual Transformer model on the unknown software.
The invention processes the executable file of the software only based on the static analysis and adopts the lightweight model to execute reasoning, thereby ensuring the detection efficiency of the model and lower hardware resource occupation. The visual Transformer is adopted to automatically extract the characteristics of the visual image of the executable file, so that the problem of no correlation between longitudinal pixels of the visual image is solved; knowledge distillation is adopted to further improve the performance of the model, and the detection and family classification precision of the model are ensured.
In one embodiment of the present invention, as shown in fig. 2, the step (a) of visualizing all samples in the executable file data set as RGB images comprises:
(a-1) reading the executable file of the application software in hexadecimal, namely a binary file, and converting the hexadecimal number into a decimal number to enable the executable file of the application software to be represented as a decimal number sequence with the value range of [0,255 ].
(a-2) a decimal value sequence length ofThe length of the sequence->Has a width of ^ 4>,,/>To round down.
(a-3) consecutive three decimal numbers in the decimal value sequence as R of a single pixel in orderObtaining the visual RGB image of the executable file by the channel value, the G channel value and the B channel value,/>,/>Is a real number space, is>For high, 3 is the number of channels of the image, and the visualized RGB images of all executables constitute the malware image dataset.
In one embodiment of the present invention, as shown in fig. 3, the step (b) comprises the steps of:
(b-1) the visual Transformer model sequentially comprises 12 layers of encoders and a multilayer perceptron MLP, the multilayer perceptron MLP of the visual Transformer model is used for classification, and each encoder sequentially comprises a first normalization layer LayerNorm, a multi-Head self-Attention mechanism Muti-Head Attention, a first residual connecting layer, a second normalization layer LayerNorm, the multilayer perceptron MLP and a second residual connecting layer.
(b-2) RGB image to be visualizedZooming to obtain zoomed visual RGB imageWherein->High,. For a zoomed visualized RGB image>For the width of the scaled visualized RGB image, the scaled visualized RGB image ≦ based on the Flatten function in the torch library>Middle and fifth>Row pixel valueThe flattening treatment is->,/>Visual RGB image of 3DConversion into a 2D line sequence>,/>。
(b-3) 2D line sequenceMapping each element in the sequence to ≥ via a linear layer>Get line embedding->,Learnable class label tensor is ≦ using the cat function in the store>Is embedded in a row>Splicing to obtain spliced tensor, and embedding the spliced tensor and the learnable absolute positionMake an addition to obtain a tensor->,/>. Learnable class label tensor>And a learnable absolute position embedding>Are prior art and are essentially learnable parameters.
(b-4) tensorThe tensor is obtained by normalization in a first normalization layer LayerNorm of a layer 1 encoder which is input into a vision Transformer model>The multi-headed self-Attention mechanism Muti-Head Attention of the layer 1 encoder includes ^ H>Individual heads of attention, each individual head of attention being on a tensor @>Operations are performed to extract features from multiple different perspectives, and after operations, stitching fusion is performed. Will tensor & lt>Respectively input into a multi-Head self-Attention mechanism Muti-Head Attention, and the fifth or the sixth gear>Individual attention head pair tensor->Respectively proceed to lineObtaining a query matrix by sexual mappingAnd key matrix->And value matrix>,/>,,/>,/>,/>、/>、/>A weight matrix which is a linear transformation, is->、/>、/>Are all bias vectors, are based on the formula>Calculating an embedding ÷ based on fusion of global attention>,/>Embedding of fused global attention>Is a matrix of values weighting the sum, the weight being the attention score. In the formula>For attention scoring, is based on>,/>,/>In order to be transposed, the device is provided with a plurality of groups of parallel connection terminals,the function is activated for Softmax, which maps the attention scores of each row in the matrix to [0, 1%]In the range, and the sum is 1. Will ≧ be by cat function in the torch library>Global attention fused embedding of individual attention head outputsSplicing is carried out, and the splicing result and the tensor are greater or less>Sequentially inputting the signals into a first residual connecting layer and a second layer normalization layer LayerNorm of a layer 1 encoder, and outputting the signals to obtain tensor +>,/>Will tensor->Input to layer 1 codingThe multi-layer perceptron MLP of the device is judged by a formula->The tensor is calculated>In the formula>For the GELU activation function, <' >>Is a weight matrix of the first layer of neurons in the multi-layer perceptron MLP,,/>is a weight matrix of the neurons in the second layer of the multi-layer perceptron MLP->,/>For the bias vector of the first layer neuron in the multi-layer perceptron MLP, <' >>,/>For the bias vector of the first layer neuron in the multi-layer perceptron MLP, <' >>Will have a tensor->Inputting the residual signal into a second residual connecting layer of the layer 1 encoder, and outputting the residual signal to obtain the output tensor of the layer 1 encoder>,/>Is the embedding dimension of the first layer of neurons in the multi-layer perceptron MLP.
(b-5) tensorThe tensor in the alternative step (b-4) is/are based>Repeating step (b-4) to obtain the output tensor ^ greater than or equal to the layer 2 encoder>。
(b-6) tensorThe tensor in the alternative step (b-5) is/are based>Repeating step (b-5) to obtain the output tensor ^ greater than or equal to the layer 3 encoder>。
(b-7) the firstThe output of each encoder is taken as the ^ h->An input of an encoder, is asserted>And (c) repeating the step (b-6) to obtain the tensor based on the 12 th layer encoder output>,/>。
(b-8) tensorThe vector of the 0 th position in is the learnable classification mark tensor @>In a combined block of embedded vectors>,/>Will embed the vector->Inputting the data into a multi-layer perceptron MLP of a vision Transformer model, and outputting the data to obtain tensor>,/>Will tensor->And inputting the classification result into the full connection layer FC to obtain the classification result output by the visual Transformer model.
(b-9) carrying out classification pre-training on the visual transform model by adopting ImageNet-21K image data set. The induction preference of the loss of the denaturation such as locality, translation and the like can be compensated to a certain extent.
In one embodiment of the present invention, in step (b-3), the 2D rows are sequencedIs input into a linear layer and is processed by the formula>Calculate line insert->In the formula>Is a weight matrix for the linear mapping layer,,/>is a bias vector>。
In an embodiment of the present invention, the step (b) of modifying the full-link layer in the visual Transformer model after classification pre-training into an ordered dual task classifier for malware detection and family classification includes:
(b-10) changing the full link FC in the step (b-8) into an ordered double-task classifier, wherein the ordered double-task classifier comprises a detection task for detecting the malicious software and a family classification task for judging the family of the malicious software, and the detection task and the family classification task are both formed by two full link FCs. In addition, because the malware family classification task is performed based on the condition that the input is malicious, the output state of the first fully-connected layer of the detection task is used as one of the inputs of the second fully-connected layer of the family classification task.
(b-11) tensorIs input into the detection task and is judged by a formula>Calculating a predicted logits { (X } for the detection task>,/>In the formula>For the first weight matrix of the fully connected layer FC of the detection task, ->,/>For the detection of the weight matrix of the second fully connected layer FC>,/>For the first offset vector of the full connection layer FC of the detection task->,/>For the second offset vector of the fully connected layer FC of the detection task->。
(b-12) tensorInputting into family classification task by formulaCalculating to obtain the predicted logit of the family classification task,/>In the formula>A weight matrix for the first full link layer FC of a family classification task>,A weight matrix for the second full link layer FC of the family classification task>,/>Bias vectors for the first fully-connected layer FC of a family classification task>,/>For the bias vector of the second fully-connected layer FC of the family classification task, <>,/>Is the number of families.
In an embodiment of the present invention, the step of fine-tuning the visual Transformer model by using the malware image dataset in the step (b) includes:
(b-13) by the formula
A loss is calculated>In the formula>For Sigmoid activation function, <' > based on>Is a binary cross entropy loss>For cross entropy loss>For detecting task tags, in conjunction with a timer>0 for benign, 1 for malicious, and>is a malicious sample family one-hot tag, based on the presence of a specific marker>。
In one embodiment of the present invention, step (c) comprises the steps of:
(c-1) the lightweight visual Transformer model sequentially comprises 3 layers of encoders and a multilayer perceptron MLP, each encoder sequentially comprises a first normalization layer LayerNorm, a multi-Head self-Attention mechanism Muti-Head Attention, a first residual connecting layer, a second normalization layer LayerNorm, a multilayer perceptron MLP and a second residual connecting layer, and the number of the Attention heads of the multi-Head self-Attention mechanism Muti-Head Attention is,/>The internal embedding dimension of the lightweight visual transform model is ≥>,/>。
(c-2) 2D line sequenceMapping each element in a sequence to ÷ via a linear layer>Dimension resulting row embedding>,/>Learnable class label tensor using cat function in a torch library>And line embedding>Splicing to obtain spliced tensor, and embedding the spliced tensor and the learnable absolute positionMake an addition to obtain a tensor->,/>。
(c-3) tensorThe tensor is obtained by normalization in a first normalization layer LayerNorm of a layer 1 encoder which is input into a vision Transformer model>The multi-headed self-Attention mechanism Muti-Head Attention of the layer 1 encoder includes ^ H>Attention head, the tensor>Respectively input into a multi-Head self-Attention mechanism Muti-Head Attention, and the fifth/fifth judgment>Individual attention head pair tensor->Respectively carrying out linear mapping to obtain query matrixesKey matrix>Value matrix->,,/>,/>,/>,/>、/>、A weight matrix which is a linear transformation, is->、/>、/>Are all bias vectors, are based on the formula>The calculated embedded->,/>In the formula>In order to be a point of attention score,,/>will be ≧ by a cat function in the torch library>Global attention fused embed of individual attention head outputs>Splicing, and combining the splicing result with tensor>Sequentially inputting the signals into a first residual connecting layer and a second layer normalization layer LayerNorm of a layer 1 encoder, and outputting the signals to obtain tensor +>,/>Will tensor->Is input into a multi-layer perceptron MLP of a layer 1 encoder through a formula->The tensor is calculated>,/>Is a weight matrix of the neurons in the first layer of the multi-layer perceptron MLP->,/>Is a weight matrix of the neurons in the second layer of the multi-layer perceptron MLP->,/>Bias vectors for neurons in the first layer of the multi-layered perceptron MLP @>,/>Bias vectors for neurons in the first layer of the multi-layered perceptron MLP @>Will make the vector->Inputting the residual signal into a second residual connecting layer of the layer 1 encoder, and outputting the residual signal to obtain the output tensor of the layer 1 encoder>,/>Is the embedding dimension of the first layer of neurons in the multi-layer perceptron MLP.
(c-4) tensorTensor in alternative step (c-3)>Repeating the step (c-3) to obtain an output tensor { } for the layer 2 encoder>。
(c-5) tensorThe tensor in the alternative step (c-4) is/are based>Repeating step (c-4) to obtain the output tensor ^ 4 of the layer 3 encoder>,/>。
(c-6) tensorThe vector for the 0 th position in is the learnable classification mark tensor->Is embedded vector pick>,/>Will embed the vector->Inputting the data into a multi-layer perceptron MLP of a vision Transformer model, and outputting the data to obtain tensor>,/>。
(c-7) tensorIs input into the detection task and is judged by a formula>Calculating a predicted logits { (X } for the detection task>,/>In the formula>For the detection of the weight matrix of the first fully connected layer FC>,/>To detect the weight matrix of the second fully connected layer FC of a task,,/>for the first offset vector of the full connection layer FC of the detection task->,/>A bias vector for the second fully-connected layer FC for detection tasks>。
(c-8) tensorInputting into family classification task by formulaCalculating a predicted logit @fora family classification task>,/>In the formula>The weight matrix for the first fully-connected layer FC of the family classification task,,/>a weight matrix for the second full link layer FC of the family classification task>,/>For the bias vector of the first fully-connected layer FC of the family classification task, <>,/>A bias vector for the second full link layer FC of a family classification task>。
The teacher model is used to supervise training of the student models such that the student models mimic the teacher model representation to achieve performance comparable to the teacher model. In order to make the representation capability of the student model approach to the teacher model as much as possible, three distillation methods are adopted: predicted logits distillation, self-attentive distillation and cryptic state distillation. The predicted logits distillation is to adopt the predicted logits of two classification layers of the teacher model to supervise and train the student model. Thus, in one embodiment of the present invention, as shown in FIG. 4, step (d) comprises the steps of:
(d-1) by the formulaThe loss of predicted logits distillation is calculated>In the formula>For classifying an influencing factor for a loss ratio>In order to obtain a loss of L2,for detecting a temperature over-parameter of the classifier distillation in question, is selected>Temperature over-parameter for family classification task classifier distillation.
(d-2) number of heads due to multi-head self-attention in teacher model encoderAnd an embedding dimension->Inconsistent with student model encoders, therefore distillation is performed taking the correlation between the head of attention, in particular, by formulaCalculating a loss from attention distillation>,/>For the correlation matrix between the self-attention matrices in the student model @>Is based on the fifth->Line and/or combination>For the correlation matrix between the self-attention matrices in the teacher model &>Is based on the fifth->Line,. Or>,/>In multi-head self-attention device for teacher model>Query matrix for individual self-attention head stitching>,/>Is a multi-head self-attention device for the teacher model>Individual self-attention head spliced key matrix->,/>In multi-head self-attention device for teacher model>Value matrix which is stitched together by individual self-attention heads>,For transposition, in>,/>,/>,/>,For multi-head self-attention system of student model>Query matrix spliced by individual self-attention head,/>For multi-head self-attention system of student model>Key matrix combined with individual self-attention head>,/>For multi-head self-attention system of student model>Value matrix for individual self-attention head stitching>,/>For transposition, in>,,/>。
(d-3) by the formulaCalculating to obtain the distillation loss of the hidden layer stateIn the formula>Hidden layer state association matrix for a student model>In a first or second section>Line,. Or>,Hidden layer state association matrix ^ for teacher model>Is based on the fifth->Line,. Or>。
(d-4) because the teacher model and the student models have different numbers of encoders, the self-attention distillation and the hidden-layer state distillation cannot correspond to each other one by one, so that the 4 th layer of encoder of the teacher model supervises the 1 st layer of encoder of the student models, the 8 th layer of encoder of the teacher model supervises the 2 nd layer of encoder of the student models, and the 12 th layer of encoder of the teacher model supervises the 3 rd layer of encoder of the student models;
(d-5) by the formulaCalculating to obtain the total of the student model trainingLoss of powerIn the formula>Is a weight lost from attention distillation>The weight of distillation loss in a hidden layer state is obtained;
(d-6) Total loss byAnd (4) carrying out iterative training on the lightweight visual Transformer model to obtain the lightweight visual Transformer model after distillation training.
In one embodiment of the present invention, step (e) comprises the steps of:
(e-2) RGB image to be visualizedZooming into zoomed visualized RGB image>RGB image scaled for visualization using the Flatten function in the torch libraryIn a fifth or fifth sun>Row pixel value->Flattening processing is>,RGB image for visualization in 3D>Conversion to 2D line sequences,/>。
(e-3) sequencing the 2D linesInputting the result into a lightweight visual Transformer model after distillation training to obtain the predicted logits->And predicted logit @offamily classification task>If->Then the unknown software is determined to be malware and if ≥ is present>Judging the unknown software as benign software, and judging the family to which the malware belongs as being ^ or greater than or equal to when the distillation-trained lightweight vision Transformer model judges the input unknown software as the malware>The family corresponding to the highest value in (c).
The improvements of this patent over the prior art are illustrated by the following table:
data set: babuk, blackMatter, cerber, chaos, conti, darkSide, gandCrab, globeimpser, lockBit, locky, magniber, makop, medusa Locker, nemty, phobos, sodinkibe, teslaCrypt, thanos, 18 malicious families of Lesoxol software and BlackMoon, gafgyt botnet were crawled from the malware sharing platform, 20 malicious families and 11841 malicious samples were counted. Furthermore, 9833 benign executables were collected as benign categories in the Windows10 system. The experiment divided 80% of each class of samples into training sets and the remaining 20% of samples into test sets to evaluate model performance. Due to the fact that the malicious family samples are different in size and have certain data imbalance factors, the Macro-F1 value is added to the evaluation index besides the accuracy.
Table one comparison result of the performance of the lightweight visual Transformer and the classic CNN network.
Table two comparison of distillation process performance.
Table three ordered dual tasks compare with single task performance.
The performance of the lightweight visual Transformer is superior to that of a classic CNN network, and the Macro-F1 value and accuracy are improved by 1.71% and 1.42% at least. It can be seen from table two that compared with the non-distillation method, the three distillation methods can all bring certain performance gain to the student model, but the three methods jointly carry out distillation to bring the maximum improvement. The performance of the student model after combined distillation is extremely close to that of the teacher model, and the difference between the Macro-F1 value and the accuracy rate is only 0.30 percent and 0.38 percent. It can be seen from table three that the ordered double tasks have certain advantages compared with the single task of jointly detecting and classifying the malicious software by regarding benign software as a harmless malicious software family, and the ordered double tasks are higher than the single tasks by 0.71% and 0.38% in Macro-F1 value and accuracy.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described above, or equivalents may be substituted for elements thereof. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A malware identification method based on visual transform is characterized by comprising the following steps:
(a) Acquiring an ImageNet-21K image dataset and an executable file dataset of application software, wherein the executable file dataset comprises an executable file of benign software and a malicious software executable file comprising a family tag, and all samples in the executable file dataset can be visualized as RGB images to construct a malicious software image dataset;
(b) Building a visual Transformer model containing an X-layer encoder, carrying out classification pre-training on the visual Transformer model by adopting an ImageNet-21K image data set, changing a full connection layer in the visual Transformer model after the classification pre-training into an ordered double-task classifier for carrying out malicious software detection and family classification, and carrying out fine tuning on the visual Transformer model by adopting a malicious software image data set;
(c) Constructing a lightweight visual Transformer model for actual deployment;
(d) Taking the trimmed vision Transformer model as a teacher model, taking the lightweight vision Transformer model as a student model, and performing distillation training on the student model by taking a self-attention matrix and a hidden layer state of the teacher model and a predicted logits of the double-task classifier as supervision information of the student model;
(e) And (3) carrying out discrimination of benign software or malicious software and judgment of a family label of the malicious software on unknown software by using the distillation-trained lightweight vision Transformer model.
2. The visual Transformer-based malware recognition method of claim 1, wherein the step of visualizing all samples in the executable file dataset as RGB images in step (a) is:
(a-1) reading the executable file of the application software in hexadecimal, and converting the hexadecimal number into a decimal number to represent the executable file of the application software as a decimal number sequence with a value range of [0,255 ];
(a-2) a decimal value sequence length ofLength of the sequence>Has a width of ^ 4>,/>,To round down;
(a-3) sequentially using three adjacent decimal numbers in the decimal value sequence as the R channel value, the G channel value and the B channel value of a single pixel to obtain a visual RGB image of the executable file,/>,/>In real space, are>Is the height of the image, 3 is the number of channels of the image, allThe visualized RGB image of the executable file of (a) constitutes a malware image dataset.
3. The visual Transformer-based malware identification method of claim 2, wherein step (b) comprises the steps of:
(b-1) the visual Transformer model sequentially comprises 12 layers of encoders and a multilayer perceptron MLP, and each encoder sequentially comprises a first normalization layer LayerNorm, a multi-Head self-Attention mechanism Muti-Head Attention, a first residual connecting layer, a second normalization layer LayerNorm, a multilayer perceptron MLP and a second residual connecting layer;
(b-2) RGB image to be visualizedZooming to obtain zoomed visual RGB imageIn which>High,. For a zoomed visualized RGB image>For the width of the scaled visualized RGB image, the scaled visualized RGB image ≦ based on the Flatten function in the torch library>In a fifth or fifth sun>Row pixel valueFlattening processing is>,/>Visual RGB image of 3DConversion into a 2D line sequence>,/>;
(b-3) 2D line sequenceMapping each element in the sequence to ≥ via a linear layer>Dimension resulting row embedding>,/>Learnable class label tensor is ≦ using the cat function in the store>Is embedded in a row>Splicing to obtain spliced tensor, and embedding the spliced tensor and the learnable absolute positionAdd to make a tensor>,/>;
(b-4) tensorInputting the data into a first normalization layer LayerNorm of a layer 1 encoder of a visual Transformer model for normalization to obtain tensor->The multi-headed self-Attention mechanism Muti-Head Attention of the layer 1 encoder includes >>Attention head, the tensor>Respectively input into a multi-Head self-Attention mechanism Muti-Head Attention, and the fifth or the sixth gear>Individual attention head pair tensor->Respectively carrying out linear mapping to obtain query matrixesKey matrix>Value matrix->,/>,,/>,/>,/>、/>、/>Weight matrix, which is a linear transformation, in each case>、/>、/>Are all bias vectors, by formula>The calculated embedded->,/>In the formula>For attention scoring, is based on>,/>,/>Is transposed and is up and down>For the Softmax activation function, will be ≧ by a cat function in the torch library>Global attention fused embed of individual attention head outputs>Splicing is carried out, and the splicing result and the tensor are greater or less>Sequentially inputting the data into a first residual connecting layer and a second normalization layer LayerNorm of a layer 1 encoder, and outputting to obtain a tensor,/>Will tensor->Input to the multi-layer perceptron MLP of the layer 1 encoder by the formulaThe tensor is calculated>In the formula>For a GELU activation function, <' > based on>Is a weight matrix of the neurons in the first layer of the multi-layer perceptron MLP->,/>For a weight matrix for a second layer of neurons in a multi-layer perceptron MLP @>,/>For bias vectors in first layer neurons in a multi-layer perceptron MLP @>,/>For the bias vector of the first layer neuron in the multi-layer perceptron MLP, <' >>Will have a tensor->Inputting the residual signal into a second residual connecting layer of the layer 1 encoder, and outputting the residual signal to obtain the output tensor of the layer 1 encoder>,/>Embedding dimension of the first layer of neurons in the multi-layer perceptron MLP;
(b-5) tensorThe tensor in the alternative step (b-4) is/are based>And (c) repeatedly executing the step (b-4) to obtain a layer 2 braidThe output tensor of the encoder>;
(b-6) tensorThe tensor in the alternative step (b-5) is/are based>Repeating step (b-5) for output tensor/based on layer 3 encoder>;
(b-7) the firstThe output of each encoder is taken as the ^ h->An input of an encoder, based on the number of the encoder units>Repeating the step (b-6) to obtain the tensor/device output by the encoder of the 12 th layer>,/>;/>
(b-8) tensorThe vector for the 0 th position in is the learnable classification mark tensor->Embedded vector of,/>Will embed the vector->Inputting the data into a multi-layer perceptron MLP of a vision Transformer model, and outputting the data to obtain tensor>,/>Will tensor->Inputting the classification result into a full connection layer FC to obtain a classification result output by a visual Transformer model;
(b-9) carrying out classification pre-training on the visual transform model by adopting ImageNet-21K image data set.
5. The visual Transformer-based malware identification method of claim 4, wherein the step of modifying the full link layer in the visual Transformer model after classification pre-training into an ordered double task classifier for malware detection and family classification in the step (b) comprises the steps of:
(b-10) changing the full link FC in the step (b-8) into an ordered double-task classifier, wherein the ordered double-task classifier comprises a detection task for detecting the malicious software and a family classification task for judging the family of the malicious software, and the detection task and the family classification task are both composed of two full link FCs;
(b-11) tensorIs input into the detection task and is judged by a formula>The prediction logits of the detection task is calculated>,/>In the formula>For the detection of the weight matrix of the first fully connected layer FC>,/>For the detection task, the weight matrix of the second full link layer FC>,/>For the first offset vector of the full connection layer FC of the detection task->,/>For the second offset vector of the fully connected layer FC of the detection task->;
(b-12) tensorInputting into family classification task by formulaCalculating to obtain the predicted logit of the family classification task,/>In the formula>For the weight matrix of the first fully-connected layer FC of the family classification task, <>,A weight matrix for the second fully-connected layer FC for the family classification task, <>,/>For the bias vector of the first fully-connected layer FC of the family classification task, <>,/>For the bias vector of the second fully-connected layer FC of the family classification task, <>,/>Is the number of families.
6. The visual Transformer-based malware identification method of claim 5, wherein the step of fine-tuning the visual Transformer model by using the malware image dataset in the step (b) comprises the steps of:
(b-13) by the formulaCalculated loss>In the formula>For Sigmoid activation function, <' >>Is a binary cross entropy loss>For a cross entropy loss, is>For detecting a task tag, is asserted>0 means benign, 1 means malicious and/or->Is a malicious sample family one-hot tag, based on the presence of a specific marker>。
7. The visual Transformer-based malware identification method of claim 5, wherein step (c) comprises the steps of:
(c-1) the lightweight visual Transformer model sequentially comprises 3 layers of encoders and a multilayer perceptron MLP, each encoder sequentially comprises a first normalization layer LayerNorm, a multi-Head self-Attention mechanism Muti-Head Attention, a first residual connecting layer, a second normalization layer LayerNorm, a multilayer perceptron MLP and a second residual connecting layer, and the number of the Attention heads of the multi-Head self-Attention mechanism Muti-Head Attention is,/>The internal embedding dimension of the lightweight visual transform model is ≥>,/>;
(c-2) 2D line sequenceMapping each element in the sequence to ≥ via a linear layer>Dimension resulting row embedding>,Learnable class label tensor using cat function in a torch library>And line embedding>Splicing to obtain spliced tensor, and embedding the spliced tensor and the learnable absolute positionAdd to make a tensor>,/>;
(c-3) tensorThe tensor is obtained by normalization in a first normalization layer LayerNorm of a layer 1 encoder which is input into a vision Transformer model>The multi-headed self-Attention mechanism Muti-Head Attention of the layer 1 encoder includes >>Attention head, the tensor>Respectively input into a multi-Head self-Attention mechanism Muti-Head Attention, and the fifth or the sixth gear>Individual attention head pair tensor->Respectively carrying out linear mapping to obtain query matrixesAnd key matrix->Value matrix->,,/>,/>,/>, />、/>、A weight matrix which is a linear transformation, is->、/>、/>Are all bias vectors, are based on the formula>Calculating an embedding ÷ based on fusion of global attention>,/>In the formula>In order to be a fraction of attention,,/>will be ≧ by a cat function in the torch library>Global attention fused embedding of individual attention head outputs>Splicing, and combining the splicing result with tensor>Sequentially inputting the signals into a first residual connecting layer and a second layer normalization layer LayerNorm of a layer 1 encoder, and outputting the signals to obtain tensor +>,/>Will tensor->Is input into a multi-layer perceptron MLP of a layer 1 encoder through a formula->The tensor is calculated>,/>Is a weight matrix of the neurons in the first layer of the multi-layer perceptron MLP->,/>For a weight matrix for a second layer of neurons in a multi-layer perceptron MLP @>,/>Bias vectors for neurons in the first layer of the multi-layered perceptron MLP @>,/>Bias vectors for neurons in the first layer of the multi-layered perceptron MLP @>Combining the vector>Inputting the residual signal into a second residual connecting layer of the layer 1 encoder, and outputting the residual signal to obtain the output tensor of the layer 1 encoder>,/>Embedding dimension of the first layer of neurons in the multi-layer perceptron MLP;
(c-4) tensorTensor in alternative step (c-3)>Repeating step (c-3) to obtain the output tensor ^ greater than or equal to the layer 2 encoder>;
(c-5) tensorThe tensor in the alternative step (c-4) is/are based>Repeating step (c-4) to obtain the output tensor ^ 4 of the layer 3 encoder>,/>;
(c-6) tensorThe vector of the 0 th position in is the learnable classification mark tensor @>Embedded vector of,/>Will embed the vector->Inputting the data into a multi-layer perceptron MLP of a vision Transformer model, and outputting the data to obtain tensor>,/>;
(c-7) tensorIs input into the detection task and is judged by a formula>The prediction logits of the detection task is calculated>,/>In the formula>For the first weight matrix of the fully connected layer FC of the detection task, ->,/>To detect the weight matrix of the second fully connected layer FC of a task,,/>bias vectors for the first fully-connected layer FC for detection tasks>,/>For the second offset vector of the fully connected layer FC of the detection task->;
(c-8) tensorInput into family classification task by formulaCalculating the predicted logit @ofthe family classification task>,/>In the formula>For the weight matrix of the first fully-connected layer FC of the family classification task,,/>a weight matrix for the second fully-connected layer FC for the family classification task, <>,/>For the bias vector of the first fully-connected layer FC of the family classification task, <>,/>For the bias vector of the second fully-connected layer FC of the family classification task, <>。
8. The visual Transformer-based malware identification method of claim 7, wherein step (d) comprises the steps of:
(d-1) by the formulaThe loss of predicted logits distillation is calculated>In the formula>For classifying an influencing factor for a loss ratio>Is L2 lost, is>For detecting a temperature override of a task classifier distillation>Temperature over-parameter for family classification task classifier distillation; />
(d-2) by the formulaCalculating a loss from attention distillation>,/>For the correlation matrix between the self-attention matrices in the student model @>In a first or second section>Line,. Or>For the correlation matrix between the self-attention matrices in the teacher model &>Is based on the fifth->Line and/or combination>,/>Is a multi-head self-attention device for the teacher model>Query matrix spliced by individual self-attention heads,/>Is a multi-head self-attention device for the teacher model>Individual self-attention head spliced key matrix->,/>In multi-head self-attention device for teacher model>Value matrix which is stitched together by individual self-attention heads>,/>Is transposed and is up and down> ,/>,,/>,/>For multi-head self-attention system of student model>Query matrix combined by individual self-attention heads>,/>For multi-head self-attention system of student model>Key matrix combined with individual self-attention head>,/>For the multi-head self-attention device of the student model>Value matrix spliced by individual self-attention head,/>For transposition, in> ,/>,;
(d-3) by the formulaCalculating loss for distilling in hidden state>In the formula>Hidden layer state association matrix for a student model>Is based on the fifth->Line and/or combination>,Hidden layer status association matrix for teacher model>In a first or second section>Line and/or combination>;
(d-4) supervising the layer 1 encoder of the student model with the layer 4 encoder of the teacher model, supervising the layer 2 encoder of the student model with the layer 8 encoder of the teacher model, supervising the layer 3 encoder of the student model with the layer 12 encoder of the teacher model;
(d-5) by the formulaCalculating the total loss of the student model training>In the formula>Is a weight lost from attention distillation>The weight of distillation loss in a hidden layer state is obtained; />
9. The visual Transformer-based malware identification method of claim 3, wherein step (e) comprises the steps of:
(e-2) RGB image to be visualizedZooming to obtain zoomed visual RGB imageRGB image scaled for visualization using the Flatten function in the torch libraryIn a fifth or fifth sun>Line pixel values>The flattening treatment is->,Visualized RGB image ^ of 3D>Conversion to 2D line sequences,/>;
(e-3) sequencing the 2D linesInputting the result into a lightweight visual Transformer model after distillation training to obtain the predicted logits->And predicted logit @ofa family classification task>If->Then the unknown software is determined to be malware, if &>Judging the unknown software as benign software, and judging the family to which the malware belongs as being ^ or greater than or equal to when the distillation-trained lightweight vision Transformer model judges the input unknown software as the malware>The family corresponding to the highest value in (c). />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063452.3A CN115879109B (en) | 2023-02-06 | 2023-02-06 | Malicious software identification method based on visual transducer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063452.3A CN115879109B (en) | 2023-02-06 | 2023-02-06 | Malicious software identification method based on visual transducer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115879109A true CN115879109A (en) | 2023-03-31 |
CN115879109B CN115879109B (en) | 2023-05-12 |
Family
ID=85758746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310063452.3A Active CN115879109B (en) | 2023-02-06 | 2023-02-06 | Malicious software identification method based on visual transducer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115879109B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160358312A1 (en) * | 2015-06-05 | 2016-12-08 | Mindaptiv LLC | Digital quaternion logarithm signal processing system and method for images and other data types |
US20180007074A1 (en) * | 2015-01-14 | 2018-01-04 | Virta Laboratories, Inc. | Anomaly and malware detection using side channel analysis |
CN110633570A (en) * | 2019-07-24 | 2019-12-31 | 浙江工业大学 | Black box attack defense method for malicious software assembly format detection model |
CN114065199A (en) * | 2021-11-18 | 2022-02-18 | 山东省计算中心(国家超级计算济南中心) | Cross-platform malicious code detection method and system |
CN114462039A (en) * | 2022-01-27 | 2022-05-10 | 北京工业大学 | Android malicious software detection method based on Transformer structure |
CN114676769A (en) * | 2022-03-22 | 2022-06-28 | 南通大学 | Visual transform-based small sample insect image identification method |
CN114694220A (en) * | 2022-03-25 | 2022-07-01 | 上海大学 | Double-flow face counterfeiting detection method based on Swin transform |
CN114818826A (en) * | 2022-05-19 | 2022-07-29 | 石家庄铁道大学 | Fault diagnosis method based on lightweight Vision Transformer module |
CN114913162A (en) * | 2022-05-25 | 2022-08-16 | 广西大学 | Bridge concrete crack detection method and device based on lightweight transform |
CN114937016A (en) * | 2022-05-25 | 2022-08-23 | 广西大学 | Bridge concrete crack real-time detection method and device based on edge calculation and Transformer |
CN115563327A (en) * | 2022-08-30 | 2023-01-03 | 电子科技大学 | Zero sample cross-modal retrieval method based on Transformer network selective distillation |
-
2023
- 2023-02-06 CN CN202310063452.3A patent/CN115879109B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180007074A1 (en) * | 2015-01-14 | 2018-01-04 | Virta Laboratories, Inc. | Anomaly and malware detection using side channel analysis |
US20160358312A1 (en) * | 2015-06-05 | 2016-12-08 | Mindaptiv LLC | Digital quaternion logarithm signal processing system and method for images and other data types |
CN110633570A (en) * | 2019-07-24 | 2019-12-31 | 浙江工业大学 | Black box attack defense method for malicious software assembly format detection model |
CN114065199A (en) * | 2021-11-18 | 2022-02-18 | 山东省计算中心(国家超级计算济南中心) | Cross-platform malicious code detection method and system |
CN114462039A (en) * | 2022-01-27 | 2022-05-10 | 北京工业大学 | Android malicious software detection method based on Transformer structure |
CN114676769A (en) * | 2022-03-22 | 2022-06-28 | 南通大学 | Visual transform-based small sample insect image identification method |
CN114694220A (en) * | 2022-03-25 | 2022-07-01 | 上海大学 | Double-flow face counterfeiting detection method based on Swin transform |
CN114818826A (en) * | 2022-05-19 | 2022-07-29 | 石家庄铁道大学 | Fault diagnosis method based on lightweight Vision Transformer module |
CN114913162A (en) * | 2022-05-25 | 2022-08-16 | 广西大学 | Bridge concrete crack detection method and device based on lightweight transform |
CN114937016A (en) * | 2022-05-25 | 2022-08-23 | 广西大学 | Bridge concrete crack real-time detection method and device based on edge calculation and Transformer |
CN115563327A (en) * | 2022-08-30 | 2023-01-03 | 电子科技大学 | Zero sample cross-modal retrieval method based on Transformer network selective distillation |
Non-Patent Citations (3)
Title |
---|
SHI CHEN 等: "Malicious Code Family Classification Method Based on Vision Transformer", 《2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN)》 * |
徐至峰: "基于深度学习的Windows系统恶意软件检测研究", 《万方学位论文》 * |
王志文 等: "基于机器学习的恶意软件识别研究综述", 《小型微型计算机系统》 * |
Also Published As
Publication number | Publication date |
---|---|
CN115879109B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738315B (en) | Image classification method based on countermeasure fusion multi-source transfer learning | |
Zhong et al. | An end-to-end dense-inceptionnet for image copy-move forgery detection | |
CN108537742B (en) | Remote sensing image panchromatic sharpening method based on generation countermeasure network | |
CN107392019A (en) | A kind of training of malicious code family and detection method and device | |
CN111274869B (en) | Method for classifying hyperspectral images based on parallel attention mechanism residual error network | |
WO2020046213A1 (en) | A method and apparatus for training a neural network to identify cracks | |
CN109063649B (en) | Pedestrian re-identification method based on twin pedestrian alignment residual error network | |
CN112131967A (en) | Remote sensing scene classification method based on multi-classifier anti-transfer learning | |
CN108090447A (en) | Hyperspectral image classification method and device under double branch's deep structures | |
CN108021947A (en) | A kind of layering extreme learning machine target identification method of view-based access control model | |
CN115690479A (en) | Remote sensing image classification method and system based on convolution Transformer | |
CN115830531A (en) | Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion | |
CN111626357B (en) | Image identification method based on neural network model | |
CN115631365A (en) | Cross-modal contrast zero sample learning method fusing knowledge graph | |
CN117011883A (en) | Pedestrian re-recognition method based on pyramid convolution and transducer double branches | |
CN116310647A (en) | Labor insurance object target detection method and system based on incremental learning | |
Khan et al. | A hybrid defense method against adversarial attacks on traffic sign classifiers in autonomous vehicles | |
CN113792686A (en) | Vehicle weight identification method based on cross-sensor invariance of visual representation | |
Chulif et al. | Herbarium-Field Triplet Network for Cross-domain Plant Identification. NEUON Submission to LifeCLEF 2020 Plant. | |
CN116977725A (en) | Abnormal behavior identification method and device based on improved convolutional neural network | |
Chen et al. | Feature descriptor by convolution and pooling autoencoders | |
CN116994130A (en) | Knowledge distillation-based high-precision light bridge crack identification method | |
CN115879109A (en) | Malicious software identification method based on visual transform | |
CN114821200B (en) | Image detection model and method applied to industrial vision detection field | |
CN115861306A (en) | Industrial product abnormity detection method based on self-supervision jigsaw module |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |