CN117332419A - Malicious code classification method and device based on pre-training - Google Patents
Malicious code classification method and device based on pre-training Download PDFInfo
- Publication number
- CN117332419A CN117332419A CN202311610887.1A CN202311610887A CN117332419A CN 117332419 A CN117332419 A CN 117332419A CN 202311610887 A CN202311610887 A CN 202311610887A CN 117332419 A CN117332419 A CN 117332419A
- Authority
- CN
- China
- Prior art keywords
- training
- layer
- vector
- model
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 143
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000008569 process Effects 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 129
- 230000006870 function Effects 0.000 claims description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 230000008014 freezing Effects 0.000 claims description 4
- 238000007710 freezing Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 6
- 238000004458 analytical method Methods 0.000 description 11
- 230000003068 static effect Effects 0.000 description 8
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
- G06F18/15—Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Probability & Statistics with Applications (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Virology (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a pre-training-based malicious code classification method and device, wherein the method firstly extracts shallow features in malicious codes and operation codes in subtertine to construct a feature set; then, an improved pre-training model is built, a pre-training task is carried out, and a final model is obtained through training; and finally, inputting the code to be tested into a final model to obtain category probability distribution, and selecting the category with the highest probability as a final prediction result. The invention firstly extracts the operation code sequence of the subtitine in the malicious code, and then extracts the shallow features of TF-IDF and Asm2Vec. The subtutine is used for pre-training the input samples of the pre-training model, so that the generalization capability of the model can be improved, and the training speed and effect of the model can be improved. The shallow layer features are used as the pre-fix, so that the parameter scale required to be trained in the model training process can be reduced, the universality of the pre-training model can be improved, and the performance equivalent to the pre-training-fine tuning paradigm can be realized.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for classifying malicious codes based on pre-training.
Background
Malicious code refers to programs or scripts that are specifically used to infringe on computer systems, networks, and data security. To effectively combat the threat of malicious code, researchers have developed a variety of classification techniques to identify and classify different types of malicious code.
Among them, static analysis and dynamic analysis are two basic classification techniques. Static analysis is the detection and identification of malicious behavior in executable files of malicious code by static scanning and analysis of the files. The main techniques of static analysis include disassembly, decompilation, code analysis, and the like. The disassembly technique can translate machine code in an executable file into human-readable assembly code, thereby making it easier to analyze its execution logic and determine if there is malicious activity. Unlike static analysis, dynamic analysis is to determine whether malicious code is malicious code by running malicious code in a virtual environment to obtain its runtime behavior and characteristics.
In addition to static and dynamic analysis, there are other classification techniques, such as machine learning based classification techniques. This technique automatically identifies and classifies new malicious code by collecting and analyzing a large number of malicious code samples and features, training a machine learning model. Common machine learning algorithms include support vector machines, decision trees, random forests, neural networks, and the like. These algorithms can automatically learn and extract features of malicious code to classify and identify new malicious code. In addition, there are some hybrid classification techniques, such as combining static and dynamic analysis, or combining machine learning and static analysis, etc.
However, in the existing classification technology based on machine learning, the model is difficult to handle large-scale malicious samples, and the performance of the model is poor.
Disclosure of Invention
The invention provides a pre-training-based malicious code classification method and device, which are used for solving or at least partially solving the technical problems that a model is difficult to process large-scale malicious samples and the performance of the model is poor in the prior art.
In order to solve the technical problem, a first aspect of the present invention provides a malicious code classification method based on pre-training, including:
extracting shallow features and operation codes in subroutines from malicious codes contained in a preset data set, and constructing a feature set according to the shallow features and the operation codes, wherein the shallow features comprise TF-IDF features and Asm2Vec features, the TF-IDF features comprise readable character string sequence features, and the Asm2Vec features are semantic information features logically related to code execution in an assembly file;
pre-training an improved pre-training model by taking a malicious code contained in a preset data set and a constructed feature set as a training data set to obtain a final model, wherein the improved pre-training model comprises a prefix tuning structure, an embedding layer, a one-dimensional convolutional neural network, a position encoder and a transducer encoding layer, the prefix tuning structure is used for segmenting an output vector into a plurality of sections of input transducer encoding layers, the embedding layer is used for embedding three-dimensional tensors of operation features in a subprogram to obtain four-dimensional tensors, the one-dimensional convolutional neural network is used for obtaining a new feature sequence vector according to the four-dimensional tensors obtained by the embedding layer, the position encoder is used for encoding position information of each word or mark in an input sequence into the new feature sequence vector, the transducer encoding layer consists of a plurality of stacked encoders, each encoder comprises a multi-head attention layer and a feedforward network layer, the multi-head attention layer is used for performing attention calculation according to the input vector to obtain an output vector, and the feedforward network layer is used for obtaining the encoded vector according to the input vector and the output vector;
and inputting the code to be tested into a final model to obtain category probability distribution, and selecting the category with the highest probability according to the category probability distribution as a final prediction result.
In one embodiment, the prefix tuning structure is a prefix_tuning model structure, and is formed by a multi-layer fully-connected neural network, input layer nodes are set to be the sequence length of prefix, length conversion is performed through a hidden layer, the length of output layer nodes is set to be the node length suitable for deepfreex processing, and the deepfreex processing is to divide vectors output by the output layer into a plurality of multi-head self-attention layers used in a transform coding layer of an input pre-training model, and splice the multi-head self-attention layers with key vectors K and value vectors V used in the multi-head attention layers.
In one embodiment, pre-training the improved pre-training model with malicious code contained in a preset data set and the constructed feature set as training data sets to obtain a final model, including:
inputting the operation codes in the training data set into the improved pre-training model for training;
freezing parameters of the pre-training model, and inputting the extracted shallow features as prefixes into a prefix tuning structure for training.
In one embodiment, the embedding layer is specifically configured to find an embedding vector for each element in a given input from an embedding matrix having a size v×d, where V represents the number of rows of the embedding matrix and D represents the dimension of each embedding vector; the input of the embedding layer is a three-dimensional tensor of the operating characteristics in the subroutine, which is a tensor of the shape (B, S, L), where B is the number of samples processed in the batch, S is the number of subroutines contained in the batch of samples, L is the number of opcodes contained in the subroutine, and the three-dimensional tensor contains an index of each word to be retrieved from the embedding matrix; the output of the embedded layer is in the shape ofContains an embedded vector for each input word.
In one embodiment, the calculation formula of the position encoder is:
where pos denotes the position of the word or token in the sequence, m denotes the dimension of the output vector of the position encoder, d denotes the dimension of the input vector, the above formula denotes that the position encoder encodes each position as a vector, each element of the vector is a sine or cosine function, and the coefficients in the function are different depending on the position and dimension.
In one embodiment, the transform coding layer has the following formula:
wherein,representing the input vector +.>Representing the normalization layer, FFN is the feed forward network layer, and the above formula represents that each encoder processes the input vector through the multi-headed self-attention layer first>Obtaining an output vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Then +_input vector>And output vector->Adding and normalizing; and finally taking the normalized vector as the input of a feedforward network layer, and calculating again through a stacked encoder until the final output of a transducer coding layer is obtained.
In one embodiment, the pre-training task in the pre-training process adopts an MLM task, wherein the MLM task refers to randomly masking some words or marks in an input sequence, and then enabling a model to predict the masked words or marks;
the training target of the model is that the relation between the operation code representations is learned through the MLM task, and the calculation formula of the pre-training task is as follows:
where n represents the number of training samples, l represents the number of covered words or tokens in each sample,indicate->Sample No.)>Actual value of individual covered words or marks, < >>Representing the probability that the model predicts the word or token, < +.>Is the loss function of the MLM task.
Based on the same inventive concept, a second aspect of the present invention provides a malicious code classification device based on pre-training, comprising:
the feature extraction module is used for extracting shallow features and operation codes in subroutines from malicious codes contained in a preset data set, and constructing a feature set according to the shallow features and the operation codes, wherein the shallow features comprise TF-IDF features and Asm2Vec features, the TF-IDF features comprise readable character string sequence features, and the Asm2Vec features are semantic information features related to code execution logic in an assembly file;
the device comprises a pre-training module, a pre-training module and a position encoder, wherein the pre-training module is used for pre-training an improved pre-training model by taking malicious codes contained in a preset data set and a constructed feature set as training data sets to obtain a final model, the improved pre-training model comprises a prefix tuning structure, an embedding layer, a one-dimensional convolutional neural network, the position encoder and a transducer encoding layer, the prefix tuning structure is used for segmenting an output vector into a plurality of sections of input transducer encoding layers, the embedding layer is used for embedding three-dimensional tensors of operation features in a subprogram to obtain four-dimensional tensors, the one-dimensional convolutional neural network is used for obtaining a new feature sequence vector according to the four-dimensional tensors obtained by the embedding layer, the position encoder is used for encoding position information of each word or mark in an input sequence into the new feature sequence vector, the transducer encoding layer comprises a plurality of stacked encoders, each encoder comprises a multi-head attention layer and a feedforward network layer, and the feedforward network layer is used for performing attention calculation according to the input vector to obtain an output vector;
and the classification module is used for inputting the codes to be detected into the final model to obtain category probability distribution, and selecting the category with the highest probability according to the category probability distribution as a final prediction result.
Based on the same inventive concept, a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the method of the first aspect.
Based on the same inventive concept, a fourth aspect of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the method according to the first aspect when executing said program.
Compared with the prior art, the invention has the following advantages and beneficial technical effects:
the invention provides a malicious code classification method and device based on pre-training, which comprises the steps of firstly extracting an operation code and shallow features of a subroutine (sub-program) in a malicious code: TF-IDF features and Asm2Vec features. The subtutine is used for pre-training the input samples of the pre-training model, so that the generalization capability of the model can be improved, and the training speed and effect of the model can be improved. The three-dimensional input (batch, seq, emudding) form of the pre-training model is changed into four-dimensional input (batch, sub, seq, emudding), so that the input scale of the model can be enlarged without enlarging the parameter number of the model, and the dilemma that the input sequence length is insufficient, and thus large-scale malicious codes are difficult to input the pre-training model is solved. Meanwhile, the improved pre-training model comprises a prefix optimizing structure, the output vector is segmented into a plurality of segments of input transform coding layers, a pre-training method is adopted, shallow layer characteristics can be used as pre-fix input, and pre-fix is trained based on the pre-training model, so that the parameter scale required to be trained in the model training process can be reduced, the universality of the pre-training model can be improved, and the performance equivalent to that of a pre-training-fine tuning paradigm can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a pre-training based malicious code classification method in an embodiment of the invention;
FIG. 2 is a schematic diagram of the process of extracting opcodes in shallow features and subroutines according to an embodiment of the present invention.
Detailed Description
The invention provides a pre-training-based malicious code classification method and device, which are characterized in that shallow features in malicious codes and operation codes in subtropine are extracted at first to construct a feature set; then, an improved pre-training model is built, a pre-training task is carried out, and a final model is obtained through training; and finally, inputting the code to be tested into a final model to obtain category probability distribution, and selecting the category with the highest probability as a final prediction result. The invention firstly extracts the operation code sequence of the subtitine in the malicious code, and then extracts the shallow features of TF-IDF and Asm2Vec. And the subtropine is used for pre-training the input samples of the pre-training model, so that the generalization capability of the model can be improved, and the training speed and effect of the model can be improved. The shallow layer features are used as the pre-fix, so that the parameter scale required to be trained in the model training process can be reduced, the universality of the pre-training model can be improved, and the performance equivalent to the pre-training-fine tuning paradigm can be realized.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment of the invention provides a malicious code classification method based on pre-training, referring to fig. 1, the method comprises the following steps:
s1: extracting shallow features and operation codes in subroutines from malicious codes contained in a preset data set, and constructing a feature set according to the shallow features and the operation codes, wherein the shallow features comprise TF-IDF features and Asm2Vec features, the TF-IDF features comprise readable character string sequence features, and the Asm2Vec features are semantic information features logically related to code execution in an assembly file;
s2: pre-training an improved pre-training model by taking a malicious code contained in a preset data set and a constructed feature set as a training data set to obtain a final model, wherein the improved pre-training model comprises a prefix tuning structure, an embedding layer, a one-dimensional convolutional neural network, a position encoder and a transducer encoding layer, the prefix tuning structure is used for segmenting an output vector into a plurality of sections of input transducer encoding layers, the embedding layer is used for embedding three-dimensional tensors of operation features in a subprogram to obtain four-dimensional tensors, the one-dimensional convolutional neural network is used for obtaining a new feature sequence vector according to the four-dimensional tensors obtained by the embedding layer, the position encoder is used for encoding position information of each word or mark in an input sequence into the new feature sequence vector, the transducer encoding layer consists of a plurality of stacked encoders, each encoder comprises a multi-head attention layer and a feedforward network layer, the multi-head attention layer is used for performing attention calculation according to the input vector to obtain an output vector, and the feedforward network layer is used for obtaining the encoded vector according to the input vector and the output vector;
s3: and inputting the code to be tested into a final model to obtain category probability distribution, and selecting the category with the highest probability according to the category probability distribution as a final prediction result.
Specifically, regarding feature extraction in S1, it can be achieved by:
for the operation code sequence, the sub-program in the code segment is selected as a mode of dividing the code segment, and because the program is generally divided into a plurality of sub-programs, each sub-program is responsible for completing a specific task, the program structure is clear, the readability is high, and the organization and the maintenance of the code are convenient. And extracting the operation code of each sub-grouping section, arranging the operation codes into an operation code sequence according to the sequence of the operation codes, and then juxtaposing the operation code sequences of all sub-grouping sections of the sample as a two-dimensional sequence.
For TF-IDF characteristics, extracting an operation code, a register name of a first operand and annotation content of an assembly file from a row with the operation code in an assembly file 'Segment type: pure code' Segment, and adding a semicolon sign before each function or main program; making word segmentation;
for Asm2Vec features, extracting the operation code semantics of 'reduced' operation code in the row with operation code in the section of the assembly file 'Segment type: pure code', including the operation code, the register of the first operand and the annotation content of the assembly file, abstracting each sub-function into a sentence as a corpus, and the extraction process is shown in FIG. 2.
In fig. 2, malicious code is shown on the left, and sub-program is used as a unit for dividing code segments, and extracted operation codes (sequences) representing the extracted operation codes are shown on the right, and TF-IDF features and Asm2Vec features are shown on the right.
In one embodiment, the prefix tuning structure is a prefix_tuning model structure, and is formed by a multi-layer fully-connected neural network, input layer nodes are set to be the sequence length of prefix, length conversion is performed through a hidden layer, the length of output layer nodes is set to be the node length suitable for deepfreex processing, and the deepfreex processing is to divide vectors output by the output layer into a plurality of multi-head self-attention layers used in a transform coding layer of an input pre-training model, and splice the multi-head self-attention layers with key vectors K and value vectors V used in the multi-head attention layers.
In one embodiment, pre-training the improved pre-training model with malicious code contained in a preset data set and the constructed feature set as training data sets to obtain a final model, including:
inputting the operation codes in the training data set into the improved pre-training model for training;
freezing parameters of the pre-training model, and inputting the extracted shallow features as prefixes into a prefix tuning structure for training.
Specifically, in the training process, firstly Embedding the extracted operation code features in the survivine by using an Embedding layer, then sending the obtained embedded four-dimensional tensor into a one-dimensional convolutional neural network, and carrying out convolutional operation on an input sequence by the convolutional layer by sliding a convolutional kernel to obtain a new feature sequence, wherein the output of the convolutional layer is regarded as the similarity between different positions in the input sequence and the convolutional kernel, and the output of the convolutional layer is expressed by the following formula:
wherein,is an input sequence,/->Is the output sequence of the convolutional layer,/>Is a weight parameter of the convolution kernel, +.>Is a bias item->Is the length of the convolution kernel, +.>Is an activation function. The formula represents the convolution kernel in the input sequence +.>Upper position->Start, and->Is>To->The values of the individual positions are weighted and summed and added with the bias term +.>Then by activating the function->Nonlinear transformation is performed to obtain an output->;
The tensor obtained from the one-dimensional convolution layer is augmented with its position information by a position encoder for encoding each word or marker position information in the input sequence into a vector representation, and finally the tensor augmented with the position information is input to a transducer encoding layer consisting of a plurality of stacked encoders, each encoder comprising two sub-layers, a multi-headed self-attention layer and a feed forward network layer, respectively.
In one embodiment, the embedding layer is specifically configured to find an embedding vector for each element in a given input from an embedding matrix having a size v×d, where V represents the number of rows of the embedding matrix and D represents the dimension of each embedding vector; the input of the embedding layer is a three-dimensional tensor of the operating characteristics in the subroutine, which is a tensor of the shape (B, S, L), where B is the number of samples processed in the batch, S is the number of subroutines contained in the batch of samples, L is the number of opcodes contained in the subroutine, and the three-dimensional tensor contains an index of each word to be retrieved from the embedding matrix; the output of the embedded layer is in the shape ofContains an embedded vector for each input word.
In one embodiment, the calculation formula of the position encoder is:
where pos denotes the position of the word or token in the sequence, m denotes the dimension of the output vector of the position encoder, d denotes the dimension of the input vector, the above formula denotes that the position encoder encodes each position as a vector, each element of the vector is a sine or cosine function, and the coefficients in the function are different depending on the position and dimension.
In one embodiment, the transform coding layer has the following formula:
wherein,representing the input vector +.>Representing the normalization layer, FFN is the feed forward network layer, and the above formula represents that each encoder processes the input vector through the multi-headed self-attention layer first>Obtaining an output vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Then +_input vector>And output vector->Adding and normalizing; and finally taking the normalized vector as the input of a feedforward network layer, and calculating again through a stacked encoder until the final output of a transducer coding layer is obtained.
In one embodiment, the pre-training task in the pre-training process adopts an MLM task, wherein the MLM task refers to randomly masking some words or marks in an input sequence, and then enabling a model to predict the masked words or marks;
the training target of the model is that the relation between the operation code representations is learned through the MLM task, and the calculation formula of the pre-training task is as follows:
where n represents the number of training samples, l represents the number of covered words or tokens in each sample,indicate->Sample No.)>Actual value of individual covered words or marks, < >>Representing the probability that the model predicts the word or token, < +.>Is the loss function of the MLM task.
The method according to the invention is described below by way of specific examples.
A malicious code classification method comprising the steps of:
1. shallow features in malicious codes and operation codes in subtertine are extracted, and a data set for training is constructed.
Microsoft in 2015 published a data set named "malicious code data set" (Malware Classification Dataset) for research and evaluation of malware (malicious code) classification and detection tasks. The embodiment of the invention adopts the data set, is widely applied to the classification and detection tasks of the malicious software, and provides a standard data set for algorithm research and performance evaluation for researchers and scholars. The method can help researchers develop effective malicious software detection algorithms and improve network security and security of computer systems.
Shallow features of malicious code include word frequency-inverse file frequency (TF-IDF) and Asm2Vec.
The readable character string and the operation code sequence are used as word frequency-inverse file frequency (TF-IDF). The more a word or piece of assembly code appears in a sample, the fewer the number of occurrences in all samples, the more representative the sample.
For the operation code sequence (feature), the operation code of each sub-sequence section is extracted and arranged into one operation code sequence according to the sequence of the operation code, and then the operation code sequences of all sub-sequences of the sample are juxtaposed as a two-dimensional sequence.
For the readability string feature TF-IDF, the operation code, the register name of the first operand, and the annotated content of the assembly file are extracted in the row with the operation code in the assembly file 'Segment type: pure code'. And added before each function or main program, and the symbols are used for word segmentation.
For Asm2Vec characteristics: the operation code semantics (operation code, register, annotation content) of the 'reduced' operation code are extracted from the lines in the section of the assembly file 'Segment type: pure code', and each sub-sentence is abstracted into a sentence as a corpus.
2. And constructing an improved pre-training model, performing a pre-training task, and obtaining a final model through training.
The improved pre-training model is that a pre-fix_tuning model structure is added on the original pre-training model, the pre-fix_tuning model structure is composed of a plurality of layers of fully-connected neural networks, input layer nodes are set to be the sequence length of pre-fix, and the length of a hidden layer is transformed; the output layer node length is set to a node length suitable for a deepfreex process, which is a multi-head self-attention layer that splits the vector output by the output layer into multiple segments for input to the pre-training model.
The training specifically comprises the following steps:
(1) Inputting the operation code in the dataset subtutine into the improved pre-training model for training
The operation code features extracted from the subtanning are first embedded by an Embedding layer.
The Embedding layer is used for decoding a frame with a size ofAn embedded vector for each element in a given input is looked up, where V is the number of rows (i.e., vocabulary size) of the embedded matrix and D is the dimension of each embedded vector.
The embedded four-dimensional tensor obtained in the previous step (Embedding layer) is then fed into a one-dimensional convolutional neural network.
The tensor obtained from the one-dimensional convolutional layer is then added with its position information by a position encoder. The position encoder is used to encode each word or tag position information in the input sequence into a vector representation.
Finally, the tensor added with the position information is input to the transducer coding layer. Which consists of a plurality of stacked encoders. Each encoder includes two sub-layers, a multi-headed self-attention layer and a feed-forward network layer, respectively.
In this embodiment, the three-dimensional input of the conventional transducer is changed to four-dimensional input, and the number of tokens (token is generally a basic unit in source code, and the present invention is a "flag") of the model processing is increased without increasing the number of model parameters.
The pre-training task uses Masked Language Model (MLM), which is to randomly mask some words or tokens in the input sequence and then let the model predict the masked words or tokens. The training goal of the Bert model is to learn the relationships between opcode representations through this task.
(2) Freezing parameters of the pre-training model, and inputting the extracted shallow features as a pre-fix into a pre-fix_tuning model structure for training.
The parameters of the pre-training model are frozen, so that the training parameter scale of the model can be reduced, and the pre-training model can be applied to different pre-fix_tuning tasks. In the specific implementation process, the TF-IDF and Asm2Vec features are converted into vectors with fixed length through TtffVectors and Word2Vec, the vectors are input into a prefix_tuning model structure in an improved pre-training model for training, and the vectors output by the prefix_tuning model structure are input into a multi-head self-attention layer of the pre-training model after being cut through deepfreex processing; and splicing the vector subjected to deepfreex processing segmentation with a key vector K and a value vector V used in a self-attention sub-layer in the pre-training model.
The split vector is spliced with a key vector K and a value vector V used in a multi-head self-attention sub-layer in the pre-training model. The key vector K is a vector used by the self-attention layer to calculate the attention weight, and represents the representation of the currently input representation in the key space. Similar to the query vector, the key vector K is obtained by multiplying the input vector by a key matrix (key matrix), which is also a trainable parameter of the model. The value vector V is a vector used by the self-attention layer to calculate the output vector, which represents the representation of the current input in the value space. Similar to the query vector and the key vector, the value vector is also obtained by multiplying the input vector by a value matrix (value matrix).
In the self-attention layer, the computation of the query vector, key vector and value vector are all independent and they are all obtained by multiplying the input vector by different matrices. Then, the attention weight calculated by the query vector and the key vector is multiplied by the value vector, and the result is weighted and summed to obtain an output vector. The dimension of the output vector is the same as the dimension of the value vector.
The deep prefix is a vector obtained by splicing the prefix vector with the vector K and the value vector V in all attention sublayers in the pre-training model, so that the performance of the model is improved.
3. And inputting the code to be tested into a final model to obtain category probability distribution, and selecting the category with the highest probability as a final prediction result.
Model training
The trained models of malicious codes of the same category are output as vectors and should have similar feature vectors. Model training can minimize cross entropy between model predictions and real labels through cross entropy loss functions so that the model can better fit data. The formula of the cross entropy loss function is as follows:
wherein,representing the number of categories>One-hot coding, which is a genuine tag,/>Is the +.o in the probability distribution of model output>Probability of class.
Model prediction for code search
After the model is trained, a vector can be obtained by inputting a subttine operation code and shallow features of malicious codes, and the vector is converted into probability distribution through a softmax function, so that the probability of each category is between 0 and 1 and the sum is 1. Based on the resulting class probability distribution, the class with the highest probability may be selected as the final prediction result. Thresholds may also be set to determine the prediction of the category.
The invention firstly extracts the operation code sequence of the subtitine in the malicious code, and then extracts the shallow features of TF-IDF and Asm2Vec. The subtutine is used for pre-training the input samples of the pre-training model, so that the generalization capability of the model can be improved, and the training speed and effect of the model can be improved. The shallow layer features are used as the pre-fix, so that the parameter scale required to be trained in the model training process can be reduced, the universality of the pre-training model can be improved, and the performance equivalent to the pre-training-fine tuning paradigm can be realized.
Example two
Based on the same inventive concept, the embodiment discloses a malicious code classification device based on pre-training, which comprises:
the feature extraction module is used for extracting shallow features and operation codes in subroutines from malicious codes contained in a preset data set, and constructing a feature set according to the shallow features and the operation codes, wherein the shallow features comprise TF-IDF features and Asm2Vec features, the TF-IDF features comprise readable character string sequence features, and the Asm2Vec features are semantic information features related to code execution logic in an assembly file;
the device comprises a pre-training module, a pre-training module and a position encoder, wherein the pre-training module is used for pre-training an improved pre-training model by taking malicious codes contained in a preset data set and a constructed feature set as training data sets to obtain a final model, the improved pre-training model comprises a prefix tuning structure, an embedding layer, a one-dimensional convolutional neural network, the position encoder and a transducer encoding layer, the prefix tuning structure is used for segmenting an output vector into a plurality of sections of input transducer encoding layers, the embedding layer is used for embedding three-dimensional tensors of operation features in a subprogram to obtain four-dimensional tensors, the one-dimensional convolutional neural network is used for obtaining a new feature sequence vector according to the four-dimensional tensors obtained by the embedding layer, the position encoder is used for encoding position information of each word or mark in an input sequence into the new feature sequence vector, the transducer encoding layer comprises a plurality of stacked encoders, each encoder comprises a multi-head attention layer and a feedforward network layer, and the feedforward network layer is used for performing attention calculation according to the input vector to obtain an output vector;
and the classification module is used for inputting the codes to be detected into the final model to obtain category probability distribution, and selecting the category with the highest probability according to the category probability distribution as a final prediction result.
Since the device described in the second embodiment of the present invention is a device for implementing the pretrained malicious code classification method in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the device, and therefore, the description thereof is omitted herein. All devices used in the method of the first embodiment of the present invention are within the scope of the present invention.
Example III
Based on the same inventive concept, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the method as described in embodiment one.
Because the computer readable storage medium described in the third embodiment of the present invention is a computer readable storage medium used for implementing the pretrained malicious code classification method in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the modification of the computer readable storage medium, and therefore, the description thereof is omitted here. All computer readable storage media used in the method according to the first embodiment of the present invention are included in the scope of protection.
Example IV
Based on the same inventive concept, the present application also provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method in the first embodiment when executing the program.
Because the computer device described in the fourth embodiment of the present invention is a computer device used for implementing the pretrained malicious code classification method in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the computer device, and therefore, the description thereof is omitted herein. All computer devices used in the method of the first embodiment of the present invention are within the scope of the present invention.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims and the equivalents thereof, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A pre-training based malicious code classification method, comprising:
extracting shallow features and operation codes in subroutines from malicious codes contained in a preset data set, and constructing a feature set according to the shallow features and the operation codes, wherein the shallow features comprise TF-IDF features and Asm2Vec features, the TF-IDF features comprise readable character string sequence features, and the Asm2Vec features are semantic information features logically related to code execution in an assembly file;
pre-training an improved pre-training model by taking a malicious code contained in a preset data set and a constructed feature set as a training data set to obtain a final model, wherein the improved pre-training model comprises a prefix tuning structure, an embedding layer, a one-dimensional convolutional neural network, a position encoder and a transducer encoding layer, the prefix tuning structure is used for segmenting an output vector into a plurality of sections of input transducer encoding layers, the embedding layer is used for embedding three-dimensional tensors of operation features in a subprogram to obtain four-dimensional tensors, the one-dimensional convolutional neural network is used for obtaining a new feature sequence vector according to the four-dimensional tensors obtained by the embedding layer, the position encoder is used for encoding position information of each word or mark in an input sequence into the new feature sequence vector, the transducer encoding layer consists of a plurality of stacked encoders, each encoder comprises a multi-head attention layer and a feedforward network layer, the multi-head attention layer is used for performing attention calculation according to the input vector to obtain an output vector, and the feedforward network layer is used for obtaining the encoded vector according to the input vector and the output vector;
and inputting the code to be tested into a final model to obtain category probability distribution, and selecting the category with the highest probability according to the category probability distribution as a final prediction result.
2. The method for classifying malicious codes based on pre-training according to claim 1, wherein the prefix tuning structure is a pre-fix_tuning model structure and is composed of a plurality of layers of fully-connected neural networks, input layer nodes are set to be the sequence length of pre-fix, length conversion is performed through a hidden layer, the length of output layer nodes is set to be the node length suitable for deepdiffix processing, the deepdiffix processing is to divide vectors output from the output layer into a plurality of sections for inputting a multi-head self-attention layer in a transform coding layer of the pre-training model, and the multi-head self-attention layer is spliced with key vectors K and value vectors V used in the multi-head attention layer.
3. The pretrained malicious code classification method according to claim 1, wherein pretraining the improved pretrained model with the malicious code contained in the preset data set and the constructed feature set as the training data set to obtain a final model comprises:
inputting the operation codes in the training data set into the improved pre-training model for training;
freezing parameters of the pre-training model, and inputting the extracted shallow features as prefixes into a prefix tuning structure for training.
4. The pretrained malicious code classification method according to claim 1, wherein the embedding layer is specifically configured to find an embedding vector for each element in a given input from an embedding matrix of size V x D, where V represents the number of rows of the embedding matrix and D represents the dimension of each embedding vector; the input of the embedded layer is a three-dimensional tensor of the operation characteristics in the subprogram, which is a shape of [ ]B, S, L), where B is the number of samples processed in the batch, S is the number of subroutines contained in the batch of samples, L is the number of opcodes contained in the subroutines, and the three-dimensional tensor contains an index for each word to be retrieved from the embedding matrix; the output of the embedded layer is in the shape ofContains an embedded vector for each input word.
5. The pretrained malicious code classification method according to claim 1, wherein the calculation formula of the position encoder is:
where pos denotes the position of the word or token in the sequence, m denotes the dimension of the output vector of the position encoder, d denotes the dimension of the input vector, the above formula denotes that the position encoder encodes each position as a vector, each element of the vector is a sine or cosine function, and the coefficients in the function are different depending on the position and dimension.
6. The pretrained malicious code classification method according to claim 1, wherein the calculation formula of the transducer coding layer is:
wherein,representing the input vector +.>Representing the normalization layer, FFN is the feed forward network layer, and the above formula represents that each encoder processes the input vector through the multi-headed self-attention layer first>Obtaining an output vector +.>The method comprises the steps of carrying out a first treatment on the surface of the Then +_input vector>And output vector->Adding and normalizing; and finally taking the normalized vector as the input of a feedforward network layer, and calculating again through a stacked encoder until the final output of a transducer coding layer is obtained.
7. The pretraining-based malicious code classification method according to claim 1, wherein the pretraining task adopts an MLM task, the MLM task is to randomly cover some words or marks in an input sequence, and then a model predicts the covered words or marks;
the training target of the model is that the relation between the operation code representations is learned through the MLM task, and the calculation formula of the pre-training task is as follows:
where n represents the number of training samples, l represents the number of covered words or tokens in each sample,represent the firstSample No.)>Actual value of individual covered words or marks, < >>The probability that the representation model predicts the word or token,is the loss function of the MLM task.
8. A pretrained malicious code classification apparatus, comprising:
the feature extraction module is used for extracting shallow features and operation codes in subroutines from malicious codes contained in a preset data set, and constructing a feature set according to the shallow features and the operation codes, wherein the shallow features comprise TF-IDF features and Asm2Vec features, the TF-IDF features comprise readable character string sequence features, and the Asm2Vec features are semantic information features related to code execution logic in an assembly file;
the device comprises a pre-training module, a pre-training module and a position encoder, wherein the pre-training module is used for pre-training an improved pre-training model by taking malicious codes contained in a preset data set and a constructed feature set as training data sets to obtain a final model, the improved pre-training model comprises a prefix tuning structure, an embedding layer, a one-dimensional convolutional neural network, the position encoder and a transducer encoding layer, the prefix tuning structure is used for segmenting an output vector into a plurality of sections of input transducer encoding layers, the embedding layer is used for embedding three-dimensional tensors of operation features in a subprogram to obtain four-dimensional tensors, the one-dimensional convolutional neural network is used for obtaining a new feature sequence vector according to the four-dimensional tensors obtained by the embedding layer, the position encoder is used for encoding position information of each word or mark in an input sequence into the new feature sequence vector, the transducer encoding layer comprises a plurality of stacked encoders, each encoder comprises a multi-head attention layer and a feedforward network layer, and the feedforward network layer is used for performing attention calculation according to the input vector to obtain an output vector;
and the classification module is used for inputting the codes to be detected into the final model to obtain category probability distribution, and selecting the category with the highest probability according to the category probability distribution as a final prediction result.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the method of any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311610887.1A CN117332419B (en) | 2023-11-29 | 2023-11-29 | Malicious code classification method and device based on pre-training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311610887.1A CN117332419B (en) | 2023-11-29 | 2023-11-29 | Malicious code classification method and device based on pre-training |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117332419A true CN117332419A (en) | 2024-01-02 |
CN117332419B CN117332419B (en) | 2024-02-20 |
Family
ID=89293778
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311610887.1A Active CN117332419B (en) | 2023-11-29 | 2023-11-29 | Malicious code classification method and device based on pre-training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117332419B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113515742A (en) * | 2020-04-12 | 2021-10-19 | 南京理工大学 | Internet of things malicious code detection method based on behavior semantic fusion extraction |
CN113987209A (en) * | 2021-11-04 | 2022-01-28 | 浙江大学 | Natural language processing method and device based on knowledge-guided prefix fine tuning, computing equipment and storage medium |
CN114065199A (en) * | 2021-11-18 | 2022-02-18 | 山东省计算中心(国家超级计算济南中心) | Cross-platform malicious code detection method and system |
CN114386511A (en) * | 2022-01-11 | 2022-04-22 | 广州大学 | Malicious software family classification method based on multi-dimensional feature fusion and model integration |
CN114647723A (en) * | 2022-04-18 | 2022-06-21 | 北京理工大学 | Few-sample abstract generation method based on pre-training soft prompt |
US20220398462A1 (en) * | 2021-06-14 | 2022-12-15 | Microsoft Technology Licensing, Llc. | Automated fine-tuning and deployment of pre-trained deep learning models |
US20230161567A1 (en) * | 2021-11-24 | 2023-05-25 | Microsoft Technology Licensing, Llc. | Custom models for source code generation via prefix-tuning |
CN116720184A (en) * | 2023-04-27 | 2023-09-08 | 厦门农芯数字科技有限公司 | Malicious code analysis method and system based on generation type AI |
CN117113349A (en) * | 2023-08-25 | 2023-11-24 | 杭州电子科技大学 | Malicious software detection method based on malicious behavior enhancement pre-training model |
-
2023
- 2023-11-29 CN CN202311610887.1A patent/CN117332419B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113515742A (en) * | 2020-04-12 | 2021-10-19 | 南京理工大学 | Internet of things malicious code detection method based on behavior semantic fusion extraction |
US20220398462A1 (en) * | 2021-06-14 | 2022-12-15 | Microsoft Technology Licensing, Llc. | Automated fine-tuning and deployment of pre-trained deep learning models |
CN113987209A (en) * | 2021-11-04 | 2022-01-28 | 浙江大学 | Natural language processing method and device based on knowledge-guided prefix fine tuning, computing equipment and storage medium |
CN114065199A (en) * | 2021-11-18 | 2022-02-18 | 山东省计算中心(国家超级计算济南中心) | Cross-platform malicious code detection method and system |
US20230161567A1 (en) * | 2021-11-24 | 2023-05-25 | Microsoft Technology Licensing, Llc. | Custom models for source code generation via prefix-tuning |
CN114386511A (en) * | 2022-01-11 | 2022-04-22 | 广州大学 | Malicious software family classification method based on multi-dimensional feature fusion and model integration |
CN114647723A (en) * | 2022-04-18 | 2022-06-21 | 北京理工大学 | Few-sample abstract generation method based on pre-training soft prompt |
CN116720184A (en) * | 2023-04-27 | 2023-09-08 | 厦门农芯数字科技有限公司 | Malicious code analysis method and system based on generation type AI |
CN117113349A (en) * | 2023-08-25 | 2023-11-24 | 杭州电子科技大学 | Malicious software detection method based on malicious behavior enhancement pre-training model |
Non-Patent Citations (3)
Title |
---|
WENHAO MA ET AL.: "Pre-trained Model Based Feature Envy Detection", 2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR) * |
XIAOMING RUAN ET AL.: "Prompt Learning for Developing Software Exploits", INTERNETWARE \'23: PROCEEDINGS OF THE 14TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE * |
刘恒讯;艾中良;: "一种基于词向量的恶意代码分类模型", 电子设计工程, no. 06 * |
Also Published As
Publication number | Publication date |
---|---|
CN117332419B (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Learning to extract attribute value from product via question answering: A multi-task approach | |
Meng et al. | Research on denoising sparse autoencoder | |
CN113596007B (en) | Vulnerability attack detection method and device based on deep learning | |
Yu et al. | LSTM-based end-to-end framework for biomedical event extraction | |
CN115982403B (en) | Multi-mode hash retrieval method and device | |
CN111931935A (en) | Network security knowledge extraction method and device based on One-shot learning | |
CN113282714A (en) | Event detection method based on differential word vector representation | |
Yin et al. | Intrusion detection for capsule networks based on dual routing mechanism | |
EP4004827A1 (en) | A computer-implemented method, a system and a computer program for identifying a malicious file | |
Chen et al. | Survey on ai sustainability: Emerging trends on learning algorithms and research challenges | |
CN110969015A (en) | Automatic label identification method and equipment based on operation and maintenance script | |
Şahin | Malware detection using transformers-based model GPT-2 | |
Pei et al. | Combining multi-features with a neural joint model for Android malware detection | |
CN117332419B (en) | Malicious code classification method and device based on pre-training | |
CN116361788A (en) | Binary software vulnerability prediction method based on machine learning | |
CN113326371B (en) | Event extraction method integrating pre-training language model and anti-noise interference remote supervision information | |
Al-Jamal et al. | Image captioning techniques: A review | |
Meng et al. | A survey on machine learning-based detection and classification technology of malware | |
Otsubo et al. | Compiler provenance recovery for multi-cpu architectures using a centrifuge mechanism | |
CN111199170B (en) | Formula file identification method and device, electronic equipment and storage medium | |
Sharma et al. | Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization | |
Li et al. | Prior knowledge integrated with self-attention for event detection | |
Rastogi et al. | Dimensionality Reduction Approach for High Dimensional Data using HGA based Bio Inspired Algorithm | |
Vadavalli et al. | Deep Learning based truth discovery algorithm for research the genuineness of given text corpus | |
Jiang et al. | Multi-label Detection Method for Smart Contract Vulnerabilities Based on Expert Knowledge and Pre-training Technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |