CN115859290A - Malicious code detection method based on static characteristics and storage medium - Google Patents

Malicious code detection method based on static characteristics and storage medium Download PDF

Info

Publication number
CN115859290A
CN115859290A CN202310049009.0A CN202310049009A CN115859290A CN 115859290 A CN115859290 A CN 115859290A CN 202310049009 A CN202310049009 A CN 202310049009A CN 115859290 A CN115859290 A CN 115859290A
Authority
CN
China
Prior art keywords
layer
convolution
array
separable
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310049009.0A
Other languages
Chinese (zh)
Other versions
CN115859290B (en
Inventor
王平
荣星
严锦立
吴流丽
毛建辉
汪文晓
严亚伟
王耀
贾雄
刘筱明
谷广宇
王秋实
尹韧达
杜丽
宋健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UNIT 61660 OF PLA
Original Assignee
UNIT 61660 OF PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UNIT 61660 OF PLA filed Critical UNIT 61660 OF PLA
Priority to CN202310049009.0A priority Critical patent/CN115859290B/en
Publication of CN115859290A publication Critical patent/CN115859290A/en
Application granted granted Critical
Publication of CN115859290B publication Critical patent/CN115859290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention discloses a malicious code detection method and a storage medium based on static characteristics, which comprises the steps of obtaining a code to be detected, and extracting a static characteristic array of the code to be detected; performing alternate processing of a first separable convolution and a second separable convolution on the static feature array to obtain a first size array, wherein the first separable convolution and the second separable convolution processing both comprise processing of channel-divided spatial information on an input array of the convolution layer by depth; the convolution processing step size of the depth-wise convolutional layer in the first separable convolution is smaller than that of the depth-wise convolutional layer in the second separable convolution; the number of the alternate occurrence periods of the first separable convolution and the second separable convolution is more than or equal to 2; the code class is determined using the full connectivity layer and the Softmax layer. All-round discernment and primitive feature extraction are carried out through static characteristic, separate the spatial information and the channel information of static characteristic array, catch the spatial correlation between the characteristic better when training, are favorable to the detection to malicious code.

Description

Malicious code detection method based on static characteristics and storage medium
Technical Field
The invention relates to the technical field of computer program security. And more particularly, to a malicious code detection method and a storage medium based on static features.
Background
With the rapid development of computer network technology, network security has become a complex, realistic, and serious non-traditional security problem.
Malicious code attack is one of the main factors influencing network security, and computer viruses or trojan programs are injected into an attack target by utilizing various deception means so as to achieve the purpose of destroying target system resources or acquiring target system resource information. For practical cases, most of the major network security events which have been outbreaked in recent years are attacking components with malicious codes as cores and causing substantial damage. Adverse effects caused by malicious code attacks are gradually expanded, and great potential safety hazards are formed for the nation and the society. The development of anti-technical research for malicious code attacks has become an urgent need to maintain network security.
In the prior art, although a malicious code detection method based on machine learning can detect known and unknown malicious codes, most of the malicious codes have the problems of excessive involved algorithm parameters and incomplete code information extraction, and the accuracy of code detection is influenced.
Disclosure of Invention
The present invention is made based on the above-mentioned needs of the prior art, and the technical problem to be solved by the present invention is how to quickly and accurately identify malicious codes.
In order to solve the problems, the invention is realized by adopting the following technical scheme:
a malicious code detection method based on static characteristics comprises the following steps:
acquiring a code to be detected, and extracting a static characteristic array of the code to be detected;
performing alternating processing on the static feature array by using a first separable convolution and a second separable convolution to obtain a first size array, wherein the first separable convolution and the second separable convolution both comprise processing the input array by using a depth-by-depth convolution layer to perform channel-division spatial information processing; processing channel information of the depth-wise convolutional layer processing result according to the point-wise convolutional layer; a convolution processing step size of a depth-wise convolutional layer in the first separable convolution is smaller than a convolution processing step size of a depth-wise convolutional layer in the second separable convolution; wherein the number of alternating occurrence periods of the first separable convolution and the second separable convolution is greater than or equal to 2;
and processing the first size array by utilizing a full connection layer and a Softmax layer to determine the code type.
Optionally, after extracting the static feature array of the code to be detected, the method further includes: and normalizing the characteristic values in the static characteristic array.
Optionally, before processing the first size array with the fully connected layer and the Softmax layer, performing maximum pooling processing on the first size array through a pooling layer, and extracting a maximum value in each pooled region.
Optionally, the depth-wise convolutional layer comprises a depth-wise convolutional kernel, and a BN layer and a ReLU layer which are sequentially connected after the depth-wise convolutional kernel; the output of the ReLU layer is used as the output result of the depth-wise convolutional layer;
the point-by-point convolution layer comprises a point-by-point convolution kernel, and a BN layer and a ReLU layer which are sequentially connected behind the point-by-point convolution kernel; the output of the ReLU layer is used as the output result of the point-by-point convolution layer.
Optionally, the obtaining the code to be detected and extracting the static feature array of the code to be detected includes:
decompiling the code to be detected through an assembly language to obtain a decompiled file; extracting a static characteristic array of the code to be detected according to the disassembled file;
the numerical values in the static characteristic array comprise punctuation frequency, register character frequency, operation code frequency, function calling information and key frequency.
Optionally, the method further comprises:
constructing a separable convolution model according to the first separable convolution layer, the second separable convolution layer, the full connection layer and the Softmax layer; wherein the first separable convolutional layer performs the first separable convolutional processing, and the second separable convolutional layer performs the second separable convolutional processing; and training the separable convolution model by using a static characteristic array with a code category label.
Optionally, the method further comprises: parameters of the separable convolution model are optimized using a stochastic gradient descent algorithm.
Optionally, before the static feature array is subjected to the alternating processing of the first separable convolution and the second separable convolution to obtain the first size array, the static feature array is preprocessed by using a convolution kernel with a size of 5 × 5 and a step length of 2.
Optionally, the method further comprises: the separable convolution model is tested for accuracy using the test set data.
A computer readable storage medium having stored thereon a computer program, the computer readable storage medium having stored thereon a static feature based malicious code detection program, which when executed by a processor, performs any of the steps of a static feature based malicious code detection method.
Compared with the prior art, the invention provides a file static feature array extraction method aiming at computer readable codes, and the method supports the omnibearing identification and original feature extraction of the static features of the file by counting the occurrence frequency of various characters in a disassembled file. The separable convolution network designed by the embodiment of the invention has the advantages of less parameters, high training speed and the like, can separate the spatial information and the channel information of the static characteristic array by carrying out separable convolution on the static characteristic array, better captures the spatial correlation among the characteristics during training and is more favorable for detecting malicious codes.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a flowchart of a malicious code detection method based on static features according to an embodiment of the present invention;
FIG. 2 is a block diagram of a method for detecting malicious code based on static features according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a separable convolutional network of a static feature-based malicious code detection method according to an embodiment of the present invention;
FIG. 4 is a network diagram of a first separable convolutional layer of a static feature-based malicious code detection method according to an embodiment of the present invention;
fig. 5 is a schematic network diagram of a second separable convolutional layer of a static feature-based malicious code detection method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments in conjunction with the drawings, which are not intended to limit the scope of the present invention.
Example 1
The embodiment provides a malicious code detection method based on static features, which is shown in fig. 1-2 and includes:
s1: and acquiring a code to be detected, and extracting a static characteristic array of the code to be detected.
In the embodiment of the present invention, the steps specifically include: acquiring a code to be detected, and performing decompiling on the code to be detected through assembly language to obtain a decompiled file; extracting a static characteristic array of the code to be detected according to the disassembled file; the numerical values in the static characteristic array comprise punctuation frequency, register character frequency, operation code frequency, function calling information and key frequency.
Specifically, the method comprises the following steps:
the punctuation frequency comprises: the number of times of punctuation marks ' + ', ' + ', ' [ ', ' ] ', ' @ ', '.
The register character frequency number includes: the number of times register symbols 'edx', 'esi', 'es', 'fs', 'ds', 'ss', 'gs', 'cs', 'ah', 'al', 'ax', 'bh', 'bl', 'bx', 'ch', 'cl', 'cx', 'dh', 'dl', 'dx', 'eax', 'ebp', 'ebx', 'ecx', 'edi', 'esp', etc. occur in the disassembly file.
The opcode frequency includes: the operation codes 'add', 'al', 'bt', 'call', 'cdq', 'cld', 'cli', 'cmc', 'cmp', 'const', 'cwd', 'daa', 'db', 'dd', 'dec', 'dw', 'endp', 'ends', 'faddp', 'fchs', 'fdiv', 'fdivp', 'rdivp' in the disassembled file, 'fdivr', 'fill', 'fistp', 'fld', 'fstcw', 'fstcwimul', 'fstp', 'fword', 'fxch', 'imu', 'in', 'inc', 'ins', 'int', 'jb', 'je', 'jg', 'jge', 'jl', 'jmp', 'jnb', 'jno', 'jnz', 'jo', 'jz', 'lea', 'lope', 'mov', 'movzx', 'mul', 'near', 'neg', 'or', 'out', 'outlets', 'pop', 'proc', 'push', 'rcl', 'rcr', 'rdtsc', 'rep', 'ret', 'retn', 'rol', 'ror', the number of times of 'sal', 'sar', 'sbb', 'scas', 'setb', 'setle', 'setnz', 'setz', 'shl', 'shld', 'shr', 'sidt', 'stc', 'std', 'stis', 'stos', 'sub', 'test', 'wait', 'xchg', 'xor', etc.
The function call information includes: firstly, generating a function call graph of a file, and then counting the times of each character appearing in the function call graph of the file.
The keyword frequency number comprises: <xnotran> 'Virtual', 'Offset', 'loc', 'Import', 'Imports', 'var', 'Forwarder', 'UINT', 'LONG', 'BOOL', 'WORD', 'BYTES', 'large', 'short', 'dd', 'db', 'dw', 'XREF', 'ptr', 'DATA', 'FUNCTION', 'extrn', 'byte', 'word', 'dword', 'char', 'DWORD', 'stdcall', 'arg', 'locret', 'asc', 'align', 'WinMain', 'unk', 'cookie', 'off', 'nullsub', 'DllEntryPoint', 'System32', 'dll', 'CHUNK', 'BASS', 'HMENU', 'DLL', 'LPWSTR', 'void', 'HRESULT', 'HDC', 'LRESULT', 'HANDLE', 'HWND', 'LPSTR', 'int', 'HLOCAL', 'FARPROC', 'ATOM', 'HMODULE', 'WPARAM', 'HGLOBAL', 'entry', 'rva', 'COLLAPSED', 'config', 'exe', 'Software', 'CurrentVersion', '__ imp _', 'INT _ PTR', 'UINT _ PTR', '- - -Seperator', 'PCCTL _ CONTEXT', '__ IMPORT _', 'INTERNET _ STATUS _ CALLBACK', '. Rdata:', '. Data:', '. Text:', 'case', 'installdir', 'market', 'microsoft', 'policies', 'proc', 'scrollwindow', 'search', 'trap', 'visualc', '___ security _ cookie', 'assume', 'callvirtualalloc', 'exportedentry', 'hardware', 'hkey _ current _ user', 'hkey _ local _ machine', 'sp-analysisfailed', 'unableto' . </xnotran>
After the binary file is subjected to reverse compilation, the code characteristics can be better extracted, and compared with the binary file, the reverse compilation file can better embody the code functions, so that the static characteristics of the code can be more favorably extracted.
By extracting the static characteristics of the code to be detected, a malicious code characteristic array with identifiability can be constructed and obtained, and the identification of a convolutional network is facilitated.
Optionally, after extracting the static feature array of the code to be detected, the method further includes: and normalizing the characteristic values in the static characteristic array.
And the data in the static characteristic array is limited within a certain range through normalization, so that the dimensional influence between indexes is eliminated.
In the embodiment of the invention, the five types of features are combined to obtain 16384-dimensional features, and because the numerical value ranges of the features have differences, all feature values are subjected to normalization processing, so that the deviation of classification results caused by the difference of data span or magnitude is avoided. And data preprocessing is performed for the input of the convolution network.
In this embodiment, according to the normalized feature value, the feature of each file is represented as a static feature array with a size of 128 × 128, that is, the extraction of the static feature array of the code to be detected is completed.
Furthermore, the static feature array of the code with known category is collected into a feature library, and a training feature library is obtained and used for the training of the Softmax layer.
S2: performing alternating processing on the static feature array by using a first separable convolution and a second separable convolution to obtain a first size array, wherein the first separable convolution and the second separable convolution both comprise processing the input array by using a depth-by-depth convolution layer to perform channel-division spatial information processing; processing channel information of the depth-wise convolutional layer processing result according to the point-wise convolutional layer; the convolution processing step size of the depth-wise convolutional layer in the first separable convolution is smaller than that of the depth-wise convolutional layer in the second separable convolution; wherein the first separable convolution and the second separable convolution alternately occur for a period number greater than or equal to 2.
The first separable convolution represents inputting an input array to a first separable convolution layer for a convolution process, and the second separable convolution represents inputting an input array to a second separable convolution layer for a convolution process.
In the embodiment of the present invention, the alternation means that the first separable convolution and the second separable convolution process the input array in turn, where the first separable convolution processes the static feature array, the processing result is further processed by the second separable convolution, and the processing result is further processed by the first separable convolution, and the processing results are sequentially set. The period refers to the number of times the first separable convolutional layer and the second separable convolutional layer appear alternately, one first separable convolutional layer and one second separable convolutional layer connected adjacently as one period.
In the embodiment of the present invention, each of the first separable convolutional layer and the second separable convolutional layer includes a depth-by-depth convolutional layer and a point-by-point convolutional layer, the depth-by-depth convolutional layer includes a plurality of sets of input channels and output channels corresponding to each other, the output channels in the depth-by-depth convolutional layer serve as the input channels of the point-by-point convolutional layer at the same time, each output channel of the depth-by-depth convolutional layer corresponds to an h × w convolution kernel, the point-by-point convolutional layer includes a plurality of sets of output channels, and each output channel in the point-by-point convolutional layer corresponds to a 1 × 1 convolution kernel, when performing convolution calculation, one h × w convolution kernel in the depth-by-depth convolutional layer performs convolution calculation with the data feature input in one input channel and is output by the corresponding output channel, and then the data features output by each output channel in the depth-by-1 convolution kernels are fused.
The embodiment of the present invention provides a process in which the number of alternating occurrence cycles of the first separable convolution and the second separable convolution is greater than 2, specifically including: carrying out convolution calculation on the static characteristic array by each channel of the depth-by-depth convolution layer in the first separable convolution to obtain first data characteristics, and outputting the first data characteristics from the corresponding output channel to carry out first separation on the spatial information of the code to be detected; and receiving the calculation result of the depth-wise convolutional layer by the input channel of the point-wise convolutional layer in the first separable convolution, and performing first fusion on the first data characteristic by using a convolution kernel of 1 x 1 to obtain a first data packet.
And carrying out convolution calculation on the first data packet by each channel of the depth-by-depth convolution layer in the second separable convolution to obtain second data characteristics, outputting the second data characteristics from the corresponding output channel, carrying out second separation on the spatial information of the code to be detected, receiving the calculation result of the depth-by-depth convolution layer by the input channel of the point-by-point convolution layer in the second separable convolution, and carrying out second fusion on the second data characteristics by using the convolution kernel of 1 x 1 to obtain a second data packet.
And carrying out convolution calculation on the second data packet by each channel of the depth-by-depth convolution layer in the first separable convolution to obtain a third data characteristic, outputting the third data characteristic from a corresponding output channel, carrying out third separation on the spatial information of the code to be detected, receiving the calculation result of the depth-by-depth convolution layer by an input channel of the point-by-point convolution layer in the second separable convolution, and carrying out third fusion on the third data characteristic by using a convolution kernel of 1 x 1 to obtain a first size array.
In an embodiment of the present invention, each output channel of the depth-wise convolutional layer in the first separable convolution corresponds to a convolution kernel with step size 1 of 3 × 3, and each output channel of the depth-wise convolutional layer in the second separable convolution corresponds to a convolution kernel with step size 2 of 3 × 3.
According to the embodiment of the invention, by separating the channel information and the space information of the characteristic array, parameters are reduced, the processing of invalid information can be avoided, and redundant information in a convolution processing result is greatly reduced.
The first separable convolutional layers and the second separable convolutional layers are alternately arranged, the convolution processing step length of the depth-by-depth convolutional layers in the first separable convolution is smaller than that of the depth-by-depth convolutional layers in the second separable convolution, the space size of an output array can be reduced, deep extraction can be achieved, code characteristic information is extracted more comprehensively, and the phenomenon that after two continuous convolution extractions, the characteristics are not fused enough and are directly discarded is avoided.
Preferably, four first separable convolutions and three second separable convolutions are alternately arranged with a period number greater than 3.
With the setting, the phenomenon of overfitting caused by excessive parameters is avoided; and the parameters are too few, so that an accurate classification result cannot be obtained, useful information of static characteristics can be kept as far as possible, and the code detection effect is better.
S3: and processing the first size array by utilizing a full connection layer and a Softmax layer to determine the code type.
And the Softmax layer is used for carrying out data classification or regression on the data features after feature fusion and outputting a classification or regression result.
Preferably, before processing the first size array by using the full connection layer and the Softmax layer, the maximum value in each pooled region is extracted by performing maximum pooling on the first size array by the pooling layer.
In the embodiment of the invention, the maximum value of each pooled region in the first size array is extracted by using one-dimensional maximum pooling, and all the maximum values are integrated to obtain the pooled data packet.
Preferably, the depth-by-depth convolution layer includes a depth-by-depth convolution kernel, and a BN layer and a ReLU layer that are sequentially connected after the depth-by-depth convolution kernel; the output of the ReLU layer is used as the output result of the depth-wise convolutional layer;
the point-by-point convolution layer comprises a point-by-point convolution kernel, and a BN layer and a ReLU layer which are sequentially connected behind the point-by-point convolution kernel; the output of the ReLU layer is used as the output result of the point-by-point convolution layer.
The BN layer executes batch normalization operation, and the problem that the data distribution of the middle layer is changed in the training process of the separable convolution model can be solved, so that gradient disappearance or explosion is prevented, and the training speed is accelerated.
The ReLU layer is an activation function, negative values in a convolution result can be removed, and positive values are kept unchanged.
Preferably, the method further comprises:
constructing a separable convolution model according to the first separable convolution layer, the second separable convolution layer, the full connection layer and the Softmax layer; wherein the first separable convolutional layer performs the first separable convolutional processing, and the second separable convolutional layer performs the second separable convolutional processing; the separable convolution model is trained using an array of static features with a code class label. The network structure of the separable convolutional model is shown in fig. 3, the network structure of the first separable convolutional layer is shown in fig. 4, and the network structure of the second separable convolutional layer is shown in fig. 5.
Preferably, the method further comprises: parameters of the separable convolution model are optimized using a stochastic gradient descent algorithm.
Preferably, before the static feature array is processed alternately by the first separable convolution and the second separable convolution to obtain the first size array, the static feature array is preprocessed by using a convolution kernel with the size of 5 × 5 and the step length of 2.
At this time, the separable convolution model further includes a preprocessing layer for performing a plurality of convolution kernels of size 5 × 5 and step size 2 on the input array.
Preferably, the method further comprises: the separable convolution model is tested for accuracy using the test set data.
During training of the separable convolution model, a training feature array set with labels is divided into a training set, a verification set and a test set according to the proportion of 7.
And optimizing network parameters by adopting a random gradient descent algorithm, and setting the initial learning rate to be 0.001, the momentum to be 0.9 and the learning rate attenuation rate to be 0.000001. The model input adopts batch processing mode, and 32 static characteristic arrays can be set and input in each batch. After all the static feature arrays are trained once, the static feature array sequences are scrambled.
Further preferably, a malicious code detection method based on feature code matching may be combined with the malicious code detection method based on static features provided in the embodiments of the present invention to improve the malicious code detection accuracy, where the static feature codes are data fragments and location information of the data fragments, where the data fragments refer to a section of special codes or character string information, extracted from a code program of malicious software through format analysis and code analysis, and these pieces of information are abstracted into static feature codes as features of the malicious software. And matching the code to be detected with the known static feature codes stored in the virus feature library to determine whether the code to be detected has malicious codes. Only by using a malicious code detection method based on feature code matching, once a static feature code of a certain malicious code is determined, accurate searching and killing can be performed, however, unknown malicious codes cannot be effectively searched and killed without predefined feature codes, namely, certain hysteresis exists, and unknown malicious software cannot be identified; in addition, features of malware are easily changed in an encrypted, obfuscated manner, resulting in the need for frequent updates of extracted features of malware. After the method is combined with the separable convolution model designed by the embodiment of the invention, secondary detection is carried out on the codes which are missed to be detected by the separable convolution model, thereby realizing rapid and accurate malicious code detection.
The computer readable code may be in the form of software, data, or images and other computer readable data.
Compared with the prior art, the embodiment of the invention provides a file static feature array extraction method aiming at computer readable codes, and the static features of the file are supported to be comprehensively identified and original feature extracted by counting the occurrence frequency of various characters in a disassembled file. The separable convolution network designed by the embodiment of the invention has the advantages of less parameters, high training speed and the like, can separate the spatial information and the channel information of the static characteristic array by carrying out separable convolution on the static characteristic array, better captures the spatial correlation among the characteristics during training and is more favorable for detecting malicious codes. The method provided by the embodiment of the invention can be used as a supplementary means for traditional malicious code detection, and is combined with a malicious code detection method based on feature code matching to carry out secondary detection on codes missed by detection of the latter, so that rapid and accurate malicious code detection is realized.
A computer-readable storage medium having stored thereon a computer program, the computer-readable storage medium having stored thereon a static feature-based malicious code detection program, which, when executed by a processor, implements the steps of a static feature-based malicious code detection method.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A malicious code detection method based on static characteristics is characterized by comprising the following steps:
acquiring a code to be detected, and extracting a static characteristic array of the code to be detected;
performing alternating processing on the static feature array by using a first separable convolution and a second separable convolution to obtain a first size array, wherein the first separable convolution and the second separable convolution both comprise processing the input array by using a depth-by-depth convolution layer to perform channel-division spatial information processing; processing channel information of the depth-wise convolutional layer processing result according to the point-wise convolutional layer; the convolution processing step size of the depth-wise convolutional layer in the first separable convolution is smaller than that of the depth-wise convolutional layer in the second separable convolution; wherein the number of alternating occurrence periods of the first separable convolution and the second separable convolution is greater than or equal to 2;
and processing the first size array by utilizing a full connection layer and a Softmax layer to determine the code type.
2. The method for detecting malicious codes based on static characteristics according to claim 1, further comprising, after extracting the static characteristic array of the codes to be detected: and normalizing the characteristic values in the static characteristic array.
3. The static-feature-based malicious code detection method according to claim 1, wherein before the first size array is processed by a full connection layer and a Softmax layer, the maximum value in each pooled region is extracted by performing maximum pooling on the first size array by a pooling layer.
4. The malicious code detection method based on the static characteristics of claim 1, wherein the depth-wise convolutional layer comprises a depth-wise convolutional kernel, and a BN layer and a ReLU layer which are sequentially connected after the depth-wise convolutional kernel; the output of the ReLU layer is used as the output result of the depth-wise convolutional layer;
the point-by-point convolution layer comprises a point-by-point convolution kernel, and a BN layer and a ReLU layer which are sequentially connected behind the point-by-point convolution kernel; the output of the ReLU layer is used as the output result of the point-by-point convolution layer.
5. The method as claimed in claim 1, wherein the obtaining the code to be detected and extracting the static feature array of the code to be detected comprises:
decompiling the code to be detected through assembly language to obtain a decompiled file; extracting a static characteristic array of the code to be detected according to the disassembled file;
the numerical values in the static characteristic array comprise punctuation frequency, register character frequency, operation code frequency, function calling information and key frequency.
6. The static feature-based malicious code detection method according to claim 1, further comprising:
constructing a separable convolution model according to the first separable convolution layer, the second separable convolution layer, the full connection layer and the Softmax layer; wherein the first separable convolutional layer performs the first separable convolutional processing, and the second separable convolutional layer performs the second separable convolutional processing; the separable convolution model is trained using an array of static features with a code class label.
7. The static feature-based malicious code detection method according to claim 6, further comprising: parameters of the separable convolution model are optimized using a stochastic gradient descent algorithm.
8. The method according to claim 1, wherein before the static feature array is subjected to the alternating processing of the first separable convolution and the second separable convolution to obtain the first size array, the static feature array is preprocessed with a step length of 2 by using a convolution kernel with a size of 5 x 5.
9. The static feature-based malicious code detection method according to claim 6, further comprising: the separable convolution model is tested for accuracy using the test set data.
10. A computer-readable storage medium having stored thereon a computer program, the computer-readable storage medium having stored thereon a static feature-based malicious code detection program, which, when executed by a processor, implements the steps of a static feature-based malicious code detection method according to any of claims 1 to 9.
CN202310049009.0A 2023-02-01 2023-02-01 Malicious code detection method based on static characteristics and storage medium Active CN115859290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310049009.0A CN115859290B (en) 2023-02-01 2023-02-01 Malicious code detection method based on static characteristics and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310049009.0A CN115859290B (en) 2023-02-01 2023-02-01 Malicious code detection method based on static characteristics and storage medium

Publications (2)

Publication Number Publication Date
CN115859290A true CN115859290A (en) 2023-03-28
CN115859290B CN115859290B (en) 2023-05-16

Family

ID=85657416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310049009.0A Active CN115859290B (en) 2023-02-01 2023-02-01 Malicious code detection method based on static characteristics and storage medium

Country Status (1)

Country Link
CN (1) CN115859290B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090059337A (en) * 2007-12-06 2009-06-11 전북대학교산학협력단 A new register allocation method for performance enhancement of embedded softwar
US20180285740A1 (en) * 2017-04-03 2018-10-04 Royal Bank Of Canada Systems and methods for malicious code detection
CN111651762A (en) * 2020-04-21 2020-09-11 浙江大学 Convolutional neural network-based PE (provider edge) malicious software detection method
CN112765606A (en) * 2021-01-19 2021-05-07 南京东巽信息技术有限公司 Malicious code homology analysis method, device and equipment
CN114611102A (en) * 2022-02-23 2022-06-10 西安电子科技大学 Visual malicious software detection and classification method and system, storage medium and terminal
CN114692156A (en) * 2022-05-31 2022-07-01 山东省计算中心(国家超级计算济南中心) Memory segment malicious code intrusion detection method, system, storage medium and equipment
CN115630364A (en) * 2022-09-30 2023-01-20 兴业银行股份有限公司 Android malicious software detection method and system based on multi-dimensional visual analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090059337A (en) * 2007-12-06 2009-06-11 전북대학교산학협력단 A new register allocation method for performance enhancement of embedded softwar
US20180285740A1 (en) * 2017-04-03 2018-10-04 Royal Bank Of Canada Systems and methods for malicious code detection
CN111651762A (en) * 2020-04-21 2020-09-11 浙江大学 Convolutional neural network-based PE (provider edge) malicious software detection method
CN112765606A (en) * 2021-01-19 2021-05-07 南京东巽信息技术有限公司 Malicious code homology analysis method, device and equipment
CN114611102A (en) * 2022-02-23 2022-06-10 西安电子科技大学 Visual malicious software detection and classification method and system, storage medium and terminal
CN114692156A (en) * 2022-05-31 2022-07-01 山东省计算中心(国家超级计算济南中心) Memory segment malicious code intrusion detection method, system, storage medium and equipment
CN115630364A (en) * 2022-09-30 2023-01-20 兴业银行股份有限公司 Android malicious software detection method and system based on multi-dimensional visual analysis

Also Published As

Publication number Publication date
CN115859290B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN109784056B (en) Malicious software detection method based on deep learning
US10114946B2 (en) Method and device for detecting malicious code in an intelligent terminal
Wong et al. Hunting for metamorphic engines
US9454658B2 (en) Malware detection using feature analysis
CN103761475B (en) Method and device for detecting malicious code in intelligent terminal
WO2015101097A1 (en) Method and device for feature extraction
CN109829306B (en) Malicious software classification method for optimizing feature extraction
Lu et al. De-obfuscation and detection of malicious PDF files with high accuracy
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
Sun et al. Malware family classification method based on static feature extraction
CN104933364B (en) A kind of malicious code based on the behavior of calling automates homologous determination method and system
CN110362995B (en) Malicious software detection and analysis system based on reverse direction and machine learning
CN108932430A (en) A kind of malware detection method based on software gene technology
US11056213B2 (en) Identifying signature snippets for nucleic acid sequence types
Tian et al. Fine-grained compiler identification with sequence-oriented neural modeling
CN114386511B (en) Malicious software family classification method based on multidimensional feature fusion and model integration
CN108229168B (en) Heuristic detection method, system and storage medium for nested files
CN115859290A (en) Malicious code detection method based on static characteristics and storage medium
CN110263540B (en) Code identification method and device
Chen et al. A learning-based static malware detection system with integrated feature
Chen et al. IHB: A scalable and efficient scheme to identify homologous binaries in IoT firmwares
CN113987486A (en) Malicious program detection method and device and electronic equipment
Wen et al. CNN based zero-day malware detection using small binary segments
Patel Similarity tests for metamorphic virus detection
Madani et al. Towards sequencing malicious system calls

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant