CN114692156A - Memory segment malicious code intrusion detection method, system, storage medium and equipment - Google Patents

Memory segment malicious code intrusion detection method, system, storage medium and equipment Download PDF

Info

Publication number
CN114692156A
CN114692156A CN202210603899.0A CN202210603899A CN114692156A CN 114692156 A CN114692156 A CN 114692156A CN 202210603899 A CN202210603899 A CN 202210603899A CN 114692156 A CN114692156 A CN 114692156A
Authority
CN
China
Prior art keywords
memory
segment
neural network
network model
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210603899.0A
Other languages
Chinese (zh)
Other versions
CN114692156B (en
Inventor
张淑慧
胡长栋
王连海
王金鹏
匡瑞雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Shandong Computer Science Center National Super Computing Center in Jinan
Priority to CN202210603899.0A priority Critical patent/CN114692156B/en
Publication of CN114692156A publication Critical patent/CN114692156A/en
Application granted granted Critical
Publication of CN114692156B publication Critical patent/CN114692156B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Virology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the technical field of computer malicious software detection, and provides a method, a system, a storage medium and equipment for detecting the intrusion of memory segment malicious codes, wherein the method comprises the following steps: acquiring a memory file to be detected; after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment; inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not; the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer. By learning the potential rules and characteristics of the malicious code, the virus which is not discovered yet can be detected, and the existing virus can be detected.

Description

Memory segment malicious code intrusion detection method, system, storage medium and equipment
Technical Field
The invention belongs to the technical field of computer malicious software detection, and particularly relates to a method, a system, a storage medium and equipment for detecting malicious code intrusion of a memory segment.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the rapid development of computer and internet technologies, the number of malicious software is exponentially increased, malicious programs have the development characteristics of multiple varieties and faster updating of anti-detection technologies, detection of a security protection system is broken through in a file-free malicious software attack mode, and serious threats and challenges are formed for enterprise security defenders. A file-free malware attack is a method for diving a victim organization to execute codes from a memory, and malicious files or file fragments are not used on a computer disk, so that the malicious files or the file fragments hide self and attack traces of the malicious files. However, they cannot completely delete their traces in memory. Therefore, memory analysis is one of the best methods for systematically analyzing programs with unknown malicious characteristics and without source code.
In addition, the paging and replacement mechanism of the memory makes most of the information in the memory incomplete, and the program will not call all the information into the memory during execution, and only call part of the information into the memory first, so that it is impossible to obtain a complete file, and it is difficult to detect whether the obtained file is a malicious program or file by a professional analysis method.
The existing antivirus software basically compares some characteristics existing in a virus library to judge whether the file is a malicious file or not, and the obvious advantages of the method are high accuracy, convenience and low false alarm rate, but incomplete detection of the file in a memory is relatively difficult, and the newly generated virus cannot be detected.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a method, a system, a storage medium and a device for detecting intrusion of malicious codes of memory segments, which detect undiscovered viruses and detect the existing viruses by learning the potential rules and characteristics of the malicious codes by utilizing a neural network model.
In order to achieve the purpose, the invention adopts the following technical scheme:
the first aspect of the present invention provides a method for detecting intrusion of malicious codes into a memory segment, which includes:
acquiring a memory file to be detected;
after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction fragment, then is subjected to pooling after convolution through convolution layers with different convolution kernel sizes, and finally is input into a classifier after being converted through a flattening layer and a full connection layer.
Further, the word segmentation preprocessing comprises the following specific steps:
converting the binary file obtained after binary conversion into a decimal system to obtain a decimal file;
and judging whether the decimal file reaches the preset length, if not, adding 1 to the data in the decimal file as a whole, and then filling up with 0.
Further, the fragment position and length combination is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
Further, the step of obtaining the optimal neural network model and the optimal combination of the position and the length of the segment is as follows:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
Further, the step of training and testing the neural network model by adopting the training and testing set comprises:
(a) dividing a training test set into a training set and a test set;
(b) updating the weight of the neural network model based on the training set, and outputting the neural network model after a plurality of iterations until the loss function reaches the minimum;
(c) and (c) classifying the samples in the test set by using the output neural network model, and returning to the step (b) to continue parameter training on the neural network model when the classification accuracy is less than the threshold value until the classification accuracy of the samples in the test set by the output neural network model is more than or equal to the threshold value.
A second aspect of the present invention provides a memory segment malicious code intrusion detection system, including:
a file acquisition module configured to: acquiring a memory file to be detected;
a fragment intercept module configured to: after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
a prediction module configured to: inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer.
Further, the fragment position and length combination is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
Further, a training module is included that is configured to:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
A third aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps in the memory segment malicious code intrusion detection method as described above.
A fourth aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps in the memory segment malicious code intrusion detection method described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for detecting intrusion of malicious codes of memory segments, which detects undiscovered viruses and detects the existing viruses by learning potential rules and characteristics of the malicious codes by utilizing a neural network model.
The invention provides a method for detecting malicious code intrusion of a memory segment, which detects a dynamic file analyzed by a memory, on one hand, the malicious code which cannot be detected under a static file and can be detected only when in operation can be detected; on the other hand, the memory evidence can be fixed more efficiently for the memory forensics personnel.
The invention provides a method for detecting the intrusion of malicious codes of memory fragments, which has effective detection on running memory files dumped onto a disk and has strong referential property and practicability on memory forensics personnel.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flowchart of an optimal neural network model obtaining method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of binary and decimal translation according to a first embodiment of the present invention;
FIG. 3 is a diagram of a neural network model according to a first embodiment of the present invention;
fig. 4 is a hidden layer structure diagram of the neural network model according to the first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Interpretation of terms:
convolutional Neural Networks (CNNs) are a class of feed-forward Neural Networks that contain convolution computations and have a deep structure.
Example one
The embodiment provides a method for detecting intrusion of malicious codes of memory segments, which specifically comprises the following steps:
step 1, obtaining a memory file to be detected. The specific process of acquiring the memory file to be detected can be divided into three stages of sandbox operation, dumping and extracting.
Wherein, the sandbox operation specifically is: and running an untrusted program in the virtual machine, thereby effectively controlling that real equipment is not damaged. The dump is specifically: and dumping the memory file in the virtual machine by using a snapshot form. The extraction method specifically comprises the following steps: and extracting the memory file to be detected from the memory file obtained by the trans-storage by using a evidence obtaining tool, wherein the memory file to be detected is an executable file (. exe), a dynamic link library file (. dll) or a system file (. sys).
And 2, sequentially carrying out binary conversion and word segmentation pretreatment on the memory file to be detected, and then carrying out fragment interception based on the optimal fragment position and length (size) combination to obtain a predicted fragment.
And 3, inputting the predicted fragments into the optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with the malicious codes or not. If the memory file to be detected belongs to the memory file invaded by the malicious codes, the fact that the malicious codes are implanted into the detected memory file can be judged, otherwise, the detected memory file belongs to the memory file not implanted with the malicious codes, and therefore the file which is not detected in a static state and is detected out of the malicious codes in the dynamic file is found out in the static file.
Specifically, as shown in fig. 1, the steps of obtaining the optimal neural network model and the optimal combination of the position and the length of the segment are as follows:
(1) the method comprises the steps of obtaining a malicious sample set and a benign sample set, carrying out binary conversion on each memory file in the malicious sample set and the benign sample set to obtain a binary file data set, and carrying out word segmentation pretreatment on each binary file in the binary file data set to obtain an initial training test set.
In this embodiment, the tag of each sample (i.e., each memory file) in the malicious sample set is invaded by a malicious code; the label of each sample in the benign sample set is unimplanted with malicious code. The specific method for acquiring the malicious sample set and the benign sample set comprises the following steps: downloading a malicious sample set, in this embodiment, 600 static malicious samples are downloaded in the Virus Share website (https:// Virus Share. com /); running an untrusted program in a virtual machine, acquiring a memory mirror image, and extracting a malicious sample set through a memory forensics tool; the windows system obtains a memory mirror image, and extracts a benign sample set through a memory forensics tool, wherein the benign sample set includes 300 samples in the embodiment.
The word segmentation pretreatment comprises the following specific steps: converting the binary file obtained after binary conversion into a decimal file, and obtaining the decimal file, wherein the value range of the decimal file is 0-255, in order to enable the memory file to be closer to the image, and the pixel point of one image is in the interval of 0-255, so that the binary file can be converted into a gray map, and the performance of a neural network model is more adapted; in order to keep the lengths of the memory files input into the neural network model consistent, whether the decimal file reaches the preset length is judged, if not, 1 is added to the whole data in the decimal file, and then 0 is used for completing the data, and in order to enable 0 in the original data to become meaningful, 1 is added to the whole data in the decimal file, as shown in fig. 2.
(2) And performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets. A training test set corresponds to a combination of fragment position and length to carry out fragment interception on an initial training test set to obtain a result.
Fragment position and length combinations were: taking data with length of integral multiple of 1024 (1024 bytes or 2048 bytes and the like) from the head of the memory file as a prediction fragment; or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment; or, a plurality of non-continuous sub-segments are selected from the memory file, the plurality of non-continuous sub-segments are combined to be used as the prediction segment, and when the sub-segments are combined, the sub-segments can be combined according to the sequence of the sub-segments in the memory file, or the sub-segments can be combined after the sequence is disturbed.
The number and the length of the sub-segments are calculated by the following method:
data_len=file_len/k;
if train _ len < data _ len, data _ len = 256;
NN = train_len/data_len;
wherein k is a parameter set according to the average length of the samples, and can be taken as 60; file _ len is the sample length; data _ len is the length of the sample at different positions (i.e. the length of one sub-segment); train _ len is the length of the set predicted segment; NN represents the number of subfragments.
(3) And respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the fragment position and length combination adopted by the training and testing test set when the fragment is intercepted as the optimal fragment position and length combination.
The method for training and testing the neural network model by adopting a training and testing set comprises the following steps:
(a) the training test set is as follows 2: 1, dividing the mode into a training set and a test set;
(b) updating the weight of the neural network model based on the training set, and outputting the neural network model after a plurality of iterations until the loss function reaches the minimum and the training result is stable and unchanged;
(c) and (3) testing the output neural network model by adopting the test set, namely classifying the samples in the test set by using the output neural network model, and returning to the step (b) to continue parameter adjustment training of the neural network model when the classification accuracy is less than 80% of the threshold value until the classification accuracy of the samples in the test set by using the output neural network model is more than or equal to 80% of the threshold value.
Specifically, the neural network model is a CNN neural network model. As shown in fig. 3, the neural network model includes an input layer, a hidden layer, and an output layer. The number of neurons in the input layer is the number of data in the input prediction segment, in this embodiment, the number of neurons in the input layer is 256, each neuron xi represents one data in the prediction segment, and the number of channels in the input layer is 1; the number of neurons in the output layer is 2, representing the category: invaded by malicious code or not implanted with malicious code. The neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into a classifier after conversion through a flattening layer and a full connecting layer. Specifically, as shown in fig. 4, the hidden layer has 12 layers in total, including an embedding layer (embedding), a first one-dimensional convolution layer, a first pooling layer, a second one-dimensional convolution layer, a first Dropout layer, a third one-dimensional convolution layer, a second pooling layer, a second Dropout layer, a third pooling layer, a flattening layer (flatting), a full-link layer, and a softmax layer, which are connected in sequence. The neural network model convolutional layer of the invention adopts a smaller convolutional kernel and more convolutional layers instead of a larger convolutional kernel (generally set to be more than 10) as the general natural language processing classification task. For capturing more data features. Although the calculation times of the convolution layer are increased, the training time is not increased too much, because the memory data is sparse, and 0 with at least one fourth of the data is distributed in the data, so that the convolution calculation becomes simple.
The embedded layer converts the vector of each neuron of the input layer into a multidimensional (100-dimensional) vector by using a dimension-increasing mode, so that the difference between different values can be better learned, and the dimension-increased data (100-dimensional) is spliced for the second time and used as the input of the first one-dimensional convolutional layer. The number of channels inputted to the first one-dimensional convolutional layer is 200.
The use of stacked small convolution kernels is preferable to the use of large convolution kernels for a given received field because multiple non-linear layers can increase the depth of the network. Therefore, a plurality of one-dimensional convolutional layers and smaller convolution kernels are adopted for training, wherein the sizes of the convolution kernels of the first one-dimensional convolutional layer, the second one-dimensional convolutional layer and the third one-dimensional convolutional layer are respectively set to be 3, 4 and 5; the step lengths of the first one-dimensional convolutional layer, the second one-dimensional convolutional layer and the third one-dimensional convolutional layer are all 1; the first one-dimensional convolutional layer input and output channels are 200 and 100 respectively; the input channel of the second one-dimensional convolution layer is 100, and the output channel is 50; the third one-dimensional convolutional layer input channel is set to 50 and the output channel is set to 25.
The output characteristics of the first, second, and third one-dimensional convolutional layers may all be expressed as:
Figure 237111DEST_PATH_IMAGE001
wherein, N is the size of the batch, C is the size of the channel, L is the length of the sequence, bias is the offset value (the offset value defaults to 1) in the neural network, batch is the number of samples processed in batch, that is, all samples in the whole data set (which can be a training set, a test set or a single memory file to be detected during use) are divided into a plurality of groups, the number of samples in each group is the size of the batch, i is expressed as the group, j is the ith group in the ith groupj samples, k being the index of the input channel, the jth sample of the ith group being input with the kth neuron node, outjOutput channel representing the jth sample, CinRepresenting the total number of input channels, weight representing the weight vectors of the first, second or third one-dimensional convolutional layers, input representing the input characteristics of the first, second or third one-dimensional convolutional layers.
The lengths of the output sequences of the first, second and third one-dimensional convolutional layers are all calculated by:
Figure 897900DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,L out in order to output the length of the sequence,L in in order to input the length of the sequence,paddingin order to be able to fill the length,dilationwhich is the size of the hole convolution, here set to 0,kernel_sizeis the size of the convolution kernel and is,strideis the step size.
The first Dropout layer and the second Dropout layer not only prevent the problem of overfitting of the training data, but also do not reduce the precision of the training. The mask values (sizes of the random hidden neuron nodes) of the first Dropout layer and the second Dropout layer are each set to 0.5. Dropout is a random hiding regularization technology, a Dropout layer is used for randomly hiding real neuron nodes, and default is to fill the neuron nodes with zeros randomly, so that the convolutional neural network can consider the neuron nodes as new data, learn again, update weights and inhibit the overfitting problem during training.
Due to sparsity of memory data, the extracted features are weakened by using average pooling, and maximum pooling (maxpool) is adopted in the first pooling layer, the second pooling layer and the third pooling layer, so that the model is better than the average pooling performance. The sizes of the first pooling layer, the second pooling layer and the third pooling layer are all set to 4, which is determined by the sparse characteristics of the dynamic data binary file data.
And after the data characteristics output by the third pooling layer are flattened by the flattening layer, the input parameters of the flattening layer are obtained by combining the length of the output sequence of the third one-dimensional convolution layer.
And flattening the sequence by utilizing a flattening layer, converting the sequence into two neuron nodes through a full connection layer, and finally realizing classification through a softmax layer.
The last softmax layer of the hidden layer, namely the classifier, adopts a normalized exponential function:
Figure 118797DEST_PATH_IMAGE003
wherein exp (x) represents exIs given in the figure (e is a nanopiere constant 2.7182.), n denotes neurons sharing n in the output layer, ykRepresenting the output of the kth neuron of the output layer, akAn input representing a kth neuron of an output layer; the molecule is the input signal a of the kth neuronkThe denominator is the sum of the exponential functions of all input signals.
The optimizer in the neural network model selects the optimizer most suitable for the model under the verification of a large amount of data.
The loss function of the neural network model adopts a min-batch cross entropy loss function:
Figure 206838DEST_PATH_IMAGE004
where M represents the number of predicted segments in the training set, tmkValue, y, representing the kth element (data) of the mth prediction fragmentmkIs the output of the neural network on the kth element (data) of the mth prediction fragment, tmkIs supervisory data. By extending the loss function of a single data to M data, but finally dividing by M, the average loss function of a single prediction segment can be obtained, by which averaging a uniform index independent of the training data can be obtained, e.g. even if there are 1000 or 10000 training data (prediction segments in the training set), a single pre-prediction segment can be obtainedThe average loss function of the fragments was measured.
The invention provides two classifications of the files extracted from the memory by using machine learning based on the analysis, and the files extracted from the memory are detected by the method, so that whether malicious codes are implanted into the memory can be effectively found.
Example two
The embodiment provides a system for detecting intrusion of malicious codes in a memory segment, which specifically comprises the following modules:
a file acquisition module configured to: acquiring a memory file to be detected;
a fragment intercept module configured to: after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
a prediction module configured to: inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer.
Wherein the combination of fragment position and length is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
A training module configured to:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described again here.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the memory segment malicious code intrusion detection method according to the first embodiment.
Example four
The embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the program to implement the steps in the intrusion detection method for malicious codes in memory segments according to the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The intrusion detection method for the malicious codes of the memory segments is characterized by comprising the following steps:
acquiring a memory file to be detected;
after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer.
2. The intrusion detection method for the malicious codes in the memory segment according to claim 1, wherein the word segmentation preprocessing comprises the following specific steps:
converting the binary file obtained after binary conversion into a decimal system to obtain a decimal file;
and judging whether the decimal file reaches the preset length, if not, adding 1 to the data in the decimal file as a whole, and then filling up with 0.
3. The memory segment malicious code intrusion detection method according to claim 1, wherein the combination of the segment position and the length is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
4. The intrusion detection method for the malicious codes in the memory segment according to claim 1, wherein the steps of obtaining the optimal neural network model and the optimal combination of the position and the length of the segment are as follows:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
5. The memory segment malicious code intrusion detection method according to claim 4, wherein the step of training and testing the neural network model by adopting the training test set comprises the following steps:
(a) dividing a training test set into a training set and a test set;
(b) updating the weight of the neural network model based on the training set, and outputting the neural network model after a plurality of iterations until the loss function reaches the minimum;
(c) and (c) classifying the samples in the test set by using the output neural network model, and returning to the step (b) to continue parameter training on the neural network model when the classification accuracy is less than the threshold value until the classification accuracy of the samples in the test set by the output neural network model is more than or equal to the threshold value.
6. The intrusion detection system for the malicious codes of the memory segments is characterized by comprising the following steps:
a file acquisition module configured to: acquiring a memory file to be detected;
a fragment intercept module configured to: after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
a prediction module configured to: inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer.
7. The memory segment malicious code intrusion detection system according to claim 6, wherein the segment location and length combination is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
8. The memory segment malicious code intrusion detection system of claim 6, further comprising a training module configured to:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the memory segment malicious code intrusion detection method according to any one of claims 1 to 5.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in the memory segment malicious code intrusion detection method according to any one of claims 1 to 5 when executing the program.
CN202210603899.0A 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment Active CN114692156B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210603899.0A CN114692156B (en) 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210603899.0A CN114692156B (en) 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN114692156A true CN114692156A (en) 2022-07-01
CN114692156B CN114692156B (en) 2022-08-30

Family

ID=82131254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210603899.0A Active CN114692156B (en) 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114692156B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115859290A (en) * 2023-02-01 2023-03-28 中国人民解放军61660部队 Malicious code detection method based on static characteristics and storage medium
CN116861420A (en) * 2023-05-26 2023-10-10 广州天懋信息系统股份有限公司 Malicious software detection system and method based on memory characteristics
WO2024051196A1 (en) * 2022-09-09 2024-03-14 上海派拉软件股份有限公司 Malicious code detection method and apparatus, electronic device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3227820A1 (en) * 2014-12-05 2017-10-11 Permissionbit Methods and systems for encoding computer processes for malware deteection
CN110704840A (en) * 2019-09-10 2020-01-17 中国人民公安大学 Convolutional neural network CNN-based malicious software detection method
CN111382438A (en) * 2020-03-27 2020-07-07 玉溪师范学院 Malicious software detection method based on multi-scale convolutional neural network
CN111881447A (en) * 2020-06-28 2020-11-03 中国人民解放军战略支援部队信息工程大学 Intelligent evidence obtaining method and system for malicious code fragments
CN113420294A (en) * 2021-06-25 2021-09-21 杭州电子科技大学 Malicious code detection method based on multi-scale convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3227820A1 (en) * 2014-12-05 2017-10-11 Permissionbit Methods and systems for encoding computer processes for malware deteection
CN110704840A (en) * 2019-09-10 2020-01-17 中国人民公安大学 Convolutional neural network CNN-based malicious software detection method
CN111382438A (en) * 2020-03-27 2020-07-07 玉溪师范学院 Malicious software detection method based on multi-scale convolutional neural network
CN111881447A (en) * 2020-06-28 2020-11-03 中国人民解放军战略支援部队信息工程大学 Intelligent evidence obtaining method and system for malicious code fragments
CN113420294A (en) * 2021-06-25 2021-09-21 杭州电子科技大学 Malicious code detection method based on multi-scale convolutional neural network

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024051196A1 (en) * 2022-09-09 2024-03-14 上海派拉软件股份有限公司 Malicious code detection method and apparatus, electronic device, and storage medium
CN115859290A (en) * 2023-02-01 2023-03-28 中国人民解放军61660部队 Malicious code detection method based on static characteristics and storage medium
CN115859290B (en) * 2023-02-01 2023-05-16 中国人民解放军61660部队 Malicious code detection method based on static characteristics and storage medium
CN116861420A (en) * 2023-05-26 2023-10-10 广州天懋信息系统股份有限公司 Malicious software detection system and method based on memory characteristics
CN116861420B (en) * 2023-05-26 2024-05-28 广州天懋信息系统股份有限公司 Malicious software detection system and method based on memory characteristics

Also Published As

Publication number Publication date
CN114692156B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
US11637859B1 (en) System and method for analyzing binary code for malware classification using artificial neural network techniques
CN114692156B (en) Memory segment malicious code intrusion detection method, system, storage medium and equipment
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
CN111783442A (en) Intrusion detection method, device, server and storage medium
CN111753290B (en) Software type detection method and related equipment
CN110874471B (en) Privacy and safety protection neural network model training method and device
CN110827330B (en) Time sequence integrated multispectral remote sensing image change detection method and system
CN109656818B (en) Fault prediction method for software intensive system
Masabo et al. Big data: deep learning for detecting malware
CN111400713B (en) Malicious software population classification method based on operation code adjacency graph characteristics
Smith et al. Dynamic analysis of executables to detect and characterize malware
US11977633B2 (en) Augmented machine learning malware detection based on static and dynamic analysis
CN111898129A (en) Malicious code sample screener and method based on Two-Head anomaly detection model
Abijah Roseline et al. Vision-based malware detection and classification using lightweight deep learning paradigm
US11954202B2 (en) Deep learning based detection of malicious shell scripts
Maulana et al. Malware classification based on system call sequences using deep learning
Jere et al. Principal component properties of adversarial samples
Waghmare et al. A review on malware detection methods
Aditya et al. Deep learning for malware classification platform using windows api call sequence
Santacroce et al. Detecting malware code as video with compressed, time-distributed neural networks
Jiang et al. A pyramid stripe pooling-based convolutional neural network for malware detection and classification
JP2023534518A (en) Reasoning device, reasoning method, and program
Rueda et al. A hybrid intrusion detection approach based on deep learning techniques
CN111581640A (en) Malicious software detection method, device and equipment and storage medium
Yılmaz Malware classification with using deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant