CN114692156B - Memory segment malicious code intrusion detection method, system, storage medium and equipment - Google Patents

Memory segment malicious code intrusion detection method, system, storage medium and equipment Download PDF

Info

Publication number
CN114692156B
CN114692156B CN202210603899.0A CN202210603899A CN114692156B CN 114692156 B CN114692156 B CN 114692156B CN 202210603899 A CN202210603899 A CN 202210603899A CN 114692156 B CN114692156 B CN 114692156B
Authority
CN
China
Prior art keywords
layer
memory
file
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210603899.0A
Other languages
Chinese (zh)
Other versions
CN114692156A (en
Inventor
张淑慧
胡长栋
王连海
王金鹏
匡瑞雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Shandong Computer Science Center National Super Computing Center in Jinan
Priority to CN202210603899.0A priority Critical patent/CN114692156B/en
Publication of CN114692156A publication Critical patent/CN114692156A/en
Application granted granted Critical
Publication of CN114692156B publication Critical patent/CN114692156B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Virology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the technical field of computer malicious software detection, and provides a method, a system, a storage medium and equipment for detecting the intrusion of memory segment malicious codes, wherein the method comprises the following steps: acquiring a memory file to be detected; after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment; inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not; the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer. By learning the potential rules and characteristics of the malicious code, the virus which is not discovered yet is detected, and the existing virus is detected.

Description

Method, system, storage medium and equipment for detecting intrusion of malicious codes of memory segments
Technical Field
The invention belongs to the technical field of computer malicious software detection, and particularly relates to a method, a system, a storage medium and equipment for detecting malicious code intrusion of a memory segment.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the rapid development of computer and internet technologies, the number of malicious software is exponentially increased, malicious programs have the development characteristics of multiple varieties and faster updating of anti-detection technologies, detection of a security protection system is broken through in a file-free malicious software attack mode, and serious threats and challenges are formed for enterprise security defenders. A file-free malware attack is a method for executing code from a memory by diving into a victim organization, and does not use malicious files or file fragments on a computer disk, so that the malicious files or file fragments hide the malicious files or file fragments and attack traces of the malicious files. However, they cannot completely delete their traces in memory. Therefore, memory analysis is one of the best methods for systematically analyzing programs with unknown malicious characteristics and without source code.
In addition, the paging and replacement mechanism of the memory makes most of the information in the memory incomplete, and the program will not call all the information into the memory during execution, and only call part of the information into the memory first, so that it is impossible to obtain a complete file, and it is difficult to detect whether the obtained file is a malicious program or file by a professional analysis method.
The existing antivirus software basically compares some characteristics existing in a virus library to judge whether the file is a malicious file or not, and the obvious advantages of the method are high accuracy, convenience and low false alarm rate, but incomplete detection of the file in a memory is relatively difficult, and the newly generated virus cannot be detected.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a method, a system, a storage medium and a device for detecting intrusion of malicious codes of memory segments, which detect undiscovered viruses and detect the existing viruses by learning the potential rules and characteristics of the malicious codes by utilizing a neural network model.
In order to achieve the purpose, the invention adopts the following technical scheme:
the first aspect of the present invention provides a method for detecting intrusion of malicious codes into a memory segment, which includes:
acquiring a memory file to be detected;
after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer.
Further, the word segmentation preprocessing comprises the following specific steps:
converting the binary file obtained after binary conversion into a decimal system to obtain a decimal file;
and judging whether the decimal file reaches the preset length, if not, adding 1 to the data in the decimal file as a whole, and then filling up with 0.
Further, the fragment position and length combination is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
Further, the step of obtaining the optimal neural network model and the optimal combination of the position and the length of the segment is as follows:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
Further, the step of training and testing the neural network model by adopting the training and testing set comprises:
(a) dividing a training test set into a training set and a test set;
(b) updating the weight of the neural network model based on the training set, and outputting the neural network model after a plurality of iterations until the loss function reaches the minimum;
(c) and (c) classifying the samples in the test set by using the output neural network model, and returning to the step (b) to continue parameter training on the neural network model when the classification accuracy is less than the threshold value until the classification accuracy of the samples in the test set by the output neural network model is more than or equal to the threshold value.
A second aspect of the present invention provides a memory segment malicious code intrusion detection system, including:
a file acquisition module configured to: acquiring a memory file to be detected;
a fragment intercept module configured to: after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
a prediction module configured to: inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer.
Further, the fragment position and length combination is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to be used as the prediction segment.
Further, a training module is included that is configured to:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
A third aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps in the memory segment malicious code intrusion detection method as described above.
A fourth aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps in the memory segment malicious code intrusion detection method described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for detecting intrusion of malicious codes of memory segments, which detects undiscovered viruses and detects the existing viruses by learning potential rules and characteristics of the malicious codes by utilizing a neural network model.
The invention provides a method for detecting malicious code intrusion of a memory segment, which detects a dynamic file analyzed by a memory, on one hand, the malicious code which cannot be detected under a static file and can be detected only when in operation can be detected; on the other hand, the memory evidence can be fixed more efficiently for the memory forensics personnel.
The invention provides a method for detecting the intrusion of malicious codes of memory fragments, which has effective detection on running memory files dumped onto a disk and has strong referential property and practicability on memory forensics personnel.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flowchart of an optimal neural network model obtaining method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of binary and decimal translation according to a first embodiment of the present invention;
FIG. 3 is a diagram of a neural network model according to a first embodiment of the present invention;
fig. 4 is a hidden layer structure diagram of the neural network model according to the first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Interpretation of terms:
convolutional Neural Networks (CNNs) are a class of feed-forward Neural Networks that contain convolution computations and have a deep structure.
Example one
The embodiment provides a method for detecting intrusion of malicious codes of memory segments, which specifically comprises the following steps:
step 1, obtaining a memory file to be detected. The specific process of acquiring the memory file to be detected can be divided into three stages of sandbox operation, dumping and extracting.
Wherein, the sandbox operation specifically is: and running an untrusted program in the virtual machine, thereby effectively controlling that real equipment is not damaged. The dump is specifically: and dumping the memory file in the virtual machine by using a snapshot form. The extraction method specifically comprises the following steps: and extracting the memory file to be detected from the memory file obtained by the trans-storage by using a evidence obtaining tool, wherein the memory file to be detected is an executable file (. exe), a dynamic link library file (. dll) or a system file (. sys).
And 2, sequentially carrying out binary conversion and word segmentation pretreatment on the memory file to be detected, and then carrying out fragment interception based on the optimal fragment position and length (size) combination to obtain a predicted fragment.
And 3, inputting the predicted fragments into the optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with the malicious codes or not. If the memory file to be detected belongs to the memory file invaded by the malicious codes, the fact that the malicious codes are implanted into the detected memory file can be judged, otherwise, the detected memory file belongs to the memory file not implanted with the malicious codes, and therefore the file which is not detected in a static state and is detected out of the malicious codes in the dynamic file is found out in the static file.
Specifically, as shown in fig. 1, the steps of obtaining the optimal neural network model and the optimal combination of the position and the length of the segment are as follows:
(1) the method comprises the steps of obtaining a malicious sample set and a benign sample set, carrying out binary conversion on each memory file in the malicious sample set and the benign sample set to obtain a binary file data set, and carrying out word segmentation pretreatment on each binary file in the binary file data set to obtain an initial training test set.
In this embodiment, the tag of each sample (i.e., each memory file) in the malicious sample set is invaded by a malicious code; the label of each sample in the benign sample set is unimplanted with malicious code. The specific method for acquiring the malicious sample set and the benign sample set comprises the following steps: downloading a malicious sample set, in this embodiment, 600 static malicious samples are downloaded in the Virus Share website (https:// Virus Share. com /); running an untrusted program in a virtual machine, acquiring a memory mirror image, and extracting a malicious sample set through a memory forensics tool; the mobile windows system acquires a memory mirror image, and extracts a benign sample set through a memory forensics tool, wherein in the embodiment, the benign sample set comprises 300 samples.
The word segmentation pretreatment comprises the following specific steps: converting the binary file obtained after binary conversion into a decimal file, and obtaining the decimal file, wherein the value range of the decimal file is 0-255, in order to enable the memory file to be closer to the image, and the pixel point of one image is in the interval of 0-255, so that the binary file can be converted into a gray map, and the performance of a neural network model is more adapted; in order to keep the length of the memory file input into the neural network model consistent, whether the decimal file reaches the preset length is judged, if not, 1 is added to the whole data in the decimal file, and then 0 is used for completing the decimal file, and in order to enable 0 in the original data to become meaningful, 1 is added to the whole data in the decimal file, as shown in fig. 2.
(2) And (3) carrying out fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets. A training test set corresponds to a combination of fragment position and length to carry out fragment interception on an initial training test set to obtain a result.
Fragment position and length combinations were: taking data with length of integral multiple of 1024 (1024 bytes or 2048 bytes and the like) from the head of the memory file as a prediction fragment; or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment; or, a plurality of non-continuous sub-segments are selected from the memory file, the plurality of non-continuous sub-segments are combined to be used as prediction segments, and when the sub-segments are combined, the sub-segments can be combined according to the sequence of the sub-segments in the memory file, or the sub-segments can be combined after the sequence is disturbed.
The number and the length of the sub-segments are calculated by the following method:
data_len=file_len/k;
if train _ len < data _ len, data _ len = 256;
NN = train_len/data_len;
wherein k is a parameter set according to the average length of the samples, and can be 60; file _ len is the sample length; data _ len is the length of the sample at different positions (i.e. the length of one sub-segment); train _ len is the length of the set predicted segment; NN represents the number of subfragments.
(3) And respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the fragment position and length combination adopted by the training and testing test set when the fragment is intercepted as the optimal fragment position and length combination.
The method for training and testing the neural network model by adopting a training and testing set comprises the following steps:
(a) the training test set is as follows 2: 1, dividing the mode into a training set and a test set;
(b) updating the weight of the neural network model based on the training set, and outputting the neural network model after a plurality of iterations until the loss function reaches the minimum and the training result is stable and unchanged;
(c) and (3) testing the output neural network model by adopting the test set, namely classifying the samples in the test set by using the output neural network model, and returning to the step (b) to continue parameter adjustment training of the neural network model when the classification accuracy is less than 80% of the threshold value until the classification accuracy of the samples in the test set by using the output neural network model is more than or equal to 80% of the threshold value.
Specifically, the neural network model is a CNN neural network model. As shown in fig. 3, the neural network model includes an input layer, a hidden layer, and an output layer. The number of neurons in the input layer is the number of data in the input prediction segment, in this embodiment, the number of neurons in the input layer is 256, each neuron xi represents one data in the prediction segment, and the number of channels in the input layer is 1; the number of neurons in the output layer is 2, representing the category: invaded by malicious code or not implanted with malicious code. The neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into a classifier after conversion through a flattening layer and a full connecting layer. Specifically, as shown in fig. 4, the hidden layer has 12 layers in total, including an embedding layer (embedding), a first one-dimensional convolution layer, a first pooling layer, a second one-dimensional convolution layer, a first Dropout layer, a third one-dimensional convolution layer, a second pooling layer, a second Dropout layer, a third pooling layer, a flattening layer (flatting), a full-link layer, and a softmax layer, which are connected in sequence. The neural network model convolutional layer of the invention adopts a smaller convolutional kernel and more convolutional layers instead of a larger convolutional kernel (generally set to be more than 10) as the same general natural language processing classification task. For capturing more data features. Although the calculation times of the convolution layer are increased, the training time is not increased too much, because the memory data is sparse, and 0 with at least one fourth of the data is distributed in the data, so that the convolution calculation becomes simple.
The embedded layer converts the vector of each neuron of the input layer into a multidimensional (100-dimensional) vector by using a dimension-increasing mode, so that the difference between different values can be better learned, and the dimension-increased data (100-dimensional) is spliced for the second time and used as the input of the first one-dimensional convolutional layer. The number of channels inputted into the first one-dimensional convolution layer is 200.
The use of stacked small convolution kernels is preferable to the use of large convolution kernels for a given received field because multiple non-linear layers can increase the depth of the network. Therefore, a plurality of one-dimensional convolutional layers and smaller convolution kernels are adopted for training, wherein the sizes of the convolution kernels of the first one-dimensional convolutional layer, the second one-dimensional convolutional layer and the third one-dimensional convolutional layer are respectively set to be 3, 4 and 5; the step lengths of the first one-dimensional convolutional layer, the second one-dimensional convolutional layer and the third one-dimensional convolutional layer are all 1; the first one-dimensional convolutional layer input and output channels are 200 and 100 respectively; the input channel of the second one-dimensional convolution layer is 100, and the output channel is 50; the third one-dimensional convolutional layer input channel is set to 50 and the output channel is set to 25.
The output characteristics of the first, second, and third one-dimensional convolutional layers may all be expressed as:
Figure 237111DEST_PATH_IMAGE001
wherein, N is the size of the batch, C is the size of the channel, L is the length of the sequence, bias is the offset value (the offset value defaults to 1) in the neural network, the batch is the number of samples processed in batch, that is, all samples in the whole data set (which can be a training set, a test set or a single memory file to be detected in use) are divided into a plurality of groups, how many samples are in each group is the size of the batch, i is expressed as a group, j is the jth sample in the ith group, k is expressed as an index of an input channel, the jth sample in the ith group is input to the kth neuron node, out j Output channel representing the jth sample, C in Representing the total number of input channels, weight representing the weight vectors of the first, second or third one-dimensional convolutional layers, input representing the input characteristics of the first, second or third one-dimensional convolutional layers.
The lengths of the output sequences of the first, second and third one-dimensional convolutional layers are all calculated by:
Figure 897900DEST_PATH_IMAGE002
wherein the content of the first and second substances,L out in order to output the length of the sequence,L in in order to input the length of the sequence,paddingin order to be able to fill the length,dilationwhich is the size of the hole convolution, set here to 0,kernel_sizeis the size of the convolution kernel and is,strideis the step size.
The first Dropout layer and the second Dropout layer not only prevent the problem of overfitting of the training data, but also do not reduce the precision of the training. The mask values (sizes of the random hidden neuron nodes) of the first Dropout layer and the second Dropout layer are each set to 0.5. Dropout is a random hiding regularization technology, a Dropout layer is used for randomly hiding real neuron nodes, and default is to fill the neuron nodes with zeros randomly, so that the convolutional neural network can consider the neuron nodes as new data, learn again, update weights and inhibit the overfitting problem during training.
Due to sparsity of memory data, the extracted features are weakened by using average pooling, and maximum pooling (maxpool) is adopted in the first pooling layer, the second pooling layer and the third pooling layer, so that the model is better than the average pooling performance. The sizes of the first pooling layer, the second pooling layer and the third pooling layer are all set to 4, which is determined by the sparse characteristics of the dynamic data binary file data.
And after the data characteristics output by the third pooling layer are flattened by the flattening layer, the input parameters of the flattening layer are obtained by combining the length of the output sequence of the third one-dimensional convolution layer.
And flattening the sequence by utilizing a flattening layer, converting the sequence into two neuron nodes through a full connection layer, and finally realizing classification through a softmax layer.
The last softmax layer of the hidden layers, namely the classifier, adopts a normalized exponential function:
Figure 118797DEST_PATH_IMAGE003
wherein exp (x) represents e x Is given in the figure (e is a nanopiere constant 2.7182.), n denotes neurons sharing n in the output layer, y k To representOutput of the kth neuron of the output layer, a k An input representing a kth neuron of an output layer; the molecule is the input signal a of the kth neuron k The denominator is the sum of the exponential functions of all input signals.
The optimizer in the neural network model selects the optimizer most suitable for the model under the verification of a large amount of data.
The loss function of the neural network model adopts a min-batch cross entropy loss function:
Figure 206838DEST_PATH_IMAGE004
where M represents the number of predicted segments in the training set, t mk Value, y, representing the kth element (data) of the mth prediction fragment mk Is the output of the neural network on the kth element (data) of the mth prediction fragment, t mk Is supervisory data. The loss function of the individual data is expanded to M pieces of data, but finally divided by M to obtain the average loss function of the individual prediction segments, and by such averaging, it is possible to obtain a uniform index that is independent of the training data, for example, even if there are 1000 or 10000 training data (prediction segments in the training set), the average loss function of the individual prediction segments can be obtained.
The invention provides two classifications of the files extracted from the memory by using machine learning based on the analysis, and the files extracted from the memory are detected by the method, so that whether malicious codes are implanted into the memory can be effectively found.
Example two
The embodiment provides a system for detecting intrusion of malicious codes into memory fragments, which specifically comprises the following modules:
a file acquisition module configured to: acquiring a memory file to be detected;
a fragment intercept module configured to: after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
a prediction module configured to: inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into the classifier after conversion through a flattening layer and a full connecting layer.
Wherein the combination of fragment position and length is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
A training module configured to:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the program implements the steps in the intrusion detection method for memory segment malicious codes according to the first embodiment.
Example four
The embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the program to implement the steps in the intrusion detection method for malicious codes in memory segments according to the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The intrusion detection method for the malicious codes of the memory segments is characterized by comprising the following steps:
acquiring a memory file to be detected;
after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises an embedded layer, a first one-dimensional convolutional layer, a first pooling layer, a second one-dimensional convolutional layer, a first Dropout layer, a third one-dimensional convolutional layer, a second pooling layer, a second Dropout layer, a third pooling layer, a flattening layer, a full-connection layer and a softmax layer which are sequentially connected; the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into a classifier after conversion through a flattening layer and a full connection layer;
the word segmentation preprocessing comprises the following specific steps: and converting the binary file obtained after binary conversion into a decimal file to obtain the decimal file, wherein the value range of the decimal file is 0-255.
2. The intrusion detection method for malicious codes in memory segments according to claim 1, wherein the specific steps of word segmentation preprocessing further comprise:
and judging whether the decimal file reaches the preset length, if not, adding 1 to the whole data in the decimal file, and then supplementing the data with 0.
3. The memory fragment malicious code intrusion detection method according to claim 1, wherein the fragment position and length combination is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
4. The intrusion detection method for the malicious codes in the memory segment according to claim 1, wherein the steps of obtaining the optimal neural network model and the optimal combination of the position and the length of the segment are as follows:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
5. The memory segment malicious code intrusion detection method according to claim 4, wherein the step of training and testing the neural network model by adopting the training test set comprises the following steps:
(a) dividing a training test set into a training set and a test set;
(b) updating the weight of the neural network model based on the training set, and outputting the neural network model after a plurality of iterations until the loss function reaches the minimum;
(c) and (c) classifying the samples in the test set by using the output neural network model, and returning to the step (b) to continue parameter training on the neural network model when the classification accuracy is less than the threshold value until the classification accuracy of the samples in the test set by the output neural network model is more than or equal to the threshold value.
6. The intrusion detection system for the malicious codes of the memory fragments is characterized by comprising the following steps:
a file acquisition module configured to: acquiring a memory file to be detected;
a fragment intercept module configured to: after binary conversion and word segmentation pretreatment are sequentially carried out on the memory file to be detected, fragment interception is carried out on the basis of the optimal fragment position and length combination to obtain a predicted fragment;
a prediction module configured to: inputting the predicted fragments into an optimal neural network model, and detecting the predicted fragments to obtain a result of whether the memory file to be detected is implanted with malicious codes or not;
the neural network model comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises an embedded layer, a first one-dimensional convolution layer, a first pooling layer, a second one-dimensional convolution layer, a first Dropout layer, a third one-dimensional convolution layer, a second pooling layer, a second Dropout layer, a third pooling layer, a flattening layer, a full-connection layer and a softmax layer which are sequentially connected; the neural network model adopts an embedded layer to carry out dimension increasing on an input prediction segment, then carries out pooling after convolution through convolution layers with different convolution kernel sizes, and finally inputs the input prediction segment into a classifier after conversion through a flattening layer and a full connection layer;
the word segmentation preprocessing comprises the following specific steps: and converting the binary file obtained after binary conversion into a decimal file to obtain the decimal file, wherein the value range of the decimal file is 0-255.
7. The memory segment malicious code intrusion detection system according to claim 6, wherein the segment location and length combination is:
taking data with the length of integral multiple of 1024 from the head of the memory file as a prediction fragment;
or taking the data with the length of the integral multiple of 1024 from the tail of the memory file as a prediction fragment;
or selecting a plurality of discontinuous sub-segments from the memory file, and combining the plurality of discontinuous sub-segments to obtain the prediction segment.
8. The memory segment malicious code intrusion detection system of claim 6, further comprising a training module configured to:
acquiring a malicious sample set and a benign sample set, and performing binary conversion and word segmentation pretreatment on each memory file in the malicious sample set and the benign sample set to obtain an initial training test set;
performing fragment interception on each memory file in the initial training test set based on a plurality of fragment position and length combinations to obtain a plurality of training test sets;
and respectively training and testing the neural network model by adopting each training test set, taking the neural network model with the highest accuracy as an optimal neural network model, and taking the combination of the position and the length of the segment adopted by the training test set when the segment is intercepted as the optimal combination of the position and the length of the segment.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the memory-fragment malicious code intrusion detection method according to one of claims 1 to 5.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in the memory segment malicious code intrusion detection method according to any one of claims 1 to 5 when executing the program.
CN202210603899.0A 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment Active CN114692156B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210603899.0A CN114692156B (en) 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210603899.0A CN114692156B (en) 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN114692156A CN114692156A (en) 2022-07-01
CN114692156B true CN114692156B (en) 2022-08-30

Family

ID=82131254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210603899.0A Active CN114692156B (en) 2022-05-31 2022-05-31 Memory segment malicious code intrusion detection method, system, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114692156B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115455416A (en) * 2022-09-09 2022-12-09 上海派拉软件股份有限公司 Malicious code detection method and device, electronic equipment and storage medium
CN115859290B (en) * 2023-02-01 2023-05-16 中国人民解放军61660部队 Malicious code detection method based on static characteristics and storage medium
CN116861420B (en) * 2023-05-26 2024-05-28 广州天懋信息系统股份有限公司 Malicious software detection system and method based on memory characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3227820A1 (en) * 2014-12-05 2017-10-11 Permissionbit Methods and systems for encoding computer processes for malware deteection
CN110704840A (en) * 2019-09-10 2020-01-17 中国人民公安大学 Convolutional neural network CNN-based malicious software detection method
CN111881447A (en) * 2020-06-28 2020-11-03 中国人民解放军战略支援部队信息工程大学 Intelligent evidence obtaining method and system for malicious code fragments

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382438B (en) * 2020-03-27 2024-04-23 玉溪师范学院 Malware detection method based on multi-scale convolutional neural network
CN113420294A (en) * 2021-06-25 2021-09-21 杭州电子科技大学 Malicious code detection method based on multi-scale convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3227820A1 (en) * 2014-12-05 2017-10-11 Permissionbit Methods and systems for encoding computer processes for malware deteection
CN110704840A (en) * 2019-09-10 2020-01-17 中国人民公安大学 Convolutional neural network CNN-based malicious software detection method
CN111881447A (en) * 2020-06-28 2020-11-03 中国人民解放军战略支援部队信息工程大学 Intelligent evidence obtaining method and system for malicious code fragments

Also Published As

Publication number Publication date
CN114692156A (en) 2022-07-01

Similar Documents

Publication Publication Date Title
US20230336584A1 (en) System and method for analyzing binary code for malware classification using artificial neural network techniques
CN114692156B (en) Memory segment malicious code intrusion detection method, system, storage medium and equipment
Kalash et al. Malware classification with deep convolutional neural networks
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
CN111783442A (en) Intrusion detection method, device, server and storage medium
CN111753290B (en) Software type detection method and related equipment
CN112492059A (en) DGA domain name detection model training method, DGA domain name detection device and storage medium
CN110874471B (en) Privacy and safety protection neural network model training method and device
US11954202B2 (en) Deep learning based detection of malicious shell scripts
CN109656818B (en) Fault prediction method for software intensive system
Masabo et al. Big data: deep learning for detecting malware
Smith et al. Dynamic analysis of executables to detect and characterize malware
Abijah Roseline et al. Vision-based malware detection and classification using lightweight deep learning paradigm
CN113222053B (en) Malicious software family classification method, system and medium based on RGB image and Stacking multi-model fusion
Maulana et al. Malware classification based on system call sequences using deep learning
Jere et al. Principal component properties of adversarial samples
Zhang et al. Evasion attacks based on wasserstein generative adversarial network
Aditya et al. Deep learning for malware classification platform using windows api call sequence
US11977633B2 (en) Augmented machine learning malware detection based on static and dynamic analysis
Turnip et al. Android malware classification based on permission categories using extreme gradient boosting
Waghmare et al. A review on malware detection methods
Santacroce et al. Detecting malware code as video with compressed, time-distributed neural networks
Onoja et al. Exploring the effectiveness and efficiency of LightGBM algorithm for windows malware detection
Balega et al. IoT Anomaly Detection Using a Multitude of Machine Learning Algorithms
Jiang et al. A pyramid stripe pooling-based convolutional neural network for malware detection and classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant