CN110618854B

CN110618854B - Virtual machine behavior analysis system based on deep learning and memory mirror image analysis

Info

Publication number: CN110618854B
Application number: CN201910772362.5A
Authority: CN
Inventors: 吴春明; 陈双喜; 王婉飞; 姜鑫悦; 吴安邦
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-08-21
Filing date: 2019-08-21
Publication date: 2022-04-26
Anticipated expiration: 2039-08-21
Also published as: CN110618854A

Abstract

The invention discloses a virtual machine behavior analysis system based on deep learning and memory mirror image analysis, which carries out delta coding by acquiring memory mirror image data, extracts map characteristic point information from a coded memory map, trains a neural network by using the obtained characteristic information to obtain a classifier, and finally operates the neural network to analyze unknown virtual machine behavior by using the obtained classifier. The invention has simple operation, easy realization and convenient modularization; the invention has wide application range, can be used for detecting various attack modes such as known attack, unknown attack and the like, and can not influence the detection performance of the invention even if an attacker leaves attack after being latent for a period of time; in addition, the invention has better robustness, reliability and usability on different system platforms.

Description

Virtual machine behavior analysis system based on deep learning and memory mirror image analysis

Technical Field

The invention belongs to the field of wireless network security, particularly relates to the field of mimicry active defense, and relates to a virtual machine behavior analysis system based on deep learning and memory mirror image analysis.

Background

Virtualized cloud platforms are an important part of cloud computing. The virtualized cloud platform refers to that a plurality of operating systems are simultaneously operated on the same cloud platform, and each system has an independent operating space. By operating a plurality of virtual servers on one server, the use efficiency of the machine is improved, so that the hardware purchasing expense is reduced, and the method is an important mode for creating a green data center. The cloud platform based on the virtual machine enables a user to independently build a business environment of the user, is stable in operation, has good expansibility and mobility, and is widely applied to the fields of financial industry, retail industry, digital marketing, education industry, government and enterprise units and the like.

The virtualized cloud platform structure has an open characteristic, and therefore a series of security problems related to virtual machines are derived. Resource data and applications running in the virtual machine are vulnerable to intruders. Therefore, the virtual machine needs more security mechanisms to accelerate the deployment of the large-scale cloud service. The first problem is how to accurately judge the behavior of the virtual machine in real time and judge whether the virtual machine is attacked maliciously.

At present, the method for solving the problem of the operation safety of the virtual machine comprises the following steps: a virtual machine running state judgment method based on network flow data, logs and priori knowledge. The virtual machine behavior judging method based on network flow data detects whether the virtual machine is attacked maliciously by judging whether data received by a virtual machine network card contains a malicious data packet. This method requires that the communication protocol be resolvable and unable to cope with unknown protocols. In addition, the use of a large number of data packets results in a high computational overhead for the determination method. The method based on log analysis judges whether the virtual machine is attacked maliciously by analyzing the system log of the virtual machine. However, the log itself has hysteresis, and the system needs to determine a series of activities and actions to determine the occurrence of the intrusion, which is very disadvantageous for preventing even active intrusion behavior. The judgment method based on the priori knowledge needs known attack behaviors and cannot cope with unknown bugs, unknown backdoors and unknown attacks.

In order to guarantee the security of the virtual machine in real time, a quick and effective virtual machine behavior analysis method independent of a leaky library and an attack library is urgently needed to improve the accuracy and efficiency of threat discovery and realize the reliability, availability and security of the virtual machine.

Disclosure of Invention

The invention aims to provide a virtual machine behavior analysis system based on deep learning and memory mirror image analysis, aiming at the defects of the prior art. The method and the system aim at internal attack and external attack, known attack and unknown attack in the network, ensure the safety of the virtual machine platform, give early warning to unknown threats in time, correctly judge the behavior of the virtual machine in real time, and improve the safety, reliability and usability of the cloud virtual machine.

The purpose of the invention is realized by the following technical scheme: a virtual machine behavior analysis system based on deep learning and memory mirror image analysis comprises the following steps:

(1) acquiring memory mirror image data, comprising the following substeps:

(1.1) at an initial time t₀And acquiring initial memory mirror image data by using a memory forensics tool to obtain an initial memory.

(1.2) at an arbitrary time t₀And + delta t, on a VirtualBox and VMware virtualization platform, respectively and automatically sampling memory mirror image data under the conditions that all the isomorphs are not attacked and under the attacked condition at the current moment according to memory management mechanisms of different operating systems to obtain the current memory, namely a normal sample and a malicious sample.

(2) Performing delta encoding, comprising the sub-steps of:

and (2.1) operating a memory forensics tool, and determining an EXE type executable file and a DLL type dynamic link library list in the initial memory respectively by using pslist and dlllist commands for the initial memory obtained in the step (1.1).

(2.2) determining an executable file of an EXE type and a dynamic link library list of a DLL type in the current memory respectively for the pslist and the dlllist command in the memory forensics tool operated by the current memory obtained in the step (1.2);

(2.3) analyzing the EXE type executable file and the DLL type dynamic link library list obtained in the steps (2.1) and (2.2), and determining an executable file which is in the current memory but not in the initial memory, namely a new executable file;

(2.4) generating a prediction memory for each new executable file according to the initial memory, comprising the following substeps:

(2.4.1) determining the process ID of each new executable file, and simultaneously determining the base address of the process in the virtual memory address space;

(2.4.2) for the process of each new executable file, operating a memmap command in a memory evidence obtaining tool in the current memory according to the process base address in the step (2.4.1) to extract the mapping relation between the process virtual memory and the physical memory;

(2.4.3) copying the new executable file from the virtual disk to the initial memory, and executing the following two steps for each virtual memory page of the new executable file: firstly, copying a new executable file to an initial memory by using the mapping relation between the virtual memory and the physical memory extracted in the step (2.4.2) when the virtual memory page is in the current memory; then, recording page copy information, including the source page position of the virtual memory page, the target page position in the physical memory and the page length; finally generating a prediction memory;

(2.5) outputting header information, including path information of the new executable file to be loaded and page copy information of all the new executable files extracted in the step (2.4.3);

(2.6) using the prediction memory generated in the step (2.4) as a source, using the current memory as a comparison object, and using xdelta3 for coding to obtain a memory map after the current memory mirror image data is coded; m, N, respectively representing the row number and the column number of the memory map, and I (I, j) ═ a, b, c, representing the element of the ith row and the jth column of the memory map; wherein I is more than or equal to 0 and less than M, j is more than or equal to 0 and less than N, a, b and c are 32-bit floating point numbers, and I (I, j) is a three-dimensional vector;

(3) extracting the memory map feature point information obtained in the step (2.6), wherein the memory map feature point information comprises feature point positions, feature point sizes and feature strength of the feature points, and the method comprises the following substeps:

(3.1) constructing a Hessian matrix, specifically comprising the following steps: calculating a determinant of a Hessian matrix H (i, j) corresponding to each element in the memory map as a characteristic value of the element, wherein the calculation formula is as follows:

det(H(i,j))＝D_ii·D_jj-0.9D_ij·D_ij

wherein D is_ii＝I(i+1,j)+I(i-1,j)-2I(i,j)，D_jj＝I(i,j+1)+I(i,j-1)-2I(i,j)，D_ij＝I(i+1,j)+I(i,j-1)-2I(i,j)；

(3.2) constructing a scale space by adopting a SURF mode: firstly, filtering an original image of a memory map by adopting a 9 x 9 box filter to be used as a bottom image; then gradually increasing the size of the box filter, and continuously filtering the original image of the memory map; finally, obtaining filter response graphs of different scales and constructing a scale space; the scale space has 4 layers, and the scaling ratio between the layers is 2;

(3.3) accurately positioning the characteristic points, specifically: in each 3 × 3 × 3 local region, performing non-maximum suppression on the scale space constructed in the step (3.2); comparing each element in the scale space with the characteristic values of 26 elements of the three-dimensional neighborhood of the element, wherein the elements with the characteristic values larger or smaller than the surrounding 26 elements are taken as characteristic points, and recording the positions (i, j) of the characteristic points and the scale s;

(3.4) determining map feature points and feature vectors according to the threshold, specifically: comparing the characteristic value of each characteristic point obtained in the step (3.3) under the corresponding scale with a preset threshold value, and if the corresponding characteristic value is smaller than the preset threshold value, the characteristic point is not taken as a final characteristic point; if the corresponding characteristic value is larger than or equal to a preset threshold value, taking the characteristic point as a final characteristic point, and expressing the characteristic vector as [ i, j, s, det (H (i, j, s)) ]; wherein, i and j are the line number and the column number of the final characteristic point in the memory map, s is the filter scale corresponding to the final characteristic point, and det (H (i, j, s)) is the characteristic value of the final characteristic point under the scale s;

(3.5) counting the feature vectors, specifically: judging the source of the feature vector obtained in the step (3.4), wherein the source comprises the memory mirror image data under the condition of not being attacked and the memory mirror image data under the condition of being attacked in the step (1.2); determining a label z corresponding to each feature vector, wherein the feature vector is represented by z-0 derived from memory mirror image data under the condition of not being attacked, and the feature vector is represented by z-1 derived from the memory mirror image data under the condition of being attacked; finally, obtaining a characteristic vector sequence [ i, j, s, det (H (i, j, s)), z ];

(4) training a neural network, specifically: taking the characteristic vector sequence obtained in the step (3.5) as an input sample of the deep neural network, and training the deep neural network to obtain a virtual machine behavior classifier by taking whether the behavior of the virtual machine is normal as output;

(5) operating a neural network, and analyzing unknown virtual machine behaviors, specifically: and (4) analyzing the virtual machine with unknown running state by using the virtual machine behavior classifier obtained in the step (4), and judging whether the unknown virtual machine behavior is normal or not.

Further, the initial time t in the step (1.1)₀Are positive real numbers.

Further, Δ t in the step (1.2) is a positive real number.

Further, in the step (1.2), a normal sample can be obtained by adopting a common memory mirror image means; and for the malicious samples, creating shared spaces for all the heterogeneous executors to store different types of malicious tool samples, and configuring simulated intrusion environments for all the heterogeneous executors so as to obtain memory mirror image data when the virtualization platform is attacked by different types.

Further, the 26 elements of the three-dimensional neighborhood of one element in the step (3.3) refer to 8 elements on the same scale as the element and 9 elements of two scale layers above and below the element.

Further, the threshold value preset in the step (3.4) depends on the number of features to be recognized, and the higher the threshold value is set, the fewer features can be recognized.

Further, the deep neural network in the step (4) is any one existing deep neural network structure.

The invention has the beneficial effects that: the method utilizes a memory mirror image data analysis and deep learning mechanism to analyze the behavior attribute of the virtual machine through the coding characteristics of the memory mirror image data; compared with the existing virtual platform state analysis method, the method is simple to operate, easy to realize and convenient to modularize; the invention has wide application range, can be used for detecting various attack modes such as known attack, unknown attack and the like, and can not influence the detection performance of the invention even if an attacker leaves attack after being latent for a period of time; in addition, the invention has better robustness, reliability and usability on different system platforms.

Drawings

FIG. 1 is a schematic diagram of a system model in an embodiment of the invention;

FIG. 2 is a flow chart of the method of the present invention.

Detailed Description

The technical scheme of the invention is described in detail by referring to the accompanying drawings and embodiments.

In consideration of the fact that the memory mirror image data can completely represent the running state of one virtual machine, the invention provides a virtual machine behavior analysis system based on deep learning and memory mirror image analysis by utilizing the memory mirror image data and combining a deep neural network.

As shown in fig. 1, the system model of the present embodiment is: a plurality of operating systems are run on a virtual platform, wherein the operating systems comprise WinServer, Ubuntu, CentOS and RedHat. By introducing a backdoor and virus malicious tool database into each operating system through a manual method, the memory mirror image data when each system is not attacked and the memory mirror image data after being attacked by different types can be obtained at any time. The method extracts the memory data characteristics by using the data through the memory map coding, and further judges whether the virtual machine behavior state is attacked or not by using the memory characteristics, wherein the process is shown as the attached figure 2, and the method specifically comprises the following steps:

step one, acquiring memory mirror image data; the specific process is as follows:

(1) at an initial time t₀Acquiring initial memory mirror image data by using a memory forensics tool when the initial memory mirror image data is 0;

(2) after the time delta t is 1, on a VirtualBox or VMware virtualization platform, according to the memory management mechanisms of different operating systems, memory mirror image data under the normal condition (not attacked) and under the attacked condition of each isomer at the current moment, namely a normal sample and a malicious sample, are respectively and automatically sampled. For normal samples, a common memory mirror image method is adopted to realize the normal samples; for the malicious samples, creating shared spaces for all the heterogeneous executors to store malicious tool samples of different types, and configuring simulated intrusion environments for all the heterogeneous executors so as to obtain memory mirror image data when the virtual platform is attacked by different types;

step two, delta coding is carried out; the specific process is as follows:

(1) operating a memory forensics tool, and respectively determining an EXE type executable file and a DLL type dynamic link library list in the initialized memory by using pslist and dlllist commands for the initialized memory;

(2) respectively determining an EXE type executable file and a DLL type dynamic link library list in a current memory for pslist and dlllist commands in a memory forensics tool of the current memory;

(3) analyzing the EXE/DLL list obtained in the last two steps, and determining executable (PE) files in the current memory but not in the initial memory;

(4) generating a prediction memory for each new PE according to the initial memory;

a) determining the process ID of each new PE, and simultaneously determining the base address of the process in the virtual memory address space;

b) for the process of each new PE, operating a memmap command in a memory evidence obtaining tool in a current memory to extract the mapping relation between a process virtual memory and a physical memory;

c) copying the new PE from the corresponding file on the virtual disk to the initial memory; for each virtual memory page of the PE file, the following two steps are performed: firstly, if the page is in the current memory, copying the PE file to the initial memory by using the mapping relation between the virtual memory and the physical memory obtained in the step (4) b) in the step two; secondly, recording page copy information, including a source page position in the PE file, a target page position in the physical memory and a page length;

(5) outputting header information which comprises path information of the new PE needing to be loaded and all copy pages of each PE;

(6) using the predicted memory as a source and the current memory as a comparison object, and using xdelta3 for coding to obtain a memory map after the current memory mirror image data is coded; m and N respectively represent the row number and the column number of the map, I (I, j) ═ a, b and c respectively represent the elements of the ith row and the jth column of the map, I is more than or equal to 0 and less than M, j is more than or equal to 0 and less than N, a, b and c are floating point numbers of 32 bits, and I (I, j) is a three-dimensional vector;

extracting memory map feature point information, including feature point positions, feature point sizes and feature strength of the feature points; the specific process is as follows:

(1) constructing a Hessian matrix;

the Hessian matrix is a core operator of the feature extraction algorithm. The Hessian matrix H of any one of the binary functions f (x, y) is expressed as:

the eigenvalues of f (x, y) are represented by the determinant of the matrix H:

for the characteristic extraction process, in order to accelerate the calculation speed in practical application, the hessian matrix is solved in an approximate mode, and the determinant of the hessian matrix H (i, j) corresponding to the element of the ith row and the jth column in the memory mirror image map is calculated as follows:

det(H(i,j))＝D_ii·D_jj-0.9D_ij·D_ij；

where, denotes the vector dot product, i.e. the sum of the product of the elements, D_ii＝I(i+1,j)+I(i-1,j)-2I(i,j)，D_jj＝I(i,j+1)+I(i,j-1)-2I(i,j)，D_ij＝I(i+1,j)+I(i,j-1)-2I(i,j)；

Performing the calculation on each element in the memory mirror image map to obtain a determinant of a Hessian matrix corresponding to each pixel point in the map, namely a characteristic value of the pixel point;

(2) constructing a scale space;

the scale space is the representation of a map under different resolutions; in order to simulate multi-scale features of image data, extreme points are found on a space domain and a scale domain, preliminary feature points are determined, a scale space needs to be constructed for the map, and feature values of the map on different scale domains are constructed through repeated binary functions and Gaussian function kernel convolution;

the method adopts an SURF mode to construct a scale space; for any memory mirror image map, the size of an original image is kept unchanged, and the original image is filtered by changing the size of a template box to construct a scale space; meanwhile, SURF can adopt parallel operation to process each layer of image in the scale space simultaneously; convolving the filtering template with the integral image in the gradually increased box size, obtaining a response image by a Hessian matrix determinant corresponding to each pixel point, and constructing a pyramid;

firstly, adopting a response image obtained by a 9 x 9 box filter as an image of the bottom layer, then gradually increasing the size of the box and continuously carrying out filtering processing on the original image; dividing a scale space into 4 layers, wherein the scaling ratio between the layers is 2, and each layer comprises a filter response graph with different scales; each layer is processed by adopting gradually increasing filter size, so that a series of maps with different scales containing multiple layers are obtained;

(3) accurately positioning the characteristic points;

in each 3 × 3 × 3 local region, non-maximum suppression is performed; for each pixel point, comparing 8 points on the same scale with 9 points on two scale layers above and below the pixel point, only using extreme points which are larger or smaller than 26 surrounding field values as feature points, and recording feature point positions (i, j) and a scale s;

(4) determining map feature points and feature vectors according to a threshold;

and for each characteristic point obtained in the last step, comparing the characteristic value of the point under the corresponding scale with a preset threshold value. If the corresponding characteristic value is smaller than a preset threshold value, the point cannot be used as a characteristic point; if the corresponding characteristic value is larger than or equal to a preset threshold value, the point can be taken as a final characteristic point, and the characteristic vector is represented as [ i, j, s, det (H (i, j, s)) ], wherein i, j is the number of a row and a column of the characteristic point in the map, s is the corresponding filter scale when the point can be taken as the characteristic point, and det (H (i, j, s)) is the characteristic value of the point under the scale s;

(5) counting the feature vectors;

determining a corresponding label z for each feature vector according to the source of the feature vector, namely the feature vector is from the memory mirror image data which is not attacked or the memory mirror image data which is attacked, wherein z is 0 to represent that the feature vector is from the memory mirror image data which is not attacked, and z is 1 to represent that the feature vector is from the memory mirror image data which is attacked;

at this moment, the memory map after Delta coding is abstracted into a specific coded tagged feature vector sequence through feature extraction;

step four, training a neural network;

one input sample of the deep neural network is denoted as [ i, j, s, det (H (i, j, s)), z ]; selecting an existing deep neural network structure to train to obtain a classifier, and analyzing the behavior of an unknown virtual machine in actual operation;

running a neural network, and analyzing unknown behaviors of the virtual machine;

and analyzing the virtual machine with unknown running state by using the neural network trained in the step four, and judging whether the behavior of the unknown virtual machine is normal or not.

The above is an embodiment of the present invention, and the present invention is not limited by the above embodiment, and the specific implementation method may be determined by combining the technical scheme of the present invention with an actual application scenario.

Claims

1. A virtual machine behavior analysis system based on deep learning and memory mirror image analysis is characterized by comprising the following steps:

(1) acquiring memory mirror image data, comprising the following substeps:

(1.1) at an initial time t₀Acquiring initial memory mirror image data by using a memory forensics tool to obtain an initial memory;

(1.2) at an arbitrary time t₀+ Δ t, on a VirtualBox, VMware virtualization platform, according to the memory management mechanisms of different operating systems, automatically sampling the memory mirror image data of each isomer under the non-attacked and attacked conditions at the current moment respectively to obtain the current memory, namely a normal sample and a malicious sample;

(2) performing delta encoding, comprising the sub-steps of:

(2.1) operating a memory forensics tool, and respectively determining an EXE type executable file and a DLL type dynamic link library list in the initial memory by using a pslist command and a dlllist command for the initial memory obtained in the step (1.1);

(2.4.3) copying the new executable file from the virtual disk to the initial memory, and executing the following two steps for each virtual memory page of the new executable file: firstly, copying a new executable file to an initial memory by using the mapping relation between the virtual memory and the physical memory extracted in the step (2.4.2); then, recording page copy information, including the source page position of the virtual memory page, the target page position in the physical memory and the page length; finally generating a prediction memory;

det(H(i,j))＝D_ii·D_jj-0.9D_ij·D_ij

2. The system according to claim 1, wherein the initial time t in the step (1.1) is the initial time t₀Are positive real numbers.

3. The system according to claim 1, wherein Δ t in step (1.2) is positive and real.

4. The virtual machine behavior analysis system based on deep learning and memory mirror image analysis as claimed in claim 1, wherein in the step (1.2), the normal sample can be obtained by using a common memory mirror image method; and for the malicious samples, creating shared spaces for all the heterogeneous executors to store different types of malicious tool samples, and configuring simulated intrusion environments for all the heterogeneous executors so as to obtain memory mirror image data when the virtualization platform is attacked by different types.

5. The system for analyzing virtual machine behavior based on deep learning and memory mirroring analysis as claimed in claim 1, wherein the 26 elements in the three-dimensional neighborhood of one element in the step (3.3) refer to 8 elements on the same scale with the element and 9 elements of two scale layers above and below the element.

6. The system according to claim 1, wherein the threshold preset in the step (3.4) depends on the number of features to be identified, and the higher the threshold is set, the fewer features can be identified.

7. The virtual machine behavior analysis system based on deep learning and memory mirror analysis of claim 1, wherein the deep neural network in step (4) is any one of existing deep neural network structures.