CN114925363A - Cloud online malicious software detection method based on recurrent neural network - Google Patents

Cloud online malicious software detection method based on recurrent neural network Download PDF

Info

Publication number
CN114925363A
CN114925363A CN202210520540.7A CN202210520540A CN114925363A CN 114925363 A CN114925363 A CN 114925363A CN 202210520540 A CN202210520540 A CN 202210520540A CN 114925363 A CN114925363 A CN 114925363A
Authority
CN
China
Prior art keywords
virtual machine
qualified
cloud
tenant administrator
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210520540.7A
Other languages
Chinese (zh)
Other versions
CN114925363B (en
Inventor
徐琛
赵哲锋
梁雄伟
张保玉
张鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Silk Road Information Port Cloud Computing Technology Co ltd
Original Assignee
Silk Road Information Port Cloud Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Silk Road Information Port Cloud Computing Technology Co ltd filed Critical Silk Road Information Port Cloud Computing Technology Co ltd
Priority to CN202210520540.7A priority Critical patent/CN114925363B/en
Publication of CN114925363A publication Critical patent/CN114925363A/en
Application granted granted Critical
Publication of CN114925363B publication Critical patent/CN114925363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Biophysics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Virology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a cloud online malicious software detection method based on a recurrent neural network, which comprises the steps of verifying a virtual machine, and obtaining a qualified virtual machine after the virtual machine is qualified; tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track; extracting the features of the tracking track to obtain a calling feature matrix database; using the feature matrix to generate a decision model to test the operation behavior of the cloud tenant administrator to obtain a test result; when the test result is abnormal, the abnormal processing is carried out on the qualified virtual machine, and the method and the device detect the qualified virtual machine by deploying the decision model outside the qualified virtual machine, so that the problem that the existing malicious software detection method has poor malicious software detection effect is solved.

Description

Cloud online malicious software detection method based on recurrent neural network
Technical Field
The invention relates to the technical field of data processing, in particular to a cloud online malicious software detection method based on a recurrent neural network.
Background
Cloud malware is one of the most common threats, malware being specially designed to attack virtual machines running on a cloud environment.
According to the existing malicious software detection method, the security software is arranged in the virtual machine to detect the malicious software, but the security software has the same privilege as the virtual machine system and is easy to attack and cheat, so that the detection effect on the malicious software is reduced.
Disclosure of Invention
The invention aims to provide a cloud online malicious software detection method based on a recurrent neural network, and aims to solve the problem that the existing malicious software detection method is poor in malicious software detection effect.
In order to achieve the purpose, the invention provides a cloud online malicious software detection method based on a recurrent neural network, which comprises the following steps:
verifying the virtual machine, and obtaining a qualified virtual machine after the virtual machine is qualified;
tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track;
extracting the features of the tracking track to obtain a calling feature matrix database;
generating a decision model by using the calling feature matrix database to test the operation behavior of the cloud tenant administrator to obtain a test result;
and when the test result is abnormal, performing exception handling on the qualified virtual machine.
The virtual machine is verified to be qualified, and the specific mode for obtaining the qualified virtual machine is as follows:
calling a process log of a cloud tenant administrator;
generating a process directory of the virtual machine;
and verifying the process log by using the process directory, and obtaining a qualified virtual machine after verification is qualified.
The process of the cloud tenant administrator is tracked through the qualified virtual machine, and a specific mode of obtaining a tracking track is as follows:
capturing process operation of a cloud tenant administrator from a monitoring program of the qualified virtual machine through a self-saving mechanism based on software breakpoint execution counting to obtain a capture position;
and executing continuous memory reading operation on the qualified virtual machine based on the capture position, and generating a tracking track executed by the process.
The specific method for extracting the features of the tracking track to obtain the calling feature matrix database is as follows:
extracting a continuous system call sequence from the tracking track through a sliding window;
setting a feature vector based on the continuous system call sequence;
generating a matrix based on the feature vectors;
and (4) performing characteristic extraction on the matrix by using a binary particle swarm algorithm to obtain a calling characteristic matrix database.
The specific mode for testing the operation behavior of the cloud tenant administrator by using the calling feature matrix database to generate the decision model is as follows:
using a random forest classifier as a learning model;
training the learning model by using a calling feature matrix database to obtain a decision model;
and testing the operation behavior of the cloud tenant administrator by using the decision model to obtain a test result.
When the test result is abnormal, the specific way of performing exception handling on the qualified virtual machine is as follows:
marking the test result to obtain a marking result;
judging the abnormity of the marking result, and generating an alarm signal when the marking result is abnormal;
and performing exception handling on the qualified virtual machine based on the alarm signal.
The specific way of performing exception handling on the qualified virtual machine based on the alarm signal is as follows:
terminating the qualified virtual machine by terminating a monitored program or isolating the qualified virtual machine from a cloud environment, or by the cloud tenant administrator.
According to the cloud online malicious software detection method based on the recurrent neural network, the virtual machine is verified to be qualified, and a qualified virtual machine is obtained; tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track; extracting the features of the tracking track to obtain a calling feature matrix database; generating a decision model by using the calling feature matrix database to test the operation behavior of the cloud tenant administrator to obtain a test result; and when the test result is abnormal, performing exception handling on the qualified virtual machine, and detecting the qualified virtual machine by deploying the decision model outside the qualified virtual machine, thereby solving the problem that the existing malicious software detection method has poor detection effect on malicious software.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without any creative effort.
Fig. 1 is a flowchart of a cloud online malware detection method based on a recurrent neural network provided by the present invention.
Fig. 2 is a flowchart for verifying a virtual machine, and obtaining a qualified virtual machine after verification.
Fig. 3 is a flowchart of tracking a process of a cloud tenant administrator by the qualified virtual machine to obtain a tracking trajectory.
Fig. 4 is a flowchart of performing feature extraction on the tracking trajectory to obtain a calling feature matrix database.
FIG. 5 is a flow chart of testing the operation behavior of a cloud tenant administrator using the call feature matrix database generation decision model to obtain a test result.
Fig. 6 is a flowchart of exception handling for the qualified virtual machine when the test result is an exception.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention, and should not be construed as limiting the present invention.
Referring to fig. 1 to 6, the present invention provides a cloud online malware detection method based on a recurrent neural network, including the following steps:
s1, verifying the virtual machine, and obtaining a qualified virtual machine after the virtual machine is qualified;
the concrete method is as follows:
s11 calling a process log of a cloud tenant administrator;
specifically, VMShield invokes a Virtual Machine Process Verification (VMPV) component to generate a process log for the cloud tenant administrator. Some programs are installed in the guest mode of the operating system of the cloud tenant administrator, such as strace (linux) or tasklist. The log is updated after a certain time interval, and the log can be accessed by the cloud service provider. The generated log is stored in a vlog file and then sent to the process directory validation module of the VMPV.
S12 generating a process directory of the virtual machine;
specifically, the program's communication with the operating system is maintained in the form of a directory. This directory contains a list of processes and information related to the processes, such as process IDs, addresses, allocated memory, etc., as well as messages communicated between the program and the operating system. Each time a process is executed, an entry is created in the operating system directory. The cloud tenant administrator may record all processes executing on its entries.
And S13, verifying the process log by using the process directory, and obtaining a qualified virtual machine after verification.
Specifically, in order to map the actual memory usage of the cloud tenant administrator, a virtual machine process catalog generation component is called from a virtual machine monitor. The component uses LibVMI, an API that originates from province libraries, has access to registers of the vCPU, and to monitored virtual memory.
And verifying the process directory. The main objective of process directory verification is to check the security of cloud tenant administrators and their virtual machines by comparing the vlog and hlog files generated respectively at the stage of entering and exiting virtual machine process directory generation to detect any malicious activity in the monitored cloud tenant administrators.
The communication of programs with the Operating System (OS) is maintained in the form of a directory. This directory contains a list of processes and information related to the processes, such as process IDs, addresses, allocated memory, etc., as well as messages communicated between the programs and the operating system. Each time a process is executed, an entry is created in the OS directory. Similarly, the cloud tenant administrator may record all processes executing on its entries. Initially, some malicious rootkits were extracted using non-malicious applications. VMShield invokes the virtual machine process verification component to generate a process log for the tenant administrator. Some programs are installed in the guest mode of the cloud tenant administrator's operating system, such as strace (linux) or tasklist. The log is updated after a certain time interval, and the log can be accessed by the cloud service provider. The generated log is stored in a vlog file and then sent to a virtual machine process directory verification module of the virtual machine process verification component. Indeed, the log cannot be fully trusted because if a virtual machine or cloud tenant administrator is infected with some rootkit or malicious program, it is likely that all malicious processes running within the virtual machine cannot be seen on the log.
In order to map the actual memory usage of the cloud tenant administrator, the virtual machine process directory generation module is called from the virtual machine monitor. It uses LibVMI, an API that is derived from province libraries, has access to registers of vcpus, and to virtual memory of the administrator of the cloud tenant being monitored. The virtual machine process directory generation module saves and accesses the memory of the cloud tenant administrator and also helps to monitor and control the protected address of the operating system. Therefore, it is useful to develop a virtual machine monitor or a security application based on virtual machine introspection. The VMShield calls a virtual machine process directory generation module, and uses LibVMI to map all processes running on a cloud tenant administrator. For the Windows operating system, the LibVMIwinguid tool is initially used to obtain kernel debugging information. The tool uses the domain name of the cloud tenant administrator as input and generates a debugging file of a kernel program database file (PDB), which is called intoskrnl. Pdb is associated with Globally Unique Identifier (GUID) information of the pdb file. This information is used to provide the kernel symbols mapped to its address. The Rekall1.7.1 in-memory forensics framework then creates a Rekall configuration file (win. The rekall configuration file is easier to parse than the PDB file and is machine readable. Rekall and debug information were obtained using the fetch-pdb and parse-pdb plug-ins. VMShield then configures LibVMI to extract information about the Windows kernel functions and addresses. The method is a standard method for acquiring the details of the kernel data structure information, and is suitable for Windows of different versions. For a Linux operating system, kernel symbol detailed information and a kernel symbol table of a system. The symbol table contains the symbol names and addresses in memory that are mapped to the correct page table, which is mapped to the unique physical address of the unique process. A catalog of these processes is then created and stored in the hlog file and then sent to the process catalog validation module for Windows and Linux operating systems. And the process catalog verification module analyzes the hlog file to acquire relevant information required for verifying the process running on the cloud tenant administrator.
When a cloud tenant administrator becomes infected with malware or rootkits, many processes in the virtual environment are killed, including automatic update and security check processes. Once these intrusions or malware bypass security checks and penetrate virtual machines, they can pose threats to the data security and infrastructure of the cloud enterprise and its customers. In this module, we validate all processes even though they do not pose a threat to the VM. The process directory verification module mainly aims to check the security of the cloud tenant administrator and the security of the virtual machine thereof by comparing the vlog file and the hlog file which are respectively generated in the process directory generation stage and the process directory generation stage of the virtual machine so as to detect any malicious activity in the cloud tenant administrator to be monitored. Some malware processes may disable security tools in the cloud tenant administrator and hide their presence. These malware are easily tracked by the process catalog verification module. If any hidden processes are found, we can conclude that the virtual machines of the cloud enterprise are compromised. An alert will be sent to the cloud administrator to take relevant action on it. If the VM passes this security check, the next module (TPSCI) is invoked.
S2, tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track;
the concrete method is as follows:
s21, capturing the process running of a cloud tenant administrator from the monitoring program of the qualified virtual machine through a introspection mechanism based on software breakpoint execution counting to obtain a capturing position;
specifically, a system call sequence other than a cloud tenant administrator is extracted from a virtual machine monitoring program by using a DRAKUF (introspection mechanism based on a kernel debugging method). The DRAKUF captures the running of the process on the cloud tenant administrator from the virtual machine monitor based on a software breakpoint execution technology.
And after the breakpoint is executed, generating software debugging exception to the vCPU. And the vCPU calls the VMEXIT operation in the exception handling program and transfers control to the virtual machine monitoring program.
S22 executing continuous memory reading operation on the qualified virtual machine based on the capture position, and generating a tracking track executed by a process.
In particular, a complete trace of the execution of a process is generated by performing successive read operations of memory from a captured location.
At this stage of VMShield, the runtime behavior of the monitored program is extracted in the form of a sequence of system calls. This phase mainly helps to generate a dataset of tracks or system call logs (a series of system calls) of benign and malware, which are generated by advanced memory introspection to the monitored cloud tenant administrator. After generating traces for various monitoring programs, these traces or system call logs may be parsed and cleaned up because they contain a large amount of irrelevant information and noise. The parsed file contains various system calls to each Process ID (PID), such as write (), read (), open (), close (), etc., which are relevant to understanding the behavior of the program. These system calls are extracted and the tracking information is stored according to their PIDs. The DRAKUF is a introspection mechanism based on a kernel debugging method and is used for extracting a system calling sequence except a cloud tenant administrator from a virtual machine monitoring program. The method is based on a software breakpoint execution technology, and the running of the process on the cloud tenant administrator is captured from the virtual machine monitoring program.
After the breakpoint is executed, a software debug exception is generated to the vCPU. The vCPU invokes a VMEXIT () operation in the exception handler and transfers control to the virtual machine monitor. The event is then forwarded to Dom0 running the TPSCI. It then generates a complete trace of the execution of the process by performing successive read operations of memory from the captured location. VMShield uses the Rekall1.7.1 forensics framework to generate a Rekall profile for cloud tenant administrators. This rekall configuration file provides information about the location of the kernel function to be captured. It supports plug-ins for various operating systems and parses debug data. The rekall profile created for the Windows operating system is called win. PDB is generated from the kernel PDB file ntoskrnl. For Linux operating systems, system. VMShield will further analyze these traces to determine intrusion behavior. These extracted system call sequences and PIDs are used as inputs to the third stage of VMShield.
S3, extracting the features of the tracking track to obtain a calling feature matrix database;
the concrete mode is as follows:
s31, extracting a continuous system calling sequence from the tracking track through a sliding window;
specifically, a continuous system call sequence n-gram composed of n items is obtained through a sliding window.
S32 setting a feature vector based on the sequence of consecutive system calls;
specifically, let X t Is a feature vector of the tracking trajectory t, denoted X t =n-gram 1 ,n- gram 2 ,…,n-gram m . Where n-grams are features extracted for tracking and m is the number of n-grams generated for each tracking trajectory t. N-gram of each feature vector, which can be expressed as<W 1 W 2 W 3 W 4 W 5 W 6 >Wherein i is more than or equal to 1 and less than or equal to m. After extracting n-gram from the tracking track as a feature, creating a feature matrix X' t Each feature is converted into the number of occurrences c of the feature in the trace. The feature matrix may be represented as X' t =<C 1 C 2 …C z >Where z is the number of features in the tracked trajectory. The generated signature matrix is written to a processed system call signature matrix (PSCFM) which also marks the names of the trace traces, which in the present invention are the intrusion and normal traces. Similarly, all tracks in the feature matrix are preprocessed and previous entries are cleared from the buffer to store new values.
S33 generating a matrix based on the feature vectors;
in particular, the PSCFM will eventually be a matrix of n × z, where n is the number of rows representing the number of tracks present, and each row can be represented as<X' t ,L∈{0,1}>L ∈ {1,0} is per-row feature matrix X' t Label of (1) represents malignantIntentional program, 0 denotes normal program. The PSCFM represents the behavior of a virtual machine in different scenarios.
S34, extracting the characteristics of the matrix by using a binary particle swarm algorithm to obtain a calling characteristic matrix database.
Specifically, the invention uses a binary particle swarm algorithm (BPSO) to extract the characteristics. BPSO helps to reduce dimensions and provides unique features for learning models. The advantage of BPSO is its robustness and computational efficiency for parameter control over other algorithms such as Principal Component Analysis (PCA). To find the optimal solution, each particle is moved in the direction of the previous best (pbest) position in equation (1) and the global best (gbest) position in equation (2). The end result is that the population converges to an optimal solution. Based on creating an optimal PSCFM database (call feature matrix database).
Figure BDA0003641376590000081
gbest(c p )=argmin[f(X j (m))],m=1,2,…,c p andj=1,…,N (2)
Wherein N represents the total number of particles in the group, c p Represents the current number of iterations, f () represents a fitness function, and X represents a position. The position X of the particle is updated by the following equation:
Figure BDA0003641376590000082
in the formula, V represents velocity, S (x) represents Sigmoid-shaped function, i.e.
Figure BDA0003641376590000083
In this module, the execution trace of the monitoring program is extracted using TPSCI. The execution trajectory of the monitor is used to extract features and generate a Processed System Call Feature Matrix (PSCFM). An n-gram is a contiguous sequence of n entries that specifies a given system call. The sequence is obtained by a small sliding window (e.g., k) with a window shift of 1 increment each time. Each movement of the window generates a unique n-gram pattern for the parsed system call.
An example of a system call is analyzed, which includes the sequence of system calls read (), write (), read (), write (), open (), read (), write (), and close (). For a sliding window k-6 and each window move of 1, the resulting sequence of system calls will generate an n-gram as follows:
n-gram1:read(),read(),write(),write(),write(),read()
n-gram2:read(),write(),write(),write(),read(),write()
n-gram3:write(),write(),write(),read(),write(),open()
n-gram4:write(),write(),read(),write(),open(),read()
n-gram5:write(),read(),write(),open(),read(),write()
n-gram6:read(),write(),open(),read(),write(),close()
the n-grams obtained from the traces are also used to evaluate the frequency for a certain process.
Let X t Is a feature vector of the tracking trajectory t, denoted X t =n-gram 1 ,n-gram 2 ,…,n- gram m . Where n-gram is a feature extracted for tracking, and m is the number of n-grams generated for each tracking trajectory t. N-gram of each feature vector i Can be expressed as<W 1 W 2 W 3 W 4 W 5 W 6 >Wherein i is more than or equal to 1 and less than or equal to m. After extracting n-gram as feature from tracking track, creating a feature matrix X' t Each feature is converted into the number of occurrences c of the feature in the trace. The feature matrix may be represented as X' t =< C 1 C 2 …C z >Where z is the number of features in the tracked trajectory. We write the generated signature matrix into a processed system call signature matrix (PSCFM) which also marks the names of the trace tracks, in the present invention, the intrusion and normal tracks. Also, all traces in the feature matrixAll are preprocessed and the previous entries are cleared from the buffer to store the new value. The PSCFM will eventually be a matrix of n x z, where n is the number of rows, representing the number of all traces, and each row can be represented as<X' t ,L∈{0,1}>L ∈ {1,0} is per-row feature matrix X' t And 1 represents a malicious program, and 0 represents a normal program. The PSCFM represents the behavior of a virtual machine in different scenarios.
And the invention uses Binary Particle Swarm Optimization (BPSO) for feature extraction. BPSO helps to reduce dimensions and provides unique features for learning models. The advantage of BPSO is its robustness and computational efficiency over other algorithms such as Principal Component Analysis (PCA) for parameter control. To find the optimal solution, each particle is moved in the direction of the previous best (pbest) position in equation (1) and the global best (gbest) position in equation (2). The end result is that the population converges to an optimal solution.
Figure RE-GDA0003754113440000091
gbest(c p )=argmin[f(X j (m))],m=1,2,…,c p andj=1,…,N (2)
Wherein N represents the total number of particles in the group, c p Representing the current number of iterations, f () representing the fitness function, and X representing the position. The position X of the particle is updated by the following equation:
Figure BDA0003641376590000092
in the formula, V represents velocity, S (x) represents Sigmoid-shaped function, i.e.
Figure BDA0003641376590000093
BPSO is used for feature selection in discrete feature classification problems. One of the advantages of using BPSO is that it does not use gradient descent, so it can solve the non-linearity problem, and the problem is not necessarily a differentiation problem. After feature selection, an optimal PSCFM database is created.
S4, generating a decision model by using the calling feature matrix database to test the operation behavior of the cloud tenant administrator to obtain a test result;
the concrete method is as follows:
s41 using a random forest classifier as a learning model;
specifically, a Random Forest (RF) classifier is used as a learning model to learn the behavior of programs running within a monitoring cloud tenant administrator.
S42, training the learning model by using a calling feature matrix database to obtain a decision model;
specifically, the RF model is trained using an optimal PSCFM (feature matrix) database.
S43, testing the operation behavior of the cloud tenant administrator by using the decision model to obtain a test result.
Specifically, after the model is trained, a decision model is generated and used for testing the operation behavior of the cloud tenant administrator. It classifies the trace as benign or malicious according to behavior. The results of this phase will be sent to the next phase to inform the cloud administrator of the final analysis tracked.
At this stage, our learning model is trained using the optimal PSCFM database. Cross validation is used to train and test the learning model. We learn the behavior of programs running within the monitoring cloud tenant administrator using a Random Forest (RF) classifier as a learning model. And training the data set to serve as a configuration file of a cloud tenant administrator. The benefit of using RF is that it avoids overfitting of the data set, since it is a collection of Decision Trees (DTs) whose results are aggregated into one final result, thereby minimizing variance and bias-induced errors. After the model is trained, a decision model is generated for testing the runtime behavior of the cloud tenant administrator. It classifies the trace as benign or malicious according to behavior. The results of this phase will be sent to the next phase to inform the cloud administrator of the final analysis tracked.
And S5, when the test result is abnormal, performing exception handling on the qualified virtual machine.
The concrete mode is as follows:
s51, marking the test result to obtain a marking result;
specifically, the IRA receives the result of IPGD (test result).
S52, carrying out abnormity judgment on the marking result, and generating an alarm signal when the marking result is abnormal;
in particular, if the decision model flags the trace as abnormal, it will generate an alarm signal. The signal will be reported to the cloud service provider to take further action to maintain the security of the cloud tenant administrator data and the cloud tenant data in the cloud.
S53 exception handling is carried out on the qualified virtual machine based on the alarm signal.
Specifically, the qualified virtual machine is terminated by terminating the monitored program or isolating the qualified virtual machine from the cloud environment, or by the cloud tenant administrator. Cloud administrators are knowledgeable in the field and can distinguish benign tracking from benign tracking in evolution by accessing activity information of tenant users.
The IRA receives the result of IPGD. If the decision model flags the trace as abnormal, it will generate an alarm signal. The signal will be reported to the cloud service provider to take further action to maintain the security of the cloud tenant administrator data and the cloud tenant data in the cloud. This may be done by terminating the monitored program or isolating the virtual machine from the cloud environment, or by terminating the virtual machine under a cloud tenant administrator flagged with malicious behavior. Assuming also that the cloud administrator is knowledgeable in the field, benign tracking can be distinguished from benign tracking in evolution by accessing the tenant user's activity information.
The meaning of English abbreviations in the present invention will be described below.
The OS represents an operating system.
rootkit stands for trojan, a malware.
The PID represents a process ID.
The PDB represents a kernel database file.
The VM represents a virtual machine.
vCPU represents a virtual CPU.
VMPV represents virtual machine process verification.
TPSCI indicates the use of introspection tracking and parsing of system calls.
BPSO stands for binary particle swarm algorithm.
IPGD represents intrusion profile generation and detection.
IRA denotes intrusion reports and alarms.
The PSCFM represents a system call feature matrix.
RF stands for random forest.
The present invention uses two popular datasets, the university of mexico (UNM) dataset and the barkloud dataset, to validate the model proposed by the present invention. UNM and the BareLoud dataset are widely used to validate the test model. At UNM, 8 malware databases were used to validate the model of the present invention. Each database contains a log of the execution of the privileged process trace, both invasive and benign. The BareLoud dataset consists of executable binary files that circumvent malicious software. 3 kinds of circumvention malware were used: time-based avoidance malware, exception-based avoidance malware, and processor function-based avoidance malware. Containing 88 time-based avoidance class samples, 162 exception-based avoidance class samples, 89 processor function-based avoidance class samples, and 141 benign samples.
Accuracy of classification
To check the accuracy of VMShield, we selected four machine learning algorithms Random Forest (RF), Decision Tree (DT), Logistic Regression (LR) and Naive Bayes (NB) for comparison with the algorithm of the present invention, and feature selection using Principal Component Analysis (PCA) and Binary Particle Swarm Optimization (BPSO). For example PCA + RF refers to the use of PCA as a feature selection algorithm, with RF used to classify the data set.
The proposed VMShield algorithm is compared to other algorithms. From tables 1 and 2, it can be seen that BPSO + RF is the most effective. Thus, in VMShield, BPSO is used as a feature selection algorithm and RF classifier is used as a classification of malicious processes.
Table 1 clearly describes the performance comparison of the different algorithms on the UNM data set.
Table 1 UNM comparison of accuracy of different algorithms on data
Figure BDA0003641376590000121
Next, the model proposed by the present invention was validated using the barkloud dataset. The data set was first analyzed using the machine learning algorithm described above without feature selection. The machine learning algorithm then evaluates again using feature selection algorithms (e.g., PCA and BPSO). As can be seen from table 2, BPSO + RF gave the best results for all evasive attacks,
TABLE 2 comparison of the accuracy of different algorithms on BareCloud data
Figure BDA0003641376590000122
Although the preferred embodiment of the present invention is disclosed only as a preferred embodiment of the cloud online malware detection method based on recurrent neural network, it should be understood that the scope of the present invention is not limited thereto, and those skilled in the art can understand that all or part of the procedures for implementing the above embodiment and equivalent variations made by the claims of the present invention still fall within the scope of the present invention.

Claims (7)

1. A cloud online malicious software detection method based on a recurrent neural network is characterized by comprising the following steps:
verifying the virtual machine, and obtaining a qualified virtual machine after the virtual machine is qualified;
tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track;
extracting the features of the tracking track to obtain a calling feature matrix database;
generating a decision model by using the calling feature matrix database to test the operation behavior of the cloud tenant administrator to obtain a test result;
and when the test result is abnormal, performing exception handling on the qualified virtual machine.
2. The recurrent neural network-based cloud online malware detection method of claim 1,
the virtual machine is verified to be qualified, and the specific mode for obtaining the qualified virtual machine is as follows:
calling a process log of a cloud tenant administrator;
generating a process directory of the virtual machine;
and verifying the process log by using the process directory, and obtaining a qualified virtual machine after verification is qualified.
3. The recurrent neural network-based cloud online malware detection method of claim 1,
the specific method for tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain the tracking track is as follows:
capturing the process operation of a cloud tenant administrator from a monitoring program of the qualified virtual machine through a introspection mechanism based on software breakpoint execution count to obtain a capture position;
and executing continuous memory reading operation on the qualified virtual machine based on the capture position, and generating a tracking track executed by a process.
4. The recurrent neural network-based cloud online malware detection method of claim 1,
the specific method for extracting the features of the tracking track to obtain the calling feature matrix database is as follows:
extracting a continuous system call sequence from the tracking track through a sliding window;
setting a feature vector based on the continuous system call sequence;
generating a matrix based on the feature vectors;
and (4) performing feature extraction on the matrix by using a binary particle swarm algorithm to obtain a calling feature matrix database.
5. The recurrent neural network-based cloud online malware detection method of claim 1,
the specific mode for testing the operation behavior of the cloud tenant administrator by using the calling feature matrix database to generate the decision model is as follows:
using a random forest classifier as a learning model;
training the learning model by using a calling feature matrix database to obtain a decision model;
and testing the operation behavior of the cloud tenant administrator by using the decision model to obtain a test result.
6. The recurrent neural network-based cloud online malware detection method of claim 1,
when the test result is abnormal, the specific way of performing exception handling on the qualified virtual machine is as follows:
marking the test result to obtain a marking result;
judging the abnormity of the marking result, and generating an alarm signal when the marking result is abnormal;
and performing exception handling on the qualified virtual machine based on the alarm signal.
7. The recurrent neural network-based cloud online malware detection method of claim 6,
the specific mode of carrying out exception handling on the qualified virtual machine based on the alarm signal is as follows:
terminating the eligible virtual machines by terminating a monitored program or isolating the eligible virtual machines from a cloud environment, or by the cloud tenant administrator.
CN202210520540.7A 2022-05-12 2022-05-12 Cloud online malicious software detection method based on recurrent neural network Active CN114925363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210520540.7A CN114925363B (en) 2022-05-12 2022-05-12 Cloud online malicious software detection method based on recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210520540.7A CN114925363B (en) 2022-05-12 2022-05-12 Cloud online malicious software detection method based on recurrent neural network

Publications (2)

Publication Number Publication Date
CN114925363A true CN114925363A (en) 2022-08-19
CN114925363B CN114925363B (en) 2023-05-19

Family

ID=82809561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210520540.7A Active CN114925363B (en) 2022-05-12 2022-05-12 Cloud online malicious software detection method based on recurrent neural network

Country Status (1)

Country Link
CN (1) CN114925363B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784476A (en) * 2019-10-31 2020-02-11 国网河南省电力公司电力科学研究院 Power monitoring active defense method and system based on virtualization dynamic deployment
CN111931179A (en) * 2020-08-13 2020-11-13 北京理工大学 Cloud malicious program detection system and method based on deep learning
US20220035920A1 (en) * 2020-07-30 2022-02-03 Versa Networks, Inc. Systems and methods for automatically generating malware countermeasures

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784476A (en) * 2019-10-31 2020-02-11 国网河南省电力公司电力科学研究院 Power monitoring active defense method and system based on virtualization dynamic deployment
US20220035920A1 (en) * 2020-07-30 2022-02-03 Versa Networks, Inc. Systems and methods for automatically generating malware countermeasures
CN111931179A (en) * 2020-08-13 2020-11-13 北京理工大学 Cloud malicious program detection system and method based on deep learning

Also Published As

Publication number Publication date
CN114925363B (en) 2023-05-19

Similar Documents

Publication Publication Date Title
Rudd et al. A survey of stealth malware attacks, mitigation measures, and steps toward autonomous open world solutions
Mishra et al. VMGuard: A VMI-based security architecture for intrusion detection in cloud environment
US20180300484A1 (en) Detection of anomalous program execution using hardware-based micro architectural data
US20180268142A1 (en) Unsupervised anomaly-based malware detection using hardware features
Mohaisen et al. AMAL: high-fidelity, behavior-based automated malware analysis and classification
US9747452B2 (en) Method of generating in-kernel hook point candidates to detect rootkits and the system thereof
Wilhelm et al. A forced sampled execution approach to kernel rootkit identification
Ahmadi et al. Malware detection by behavioural sequential patterns
Darem et al. An adaptive behavioral-based incremental batch learning malware variants detection model using concept drift detection and sequential deep learning
Galloro et al. A Systematical and longitudinal study of evasive behaviors in windows malware
Mishra et al. VMShield: Memory introspection-based malware detection to secure cloud-based services against stealthy attacks
US11847214B2 (en) Machine learning systems and methods for reducing the false positive malware detection rate
Rabadi et al. Advanced windows methods on malware detection and classification
Mishra et al. KVMInspector: KVM Based introspection approach to detect malware in cloud environment
Sun et al. Malware virtualization-resistant behavior detection
Finder et al. Time-interval temporal patterns can beat and explain the malware
Nunes et al. Bane or Boon: Measuring the effect of evasive malware on system call classifiers
Fasano et al. Cascade learning for mobile malware families detection through quality and android metrics
Pektaş et al. Runtime-behavior based malware classification using online machine learning
CN114925363B (en) Cloud online malicious software detection method based on recurrent neural network
Luh et al. Advanced threat intelligence: detection and classification of anomalous behavior in system processes
Jawhar A Survey on Malware Attacks Analysis and Detected
Giuffrida et al. Memoirs of a browser: A cross-browser detection model for privacy-breaching extensions
Bai et al. Malware detection method based on dynamic variable length API sequence
Ekenstein et al. Classifying evasive malware

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant