CN114925363A

CN114925363A - Cloud online malicious software detection method based on recurrent neural network

Info

Publication number: CN114925363A
Application number: CN202210520540.7A
Authority: CN
Inventors: 徐琛; 赵哲锋; 梁雄伟; 张保玉; 张鑫
Original assignee: Silk Road Information Port Cloud Computing Technology Co ltd
Current assignee: Silk Road Information Port Cloud Computing Technology Co ltd
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2022-08-19
Anticipated expiration: 2042-05-12
Also published as: CN114925363B

Abstract

The invention relates to the technical field of data processing, in particular to a cloud online malicious software detection method based on a recurrent neural network, which comprises the steps of verifying a virtual machine, and obtaining a qualified virtual machine after the virtual machine is qualified; tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track; extracting the features of the tracking track to obtain a calling feature matrix database; using the feature matrix to generate a decision model to test the operation behavior of the cloud tenant administrator to obtain a test result; when the test result is abnormal, the abnormal processing is carried out on the qualified virtual machine, and the method and the device detect the qualified virtual machine by deploying the decision model outside the qualified virtual machine, so that the problem that the existing malicious software detection method has poor malicious software detection effect is solved.

Description

Cloud online malicious software detection method based on recurrent neural network

Technical Field

The invention relates to the technical field of data processing, in particular to a cloud online malicious software detection method based on a recurrent neural network.

Background

Cloud malware is one of the most common threats, malware being specially designed to attack virtual machines running on a cloud environment.

According to the existing malicious software detection method, the security software is arranged in the virtual machine to detect the malicious software, but the security software has the same privilege as the virtual machine system and is easy to attack and cheat, so that the detection effect on the malicious software is reduced.

Disclosure of Invention

The invention aims to provide a cloud online malicious software detection method based on a recurrent neural network, and aims to solve the problem that the existing malicious software detection method is poor in malicious software detection effect.

In order to achieve the purpose, the invention provides a cloud online malicious software detection method based on a recurrent neural network, which comprises the following steps:

verifying the virtual machine, and obtaining a qualified virtual machine after the virtual machine is qualified;

tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track;

extracting the features of the tracking track to obtain a calling feature matrix database;

generating a decision model by using the calling feature matrix database to test the operation behavior of the cloud tenant administrator to obtain a test result;

and when the test result is abnormal, performing exception handling on the qualified virtual machine.

The virtual machine is verified to be qualified, and the specific mode for obtaining the qualified virtual machine is as follows:

calling a process log of a cloud tenant administrator;

generating a process directory of the virtual machine;

and verifying the process log by using the process directory, and obtaining a qualified virtual machine after verification is qualified.

The process of the cloud tenant administrator is tracked through the qualified virtual machine, and a specific mode of obtaining a tracking track is as follows:

capturing process operation of a cloud tenant administrator from a monitoring program of the qualified virtual machine through a self-saving mechanism based on software breakpoint execution counting to obtain a capture position;

and executing continuous memory reading operation on the qualified virtual machine based on the capture position, and generating a tracking track executed by the process.

The specific method for extracting the features of the tracking track to obtain the calling feature matrix database is as follows:

extracting a continuous system call sequence from the tracking track through a sliding window;

setting a feature vector based on the continuous system call sequence;

generating a matrix based on the feature vectors;

and (4) performing characteristic extraction on the matrix by using a binary particle swarm algorithm to obtain a calling characteristic matrix database.

The specific mode for testing the operation behavior of the cloud tenant administrator by using the calling feature matrix database to generate the decision model is as follows:

using a random forest classifier as a learning model;

training the learning model by using a calling feature matrix database to obtain a decision model;

and testing the operation behavior of the cloud tenant administrator by using the decision model to obtain a test result.

When the test result is abnormal, the specific way of performing exception handling on the qualified virtual machine is as follows:

marking the test result to obtain a marking result;

judging the abnormity of the marking result, and generating an alarm signal when the marking result is abnormal;

and performing exception handling on the qualified virtual machine based on the alarm signal.

The specific way of performing exception handling on the qualified virtual machine based on the alarm signal is as follows:

terminating the qualified virtual machine by terminating a monitored program or isolating the qualified virtual machine from a cloud environment, or by the cloud tenant administrator.

According to the cloud online malicious software detection method based on the recurrent neural network, the virtual machine is verified to be qualified, and a qualified virtual machine is obtained; tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track; extracting the features of the tracking track to obtain a calling feature matrix database; generating a decision model by using the calling feature matrix database to test the operation behavior of the cloud tenant administrator to obtain a test result; and when the test result is abnormal, performing exception handling on the qualified virtual machine, and detecting the qualified virtual machine by deploying the decision model outside the qualified virtual machine, thereby solving the problem that the existing malicious software detection method has poor detection effect on malicious software.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without any creative effort.

Fig. 1 is a flowchart of a cloud online malware detection method based on a recurrent neural network provided by the present invention.

Fig. 2 is a flowchart for verifying a virtual machine, and obtaining a qualified virtual machine after verification.

Fig. 3 is a flowchart of tracking a process of a cloud tenant administrator by the qualified virtual machine to obtain a tracking trajectory.

Fig. 4 is a flowchart of performing feature extraction on the tracking trajectory to obtain a calling feature matrix database.

FIG. 5 is a flow chart of testing the operation behavior of a cloud tenant administrator using the call feature matrix database generation decision model to obtain a test result.

Fig. 6 is a flowchart of exception handling for the qualified virtual machine when the test result is an exception.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention, and should not be construed as limiting the present invention.

Referring to fig. 1 to 6, the present invention provides a cloud online malware detection method based on a recurrent neural network, including the following steps:

s1, verifying the virtual machine, and obtaining a qualified virtual machine after the virtual machine is qualified;

the concrete method is as follows:

s11 calling a process log of a cloud tenant administrator;

specifically, VMShield invokes a Virtual Machine Process Verification (VMPV) component to generate a process log for the cloud tenant administrator. Some programs are installed in the guest mode of the operating system of the cloud tenant administrator, such as strace (linux) or tasklist. The log is updated after a certain time interval, and the log can be accessed by the cloud service provider. The generated log is stored in a vlog file and then sent to the process directory validation module of the VMPV.

S12 generating a process directory of the virtual machine;

specifically, the program's communication with the operating system is maintained in the form of a directory. This directory contains a list of processes and information related to the processes, such as process IDs, addresses, allocated memory, etc., as well as messages communicated between the program and the operating system. Each time a process is executed, an entry is created in the operating system directory. The cloud tenant administrator may record all processes executing on its entries.

And S13, verifying the process log by using the process directory, and obtaining a qualified virtual machine after verification.

Specifically, in order to map the actual memory usage of the cloud tenant administrator, a virtual machine process catalog generation component is called from a virtual machine monitor. The component uses LibVMI, an API that originates from province libraries, has access to registers of the vCPU, and to monitored virtual memory.

And verifying the process directory. The main objective of process directory verification is to check the security of cloud tenant administrators and their virtual machines by comparing the vlog and hlog files generated respectively at the stage of entering and exiting virtual machine process directory generation to detect any malicious activity in the monitored cloud tenant administrators.

The communication of programs with the Operating System (OS) is maintained in the form of a directory. This directory contains a list of processes and information related to the processes, such as process IDs, addresses, allocated memory, etc., as well as messages communicated between the programs and the operating system. Each time a process is executed, an entry is created in the OS directory. Similarly, the cloud tenant administrator may record all processes executing on its entries. Initially, some malicious rootkits were extracted using non-malicious applications. VMShield invokes the virtual machine process verification component to generate a process log for the tenant administrator. Some programs are installed in the guest mode of the cloud tenant administrator's operating system, such as strace (linux) or tasklist. The log is updated after a certain time interval, and the log can be accessed by the cloud service provider. The generated log is stored in a vlog file and then sent to a virtual machine process directory verification module of the virtual machine process verification component. Indeed, the log cannot be fully trusted because if a virtual machine or cloud tenant administrator is infected with some rootkit or malicious program, it is likely that all malicious processes running within the virtual machine cannot be seen on the log.

In order to map the actual memory usage of the cloud tenant administrator, the virtual machine process directory generation module is called from the virtual machine monitor. It uses LibVMI, an API that is derived from province libraries, has access to registers of vcpus, and to virtual memory of the administrator of the cloud tenant being monitored. The virtual machine process directory generation module saves and accesses the memory of the cloud tenant administrator and also helps to monitor and control the protected address of the operating system. Therefore, it is useful to develop a virtual machine monitor or a security application based on virtual machine introspection. The VMShield calls a virtual machine process directory generation module, and uses LibVMI to map all processes running on a cloud tenant administrator. For the Windows operating system, the LibVMIwinguid tool is initially used to obtain kernel debugging information. The tool uses the domain name of the cloud tenant administrator as input and generates a debugging file of a kernel program database file (PDB), which is called intoskrnl. Pdb is associated with Globally Unique Identifier (GUID) information of the pdb file. This information is used to provide the kernel symbols mapped to its address. The Rekall1.7.1 in-memory forensics framework then creates a Rekall configuration file (win. The rekall configuration file is easier to parse than the PDB file and is machine readable. Rekall and debug information were obtained using the fetch-pdb and parse-pdb plug-ins. VMShield then configures LibVMI to extract information about the Windows kernel functions and addresses. The method is a standard method for acquiring the details of the kernel data structure information, and is suitable for Windows of different versions. For a Linux operating system, kernel symbol detailed information and a kernel symbol table of a system. The symbol table contains the symbol names and addresses in memory that are mapped to the correct page table, which is mapped to the unique physical address of the unique process. A catalog of these processes is then created and stored in the hlog file and then sent to the process catalog validation module for Windows and Linux operating systems. And the process catalog verification module analyzes the hlog file to acquire relevant information required for verifying the process running on the cloud tenant administrator.

When a cloud tenant administrator becomes infected with malware or rootkits, many processes in the virtual environment are killed, including automatic update and security check processes. Once these intrusions or malware bypass security checks and penetrate virtual machines, they can pose threats to the data security and infrastructure of the cloud enterprise and its customers. In this module, we validate all processes even though they do not pose a threat to the VM. The process directory verification module mainly aims to check the security of the cloud tenant administrator and the security of the virtual machine thereof by comparing the vlog file and the hlog file which are respectively generated in the process directory generation stage and the process directory generation stage of the virtual machine so as to detect any malicious activity in the cloud tenant administrator to be monitored. Some malware processes may disable security tools in the cloud tenant administrator and hide their presence. These malware are easily tracked by the process catalog verification module. If any hidden processes are found, we can conclude that the virtual machines of the cloud enterprise are compromised. An alert will be sent to the cloud administrator to take relevant action on it. If the VM passes this security check, the next module (TPSCI) is invoked.

S2, tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain a tracking track;

the concrete method is as follows:

s21, capturing the process running of a cloud tenant administrator from the monitoring program of the qualified virtual machine through a introspection mechanism based on software breakpoint execution counting to obtain a capturing position;

specifically, a system call sequence other than a cloud tenant administrator is extracted from a virtual machine monitoring program by using a DRAKUF (introspection mechanism based on a kernel debugging method). The DRAKUF captures the running of the process on the cloud tenant administrator from the virtual machine monitor based on a software breakpoint execution technology.

And after the breakpoint is executed, generating software debugging exception to the vCPU. And the vCPU calls the VMEXIT operation in the exception handling program and transfers control to the virtual machine monitoring program.

S22 executing continuous memory reading operation on the qualified virtual machine based on the capture position, and generating a tracking track executed by a process.

In particular, a complete trace of the execution of a process is generated by performing successive read operations of memory from a captured location.

At this stage of VMShield, the runtime behavior of the monitored program is extracted in the form of a sequence of system calls. This phase mainly helps to generate a dataset of tracks or system call logs (a series of system calls) of benign and malware, which are generated by advanced memory introspection to the monitored cloud tenant administrator. After generating traces for various monitoring programs, these traces or system call logs may be parsed and cleaned up because they contain a large amount of irrelevant information and noise. The parsed file contains various system calls to each Process ID (PID), such as write (), read (), open (), close (), etc., which are relevant to understanding the behavior of the program. These system calls are extracted and the tracking information is stored according to their PIDs. The DRAKUF is a introspection mechanism based on a kernel debugging method and is used for extracting a system calling sequence except a cloud tenant administrator from a virtual machine monitoring program. The method is based on a software breakpoint execution technology, and the running of the process on the cloud tenant administrator is captured from the virtual machine monitoring program.

After the breakpoint is executed, a software debug exception is generated to the vCPU. The vCPU invokes a VMEXIT () operation in the exception handler and transfers control to the virtual machine monitor. The event is then forwarded to Dom0 running the TPSCI. It then generates a complete trace of the execution of the process by performing successive read operations of memory from the captured location. VMShield uses the Rekall1.7.1 forensics framework to generate a Rekall profile for cloud tenant administrators. This rekall configuration file provides information about the location of the kernel function to be captured. It supports plug-ins for various operating systems and parses debug data. The rekall profile created for the Windows operating system is called win. PDB is generated from the kernel PDB file ntoskrnl. For Linux operating systems, system. VMShield will further analyze these traces to determine intrusion behavior. These extracted system call sequences and PIDs are used as inputs to the third stage of VMShield.

S3, extracting the features of the tracking track to obtain a calling feature matrix database;

the concrete mode is as follows:

s31, extracting a continuous system calling sequence from the tracking track through a sliding window;

specifically, a continuous system call sequence n-gram composed of n items is obtained through a sliding window.

S32 setting a feature vector based on the sequence of consecutive system calls;

specifically, let X _t Is a feature vector of the tracking trajectory t, denoted X _t ＝n-gram ₁ ,n- gram ₂ ,…,n-gram _m . Where n-grams are features extracted for tracking and m is the number of n-grams generated for each tracking trajectory t. N-gram of each feature vector, which can be expressed as<W ₁ W ₂ W ₃ W ₄ W ₅ W ₆ >Wherein i is more than or equal to 1 and less than or equal to m. After extracting n-gram from the tracking track as a feature, creating a feature matrix X' _t Each feature is converted into the number of occurrences c of the feature in the trace. The feature matrix may be represented as X' _t ＝<C ₁ C ₂ …C _z >Where z is the number of features in the tracked trajectory. The generated signature matrix is written to a processed system call signature matrix (PSCFM) which also marks the names of the trace traces, which in the present invention are the intrusion and normal traces. Similarly, all tracks in the feature matrix are preprocessed and previous entries are cleared from the buffer to store new values.

S33 generating a matrix based on the feature vectors;

in particular, the PSCFM will eventually be a matrix of n × z, where n is the number of rows representing the number of tracks present, and each row can be represented as<X' _t ,L∈{0,1}>L ∈ {1,0} is per-row feature matrix X' _t Label of (1) represents malignantIntentional program, 0 denotes normal program. The PSCFM represents the behavior of a virtual machine in different scenarios.

S34, extracting the characteristics of the matrix by using a binary particle swarm algorithm to obtain a calling characteristic matrix database.

Specifically, the invention uses a binary particle swarm algorithm (BPSO) to extract the characteristics. BPSO helps to reduce dimensions and provides unique features for learning models. The advantage of BPSO is its robustness and computational efficiency for parameter control over other algorithms such as Principal Component Analysis (PCA). To find the optimal solution, each particle is moved in the direction of the previous best (pbest) position in equation (1) and the global best (gbest) position in equation (2). The end result is that the population converges to an optimal solution. Based on creating an optimal PSCFM database (call feature matrix database).

gbest(c _p )＝argmin[f(X _j (m))],m＝1,2,…,c _p andj＝1,…,N (2)

Wherein N represents the total number of particles in the group, c _p Represents the current number of iterations, f () represents a fitness function, and X represents a position. The position X of the particle is updated by the following equation:

in the formula, V represents velocity, S (x) represents Sigmoid-shaped function, i.e.

In this module, the execution trace of the monitoring program is extracted using TPSCI. The execution trajectory of the monitor is used to extract features and generate a Processed System Call Feature Matrix (PSCFM). An n-gram is a contiguous sequence of n entries that specifies a given system call. The sequence is obtained by a small sliding window (e.g., k) with a window shift of 1 increment each time. Each movement of the window generates a unique n-gram pattern for the parsed system call.

An example of a system call is analyzed, which includes the sequence of system calls read (), write (), read (), write (), open (), read (), write (), and close (). For a sliding window k-6 and each window move of 1, the resulting sequence of system calls will generate an n-gram as follows:

n-gram1:read(),read(),write(),write(),write(),read()

n-gram2:read(),write(),write(),write(),read(),write()

n-gram3:write(),write(),write(),read(),write(),open()

n-gram4:write(),write(),read(),write(),open(),read()

n-gram5:write(),read(),write(),open(),read(),write()

n-gram6:read(),write(),open(),read(),write(),close()

the n-grams obtained from the traces are also used to evaluate the frequency for a certain process.

Let X _t Is a feature vector of the tracking trajectory t, denoted X _t ＝n-gram ₁ ,n-gram ₂ ,…,n- gram _m . Where n-gram is a feature extracted for tracking, and m is the number of n-grams generated for each tracking trajectory t. N-gram of each feature vector _i Can be expressed as<W ₁ W ₂ W ₃ W ₄ W ₅ W ₆ >Wherein i is more than or equal to 1 and less than or equal to m. After extracting n-gram as feature from tracking track, creating a feature matrix X' _t Each feature is converted into the number of occurrences c of the feature in the trace. The feature matrix may be represented as X' _t ＝< C ₁ C ₂ …C _z >Where z is the number of features in the tracked trajectory. We write the generated signature matrix into a processed system call signature matrix (PSCFM) which also marks the names of the trace tracks, in the present invention, the intrusion and normal tracks. Also, all traces in the feature matrixAll are preprocessed and the previous entries are cleared from the buffer to store the new value. The PSCFM will eventually be a matrix of n x z, where n is the number of rows, representing the number of all traces, and each row can be represented as<X' _t ,L∈{0,1}>L ∈ {1,0} is per-row feature matrix X' _t And 1 represents a malicious program, and 0 represents a normal program. The PSCFM represents the behavior of a virtual machine in different scenarios.

And the invention uses Binary Particle Swarm Optimization (BPSO) for feature extraction. BPSO helps to reduce dimensions and provides unique features for learning models. The advantage of BPSO is its robustness and computational efficiency over other algorithms such as Principal Component Analysis (PCA) for parameter control. To find the optimal solution, each particle is moved in the direction of the previous best (pbest) position in equation (1) and the global best (gbest) position in equation (2). The end result is that the population converges to an optimal solution.

gbest(c _p )＝argmin[f(X _j (m))],m＝1,2,…,c _p andj＝1,…,N (2)

Wherein N represents the total number of particles in the group, c _p Representing the current number of iterations, f () representing the fitness function, and X representing the position. The position X of the particle is updated by the following equation:

BPSO is used for feature selection in discrete feature classification problems. One of the advantages of using BPSO is that it does not use gradient descent, so it can solve the non-linearity problem, and the problem is not necessarily a differentiation problem. After feature selection, an optimal PSCFM database is created.

S4, generating a decision model by using the calling feature matrix database to test the operation behavior of the cloud tenant administrator to obtain a test result;

the concrete method is as follows:

s41 using a random forest classifier as a learning model;

specifically, a Random Forest (RF) classifier is used as a learning model to learn the behavior of programs running within a monitoring cloud tenant administrator.

S42, training the learning model by using a calling feature matrix database to obtain a decision model;

specifically, the RF model is trained using an optimal PSCFM (feature matrix) database.

S43, testing the operation behavior of the cloud tenant administrator by using the decision model to obtain a test result.

Specifically, after the model is trained, a decision model is generated and used for testing the operation behavior of the cloud tenant administrator. It classifies the trace as benign or malicious according to behavior. The results of this phase will be sent to the next phase to inform the cloud administrator of the final analysis tracked.

At this stage, our learning model is trained using the optimal PSCFM database. Cross validation is used to train and test the learning model. We learn the behavior of programs running within the monitoring cloud tenant administrator using a Random Forest (RF) classifier as a learning model. And training the data set to serve as a configuration file of a cloud tenant administrator. The benefit of using RF is that it avoids overfitting of the data set, since it is a collection of Decision Trees (DTs) whose results are aggregated into one final result, thereby minimizing variance and bias-induced errors. After the model is trained, a decision model is generated for testing the runtime behavior of the cloud tenant administrator. It classifies the trace as benign or malicious according to behavior. The results of this phase will be sent to the next phase to inform the cloud administrator of the final analysis tracked.

And S5, when the test result is abnormal, performing exception handling on the qualified virtual machine.

The concrete mode is as follows:

s51, marking the test result to obtain a marking result;

specifically, the IRA receives the result of IPGD (test result).

S52, carrying out abnormity judgment on the marking result, and generating an alarm signal when the marking result is abnormal;

in particular, if the decision model flags the trace as abnormal, it will generate an alarm signal. The signal will be reported to the cloud service provider to take further action to maintain the security of the cloud tenant administrator data and the cloud tenant data in the cloud.

S53 exception handling is carried out on the qualified virtual machine based on the alarm signal.

Specifically, the qualified virtual machine is terminated by terminating the monitored program or isolating the qualified virtual machine from the cloud environment, or by the cloud tenant administrator. Cloud administrators are knowledgeable in the field and can distinguish benign tracking from benign tracking in evolution by accessing activity information of tenant users.

The IRA receives the result of IPGD. If the decision model flags the trace as abnormal, it will generate an alarm signal. The signal will be reported to the cloud service provider to take further action to maintain the security of the cloud tenant administrator data and the cloud tenant data in the cloud. This may be done by terminating the monitored program or isolating the virtual machine from the cloud environment, or by terminating the virtual machine under a cloud tenant administrator flagged with malicious behavior. Assuming also that the cloud administrator is knowledgeable in the field, benign tracking can be distinguished from benign tracking in evolution by accessing the tenant user's activity information.

The meaning of English abbreviations in the present invention will be described below.

The OS represents an operating system.

rootkit stands for trojan, a malware.

The PID represents a process ID.

The PDB represents a kernel database file.

The VM represents a virtual machine.

vCPU represents a virtual CPU.

VMPV represents virtual machine process verification.

TPSCI indicates the use of introspection tracking and parsing of system calls.

BPSO stands for binary particle swarm algorithm.

IPGD represents intrusion profile generation and detection.

IRA denotes intrusion reports and alarms.

The PSCFM represents a system call feature matrix.

RF stands for random forest.

The present invention uses two popular datasets, the university of mexico (UNM) dataset and the barkloud dataset, to validate the model proposed by the present invention. UNM and the BareLoud dataset are widely used to validate the test model. At UNM, 8 malware databases were used to validate the model of the present invention. Each database contains a log of the execution of the privileged process trace, both invasive and benign. The BareLoud dataset consists of executable binary files that circumvent malicious software. 3 kinds of circumvention malware were used: time-based avoidance malware, exception-based avoidance malware, and processor function-based avoidance malware. Containing 88 time-based avoidance class samples, 162 exception-based avoidance class samples, 89 processor function-based avoidance class samples, and 141 benign samples.

Accuracy of classification

To check the accuracy of VMShield, we selected four machine learning algorithms Random Forest (RF), Decision Tree (DT), Logistic Regression (LR) and Naive Bayes (NB) for comparison with the algorithm of the present invention, and feature selection using Principal Component Analysis (PCA) and Binary Particle Swarm Optimization (BPSO). For example PCA + RF refers to the use of PCA as a feature selection algorithm, with RF used to classify the data set.

The proposed VMShield algorithm is compared to other algorithms. From tables 1 and 2, it can be seen that BPSO + RF is the most effective. Thus, in VMShield, BPSO is used as a feature selection algorithm and RF classifier is used as a classification of malicious processes.

Table 1 clearly describes the performance comparison of the different algorithms on the UNM data set.

Table 1 UNM comparison of accuracy of different algorithms on data

Next, the model proposed by the present invention was validated using the barkloud dataset. The data set was first analyzed using the machine learning algorithm described above without feature selection. The machine learning algorithm then evaluates again using feature selection algorithms (e.g., PCA and BPSO). As can be seen from table 2, BPSO + RF gave the best results for all evasive attacks,

TABLE 2 comparison of the accuracy of different algorithms on BareCloud data

Although the preferred embodiment of the present invention is disclosed only as a preferred embodiment of the cloud online malware detection method based on recurrent neural network, it should be understood that the scope of the present invention is not limited thereto, and those skilled in the art can understand that all or part of the procedures for implementing the above embodiment and equivalent variations made by the claims of the present invention still fall within the scope of the present invention.

Claims

1. A cloud online malicious software detection method based on a recurrent neural network is characterized by comprising the following steps:

2. The recurrent neural network-based cloud online malware detection method of claim 1,

calling a process log of a cloud tenant administrator;

generating a process directory of the virtual machine;

3. The recurrent neural network-based cloud online malware detection method of claim 1,

the specific method for tracking the process of the cloud tenant administrator through the qualified virtual machine to obtain the tracking track is as follows:

capturing the process operation of a cloud tenant administrator from a monitoring program of the qualified virtual machine through a introspection mechanism based on software breakpoint execution count to obtain a capture position;

and executing continuous memory reading operation on the qualified virtual machine based on the capture position, and generating a tracking track executed by a process.

4. The recurrent neural network-based cloud online malware detection method of claim 1,

setting a feature vector based on the continuous system call sequence;

generating a matrix based on the feature vectors;

and (4) performing feature extraction on the matrix by using a binary particle swarm algorithm to obtain a calling feature matrix database.

5. The recurrent neural network-based cloud online malware detection method of claim 1,

using a random forest classifier as a learning model;

6. The recurrent neural network-based cloud online malware detection method of claim 1,

marking the test result to obtain a marking result;

7. The recurrent neural network-based cloud online malware detection method of claim 6,

the specific mode of carrying out exception handling on the qualified virtual machine based on the alarm signal is as follows:

terminating the eligible virtual machines by terminating a monitored program or isolating the eligible virtual machines from a cloud environment, or by the cloud tenant administrator.