CN111931179A - Cloud malicious program detection system and method based on deep learning - Google Patents

Cloud malicious program detection system and method based on deep learning Download PDF

Info

Publication number
CN111931179A
CN111931179A CN202010814447.8A CN202010814447A CN111931179A CN 111931179 A CN111931179 A CN 111931179A CN 202010814447 A CN202010814447 A CN 202010814447A CN 111931179 A CN111931179 A CN 111931179A
Authority
CN
China
Prior art keywords
program
information
dynamic link
matrix
link library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010814447.8A
Other languages
Chinese (zh)
Other versions
CN111931179B (en
Inventor
田东海
马锐
赵润泽
郁裕磊
魏行
胡昌振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202010814447.8A priority Critical patent/CN111931179B/en
Publication of CN111931179A publication Critical patent/CN111931179A/en
Application granted granted Critical
Publication of CN111931179B publication Critical patent/CN111931179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45587Isolation or security of virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Virology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a cloud malicious program detection system and method based on deep learning, belongs to the technical field of software security, and is higher in efficiency and accuracy. The system comprises an information acquisition module, a data preprocessing module and a training model module. The information acquisition module specifically comprises: the program sample set comprises program samples used in malicious program detection; the program automatic execution sample is used for automatically executing the program sample in the virtual machine; running a program sample in the virtual machine each time, extracting system real-time parameter information and dynamic link library information in the running process, after the program sample is executed, saving a virtual machine snapshot, and analyzing the virtual machine snapshot to obtain internal memory forensics information; and sending each information to a data preprocessing module. The data preprocessing module carries out data preprocessing to obtain a dynamic link library characteristic vector, a system real-time parameter matrix and a memory forensics matrix and sends the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix to the training model module. The training model module pre-constructs and trains the neural network model.

Description

Cloud malicious program detection system and method based on deep learning
Technical Field
The invention relates to the technical field of software security, in particular to a cloud malicious program detection system and method based on deep learning.
Background
Malware detection refers to a method that can identify malware. Cloud computing, which is one of the most popular and important IT trends today, is a process of providing services of shared computing resources (software or data) to computers or other devices over the internet. How to detect malicious programs at the cloud end is an important development direction for malicious program detection at present. Therefore, the detection work of the cloud malicious program is very important.
Deep learning is an important method in the field of malicious program detection, and most of the popular malicious program detection methods use deep learning technology, so that deep learning receives wide attention in the aspects of practice and research.
At present, deep learning is widely applied in the field of malicious program detection, and because deep learning, particularly convolutional neural networks, achieves excellent results in the field of image processing, the results in this respect are usually obtained by means of deep learning in the malicious program detection. Usually, a malicious program is converted into an image or a digital matrix similar to the image, and then the image is trained according to a deep learning step to obtain a final result.
There are many methods for converting malicious programs into images, and the method is usually used for converting malicious programs into gray-scale maps, and the method utilizes a method for converting binary files into gray-scale maps, wherein for a binary file, each byte range of the binary file is between 00 and FF, which can correspond to 0 to 255 of a gray-scale map, and the binary file is converted into a matrix, and each bit represents a pixel point, namely, the binary file can be converted into a gray-scale map. For these gray-scale maps, the difference between malicious and benign programs is usually seen from the texture on the image.
The method for converting the malicious program into the gray-scale image is usually used for static analysis, but for the development of the malicious program, the purpose of detection can be completed by running the malicious program sometimes, and then the malicious program can be analyzed by tools such as a sandbox. The sandbox can analyze an API sequence called by a malicious program during system operation, convert the API into a vector, generally regard the API as a section of text, convert the API into a word vector by following a natural language processing method, and convert an API sequence list obtained during each sample operation into a matrix which can also be used as an input of a neural network.
At present, most of malicious program static analysis methods are difficult to adapt to the development of malicious programs, the internet environment is more and more complex, the software environment becomes diversified, and with the development of the malicious programs, people who write the malicious programs generally use polymorphic or deformed technologies, so that the method for identifying the malicious programs by using the traditional characteristic labeling mode is difficult to detect new malicious programs, and meanwhile, a plurality of disguised malicious programs are difficult to detect. Malicious programs have evolved in recent years, and some of the malicious programs have functions of anti-detection and anti-analysis.
The dynamic analysis can detect the malicious program more effectively, and usually requires interception and analysis of the API function call of the malicious program through a sandbox environment, but the environment configuration of the sandbox environment is very complex, and the analysis report thereof needs to be processed more carefully.
Many dynamically acquired malicious program features may be interfered by the malicious program, for example, API information of the malicious program is acquired by using a sandbox, but the API information may be intentionally called by the malicious program to disguise, and analyzing the API information may have a certain influence on the detection result.
Many studies have started to use deep learning for malicious program detection, but the current studies usually optimize the neural network model by deepening the network hierarchy, tuning the network, or some other methods, so that the neural network model becomes more and more complex and the model training time is longer.
The malicious program itself is complicated to be converted into data suitable for deep learning training, and there are many ways to convert the malicious program into data, but the conversion is not simple and may require some complicated techniques (such as sandbox technique) or stricter environmental configurations.
The structure of the deep learning model can influence the accuracy of judging the malicious program, but at present, the deep learning is applied to the field of malicious program detection for a long time, and the model is more and more difficult to optimize.
Most detection schemes have high performance overhead, are difficult to deploy in an actual environment, and are not suitable for detecting malicious codes in the cloud.
Therefore, how to utilize the existing malicious program detection technology to enable the malicious program detection to be more efficient and accurate, and the malicious code detection more adaptive to the cloud is a problem to be solved urgently at present.
Disclosure of Invention
In view of this, the invention provides a cloud malicious program detection system and method based on deep learning, which is a malicious code detection scheme capable of adapting to a cloud, and has higher efficiency and higher accuracy.
In order to achieve the purpose, the technical scheme of the invention is as follows: the cloud malicious program detection system based on deep learning comprises an information acquisition module, a data preprocessing module and a training model module.
The information acquisition module comprises a virtual machine, a program automatic execution script and a program sample set; the program sample set comprises program samples used in malicious program detection; the program automatic execution sample is used for automatically executing the program sample in the virtual machine; running a program sample in the virtual machine each time, extracting system real-time state parameter information and dynamic link library information in the running process, storing a virtual machine memory snapshot after the program sample is executed, and analyzing the virtual machine memory snapshot to obtain memory forensics information; and sending the system real-time state parameter information, the dynamic link library information and the internal memory forensics information obtained when each program sample is executed to the data preprocessing module.
The data preprocessing module carries out the following data preprocessing: converting the dynamic link library information into a dynamic link library characteristic vector, converting the system real-time state parameter information into a system real-time parameter matrix, and extracting digital characteristic information in the internal memory evidence obtaining information to convert the digital characteristic information into an internal memory evidence obtaining matrix; and the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix are sent to a training model module.
The training model module is used for constructing and training a neural network model in advance; the neural network model consists of a first feature extraction part, a second feature extraction part, a feature fusion part and a full connection layer; the first and second feature extraction parts are both composed of a convolutional layer and a pooling layer; the input of the first characteristic extraction part is a system real-time parameter matrix, and the output is characteristic information of the system real-time parameter matrix; the input of the second characteristic extraction part is a memory forensics matrix, and the output is the characteristic information of the memory forensics matrix; the feature fusion module is used for performing feature fusion on the output of the first feature extraction part and the output of the second feature extraction part and the feature vector of the dynamic link library to obtain fusion features; and obtaining classification output of the neural network model after the fusion characteristics pass through the full connection layer, namely judging whether the target virtual machine has a malicious program.
Further, the malware detection system includes a model training pattern and an actual measurement pattern.
In the model training mode, the program samples are collected into the program training samples obtained.
The program training samples include known class programs and class labels thereof, the class labels including normal programs and malicious programs.
And training the neural network model in the training model module by combining the class label through the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix obtained by the information acquisition module and the data preprocessing module of the known class program to obtain the trained neural network model.
Under the actual measurement mode, the program samples are concentrated into program test samples which are programs of unknown types; and obtaining a judgment result of whether the malicious program exists in the target virtual machine or not by utilizing the trained neural network model through the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix which are obtained by the unknown program information acquisition module and the data preprocessing module.
Further, the information acquisition module extracts system real-time state parameter information by adopting a Python Psutil module; and analyzing the virtual machine memory snapshot by using a Volatinity tool to obtain memory forensics information.
Furthermore, in the data processing module, each row of the system real-time parameter matrix and the memory forensics matrix corresponds to a process when the program sample is executed, and the inline data is the digital characteristic in the system real-time parameter or the memory forensics information generated in the corresponding process.
Converting the dynamic link library information into a dynamic link library characteristic vector, which specifically comprises the following steps: the dynamically linked library information includes: the occurrence number of different dynamic link libraries in each process during the execution of the program sample; calculating the contribution degree of each dynamic link library to the current process discrimination by using a TF-IDF algorithm, and screening to obtain the dynamic link libraries with the contribution degrees larger than a set threshold value; taking the occurrence times of the screened dynamic link libraries in the current process to form an initial vector, and further clustering the initial vectors of different processes by using a k-means algorithm to obtain initial category labels of different processes; and forming a one-dimensional vector by the primary category labels of all the processes, namely the dynamic link library feature vector.
Further, the feature fusion part firstly fuses the feature information of the real-time parameter matrix of the system and the feature information of the internal memory evidence obtaining matrix by adopting a connection concat mode or an add mode to obtain an intermediate fusion result, and then fuses the intermediate fusion result with the feature vector of the dynamic link library in the connection concat mode to obtain fusion features.
Another embodiment of the present invention provides a cloud-side malicious program detection method based on deep learning, including the following steps:
s1, establishing a virtual machine environment; and deploying a program automatic execution script in the virtual machine for automatically executing the sample in the program sample set.
The program sample set sample is initially a program training sample; the program training samples include known class programs and class labels thereof, the class labels including normal programs and malicious programs.
S2, automatically executing a program sample set sample by the virtual machine, and operating one program sample each time; in the execution process of each program training sample, the virtual machine extracts system real-time state parameter information and dynamic link library information, after the program sample is executed, the virtual machine memory snapshot is stored, and the virtual machine memory snapshot is analyzed to obtain memory forensics information.
And S3, converting the dynamic link library information into a dynamic link library characteristic vector, converting the system real-time state parameter information into a system real-time parameter matrix, and extracting digital characteristic information in the internal memory forensics information to convert the digital characteristic information into an internal memory forensics matrix.
And S4, inputting the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix corresponding to the program training sample into a pre-constructed neural network model, and training the neural network model to obtain the trained neural network model.
The neural network model consists of a first feature extraction part, a second feature extraction part, a feature fusion module and a full connection layer; the first and second feature extraction parts are both composed of a convolutional layer and a pooling layer; the input of the first characteristic extraction part is a system real-time parameter matrix, and the output is characteristic information of the system real-time parameter matrix; the input of the second characteristic extraction part is a memory forensics matrix, and the output is the characteristic information of the memory forensics matrix; the feature fusion module is used for performing feature fusion on the output of the first feature extraction part and the output of the second feature extraction part and the feature vector of the dynamic link library; and the fusion features output by the feature fusion module are classified and output by the neural network model after passing through the full connection layer, namely whether the malicious program exists in the target virtual machine is judged.
And S5, setting the program sample set samples as program test samples, wherein the program test samples are programs of unknown types.
And S6, executing S2 and S3 to obtain a dynamic link library feature vector, a system real-time parameter matrix and a memory forensics matrix of the program test sample, and obtaining a judgment result whether the target virtual machine has the malicious program or not by using the trained neural network model.
Further, in S2, the extracting system real-time status parameter information and dynamic link library information specifically includes: in the process of running a program sample by a virtual machine, extracting more than two times of system real-time state parameter information and dynamic link library information according to a set time interval; the real-time parameter information comprises execution parameters corresponding to each process when the program sample is executed; the dynamic link library information comprises the occurrence number of the dynamic link library corresponding to each process when the program sample is executed.
Further, in S3, converting the system real-time status parameter information into a system real-time parameter matrix, and extracting the digital feature information in the internal memory forensics information and converting the digital feature information into an internal memory forensics matrix, specifically including the following steps:
s301, counting processes contained in corresponding program samples, namely: counting the number of common processes and single processes, wherein the common processes refer to the processes which appear more than once in all program samples; a single process refers to a process that occurs only once in all program samples.
S302, determining the number of rows of the matrix as the number of the common processes plus the maximum number of single processes in all the program samples.
S303, filling data corresponding to each process into corresponding rows according to the system real-time state parameter information to obtain a system real-time parameter matrix; and extracting digital characteristic information in the internal memory evidence obtaining information, and filling the digital characteristic information corresponding to each process into a corresponding row to obtain an internal memory evidence obtaining matrix.
Further, in S3, the converting the dynamic link library information into a dynamic link library feature vector specifically includes: the dynamically linked library information includes: the occurrence number of different dynamic link libraries in each process during the execution of the program sample; calculating the contribution degree of each dynamic link library to the current process discrimination by using a TF-IDF algorithm, and screening to obtain the dynamic link libraries with the contribution degrees larger than a set threshold value; taking the occurrence frequency of the screened dynamic link library in the current process to form an initial vector, and further clustering the dynamic link libraries obtained by screening the initial vectors of different processes by using a k-means algorithm to obtain primary category labels of different processes; and forming a one-dimensional vector by the primary category labels of all the processes, namely the dynamic link library feature vector.
Has the advantages that:
the invention provides a cloud virtual machine malicious program detection system and method based on deep learning, which utilize a common virtual machine to simulate a cloud environment, execute a sample in the virtual machine without complex environment configuration, and simultaneously can directly acquire digital characteristics and directly convert the digital characteristics into a matrix for a deep learning model to train; therefore, the calculation complexity is greatly reduced, and the calculation efficiency is improved. According to the invention, various features are obtained, the efficiency of the neural network is increased by utilizing feature fusion, and a mode of optimizing a model is not utilized, so that deep learning can be optimized from the aspect of features, an excessively complex neural network architecture can be avoided, cloud malicious program detection can be realized only by using a simple neural network architecture, and the accuracy and the efficiency are improved.
Drawings
Fig. 1 is a schematic diagram of a cloud-based malicious program detection system based on deep learning according to an embodiment of the present invention;
FIG. 2 is a schematic fusion diagram of concat fusion mode used in the embodiment of the present invention;
fig. 3 is a fusion diagram of an add fusion method used in the embodiment of the present invention.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention provides a cloud malicious program detection system based on deep learning, which comprises an information acquisition module, a data preprocessing module and a training model module, as shown in fig. 1.
The information acquisition module comprises a virtual machine, a program automatic execution script and a program sample set; the program sample set comprises program samples used in malicious program detection; the program automatic execution sample is used for automatically executing the program sample in the virtual machine; running a program sample in the virtual machine each time, extracting system real-time state parameter information and dynamic link library information in the running process, storing a virtual machine memory snapshot after the program sample is executed, and analyzing the virtual machine memory snapshot to obtain memory forensics information; and sending the system real-time state parameter information, the dynamic link library information and the internal memory forensics information obtained when each program sample is executed to the data preprocessing module.
The data preprocessing module carries out the following data preprocessing: converting the dynamic link library information into a dynamic link library characteristic vector, converting the system real-time state parameter information into a system real-time parameter matrix, and extracting digital characteristic information in the internal memory evidence obtaining information to convert the digital characteristic information into an internal memory evidence obtaining matrix; and the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix are sent to a training model module.
The training model module is used for constructing and training a neural network model in advance; the neural network model consists of a first feature extraction part, a second feature extraction part, a feature fusion part and a full connection layer; the first and second feature extraction parts are both composed of a convolutional layer and a pooling layer; the input of the first characteristic extraction part is a system real-time parameter matrix, and the output is characteristic information of the system real-time parameter matrix; the input of the second characteristic extraction part is a memory forensics matrix, and the output is the characteristic information of the memory forensics matrix; the feature fusion module is used for performing feature fusion on the output of the first feature extraction part and the output of the second feature extraction part and the feature vector of the dynamic link library to obtain fusion features; and obtaining classification output of the neural network model after the fusion characteristics pass through the full connection layer, namely the classification of the program sample.
In the embodiment of the invention, the system comprises a model training mode and an actual measurement mode.
In the model training mode, the program samples are collected into the program training samples obtained.
The program training samples include known class programs and class labels thereof, the class labels including normal programs and malicious programs.
And training the neural network model in the training model module by combining the class label through the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix obtained by the information acquisition module and the data preprocessing module of the known class program to obtain the trained neural network model.
Under the actual measurement mode, the program samples are concentrated into program test samples which are programs of unknown types; and obtaining a judgment result of whether the malicious program exists in the target virtual machine or not by utilizing the trained neural network model through the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix which are obtained by the unknown program information obtaining module and the data preprocessing module.
In the embodiment of the invention, each module is specifically designed as follows:
an information acquisition module:
the first thing involved is a script that automatically executes the program, and the large number of data set samples did not need to be manually run. All samples are run in the virtual machine, only one sample is run in the virtual machine at a time, and the relevant parameters of the running state of the operating system and the information of the dynamic link library are extracted. And after the execution is finished, storing the virtual machine memory snapshot, and analyzing the snapshot to obtain memory forensics information. And closing the virtual machine and continuing to run the next target program.
When a program runs, whether the program is benign or malicious, operations are carried out in the system, and whether the operations are carried by the program or performed by a user, changes can be made to some state parameters in the system, and the changes can be represented by system parameter change conditions of each process in the system. The Python is generally required to be used for extracting system parameters, and although the Python can be used for extracting by using some self-contained modules, the extraction method is complex and the data can be used only by analyzing the data. Psutil is short for process and system utilities, i.e., a process and system utility, which, as the name implies, can monitor both systems and processes. The Python is used for calling the tool, and some state-related parameters in the system, such as a CPU, a memory, a disk, process information and the like, can be acquired.
Traditional antivirus software runs in a system, however, when a malicious program runs on the system, the antivirus software in the system may be found, and then a malicious state cannot be shown, or the antivirus software may be damaged by itself, so as to achieve the purpose of attacking the system. Some advanced malicious programs show some malicious states inside the system, which may cause some detection tools inside the system to make malicious attacks, affect the detection states of the detection tools, and cause the detection results to be possibly normal, but actually the system is infected by the malicious programs. Thus, neither detection tools installed on existing operating systems, nor detection schemes based on such tools, are necessarily trusted. The psutil runs inside the system, so that the obtained data may result in normal data, but the actual programs are malicious programs, the malicious states of the programs may not be detected, and the results may be affected when the data are subsequently processed. Therefore, by adding the memory feature information, more features are fused, and the detection result is improved to a certain extent. The memory mirror image is a file for storing the memory of the operating system, and can be completed by utilizing the snapshot storing function of the virtual machine when the virtual machine is started, and the mirror image can be stored on the host. The method is credible, the whole process is finished in the host without the participation of an operating system in the virtual machine, so that a malicious program running in the virtual machine cannot monitor the process, the detection method based on the memory image of the virtual machine cannot be interfered and damaged, the malicious program can show malicious characteristics as in a normal system, the characteristics can be stored in the memory snapshot of the system, the malicious program cannot find the process of generating the snapshot in the running process, and therefore the malicious characteristics shown by the malicious program are stored in the memory snapshot of the virtual machine, and the memory image contains state information showing the behavior of the malicious program. The Volatinity is a tool special for analyzing the memory image, is developed based on python, can perform memory forensics analysis on most operating systems, and can extract the semantic features of the operating systems from the memory image file by using the tool. It is used to analyze the memory dump of the virtual machine, and it can use different plug-ins to analyze the memory dump file and extract the corresponding program semantic information. The malicious program can cause some changes (such as memory writing operation) to the system memory during the execution process in the system, the semantic features of the program in the memory are analyzed through the tool, and the deep learning model can be used for modeling and analyzing the semantic features.
A Dynamic Link Library (DLL) is a program module within which function functions are available for use by a program or other DLL. There are a large number of files or modules in a system that are built using DLLs, an API is an application programming interface that can be implemented by one or a group of DLLs, so that any process in the system uses a DLL and uses the API functions derived from these DLLs to interact with the file system, processes, registries, networks, and Graphical User Interfaces (GUIs). Therefore, a process running in the system, whether malicious or benign, calls a dynamic link library of the system when running, and can call executable codes in the process by using the dynamic link library to complete a series of functions required to be completed by the process. For a benign process, the functions to be completed are usually some normal functions, so the called DLLs are also more common dynamic link libraries, and for a malicious process, some malicious operations need to be performed during execution, so some special dynamic link libraries may be called.
The processing flow of the information acquisition module is as follows:
configuring an environment: the virtual machine environment is established, an environment for running codes needs to be configured inside the virtual machine, and some application programs need to be installed, so that a user can execute programs to finish common tasks (such as webpage browsing, text editing and video playing), and the virtual machine is started from the mirror image after the virtual machine is started.
Executing code for automatically running the sample: the code can automatically execute the sample and obtain data that does not require manual acquisition. The code can automatically start a virtual machine system, execute the sample in the virtual machine while starting the system, simultaneously acquire the running state of the operating system and the information of the dynamic link library, and store the information after extracting the information. In the host, after the information is extracted, the memory mirror image of the virtual machine is stored, the virtual machine is closed, and the memory mirror image is analyzed to obtain the memory forensics information. And then the virtual machine is restarted to run the next sample.
Acquiring system real-time parameters: in the virtual machine system, the operating system running state parameters can be acquired after the sample starts to run. However, some abnormal situations may occur in the process of extracting system parameters, for example, the process does not exist, or some parameters of the process cannot be extracted, and these parameters that cannot be acquired need to be set to-1. Since some programs or processes may not have any expression at the beginning of execution, some changes to the system may be generated only after a certain period of execution, or the processes themselves may change after a certain period of execution, or shut down, or exhibit some malicious states, it may not be possible to express all the states of the processes by extracting the process system state parameter information at a single time in the virtual machine. In the invention, the system state parameters of 6 processes are obtained, and the process is fully run at an interval of 30 seconds every time, so that the process can display the due operation on the operating system as much as possible. Some malicious programs may cause system crash in the virtual machine after being executed, and similarly, the acquired parameters of the process system are less than 6 times, in this case, all the parameters of all the processes are set to be-1, which means that the virtual machine system is damaged and the extraction of the parameters fails.
Obtaining internal memory evidence obtaining parameters: after the image is saved, the virtual machine image is analyzed at the host. Since the analysis time is long, it should be performed simultaneously with the action of opening the virtual machine to save time. The acquired feature information is stored in the form of an analysis report, and specific digital information is acquired in the next module.
A data processing module:
each row of the system real-time parameter matrix and the memory forensics matrix corresponds to one process of the system when the program sample is executed, and the in-row data is the system real-time parameter generated in the corresponding process or the digital characteristic in the memory forensics information; the dynamically linked library feature vector corresponds to a process of the system in which the program sample is executed, and a position in the dynamically linked library feature vector indicates a number of occurrences of a dynamically linked library in the corresponding process.
The data processing module needs to process the above feature information and convert them into a feature matrix or a feature vector.
In the data processing module, the data to be processed is mainly internal memory forensics information, and the real-time state parameter information of the system is directly stored in a numerical form, so that the real-time state parameter information does not need to be further processed, and the internal memory forensics information is an analysis report obtained by analysis of a Volatinity tool, so that the internal memory forensics information needs to be further processed. The specific processing method is to extract the relevant digital characteristic information according to the analysis report format. For this reason, the character string needs to be analyzed and processed using a regular expression.
The main work of the data processing module is to convert data into a feature matrix or a feature vector, and the sizes of the feature matrices required by the neural network need to be the same, so that the sizes of the feature matrices need to be determined first. Each row of all feature matrices should correspond to a unique process, and it is not possible to have processes arranged randomly. The neural network can compare the change of the corresponding process in each sample in the training process to judge whether the sample is a malicious sample, if the samples of each row in each matrix are randomly arranged, the neural network can not make accurate judgment, otherwise, the accuracy is influenced, so that the process represented by each row of each characteristic matrix is unique and the arrangement sequence is the same. The PID is a unique identifier of a process, and can be used as an identifier of a process, however, if each row of the matrix corresponds to a PID, the matrix is very large and extremely sparse, which is not conducive to training of a neural network, and therefore, the method using the PID as the row number of the matrix is not feasible. In the invention, PID, process name and process path information are used as the unique identification of a process and are used for determining the line number of the characteristic matrix. Because the PID is uncertain, but the process name and the path information of the process are determined for one process, firstly, all combinations of the process name and the process path information are extracted by traversing all the feature information acquired by utilizing the psutuil, the combination is used as the determination of the number of rows of the input matrix of the neural network, which process is represented by each row is determined in advance, and then the extracted data is written into the position corresponding to the matrix, so that the input of the neural network can be obtained.
Although the number of matrix rows can be determined in this way, since the number of samples is large, if one row per process is to be guaranteed, the number of rows is still large, and therefore a distinction is made between "common processes" and "single processes" in all samples. "common process" means a process that occurs more than 1 time in all samples, and more than 1 time means that a process does not only occur in a system when a single sample is running but occurs when different samples are running, the process does not belong to a specific process or a sub-process of a sample, but generally exists in the system, and is called in the process of running the sample, and some communication may exist between the process or the sub-process of the sample, the common process is arranged above a feature matrix one by one, and if the process exists, data of the process can be filled in a corresponding row. By "single process" is meant a process that has only occurred once, which may be a particular process for a particular sample, and if the process also ensures that each row of each feature matrix corresponds to the same process, the number of rows of the feature matrix is too large, so that the processes are arranged down in the feature matrix of each sample in the order of the just-in-process queue. The number of rows of the matrix is fixed, so all samples need to be traversed, the maximum value of the number of single processes in all samples is obtained, and the number of rows of the characteristic matrix is: the total process number + the maximum value of the number of single processes.
In these data, they may have a large difference in value due to unit difference, and therefore, normalization processing is required for these data. But they also need to be weighted because they contribute to the result to a different extent. The weighting method comprises the steps of utilizing a random forest algorithm in machine learning, utilizing random forest training for each column in sequence according to the columns of the matrix to obtain result accuracy, converting the accuracy into weights in proportion, multiplying the data of each column by the weights, and arranging the columns of the matrix from large to small according to the weights. The weighting process will facilitate subsequent feature fusion.
On the other hand, the information of the dynamic link libraries also needs to be converted into the feature vectors, and because whether each process is malicious or not is not known in advance, the dynamic link libraries are clustered by using an unsupervised learning method of machine learning. The method comprises the steps of firstly, calculating the contribution degree of each dynamic link library to the current process by using a TF-IDF algorithm, screening to obtain the dynamic link libraries with the contribution degrees larger than a set threshold value, wherein the set threshold value can be set according to experience. And taking the occurrence times of the screened dynamic link library in the current process to form an initial vector, and marking the occurrence times of the screened dynamic link library in the process as 0 if the dynamic link library does not occur. And further clustering the dynamic link library obtained by screening by using different processes of a k-means + + algorithm to obtain a process clustering result, wherein the process clustering result indicates that each process is possibly malicious or possibly benign, and the process clustering result is a preliminary class label of the process. And forming a one-dimensional vector by the primary category labels of all the processes, namely the dynamic link library feature vector.
The processing flow of the data processing module is as follows:
converting feature data into digital features: the goal is to convert the memory forensics information obtained using the vollatiity tool into digital signatures for use in later steps.
Determining the size of the characteristic matrix: firstly, counting process information, determining an identification method of each process, then counting common processes and single processes, determining the row number of a matrix, and then writing characteristic data into the matrix.
③ weighting characteristics: first, to avoid the large numerical difference, all data were normalized. And then calculating the contribution degree of each feature to the classification result by utilizing the random forest, weighting the classification result according to the contribution value, and arranging the columns from large to small according to the weight.
Fourthly, converting the information of the dynamic link library into a characteristic vector: and screening the dynamic link library with larger contribution degree by using a TF-IDF algorithm, and clustering the dynamic link library by using a k-means algorithm to obtain a classification result to approximately represent whether the process is malicious or benign, namely the classification result is a primary class label of the current process. And forming a one-dimensional vector by the primary category labels of all the processes, namely the dynamic link library feature vector.
Training a model module:
the training model module needs to utilize a convolutional neural network to train the characteristic information to obtain a training model, and the result is predicted. The convolutional neural network was originally used for image processing, and compared with a general algorithm, it can reduce preprocessing and directly process raw data, and thus is very convenient for extracting and analyzing features. The invention also makes use of convolutional neural networks, based on methods in the field of image processing. The invention utilizes various data generated when the malicious program runs to respectively analyze the data, perform characteristic fusion and then learn the data, and aims to enable the network to obtain more characteristic information and obtain better classification effect. The convolutional neural network has two inputs, wherein one is state parameter information of the system in operation, the other is internal memory evidence obtaining information which is obtained through open source psutil and vollatity tools respectively, the two types of information are converted into own characteristic matrixes respectively, the characteristic matrixes pass through a convolutional layer and a pooling layer respectively, then characteristic fusion is carried out, and after the characteristic fusion is carried out, a model is continuously constructed according to the steps of a normal full connection layer. The model comprises a plurality of convolution layers and pooling layers, and the result is obtained through the full-connection layer.
In order to enable the neural network to fully utilize data information generated in the running process of the sample program in the virtual machine in the training process, a neural network fusion mode is adopted. The purpose of neural network fusion is to enable a neural network to receive different characteristics, each part of characteristics are not affected in the characteristic extraction process, and only in the classification process, the same classifier can obtain the characteristics of the characteristics, so that classification is completed. There are two general fusion methods, i.e. concat method and add method, and both methods are integrated from the feature level to complete feature fusion. The fusion of the connection method is shown in fig. 2, which is to connect two parts of features on channels, after the connection, the number of channels is increased, but the information on each basic block is not changed. The added fusion diagram is shown in fig. 3, which actually performs feature addition, and for the adding mode, unlike the connection, the number of channels is not changed, but the information on each basic block is increased.
The feature fusion part firstly fuses the feature information of the real-time parameter matrix of the system and the feature information of the internal memory evidence-obtaining matrix by adopting a connection concat mode or an add mode to obtain an intermediate fusion result, and then fuses the intermediate fusion result with the feature vector of the dynamic link library in the connection concat mode to obtain fusion features.
Most of the feature fusion technology of the neural network is used for complex image processing, connection fusion is carried out in a spatial dimension, and addition fusion is carried out for changing information, so that the addition is better than the connection in terms of calculation amount. Since the added operation is to perform fusion between the feature pixels, some information may be lost in the fusion process, but the connection is only simple, so that the problem of the loss is not worried about. However, the addition method fuses information to generate an association between information, and the degree of association between information is better than that of the connection method.
For the present invention, the two fusion modes have equivalent effects, and specifically, which mode is adopted can be according to different application scenarios. The calculation amount of the Add fusion mode is less than that of the concat, and the concat fusion mode can keep more characteristic information.
The processing flow of the training model module is as follows:
firstly, building a neural network: the whole is built according to a model of a convolutional neural network, the two networks are fused before a full connection layer, and the fusion method can select concat or add. After the neural network is built, it can be trained.
Training a neural network: and adding the three parts of feature information extracted in the invention into the neural network according to a set mode, and starting training to obtain a training model.
③ evaluation result: and verifying the test data according to the training model to obtain an evaluation result, namely verifying the verification data set.
Another embodiment of the present invention further provides a cloud malicious program detection method based on deep learning, including the following steps:
s1, establishing a virtual machine environment; and deploying a program automatic execution script in the virtual machine for automatically executing the sample in the program sample set.
The program sample set sample is initially a program training sample; the program training samples include known class programs and class labels thereof, the class labels including normal programs and malicious programs.
S2, automatically executing a program sample set sample by the virtual machine, and operating one program sample each time; in the execution process of each program training sample, the virtual machine extracts system real-time state parameter information and dynamic link library information, after the program sample is executed, the virtual machine memory snapshot is stored, and the virtual machine memory snapshot is analyzed to obtain memory forensics information.
Extracting real-time state parameter information and dynamic link library information of the system, which specifically comprises the following steps: in the process of running a program sample by a virtual machine, extracting more than two times of system real-time state parameter information and dynamic link library information according to a set time interval; the real-time parameter information comprises a state parameter corresponding to each process when the program sample is executed; the dynamic link library information comprises the occurrence number of the dynamic link library corresponding to each process when the program sample is executed.
And S3, converting the dynamic link library information into a dynamic link library characteristic vector, converting the system real-time state parameter information into a system real-time parameter matrix, and extracting digital characteristic information in the internal memory forensics information to convert the digital characteristic information into an internal memory forensics matrix.
Converting system real-time state parameter information into a system real-time parameter matrix, extracting digital characteristic information in internal memory forensics information and converting the digital characteristic information into an internal memory forensics matrix, and specifically comprising the following steps:
s301, counting processes contained in corresponding program samples, namely: counting the number of common processes and single processes, wherein the common processes refer to the processes which appear more than once in all program samples; a single process refers to a process that occurs once in and out of all program samples.
S302, determining the number of rows of the matrix as the number of the common processes plus the maximum number of single processes in all the program samples.
S303, filling data corresponding to each process into corresponding rows according to the system real-time state parameter information to obtain a system real-time state parameter matrix; and extracting digital characteristic information in the internal memory evidence obtaining information, and filling the digital characteristic information corresponding to each process into a corresponding row to obtain an internal memory evidence obtaining matrix.
Converting the dynamic link library information into a dynamic link library characteristic vector, which specifically comprises the following steps:
the dynamically linked library information includes: the occurrence number of different dynamic link libraries in each process during the execution of the program sample; calculating the contribution degree of each dynamic link library to the current process discrimination by using a TF-IDF algorithm, and screening to obtain the dynamic link libraries with the contribution degrees larger than a set threshold value; taking the occurrence frequency of the screened dynamic link library in the current process to form an initial vector, and further clustering the initial vectors of different processes by using a k-means algorithm to obtain primary category labels of different processes; and forming a one-dimensional vector by the primary category labels of all the processes, namely the dynamic link library feature vector.
And S4, inputting the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix corresponding to the program training sample into a pre-constructed neural network model, and training the neural network model to obtain the trained neural network model.
The neural network model consists of a first feature extraction part, a second feature extraction part, a feature fusion module and a full connection layer; the first and second feature extraction parts are both composed of a convolutional layer and a pooling layer; the input of the first characteristic extraction part is a system real-time parameter matrix, and the output is characteristic information of the system real-time parameter matrix; the input of the second characteristic extraction part is a memory forensics matrix, and the output is the characteristic information of the memory forensics matrix; the feature fusion module is used for performing feature fusion on the output of the first feature extraction part and the output of the second feature extraction part and the feature vector of the dynamic link library; and the fusion features output by the feature fusion module pass through the full connection layer to obtain the classification output of the neural network model, namely the classification of the program sample.
And S5, setting the program sample set samples as program test samples, wherein the program test samples are programs of unknown types.
And S6, executing S2 and S3 to obtain a dynamic link library feature vector, a system real-time parameter matrix and a memory forensics matrix of the program test sample, and obtaining whether the target virtual machine has a malicious program or not by using the trained neural network model.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. The cloud malicious program detection system based on deep learning is characterized by comprising an information acquisition module, a data preprocessing module and a training model module;
the information acquisition module comprises a virtual machine, a program automatic execution script and a program sample set; the program sample set comprises program samples used in malicious program detection; the program automatic execution sample is used for automatically executing the program sample in a virtual machine; running a program sample in the virtual machine each time, extracting system real-time state parameter information and dynamic link library information in the running process, storing a virtual machine memory snapshot after the program sample is executed, and analyzing the virtual machine snapshot to obtain memory forensics information; the real-time state parameter information, the dynamic link library information and the internal memory forensics information of the system obtained when each program sample is executed are sent to the data preprocessing module;
the data preprocessing module carries out the following data preprocessing: converting the dynamic link library information into a dynamic link library characteristic vector, converting the system real-time state parameter information into a system real-time parameter matrix, and extracting digital characteristic information in the internal memory evidence obtaining information to convert the digital characteristic information into an internal memory evidence obtaining matrix; the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix are sent to the training model module;
the training model module is used for constructing and training a neural network model in advance; the neural network model consists of a first feature extraction part, a second feature extraction part, a feature fusion part and a full connection layer; the first and second feature extraction parts are both composed of a convolutional layer and a pooling layer; the input of the first characteristic extraction part is a system real-time parameter matrix, and the output is characteristic information of the system real-time parameter matrix; the input of the second characteristic extraction part is a memory forensics matrix, and the output is the characteristic information of the memory forensics matrix; the feature fusion module is used for performing feature fusion on the output of the first feature extraction part and the output of the second feature extraction part and the feature vector of the dynamic link library to obtain fusion features; and the fusion characteristics pass through the full connection layer to obtain the classification output of the neural network model, namely the judgment result of whether the malicious program exists in the target virtual machine or not.
2. The system of claim 1, wherein the malware detection system comprises a model training pattern and a measured pattern;
under the model training mode, the program sample set is collected to obtain program training samples;
the program training sample comprises known class programs and class labels thereof, wherein the class labels comprise normal programs and malicious programs;
the known class program trains the neural network model in the training model module by combining the dynamic link library characteristic vector, the system real-time parameter matrix and the internal memory forensics matrix which are obtained by the information acquisition module and the data preprocessing module, and the class label to obtain the trained neural network model;
under the actual measurement mode, the program samples are concentrated into program test samples which are unknown programs; and the unknown program obtains a judgment result whether the malicious program exists in the target virtual machine or not by using the trained neural network model through the dynamic link library characteristic vector, the system real-time parameter matrix and the memory forensics matrix obtained by the information acquisition module and the data preprocessing module.
3. The system according to claim 1 or 2, wherein the information acquisition module adopts a Python related module to extract system real-time status parameter information;
the information acquisition module analyzes the virtual machine snapshot by using a Volatinity tool to obtain memory forensics information.
4. The system according to claim 1 or 2, wherein each row of the system real-time parameter matrix and the memory forensics matrix in the data processing module corresponds to a process during execution of the program sample, and the inline data is a digital feature in the system real-time parameter or the memory forensics information generated in the corresponding process;
the converting the dynamic link library information into the dynamic link library feature vector specifically comprises the following steps:
the dynamically linked library information includes: the occurrence number of different dynamic link libraries in each process during the execution of the program sample; calculating the contribution degree of each dynamic link library to the current process discrimination by using a TF-IDF algorithm, and screening to obtain the dynamic link libraries with the contribution degrees larger than a set threshold value; taking the occurrence frequency of the screened dynamic link library in the current process to form an initial vector, and further clustering the initial vectors of different processes by using a k-means algorithm to obtain primary category labels of different processes; and forming a one-dimensional vector by the primary category labels of all the processes, namely the dynamic link library feature vector.
5. The system of claim 1 or 2, wherein the feature fusion part first fuses feature information of a real-time parameter matrix of the system and feature information of a memory forensics matrix by a connection concat mode or an add mode to obtain an intermediate fusion result, and then fuses the intermediate fusion result with the feature vector of the dynamic link library by the connection concat mode to obtain a fusion feature.
6. The cloud malicious program detection method based on deep learning is characterized by comprising the following steps:
s1, establishing a virtual machine environment; deploying a program automatic execution script in the virtual machine for automatically executing a sample set of program samples;
the program sample set sample is initially a program training sample; the program training sample comprises known class programs and class labels thereof, wherein the class labels comprise normal programs and malicious programs;
s2, automatically executing a program sample set sample by the virtual machine, and operating one program sample each time; in the execution process of each program training sample, the virtual machine extracts system real-time state parameter information and dynamic link library information, stores a virtual machine memory snapshot after the program sample is executed, and analyzes the virtual machine snapshot to obtain memory forensics information;
s3, converting the dynamic link library information into a dynamic link library characteristic vector, converting the system real-time state parameter information into a system real-time parameter matrix, and extracting digital characteristic information in the internal memory forensics information to convert the digital characteristic information into an internal memory forensics matrix;
s4, inputting a dynamic link library characteristic vector, a system real-time parameter matrix and a memory forensics matrix corresponding to the program training sample into a pre-constructed neural network model, and training the neural network model to obtain a trained neural network model;
the neural network model consists of a first feature extraction part, a second feature extraction part, a feature fusion module and a full connection layer; the first and second feature extraction parts are both composed of a convolutional layer and a pooling layer; the input of the first characteristic extraction part is a system real-time parameter matrix, and the output is characteristic information of the system real-time parameter matrix; the input of the second characteristic extraction part is a memory forensics matrix, and the output is the characteristic information of the memory forensics matrix; the feature fusion module is used for performing feature fusion on the output of the first feature extraction part and the output of the second feature extraction part and the feature vector of the dynamic link library; the fusion features output by the feature fusion module pass through the full connection layer to obtain the classification output of the neural network model, namely the classification of the program sample;
s5, setting the program sample set samples as program test samples, wherein the program test samples are programs of unknown types;
and S6, executing S2 and S3 to obtain a dynamic link library feature vector, a system real-time parameter matrix and a memory forensics matrix of the program test sample, and obtaining a judgment result whether the malicious program exists in the target virtual machine or not by utilizing the trained neural network model.
7. The method according to claim 6, wherein in S2, the extracting the system real-time status parameter information and the dynamic link library information specifically includes:
extracting more than two times of system real-time state parameter information and dynamic link library information according to a set time interval in the process of running the program sample by the virtual machine; the real-time state parameter information comprises state parameter information corresponding to each process when the program sample is executed; the dynamic link library information comprises the occurrence frequency information of the dynamic link library corresponding to each process when the program sample is executed.
8. The method according to claim 6 or 7, wherein in S3, the step of converting the system real-time status parameter information into a system real-time parameter matrix, and the step of extracting the digital feature information from the internal memory forensic information into an internal memory forensic matrix, comprises the following steps:
s301, counting processes contained in corresponding program samples, namely: counting the number of common processes and single processes, wherein the common processes refer to the processes which appear more than once in all program samples; the single process refers to a process that occurs only once in all program samples;
s302, determining the number of rows of the matrix as the number of common processes plus the maximum value of the number of single processes in all program samples;
s303, filling the state data corresponding to each process into corresponding rows according to the system real-time state parameter information to obtain a system real-time parameter matrix; and extracting digital characteristic information in the internal memory forensics information, and filling the digital characteristic information corresponding to each process into a corresponding row to obtain an internal memory forensics matrix.
9. The method according to claim 6 or 7, wherein in S3, the converting the dynamic link library information into dynamic link library feature vectors includes:
the dynamically linked library information includes: the occurrence number of different dynamic link libraries in each process during the execution of the program sample; calculating the contribution degree of each dynamic link library to the current process discrimination by using a TF-IDF algorithm, and screening to obtain the dynamic link libraries with the contribution degrees larger than a set threshold value; taking the occurrence frequency of the screened dynamic link library in the current process to form an initial vector, and further clustering the initial vectors of different processes by using a k-means algorithm to obtain primary category labels of different processes; and forming a one-dimensional vector by the primary category labels of all the processes, namely the dynamic link library feature vector.
CN202010814447.8A 2020-08-13 2020-08-13 Cloud malicious program detection system and method based on deep learning Active CN111931179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010814447.8A CN111931179B (en) 2020-08-13 2020-08-13 Cloud malicious program detection system and method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010814447.8A CN111931179B (en) 2020-08-13 2020-08-13 Cloud malicious program detection system and method based on deep learning

Publications (2)

Publication Number Publication Date
CN111931179A true CN111931179A (en) 2020-11-13
CN111931179B CN111931179B (en) 2023-01-06

Family

ID=73311302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010814447.8A Active CN111931179B (en) 2020-08-13 2020-08-13 Cloud malicious program detection system and method based on deep learning

Country Status (1)

Country Link
CN (1) CN111931179B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989344A (en) * 2021-03-16 2021-06-18 北京理工大学 Malicious program intelligent detection method, device and system based on hardware tracking technology
CN113010268A (en) * 2021-03-22 2021-06-22 腾讯科技(深圳)有限公司 Malicious program identification method and device, storage medium and electronic equipment
CN113139185A (en) * 2021-04-13 2021-07-20 北京建筑大学 Malicious code detection method and system based on heterogeneous information network
CN113221110A (en) * 2021-04-08 2021-08-06 浙江工业大学 Remote access Trojan intelligent analysis method based on meta-learning
CN114692148A (en) * 2022-03-31 2022-07-01 中国舰船研究设计中心 Malicious code detection method based on machine learning
CN114925363A (en) * 2022-05-12 2022-08-19 丝路信息港云计算科技有限公司 Cloud online malicious software detection method based on recurrent neural network
CN117971385A (en) * 2023-12-28 2024-05-03 国网冀北电力有限公司信息通信分公司 System resource virtual environment use control method and device
CN118194285A (en) * 2024-05-14 2024-06-14 中汽智联技术有限公司 Automatic verification method and system for automobile information security test cases

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
US20180285740A1 (en) * 2017-04-03 2018-10-04 Royal Bank Of Canada Systems and methods for malicious code detection
CN110618854A (en) * 2019-08-21 2019-12-27 浙江大学 Virtual machine behavior analysis system based on deep learning and memory mirror image analysis
US20200104498A1 (en) * 2018-09-28 2020-04-02 Ut-Battelle, Llc Independent malware detection architecture
US20200137088A1 (en) * 2018-10-29 2020-04-30 Acronis International Gmbh Methods and cloud-based systems for correlating malware detections by endpoint devices and servers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180285740A1 (en) * 2017-04-03 2018-10-04 Royal Bank Of Canada Systems and methods for malicious code detection
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
US20200104498A1 (en) * 2018-09-28 2020-04-02 Ut-Battelle, Llc Independent malware detection architecture
US20200137088A1 (en) * 2018-10-29 2020-04-30 Acronis International Gmbh Methods and cloud-based systems for correlating malware detections by endpoint devices and servers
CN110618854A (en) * 2019-08-21 2019-12-27 浙江大学 Virtual machine behavior analysis system based on deep learning and memory mirror image analysis

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989344A (en) * 2021-03-16 2021-06-18 北京理工大学 Malicious program intelligent detection method, device and system based on hardware tracking technology
CN112989344B (en) * 2021-03-16 2022-07-05 北京理工大学 Malicious program intelligent detection method, device and system based on hardware tracking technology
CN113010268A (en) * 2021-03-22 2021-06-22 腾讯科技(深圳)有限公司 Malicious program identification method and device, storage medium and electronic equipment
CN113221110A (en) * 2021-04-08 2021-08-06 浙江工业大学 Remote access Trojan intelligent analysis method based on meta-learning
CN113221110B (en) * 2021-04-08 2022-06-28 浙江工业大学 Remote access Trojan intelligent analysis method based on meta-learning
CN113139185A (en) * 2021-04-13 2021-07-20 北京建筑大学 Malicious code detection method and system based on heterogeneous information network
CN113139185B (en) * 2021-04-13 2023-09-05 北京建筑大学 Malicious code detection method and system based on heterogeneous information network
CN114692148A (en) * 2022-03-31 2022-07-01 中国舰船研究设计中心 Malicious code detection method based on machine learning
CN114692148B (en) * 2022-03-31 2024-04-26 中国舰船研究设计中心 Malicious code detection method based on machine learning
CN114925363A (en) * 2022-05-12 2022-08-19 丝路信息港云计算科技有限公司 Cloud online malicious software detection method based on recurrent neural network
CN114925363B (en) * 2022-05-12 2023-05-19 丝路信息港云计算科技有限公司 Cloud online malicious software detection method based on recurrent neural network
CN117971385A (en) * 2023-12-28 2024-05-03 国网冀北电力有限公司信息通信分公司 System resource virtual environment use control method and device
CN118194285A (en) * 2024-05-14 2024-06-14 中汽智联技术有限公司 Automatic verification method and system for automobile information security test cases
CN118194285B (en) * 2024-05-14 2024-09-24 中汽智联技术有限公司 Automatic verification method and system for automobile information security test cases

Also Published As

Publication number Publication date
CN111931179B (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN111931179B (en) Cloud malicious program detection system and method based on deep learning
Warnecke et al. Evaluating explanation methods for deep learning in security
CN109753801B (en) Intelligent terminal malicious software dynamic detection method based on system call
CN111652290B (en) Method and device for detecting countermeasure sample
CN108985060A (en) A kind of extensive Android Malware automated detection system and method
CN112491796A (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
CN111753290B (en) Software type detection method and related equipment
CN111783812B (en) Forbidden image recognition method, forbidden image recognition device and computer readable storage medium
CN113468524B (en) RASP-based machine learning model security detection method
CN109656818B (en) Fault prediction method for software intensive system
CN111177731A (en) Software source code vulnerability detection method based on artificial neural network
CN111428236A (en) Malicious software detection method, device, equipment and readable medium
CN112035345A (en) Mixed depth defect prediction method based on code segment analysis
CN110618854A (en) Virtual machine behavior analysis system based on deep learning and memory mirror image analysis
CN116707859A (en) Feature rule extraction method and device, and network intrusion detection method and device
CN114024761B (en) Network threat data detection method and device, storage medium and electronic equipment
Hashemi et al. Runtime monitoring for out-of-distribution detection in object detection neural networks
CN110581857B (en) Virtual execution malicious software detection method and system
CN117675273A (en) Network scanning behavior detection method and device
CN115766090A (en) Multi-feature fusion neural network security detection method
CN112749003A (en) Method, apparatus and computer-readable storage medium for system optimization
CN115292701A (en) Malicious code detection method and system based on combination of initiative and passivity
CN111190813B (en) Android application network behavior information extraction system and method based on automatic testing
CN107239704A (en) Malicious web pages find method and device
CN108563950B (en) Android malicious software detection method based on SVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant