CN114077479A - Method for detecting malicious codes of client virtual machine in cloud platform - Google Patents

Method for detecting malicious codes of client virtual machine in cloud platform Download PDF

Info

Publication number
CN114077479A
CN114077479A CN202111300132.2A CN202111300132A CN114077479A CN 114077479 A CN114077479 A CN 114077479A CN 202111300132 A CN202111300132 A CN 202111300132A CN 114077479 A CN114077479 A CN 114077479A
Authority
CN
China
Prior art keywords
virtual machine
running state
model
client
client virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111300132.2A
Other languages
Chinese (zh)
Inventor
陈博翰
丁紫薇
马桂才
杨诏钧
魏立峰
韩光
姬一文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kirin Software Co Ltd
Original Assignee
Kirin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kirin Software Co Ltd filed Critical Kirin Software Co Ltd
Priority to CN202111300132.2A priority Critical patent/CN114077479A/en
Publication of CN114077479A publication Critical patent/CN114077479A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support

Abstract

The application discloses a method for detecting malicious codes of a client virtual machine in a cloud platform, which comprises the following steps: step S1, obtaining a memory dump file; step S2, extracting information by the virtual machine; step S3, training a model; and step S4, detecting the malicious codes. The method can avoid malicious codes from attacking the agent in the client, so that the agent is invalid and even bypasses detection software, the detection efficiency and the detection precision are improved, and the agent does not need to be adapted again aiming at different types of operating systems.

Description

Method for detecting malicious codes of client virtual machine in cloud platform
Technical Field
The application relates to the field of cloud security, in particular to a method for detecting malicious codes of a client virtual machine in a cloud platform.
Background
Cloud computing is one of distributed computing, and means that a huge data computing processing program is decomposed into countless small programs through a network cloud, and then the small programs are processed and analyzed through a system consisting of a plurality of servers to obtain results and are returned to a user.
In recent years, with the rapid development of cloud computing, the cloud security problem has become more severe. In the traditional malicious code detection in the cloud platform, the state information of the client during operation is mostly obtained by arranging an agent in the client, so that the attack of the malicious code on the agent in the client cannot be avoided, and the malicious code is invalid and even bypasses detection software. Meanwhile, the traditional method generally performs malicious code detection through expert analysis or feature-based machine learning classification, which consumes a lot of time on analysis or data preprocessing. In addition, the existing detection method needs to be adapted and changed for different types of operating systems, and cannot well detect operating systems without visual interfaces.
Disclosure of Invention
The invention mainly aims to provide a method for detecting malicious codes of a client virtual machine in a cloud platform, which can prevent the malicious codes from attacking an agent in the client, so that the agent is invalid and even bypasses detection software, the detection efficiency and the detection precision are improved, and the method does not need to be adapted again for different types of operating systems.
In order to achieve the above object, the present invention provides a method for detecting malicious codes of a client virtual machine in a cloud platform, which comprises the following steps:
step S1, obtaining a memory dump file, creating and starting a client virtual machine in the cloud platform, and obtaining the memory dump file of the client virtual machine by using the memory dump function of the virtualization platform;
step S2, the virtual machine introspection extracts information, analyzes the memory transfer file, and obtains various running state characteristics of the client virtual machine through the virtual machine introspection technology;
s3, model training, namely training a plurality of running state characteristics in sequence by using a BERT model to obtain the trained BERT model and the model classification accuracy corresponding to the various running state characteristics;
and step S4, malicious code detection, namely, sequentially inputting the detected running state characteristics into the trained BERT model for detection to obtain detection results of various running state characteristics, distributing the weight of the detection result of each running state characteristic according to the model classification accuracy corresponding to the various running state characteristics, multiplying the detection result of each running state characteristic by the corresponding weight respectively, and adding the multiplication results to obtain the final detection result.
Optionally, step S1 includes:
step S101, a client virtual machine is created and started in a cloud platform;
step S102, saving a snapshot of a client virtual machine as a recovery point, running normal software and malicious software in the client virtual machine, and simulating a scene of normal use of a user and invasion of the malicious software;
s103, acquiring a memory dump file of the client virtual machine by using a memory dump function of the virtualization platform;
and S104, restoring the client virtual machine to a restoring point, running the rest software and repeating the step S103.
Optionally, step S2 includes:
step S201, constructing a symbol table of a client operating system;
step S202, analyzing the internal storage transfer file, acquiring various running state characteristics of the client virtual machine during running, and storing the acquired data in different documents according to categories.
Optionally, the guest virtual machine includes a linux system virtual machine and a windows system virtual machine, the type of the guest virtual machine is determined, and if the guest virtual machine is the windows system virtual machine, the symbol table is obtained by using the vollatinity in step S201; if the client virtual machine is a linux system virtual machine, in step S201, a symbol table is obtained by using dwarf2 json.
Optionally, the linux system virtual machine adopts an ubuntu16.04 operating system, and the windows system virtual machine adopts a windows 7 operating system.
Optionally, step S3 includes:
s301, sorting and marking the running state features acquired in the step S2 to be used as an input data set of the BERT model, and dividing the input data set into a training set and a verification set;
s302, adjusting a hyper-parameter structure of the BERT model;
and S303, inputting the input data set into the BERT model for pre-training, completing two pre-training tasks of mask LM and NSP, and then using the same input data set again to train the pre-trained model again, so as to finally obtain the trained BERT classification model and the model classification accuracy corresponding to various operation state characteristics.
Optionally, step S302 adjusts the hyper-parameter structure of the BERT model to:
the number of hidden layers L is 2, the hidden layer size H is 512, and the number of attention headers a is 8.
Optionally, the step S4 includes:
screening n running state features with the highest model classification accuracy;
calculating the weight of the detection result of each running state characteristic according to the model classification accuracy of the screened running state characteristics, wherein the weight calculation formula is as follows:
Figure BDA0003338063590000031
wherein, wiAs a weight of the ith operating state feature, acciFor the model classification accuracy of the ith operating condition feature,
Figure BDA0003338063590000032
the sum of the model classification accuracies of n operating state features, n being an integer greater than zero;
When malicious codes of a client virtual machine to be detected are detected, firstly, n selected running state features of the client virtual machine to be detected are obtained, then, the screened running state features are classified and detected by using a trained BERT model, detection results of various running state features are obtained, and then, a final detection result calculation formula is as follows:
Figure BDA0003338063590000033
wherein R is0Probability of malicious code being present in the guest virtual machine, wiIs the weight of the ith feature, RiThe classification accuracy r of the ith feature of the guest virtual machine to be testedi Setting 1 represents that the detection result of the ith feature is malicious.
Optionally, the screened operating state features include filescan, netscan, malfind, privs, modules, psxview, pslist, svcscan, thrdscan, and mutantscan.
According to the technical scheme, the embodiment of the application has the following advantages:
1. according to the method, malicious software detection can be performed on the client virtual machines using different operating systems without changing or adjusting, and additional setting of the client operating systems is not required;
2. the method extracts various running state characteristics of the client virtual machine by acquiring the memory dump file of the virtual client and using the virtual machine introspection technology, so that the information of the client is acquired from the hypervisor layer, and the damage or bypass of malicious software to a detection system is effectively avoided;
3. the operation information is classified by using the BERT model, so that the complicated steps (such as sliding windows and the like) of feature extraction in the prior art are avoided, additional analysis or feature extraction is not needed, and the time overhead is reduced. Meanwhile, the output of the BERT comprises a word vector, a text vector and a position vector, the context relation of the running state information is strengthened, the detection accuracy is improved, the method achieves 99.9% of classification accuracy, and the safety of the cloud platform is effectively guaranteed.
Drawings
In order to express the technical scheme of the embodiment of the invention more clearly, the drawings used for describing the embodiment will be briefly introduced below, and obviously, the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a design architecture diagram of a method for malicious code detection of a guest virtual machine in a cloud platform according to the present invention;
FIG. 2 is a schematic diagram of the operation of the BERT model in an embodiment of the present invention;
fig. 3 is a flowchart of a method for detecting malicious code of a guest virtual machine in a cloud platform according to the present invention.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the invention provides a method for detecting malicious codes of a client virtual machine in a cloud platform. The data acquisition part mainly has the function of acquiring operation data of an instance on the cloud platform and providing data for subsequent model classification. The actual environment of this part is the OpenStack train version environment installed under ubuntu 18.04. A plurality of instances, namely client virtual machines, are created on the cloud platform of OpenStack, and dump file contents in operation are obtained. The main function of the virtual machine introspection part is to acquire specific information of the client during operation by using the virtual machine introspection technology from the dump file content extracted on the cloud platform. The part can extract client running information of different systems, is not limited to the same system, and enables the classification model to still show good detection rate when facing different system data.
The main function of the model training part is to obtain the possibility that the client is attacked by malicious codes by classification after the BERT model is trained. The training of the BERT model comprises a pre-training part and a training part, and the vector representation of each character/word output by the model can completely and accurately depict the whole information of the input text as much as possible by pre-training the model, so that a better initial value of the model parameter is provided for the subsequent training. After pre-training, the data set is used again for training, and a BERT classification model for malicious code detection is obtained. During detection, the classification accuracy of each feature is obtained by using the running state features of the plurality of client virtual machines and the trained BERT classification model, and the possibility of malicious codes existing in the client virtual machines is calculated by combining a weight formula.
Referring to fig. 1, fig. 2 and fig. 3, a method for detecting malicious codes of a client virtual machine in a cloud platform according to an embodiment of the present invention includes the following specific steps.
Step one, obtaining a memory dump file
(1) Creating a plurality of client virtual machines in the cloud platform, installing linux and windows systems with different versions respectively, and starting the client virtual machines. Specifically, a plurality of guest virtual machines are created using OpenStack, ubuntu16.04 (a linux system) and windows 7 operating systems are installed, respectively, and the guest virtual machines are started. By setting two types of virtual machines, the client running information of different systems can be extracted, so that the running state characteristics of various operating systems can be detected.
(2) And storing the snapshot of the client virtual machine as a recovery point, running normal software and malicious software in the client virtual machine, and simulating the scenes of normal use and invasion of the malicious software by a user. The dump file function of OpenStack is utilized to establish dump files of the client virtual machines, normal software and malicious software are operated in the client virtual machines, and scenes in which users normally use and are invaded by the malicious software are simulated.
(3) And then, acquiring a memory dump file of the client virtual machine by using a memory dump function of the virtualization platform, specifically, by using a virsh dump instruction of libvirt.
(4) And (4) restoring the client virtual machine to a recovery point, running the rest software and repeating the step (3).
Step two, the virtual machine extracts information from provinces
After obtaining the memory dump file of the client virtual machine, the state information of the client virtual machine during operation is obtained by utilizing the virtual machine introspection technology. The virtual machine introspection technology may obtain internal state information of the virtual machine during operation from outside the guest virtual machine, and the following is a step of performing virtual machine introspection using vollatity.
(1) And constructing a symbol table of a client operating system, wherein the windows operating system can use the symbol table provided by Volatinity, the Linux operating system is obtained by using DWARF2JSON, and the DWARF2JSON has the function of processing DWARF and symbol table information from the ELF file and symbols of a System map input file to generate a JSON file for the Volatinity to analyze Linux.
(2) And analyzing by using the memory dump file acquired in the last step, acquiring the running state characteristics such as an API (application programming interface) calling sequence, memory operation, network activity, registry and file operation when the client virtual machine runs by using plugins such as callbacks, dlllist, filescan and handles of the vollatity, and storing the acquired data in different documents according to the types.
The malicious code detection method of the embodiment is different from the conventional malicious code detection method, the method selects offline to perform classification detection, and in the steps, the running state feature acquisition is to acquire the client from the hypervisor layer, so that the damage or bypass of malicious software to the detection system is avoided. Moreover, the virtual machine introspection technology is adaptive to a plurality of systems, and required running state information can be extracted from different systems for subsequent classification prediction work without re-adaptation.
Step three, model training
The method selects the BERT model, the training process of the model is divided into a pre-training stage and a training stage, the pre-training stage can help the BERT model to understand word meaning and inter-sentence relation, and the training stage can enable the BERT model to learn word dependence relation in sentences and capture internal structures of the sentences. BERT model detection principle referring to fig. 2, context information is learned using a transformer encoder to enhance the semantic representation of the target word. Wherein the flow at the left side of fig. 2 is the complete process of one transform encoder in the right side diagram. After data is input into a BERT model, semantic vector representation of each word in a data text is respectively enhanced through a multi-head self-attention module, the input data and the output data passing through the multi-head self-attention module are added through residual connection to serve as new input and are subjected to standardization processing, and then linear conversion is performed on each word twice to enhance the expression capacity of the whole model. The data set of the method is the running state characteristics on the cloud platform extracted through the virtual machine introspection technology, and the context relationship is close. The context relation of the text is considered during the classification of the BERT model, which is beneficial to improving the detection accuracy, so the method selects the BERT model as the classification model.
(1) Collecting a plurality of running state characteristics obtained by using the virtual machine introspection technology, and labeling the running state characteristics as input data. After the running state features acquired by the virtual machine introspection technology are acquired, the running state features need to be sorted and marked to serve as input data of a training model, a text _ dataset _ from _ direction function is used for reading a data set and is divided into a training set and a verification set, and the proportion is 8: 2.
(2) the BERT model is pre-trained. The pre-training task can adjust parameters of the BERT model, so that the output of the model can accurately express semantics. The BERT model used in the method comprises the following structure of adjustable hyper-parameters: the number of concealment layers L (L ═ 2,4,8, 12), the size of each concealment layer H (H ═ 128,256,512,768), and the number of anchorage headers a (a ═ 2,4,8, 12). And selecting the model structure with the best classification effect by adjusting the hyper-parameters, and obtaining the best classification result when the parameters in the experiment are L-2, H-512 and A-8.
(3) And pre-training the BERT model by using the data set collected in the second step to complete two pre-training tasks of mask LM and Next Sequence Prediction (NSP), so that the BERT model can better understand the relationship between words and sentences. And (5) retraining the model after pre-training by using the same data set again, and finally obtaining the trained BERT classification model.
Step four, malicious code detection
The malicious code detection is divided into two steps, firstly, model classification accuracy of a plurality of characteristics is utilized to calculate, and a weight formula corresponding to each characteristic is calculated. And then, carrying out malicious code detection by using a weight formula and the trained BERT model to obtain a final detection result.
(1) The first n features (n is a positive integer) with the highest classification accuracy are selected from all the features used in the model training. In this embodiment, the first ten features with the highest classification accuracy are screened out as the features for finally performing malicious code detection, and all the model classification accuracy tables using the features are shown in table 1.
Figure BDA0003338063590000071
Figure BDA0003338063590000081
TABLE 1
Since the weight considering the high classification accuracy has a greater influence on the final detection result, a weight is set for each feature, and the weight calculation formula is as follows:
Figure BDA0003338063590000082
wherein, wiIs the weight of the ith feature, acciThe model classification accuracy for the ith feature.
Figure BDA0003338063590000083
Is the sum of the model classification accuracies of the n features. As can be seen from the formula, the classification is accurateThe higher the degree, the greater the weight of the feature, the greater the impact on the final detection result.
(2) When malicious code detection is carried out on a client virtual machine to be detected, the first ten characteristics (respectively: filescan, netscan, malfind, priv, modules, psxview, pslist, svcscan, thrdscan and mutantscan) with higher classification accuracy of the client virtual machine to be detected (the system is Ubuntu16.04) are obtained, and classification accuracy of each characteristic, namely classification accuracy R of malicious code detection by using the ith characteristic is obtained by respectively carrying out classification detection on the characteristics by using a BERT model trained in the step threei. And multiplying the classification accuracy of each feature by the corresponding weight and the detection result, and adding the multiplication results to obtain the final detection result, wherein the formula is as follows.
Figure BDA0003338063590000091
Wherein R is0Probability of malicious code being present in the guest virtual machine, wiIs the weight of the ith feature, RiThe classification accuracy r of the ith feature of the guest virtual machine to be testedi Setting 1 represents that the detection result of the ith feature is malicious.
In this embodiment, there are two acc locations related to the classification accuracyiAnd Ri。acciTraining a BERT model by taking a data set as input to obtain output which is acciI.e. the classification accuracy of the model. RiThe method comprises the steps of using the ith characteristic data of a virtual machine to be tested as input, using a trained BERT model for classification, and obtaining output which is RiI.e., the classification accuracy of malicious code detection using the ith feature. acc (acrylic acid)iAnd RiAs output of the training model and output of the classification using the model, acc, respectivelyiIncludes operating state characteristic data of two types of clients, and RiThen the ith running state feature of the virtual machine to be tested is taken as input, acciHas two systems in common, and RiIt is for the system under test, bothNo additional calculation steps are required for the acquisition.
In this embodiment, first, the memory dump function of the virtualization platform is used to obtain its memory dump files from outside the client virtual machine, and the introspection software of the client virtual machine is used to analyze these memory dump files to obtain the running state information of the client virtual machine. By the method, damage and bypass of malicious codes to the detection system are effectively prevented, adjustment and change are not needed for different types of client operating systems, adaptation and change are needed for different types of operating systems in the existing method, and the operating systems without visual interfaces cannot be well detected. According to the method, the obtained running state information is classified by using the BERT framework, additional analysis or feature extraction is not needed, and the time overhead is reduced. And the method achieves 99.9% of classification accuracy and effectively ensures the safety of the cloud platform.
The terms referred to in the present embodiment are explained as follows.
Hypervisors, also known as Hypervisor, are software, firmware, or hardware used to build and execute virtual machines.
BERT: is called as the Bidirective IEncoder responses from Transformer. The goal of the BERT model is to obtain the Representation of the text containing rich semantic information by using large-scale unmarked corpus training, namely: and performing semantic representation on the text, then performing fine adjustment on the semantic representation of the text in a specific NLP task, and finally applying the semantic representation of the text to the NLP task.
Volatinity: and storing a evidence obtaining tool.
dwarf2 json: and acquiring the symbol table and saving the symbol table as a tool of a json file.
API: an application programming interface.
Mask LM: the Language Model is named Mask Language Model and is based on a Mask mechanism pre-training Language Model.
Next Sequence Prediction (NSP): and learning the training task of the relation between sentences.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (9)

1. A method for detecting malicious codes of a client virtual machine in a cloud platform is characterized by comprising the following steps:
step S1, obtaining a memory dump file, creating and starting a client virtual machine in the cloud platform, and obtaining the memory dump file of the client virtual machine by using the memory dump function of the virtualization platform;
step S2, the virtual machine introspection extracts information, analyzes the memory transfer file, and obtains various running state characteristics of the client virtual machine through the virtual machine introspection technology;
s3, model training, namely training a plurality of running state characteristics in sequence by using a BERT model to obtain the trained BERT model and the model classification accuracy corresponding to the various running state characteristics;
and step S4, malicious code detection, namely, sequentially inputting the detected running state characteristics into the trained BERT model for detection to obtain detection results of various running state characteristics, distributing the weight of the detection result of each running state characteristic according to the model classification accuracy corresponding to the various running state characteristics, multiplying the detection result of each running state characteristic by the corresponding weight respectively, and adding the multiplication results to obtain the final detection result.
2. The method for detecting malicious code of a client virtual machine in a cloud platform according to claim 1, wherein step S1 includes:
step S101, a client virtual machine is created and started in a cloud platform;
step S102, saving a snapshot of a client virtual machine as a recovery point, running normal software and malicious software in the client virtual machine, and simulating a scene of normal use of a user and invasion of the malicious software;
s103, acquiring a memory dump file of the client virtual machine by using a memory dump function of the virtualization platform;
and S104, restoring the client virtual machine to a restoring point, running the rest software and repeating the step S103.
3. The method for detecting malicious code of a client virtual machine in a cloud platform according to claim 1, wherein step S2 includes:
step S201, constructing a symbol table of a client operating system;
step S202, analyzing the internal storage transfer file, acquiring various running state characteristics of the client virtual machine during running, and storing the acquired data in different documents according to categories.
4. The method according to claim 3, wherein the guest virtual machines include a linux system virtual machine and a windows system virtual machine, the type of the guest virtual machine is determined, and if the guest virtual machine is a windows system virtual machine, the symbol table is obtained by using a vollatity in step S201; if the client virtual machine is a linux system virtual machine, in step S201, a symbol table is obtained by using dwarf2 json.
5. The method for detecting malicious code of a client virtual machine in a cloud platform according to claim 4, wherein the linux system virtual machine adopts an ubuntu16.04 operating system, and the windows system virtual machine adopts a windows 7 operating system.
6. The method for detecting malicious code of a client virtual machine in a cloud platform according to claim 1, wherein step S3 includes:
s301, sorting and marking the running state features acquired in the step S2 to be used as an input data set of the BERT model, and dividing the input data set into a training set and a verification set;
s302, adjusting a hyper-parameter structure of the BERT model;
and S303, inputting the input data set into the BERT model for pre-training, completing two pre-training tasks of mask LM and NSP, and then using the same input data set again to train the pre-trained model again, so as to finally obtain the trained BERT classification model and the model classification accuracy corresponding to various operation state characteristics.
7. The method for detecting malicious codes of a client virtual machine in a cloud platform according to claim 6, wherein the step S302 adjusts the hyper-parameter structure of the BERT model to:
the number of hidden layers L is 2, the hidden layer size H is 512, and the number of attention headers a is 8.
8. The method for detecting malicious code of a client virtual machine in a cloud platform according to claim 1, wherein the step S4 includes:
screening n running state features with the highest model classification accuracy;
calculating the weight of the detection result of each running state characteristic according to the model classification accuracy of the screened running state characteristics, wherein the weight calculation formula is as follows:
Figure FDA0003338063580000021
wherein, wiAs a weight of the ith operating state feature, acciFor the model classification accuracy of the ith operating condition feature,
Figure FDA0003338063580000022
the sum of the model classification accuracy of n running state features, wherein n is an integer greater than zero;
when malicious codes of a client virtual machine to be detected are detected, firstly, n selected running state features of the client virtual machine to be detected are obtained, then, the screened running state features are classified and detected by using a trained BERT model, detection results of various running state features are obtained, and then, a final detection result calculation formula is as follows:
Figure FDA0003338063580000031
wherein R is0Probability of malicious code being present in the guest virtual machine, wiIs the weight of the ith feature, RiThe classification accuracy r of the ith feature of the guest virtual machine to be testediSetting 1 represents that the detection result of the ith feature is malicious.
9. The method for detecting malicious codes of a client virtual machine in a cloud platform according to claim 8, wherein the screened out operation status features comprise a filescan, a netscan, a malfine, a priv, modules, a psxview, a pslist, a svcscan, a thrdscan and a mutantscan.
CN202111300132.2A 2021-11-04 2021-11-04 Method for detecting malicious codes of client virtual machine in cloud platform Pending CN114077479A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111300132.2A CN114077479A (en) 2021-11-04 2021-11-04 Method for detecting malicious codes of client virtual machine in cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111300132.2A CN114077479A (en) 2021-11-04 2021-11-04 Method for detecting malicious codes of client virtual machine in cloud platform

Publications (1)

Publication Number Publication Date
CN114077479A true CN114077479A (en) 2022-02-22

Family

ID=80283616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111300132.2A Pending CN114077479A (en) 2021-11-04 2021-11-04 Method for detecting malicious codes of client virtual machine in cloud platform

Country Status (1)

Country Link
CN (1) CN114077479A (en)

Similar Documents

Publication Publication Date Title
Chawla et al. Host based intrusion detection system with combined CNN/RNN model
JP6727610B2 (en) Context analysis device and computer program therefor
CN111709406B (en) Text line identification method and device, readable storage medium and electronic equipment
CN109391706A (en) Domain name detection method, device, equipment and storage medium based on deep learning
CN110135505B (en) Image classification method and device, computer equipment and computer readable storage medium
US11600088B2 (en) Utilizing machine learning and image filtering techniques to detect and analyze handwritten text
WO2019115200A1 (en) System and method for efficient ensembling of natural language inference
KR101545809B1 (en) Method and apparatus for detection license plate
CN110618854B (en) Virtual machine behavior analysis system based on deep learning and memory mirror image analysis
CN112035345A (en) Mixed depth defect prediction method based on code segment analysis
CN112270325A (en) Character verification code recognition model training method, recognition method, system, device and medium
CN116337448A (en) Method, device and storage medium for diagnosing faults of transfer learning bearing based on width multi-scale space-time attention
CN112882899A (en) Method and device for detecting log abnormity
CN113761875B (en) Event extraction method and device, electronic equipment and storage medium
CN114077479A (en) Method for detecting malicious codes of client virtual machine in cloud platform
CN115718830A (en) Method for training information extraction model, information extraction method and corresponding device
KR20230059607A (en) Method for Automating failure prediction of virtual machines and servers through log message analysis, apparatus and system thereof
CN112698977B (en) Method, device, equipment and medium for positioning server fault
CN111914536B (en) Viewpoint analysis method, viewpoint analysis device, viewpoint analysis equipment and storage medium
CN110909688B (en) Face detection small model optimization training method, face detection method and computer system
CN115035463B (en) Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium
Li et al. A dictionary learning method based on self-adaptive locality-sensitive sparse representation
CN113688386A (en) Graph structure-based intelligent detection method and system for malicious PDF (Portable document Format) document
CN111610975A (en) Executable file type determination method, device, equipment and storage medium
Presas Valga Semi-supervised object detection using SoftTeacher and Transformers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination