CN113010268A

CN113010268A - Malicious program identification method and device, storage medium and electronic equipment

Info

Publication number: CN113010268A
Application number: CN202110302168.8A
Authority: CN
Inventors: 范宇河; 甘祥; 郑兴; 彭婧; 郭晶; 刘羽; 唐文韬; 申军利
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-06-22
Anticipated expiration: 2041-03-22
Also published as: CN113010268B

Abstract

The present disclosure provides a malicious program identification method, apparatus, electronic device and computer readable medium; relates to the technical field of safety. The malicious program identification method comprises the following steps: determining virtual memory information of the virtual machine, and determining memory data to be detected according to the virtual memory information; performing feature extraction processing on the memory data to be detected to obtain the memory features to be detected; classifying, identifying and processing the memory characteristics to be detected to obtain a memory classification result of the virtual machine; and determining whether the malicious program exists in the virtual machine according to the memory classification result. When the malicious program detection is carried out, the kernel malicious program can be prevented from being easily found and avoided, and the known and unknown kernel malicious programs in the virtual machine can be effectively detected, so that the attack of the malicious program on the virtual machine can be effectively responded, and the protection is provided for the safe operation of the virtual machine.

Description

Malicious program identification method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of security technologies, and in particular, to a malicious program identification method, a malicious program identification apparatus, an electronic device, and a computer-readable storage medium.

Background

With the rapid development and popularization of cloud computing, Virtual Machine (VM) technology is increasingly used, which makes network malicious attacks to be targeted on VMs. rootkits, refers to a collection of malicious computer software designed to allow access to a computer or other unlicensed region of software (e.g., an unauthorized user) and often mask its presence or the presence of other software.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a malicious program identification method, a malicious program identification apparatus, an electronic device, and a computer-readable storage medium, so as to overcome the problems that an unknown malicious program cannot be found and a kernel-level malicious program cannot be detected by using an existing anti-virus solution to a certain extent.

According to an aspect of the present disclosure, there is provided a malware identification method including: determining virtual memory information of the virtual machine, and determining memory data to be detected according to the virtual memory information; performing feature extraction processing on the memory data to be detected to obtain the memory features to be detected; classifying, identifying and processing the memory characteristics to be detected to obtain a memory classification result of the virtual machine; and determining whether the malicious program exists in the virtual machine according to the memory classification result.

According to an aspect of the present disclosure, there is provided a malware identification apparatus including: the memory data determining module is used for determining virtual memory information of the virtual machine and determining memory data to be detected according to the virtual memory information; the memory characteristic determining module is used for extracting the characteristics of the memory data to be detected to obtain the memory characteristics to be detected; the first classification processing module is used for performing classification recognition processing on the memory features to be detected to obtain a memory classification result of the virtual machine; and the result determining module is used for determining whether the malicious program exists in the virtual machine according to the memory classification result.

In an exemplary embodiment of the present disclosure, the memory data determination module includes a memory information determination unit configured to: acquiring snapshot data of a virtual machine, and determining a volatile memory dump file from the snapshot data; and taking the volatile memory dump file as virtual memory information.

In an exemplary embodiment of the present disclosure, the memory data determination module further includes a memory data determination unit, and the memory data determination unit is configured to: acquiring a memory information extraction plug-in; and extracting the information of the virtual memory information through the memory information extraction plug-in to obtain the memory data to be detected.

In an exemplary embodiment of the present disclosure, the memory characteristic determination module includes an object score determination unit configured to: determining a plurality of object state conditions for the driver object from the driver object data; determining object sub-scores respectively corresponding to the driver objects under the condition of each object state; and carrying out weighted calculation processing on the plurality of object sub-scores to obtain the object score.

In an exemplary embodiment of the present disclosure, the malicious program identification apparatus further includes a second classification processing module, where the second classification processing module includes: the model acquisition unit is used for acquiring a pre-constructed memory information classification model; and the classification processing unit is used for performing classification identification processing on the memory characteristics to be detected through the memory information classification model to obtain the memory classification result of the virtual machine.

In an exemplary embodiment of the disclosure, the second classification processing module further includes: a model training unit configured to: acquiring a memory information data set; the memory information data set comprises training memory data of a plurality of virtual machines; determining corresponding training memory characteristics according to the training memory data to generate a memory characteristic training set; and acquiring an initial clustering model, and training the initial clustering model based on a memory feature training set to obtain a memory information classification model.

In an exemplary embodiment of the present disclosure, the classification processing unit includes: the to-be-detected object determining subunit is used for performing feature aggregation processing on the to-be-detected memory features to generate corresponding to-be-detected objects; the association characteristic determining subunit is used for determining a plurality of reference virtual machines and respectively determining the reference memory characteristics of each reference virtual machine; the adjacent object determining subunit is used for performing feature aggregation processing on the reference memory features of each reference virtual machine so as to generate corresponding adjacent objects respectively; the density determining subunit is used for determining local density deviation values between the object to be detected and each adjacent object; and the result determining subunit is used for determining the memory classification result according to the local density deviation value.

In an exemplary embodiment of the present disclosure, the object to be detected determining subunit is configured to: acquiring the memory characteristics to be detected; the memory characteristics to be detected comprise a kernel module value, a driving object address value, a device tree value, a callback value, a system service descriptor table value and an object score; and performing characteristic aggregation processing on the kernel module value, the drive object address value, the equipment tree value, the callback value, the system service descriptor table value and the object score to generate the object to be detected.

In an exemplary embodiment of the present disclosure, the density determining subunit includes: the object distance determining subunit is used for determining the distance between each two objects to be detected and each adjacent object as the object distance; a neighborhood object determination subunit operable to determine a plurality of neighborhood objects from among a plurality of neighboring objects; the reachable distance determining subunit is used for determining reachable distances between the object to be detected and the plurality of neighborhood objects according to the distances of the plurality of objects; and the local density determining subunit is used for determining a local density deviation value according to each reachable distance.

In an exemplary embodiment of the present disclosure, the reachable distance determining subunit is configured to: determining a target adjacent object corresponding to the object to be detected according to the plurality of object distances; the target adjacent objects are nearest adjacent objects with the first appointed number of the objects to be detected; taking the distance between the object to be detected and the adjacent object of the target as the relative distance of the target; and determining the reachable distance according to the relative distance of the target.

In an exemplary embodiment of the present disclosure, the local density determination subunit is configured to: determining an reachable distance corresponding to an object to be detected as a first reachable distance, and determining a first average reachable distance according to the first reachable distance; determining the reachable distance corresponding to the neighborhood object as a second reachable distance, and determining a second average reachable distance according to the second reachable distance; determining the local reachability density of the object to be detected according to the first average reachable distance; determining local reachability density of the neighborhood object according to the second average reachable distance; and determining a local density deviation value according to the local reachability density of the object to be detected and the local reachability density of the neighborhood object.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

According to an aspect of the present disclosure, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the malicious program identification method provided in the above embodiments.

Exemplary embodiments of the present disclosure may have some or all of the following benefits:

in the malicious program identification method provided by the example embodiment of the present disclosure, memory data to be detected is determined from virtual memory information of a virtual machine; performing feature extraction processing on the memory data to be detected to obtain the memory features to be detected; and classifying, identifying and processing the memory features to be detected so as to obtain a memory classification result, and further determining whether the malicious program exists in the virtual machine according to the memory classification result. On one hand, the virtual memory information can be acquired from the outside of the monitored virtual machine, and the malicious program running in the virtual machine cannot escape or interfere with the memory acquisition process, so that the virtual memory information acquisition can be carried out in a trusted mode, and the malicious program of the kernel can be prevented from being discovered or avoided. On the other hand, classification processing is carried out on the memory features to be detected extracted from the memory data to be detected, whether malicious programs exist in the virtual machine or not is judged according to the obtained classification result, and identification and discovery of the kernel-level malicious programs can be achieved by combining kernel-level behavior matching.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 is a schematic diagram illustrating an exemplary system architecture to which a malware identification method and apparatus according to an embodiment of the present disclosure may be applied.

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.

Fig. 3 shows a schematic block diagram of core functional modules corresponding to an anti-virus solution.

Fig. 4 schematically shows a flow chart of a malware identification method according to one embodiment of the present disclosure.

Fig. 5 schematically shows a data flow diagram of a malware identification method according to one embodiment of the present disclosure.

Fig. 6 schematically shows an architectural diagram for running a virtual machine based on an OpenStack platform according to an embodiment of the present disclosure.

FIG. 7 is a diagram that schematically illustrates a process execution dependency structure created by an operating system after a process has been executed, in accordance with an embodiment of the present disclosure.

Fig. 8 schematically illustrates a training flow diagram of a memory information classification model according to one embodiment of the present disclosure.

Fig. 9 schematically illustrates a density of an object X versus its neighbors according to one embodiment of the disclosure.

Fig. 10 shows a schematic diagram of the kth distance (k-5) of an object p according to one embodiment of the present disclosure.

Fig. 11 shows a schematic diagram of object y and object z having the same reachable distance with respect to object X when k is 3, according to one embodiment of the present disclosure.

Fig. 12 schematically shows a block diagram of a malicious program identification apparatus according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Cloud Security (Cloud Security) refers to a generic term for Security software, hardware, users, organizations, secure Cloud platforms for Cloud-based business model applications. The cloud security integrates emerging technologies and concepts such as parallel processing, grid computing and unknown virus behavior judgment, abnormal monitoring of software behaviors in the network is achieved through a large number of meshed clients, the latest information of trojans and malicious programs in the internet is obtained and sent to the server for automatic analysis and processing, and then the virus and trojan solution is distributed to each client.

The main research directions of cloud security include: (1) the cloud computing security mainly researches how to guarantee the security of the cloud and various applications on the cloud, including the security of a cloud computer system, the secure storage and isolation of user data, user access authentication, information transmission security, network attack protection, compliance audit and the like; (2) the cloud of the security infrastructure mainly researches how to adopt cloud computing to newly build and integrate security infrastructure resources and optimize a security protection mechanism, and comprises the steps of constructing a super-large-scale security event and an information acquisition and processing platform through a cloud computing technology, realizing the acquisition and correlation analysis of mass information, and improving the handling control capability and the risk control capability of the security event of the whole network; (3) the cloud security service mainly researches various security services, such as anti-virus services and the like, provided for users based on a cloud computing platform.

In this context, it is to be understood that the term referred to, for example, a Virtual Machine (VM) may be a complete computer system having complete hardware system functionality, emulated by software, running in a completely isolated environment. An Application Programming Interface (API) may be some predefined function or may refer to a convention for linking different components of a software system. An API may be used to provide a set of routines that applications and developers can access based on certain software or hardware without accessing source code or understanding the details of the internal working mechanisms. Hook is a System mechanism provided in Windows for replacing "interrupt" under Disk Operating System (DOS), and is translated into "Hook" or "Hook" in chinese. After a Hook event has occurred for a particular system event, the program that performed the Hook event will be notified by the system upon the occurrence of the Hook event, at which point the program can respond to the event at a first time. The driver, called the device driver in full, may be a special program added to the operating system; which contains information about the hardware device that enables the computer to communicate with the corresponding device.

Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a malware identification method and apparatus according to an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The

terminal devices

101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

The server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The

terminal devices

101, 102, 103 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The method for identifying malicious programs provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, a malicious program identifying apparatus is generally disposed in the server 105. However, it is easily understood by those skilled in the art that the method for identifying malicious programs provided in the embodiment of the present disclosure may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the malicious program identifying apparatus may also be disposed in the

terminal devices

101, 102, and 103, which is not particularly limited in this exemplary embodiment.

For example, in an exemplary embodiment, the operating systems of the

terminal devices

101, 102, and 103 may respectively simulate corresponding virtual systems (i.e., virtual machines), the server obtains virtual memory information of the virtual machines by using the malicious program identification method provided in the embodiment of the present disclosure, performs feature extraction on the memory data to be detected determined according to the virtual memory information to obtain memory features to be detected, performs classification processing on the memory features to be detected, determines whether a malicious program exists in the virtual machines according to the classification processing result, and transmits the determined result to the

terminal devices

101, 102, and 103, and the like. Alternatively, the

terminal devices

101, 102, 103 themselves determine the result of whether a malicious program exists therein by the malicious program identification method provided by the embodiment of the present disclosure.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU 201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 208 including a hard disk and the like; and a communication section 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 210 as necessary, so that a computer program read out therefrom is mounted into the storage section 208 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU)201, performs various functions defined in the methods and apparatus of the present application. In some embodiments, the computer system 200 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 4 to 11, and the like.

The technical solution of the embodiment of the present disclosure is explained in detail below:

in one anti-virus solution, it may be based on signatures of known malicious programs and installed in the same system as the malicious program. Referring to fig. 3, fig. 3 shows a schematic block diagram of core functional modules corresponding to an anti-virus solution. The core functional modules of the antivirus software 300 may include: the system comprises a feature code scanning module, a file and verification module and a process behavior monitoring module. The antivirus software 300 can perform virus killing and virus protection through the core function module, and perform virus protection mainly based on the signature of the malware. For example, the signature scanning module may scan a file, compare the scanning information of the file with a virus signature library, and determine that the file is infected by a virus if the scanning information matches any of the virus signatures.

However, since the above anti-virus solution is based on signatures of known malware, and is installed in the same system as the malware. Thus, they can only detect known malware, while being easily discovered and circumvented by malware, and are unable to discover some unknown viruses. In addition, with the rapid development and popularization of cloud computing, virtual machine technology is increasingly used, so that network malicious attacks are beginning to be targeted to the VMs. The kernel rootkit hides the existence of the kernel in an Operating System (OS) kernel and simultaneously obtains the privilege of a System administrator, and modifies a key kernel data structure, so that the key kernel data structure is more difficult to detect than other malicious software of any kind, and therefore, the anti-virus solution has the problem that the kernel-level rootkit virus cannot be detected.

Based on one or more of the problems described above, the present example embodiment provides a malware identification method. The malicious program identification method may be applied to the server 105, or may be applied to one or more of the

terminal devices

101, 102, and 103, which is not particularly limited in this exemplary embodiment. Referring to fig. 4, the malware identification method may include the following steps S410 to S440:

and S410, determining virtual memory information of the virtual machine, and determining memory data to be detected according to the virtual memory information.

And S420, performing feature extraction processing on the memory data to be detected to obtain the memory features to be detected.

And S430, classifying, identifying and processing the memory characteristics to be detected to obtain a memory classification result of the virtual machine.

And S440, determining whether the malicious program exists in the virtual machine according to the memory classification result.

In the malicious program identification method provided in the present exemplary embodiment, memory data to be detected is determined from virtual memory information of a virtual machine; performing feature extraction processing on the memory data to be detected to obtain the memory features to be detected; and classifying, identifying and processing the memory features to be detected so as to obtain a memory classification result, and further determining whether the malicious program exists in the virtual machine according to the memory classification result. On one hand, the virtual memory information can be acquired from the outside of the monitored virtual machine, and the malicious program running in the virtual machine cannot escape or interfere with the memory acquisition process, so that the virtual memory information acquisition can be carried out in a trusted mode, and the malicious program of the kernel can be prevented from being discovered or avoided. On the other hand, classification processing is carried out on the memory features to be detected extracted from the memory data to be detected, whether malicious programs exist in the virtual machine or not is judged according to the obtained classification result, and identification and discovery of the kernel-level malicious programs can be achieved by combining kernel-level behavior matching.

The above steps of the present exemplary embodiment will be described in more detail below.

In step S410, the virtual memory information of the virtual machine is determined, and the memory data to be detected is determined according to the virtual memory information.

In this example embodiment, the virtual device may be a device that virtualizes one exclusive device into multiple logical devices through a virtualization technology, and is used by multiple user processes simultaneously. A Virtual Machine (VM) may be a complete computer system with complete hardware system functionality, which is emulated by software, running in a completely isolated environment. The work that can be done in a physical computer can be implemented in a virtual machine. The virtual memory information may be memory information generated when the virtual machine is running. The memory data to be detected can be memory data determined from the virtual memory information, and whether a malicious program exists in the virtual machine can be judged by analyzing and processing the memory data to be detected.

In the running process of the Virtual Machine, the Virtual memory information of the Virtual Machine may be obtained, for example, the Virtual memory information of the Virtual Machine may be obtained through a Virtual Machine Introspection (VMI) technology. VMI may be a technique for externally monitoring the operating state of a system level virtual machine. Because N virtual systems can be simulated in one operating system, each virtual system corresponds to one virtual machine; and the operating system running on the outermost side may be referred to as the parent machine. Because the virtual machines are generated based on the master machine, the master machine has higher authority and can monitor each virtual system.

Referring to fig. 5, fig. 5 schematically illustrates a data flow diagram of a malware identification method according to one embodiment of the present disclosure. In step S510, a data source may be determined. In the present disclosure, the virtual memory information may be obtained based on the virtual machine, for example, the virtual memory information during the running of the virtual machine may be obtained by a parent machine in the virtual device, and since the virtual memory information is obtained by a parent machine other than the monitored VM, a malicious program running on the VM cannot escape or interfere with the memory obtaining process. If the program for acquiring the memory and the malicious program are in the same virtual machine, the program may be damaged or avoided by the malicious program when the virtual memory information is acquired. Thus, this is a trusted way to obtain virtual memory information for a virtual machine.

For example, an OpenStack cloud computing management platform is used for running a virtual machine as an example, an OpenStack system is a cloud operating system, and can be used for managing an entire cloud computing environment, for example, OpenStack can be used for deploying public cloud, private cloud, and hybrid cloud environments; OpenStack can also be used to manage various resources throughout a data center and provide a dashboard or API for users to simplify daily operations. Referring to fig. 6, fig. 6 schematically illustrates an architecture diagram for running a virtual machine based on an OpenStack platform according to an embodiment of the present disclosure. In this architecture, the cloud computing management platform 600, i.e., the OpenStack platform, may provide a shared network and storage resource layer to run bare machines, virtual machines, containers, etc., on the basis of the shared network and storage resource layer. In addition, the bare machine, the virtual machine and the container can also support third-party services, built-in tools and the like. In the operation process of the OpenStack cloud computing management platform, a monitored virtual machine in operation can be used as a data acquisition source of the virtual memory information to acquire the virtual memory information of the virtual machine.

For example, the virtual memory information of the virtual machine may be determined by the following steps.

In an example embodiment of the present disclosure, snapshot data of a virtual machine is obtained, and a volatile memory dump file is determined from the snapshot data; and taking the volatile memory dump file as virtual memory information.

The snapshot data may be an instant copy of a Virtual Machine Disk file (VMWare Virtual Machine Disk Format, VMDK) at a certain point. The snapshot data may include volatile memory dump files and other information. The memory dump file, also called virtual memory, may be executed by virtualizing a space in the hard disk into a memory for storing programs. A volatile memory dump file, also called a volatile memory dump file, may refer to a file that is determined to be lost if the environment in which the memory dump file is stored cannot satisfy a certain condition.

In the running process of the virtual machine, snapshot data of the virtual machine can be obtained, a volatile memory dump file is selected from the snapshot data, and the volatile memory dump file is used as virtual memory information to perform data analysis on the virtual memory information.

For example, the memory data to be detected may be determined by the following steps.

In an example embodiment of the present disclosure, a memory information extraction plug-in is obtained; and extracting the information of the virtual memory information through the memory information extraction plug-in to obtain the memory data to be detected.

The memory information extracting plug-in can be a plug-in included in the memory forensics software, and can be used for extracting useful information in the memory dump file. The memory data to be detected can be memory data extracted from the virtual memory information, and whether a malicious program exists in the virtual machine can be determined by analyzing the memory data to be detected.

Before extracting the information of the virtual memory information, the memory information extracting plug-in can be obtained from the memory forensics software. With reference to fig. 5, in step S520, after the memory information extracting plug-in is obtained, data acquisition may be performed on the virtual memory information through the memory information extracting plug-in, and data used for analyzing whether a malicious program exists in the virtual machine is acquired from the virtual memory information and is used as the memory data to be detected.

For example, a voltility framework, which is a heavyweight framework for volatile memory forensics, may be used to collect data from the virtual memory information. The method and the device can utilize the Volatinity framework to acquire useful information from a volatile memory dump file of an operating system as the memory data to be detected. Some common plug-ins provided by the volability framework may be as shown in table 1, and perform data collection on the virtual memory information through the plug-ins in table 1.

TABLE 1

Memory information extraction plug-in	DETAILED DESCRIPTIONS
		IDT	Outputting an Interrupt Descriptor Table (IDT) address of a processor
DriverIRP	Function address of output driver IRP
		malfind2	Finding and extracting (often malicious) code injected into another process
callbacks	Print system wide notification routes
		ssdt	Print out all SSDT entries

Specifically, an Interrupt Descriptor Table (IDT) plug-in may be used to output an address of an IDT. The DriverIRP plug-in may Output a function address of an Input/Output Request packet (IRP) of a DRIVER OBJECT (DRIVER _ OBJECT). The malfind2 plug-in can be used to find and extract (often malicious) code injected into another process. callbacks plug-ins can be used to print system wide notification normal sequences (routes). The SSDT plug-in may be used to print out all System Services Descriptor Table (SSDT) entries.

In an example embodiment of the present disclosure, the memory data to be detected may include one or more of the following data: kernel module data, driver object data, device tree data, system service descriptor table data, and callback data.

The kernel module data may be related information used to determine whether a hidden kernel module exists in the virtual machine. Driver object data may be related data used to determine whether a Hook event exists for a driver object; hook is a system mechanism provided in Windows to replace "interrupt" under DOS, and chinese is translated as "Hook" or "Hook". After Hook for a particular system event, the program can respond to the event at a first time when the Hook for the event occurs and the program can be notified by the system. The device tree data may be used to check if a malicious driver is attached to a legitimate device. The system service descriptor table data may be used to detect whether the SSDT is modified or Hook. The callback data may be used to determine whether the rootkit hides itself.

Specifically, before the memory data is extracted, the working mode of the Windows system may be introduced. Referring to fig. 7, fig. 7 schematically shows a process running related structure diagram created by an operating system after a certain process is executed according to an embodiment of the present disclosure. In a Windows system, when a process is executed, the operating system will create a structure as in fig. 7. In the user mode, a service, an executable file (such as service. EXE), a process and the like can be generated; in addition, in the kernel mode, a corresponding driver structure, process structure and thread structure may be generated for the driver object.

The malicious programs may include malicious programs at a user level or a kernel level, for example, the antivirus software 300 in fig. 3 is mainly based on the solution proposed by the malicious programs at the user level, while the disclosure is mainly directed to the malicious programs (such as rootkits) at the kernel level, and the rootkits are mainly directed to the system kernel for attack. After the process of the Windows system is executed, a driver structure, a process structure, a thread structure and the like are generated in the kernel mode, so that the memory data to be detected extracted from the virtual memory information can include one or more of kernel module data, driver object data, device tree data, system service descriptor table data and callback data according to a rootkit attack mode. Because the Volatinity framework provides various plug-ins for extracting information from the memory dump file, the memory data to be detected can be acquired correspondingly based on the plug-ins provided by the Volatinity framework.

In step S420, a feature extraction process is performed on the memory data to be detected to obtain the memory feature to be detected.

In this exemplary embodiment, the feature extraction processing may be a process of extracting the memory feature to be detected from the memory data to be detected. The memory features to be detected can be memory features determined according to a rootkit attack mode, and can be used for determining whether a malicious program exists in the virtual machine. The memory characteristics to be detected can comprise one or more of a kernel module value, a driving object address value, a device tree value, a callback value, a system service descriptor table value and an object score; wherein the object score may include a first score and a second score.

Referring to fig. 5 again, in step S520, after the memory data to be detected is obtained, feature extraction processing may be performed on the memory data to be detected to obtain corresponding memory features to be detected. Specifically, feature extraction can be performed from the memory data to be detected in two ways to obtain the memory features to be detected. One is to directly extract the memory data value from the memory data to be detected as the memory characteristic to be detected. The acquired memory data to be detected can be stored in a key value pair form, and when the characteristics are extracted, the corresponding memory data value can be directly extracted from the memory data to be detected stored in the key value pair form to be used as the memory characteristics to be detected. The kernel module value, the address value of the driving object, the device tree value, the callback value, the system service descriptor table value and the like can be obtained in the mode. And the other method is to calculate the related data in the memory data to be detected to obtain the memory characteristics to be detected. The object score may be a to-be-detected memory characteristic obtained in this way. For example, when extracting the memory feature of the object score, the object score may be obtained by determining a plurality of object state conditions from the driving object data and calculating a numerical value corresponding to the plurality of object state conditions.

The memory characteristics to be detected will be described in detail below.

(1) Kernel module values. The kernel module value may be a value that determines whether a hidden kernel module is contained in the virtual machine. rootkits are loaded into the kernel as part of the kernel module, hiding its own module in order to hide its existence. When the kernel module is loaded into the kernel, a metadata structure KLDR _ DATA _ TABLE _ ENTRY is generated, all of which form a linked list. A rootkit may hide its existence by unlinking the corresponding entry from the list.

(2) The drive target address value. Whether a Hook event exists in the driver object can be determined through the address value of the driver object. Since each driver object contains a primary function table that handles different requests from the user mode of the operating system. rootkit can hide itself from Hook these functions, so the results output by the DriverIRP plug-in (i.e., the driver object address value) can be used to confirm whether the driver object exists at Hook.

For example, based on the "relationship driver-f …/test. bin" command executed by the relationship framework, the corresponding output result can be obtained:

0x0218a830 0x823b45b8 7 0 0xb2ef3000 361600 Tcpip Tcpip\Driver\Tcpip

[0][IRP_MJ_CREATE]＝>0xb2ef94f9

[1][IRP_MJ_CREATE_NAMED_PIPE]＝>0xb2ef94f9

[2][IRP_MJ_CLOSE]＝>0xb2ef94f9

[3][IRP_MJ_READ]＝>0xb2ef94f9

[4][IRP_MJ_WRITE]＝>0xb2ef94f9

[5][IRP_MJ_QUERY_INFORMATION]＝>0xb2ef94f9

[6][IRP_MJ_SET_INFORMATION]＝>0xb2ef94f9

[7][IRP_MJ_QUERY_EA]＝>0xb2ef94f9

[8][IRP_MJ_SET_EA]＝>0xb2ef94f9

[9][IRP_MJ_FLUSH_BUFFERS]＝>0xb2ef94f9

[10][IRP_MJ_QUERY_VOLUME_INFORMATION]＝>0xb2ef94f9

[11][IRP_MJ_SET_VOLUME_INFORMATION]＝>0xb2ef94f9

[12][IRP_MJ_DIRECTORY_CONTROL]＝>0xb2ef94f9

[13][IRP_MJ_FILE_SYSTEM_CONTROL]＝>0xb2ef94f9

[14][IRP_MJ_DEVICE_CONTROL]＝>0xf8b615d0

lines [0] to [14] in the output result are function addresses of the driver object, and the object names such as "IRP _ MJ _ CREATE", "IRP _ MJ _ CREATE _ name _ PIPE" and the like are all object names. From the output results, the output addresses corresponding to the [0] th to [13] th lines are the same, and are all "0 xb2ef94f 9"; the "IRP _ MJ _ DEVICE _ CONTROL" object in line [14] points to a different address than the objects in lines [0] through [13], indicating that the "IRP _ MJ _ DEVICE _ CONTROL" object has been Hook.

(3) A device tree value. The device tree values may be used to check if a malicious driver is attached to a legitimate device. Since in the Windows operating system, one device object is allowed to attach to the stack of another device. So that they will be able to handle the same Input/Output (I/O) requests, providing rootkit with an opportunity to intercept legitimate IRPs. Thus, after the device tree is printed out, it can be used to check whether an abnormal/unknown driver exists on a legitimate device.

(4) The value is adjusted back. The callback value may be used to determine whether the rootkit hides itself. If the rootkit hides itself, the corresponding module information will appear unknown, which is a very obvious exception feature.

(5) System service descriptor table values. The system service descriptor table value, i.e. the SSDT value, can be used to check whether the SSDT is modified or Hook. Since many rootkits perform malicious functions by modifying or Hook SSDT, SSDT can be derived to detect whether SSDT is modified or Hook.

The ssdt plug-in may look up each system call table present in the memory sample and then enumerate each system call table entry. For each entry, the ssdt plug-in may print the contained module (if found) or display a UNKNOWN if the entry points to a memory location that is not related to the kernel itself or the modules contained in the kernel module list. Several common kernel rootkit methods of hidden code classify triggers as UNKNOWN.

(6) A first score. The first score may be a score obtained by further calculation of the driver object in the first object state condition, and may be represented by "driver _ score".

(7) A second score. The second score may be a score obtained by further calculating the driver object in the second object state condition, and may be represented by "driver _ score _ deep".

For example, the object score may be obtained from the memory data to be detected through the following steps.

In one example embodiment of the present disclosure, a plurality of object state conditions for a driver object are determined from driver object data; determining object sub-scores respectively corresponding to the driver objects under the condition of each object state; and carrying out weighted calculation processing on the plurality of object sub-scores to obtain the object score.

The object state condition may be a state condition related to a type of the driver object, a name of the driver object, a main function, and the like. The object state condition may include a first object state condition and a second object state condition; the first object state condition may include a type of a driver object, a name of the driver object, a numerical value of a related position of the main function, and other related state conditions; the second object state condition may be a state condition associated with a start address, address length, etc. of the driver object. The object sub-score may be a score value corresponding to the driver object under each object state condition. The object sub-score may include a first sub-score and a second sub-score; the first sub-score may be a score value corresponding to the driver object in each first object state condition; the second sub-score may be a score value corresponding to the driver object in each second object state condition. The object score may be a score value obtained by performing a weighted calculation process on a plurality of object sub-scores, and for example, the object score may include a first score and a second score.

After the driver object information is acquired, a plurality of first object state conditions corresponding to the driver object may be determined from the driver object information, and a first sub-score corresponding to each first object state condition may be determined. And performing weighted calculation processing on the obtained plurality of first sub-scores to obtain corresponding first scores. The first object state condition and the corresponding first sub-score may specifically refer to table 2.

TABLE 2

Specifically, "DRIVER _ OBJECT _32.Type ═ 0x 04" may indicate that the value of the DRIVER OBJECT Type is 4, and if "if (DRIVER _ OBJECT _32.Type ═ 0x 04)" is satisfied, the corresponding first sub-score is 2.

The computation logic of the chk _ string function may be string.maximum length > -string.length & & string.buffer! NULL & iswprint (string); where iswprint () is a built-in function in C/C + +, which checks whether a given wide character can be printed.

In addition, two fields of two types, drive _ OBJECT _32.DriverName and drive _ OBJECT _32.hardware database, are originally defined in the code as follows:

typedef struct_LSA_UNICODE_STRING{

USHORT Length；

USHORT MaximumLength；

}

therefore, the purpose of the function chk _ string (DRIVER _ OBJECT _32.DriverName) and the function chk _ string (DRIVER _ OBJECT _32.hardware database) is to determine whether the input unicode is a legal unicode structure. Similarly, when the "if (DRIVER _ OBJECT _32. DriverName)" condition is satisfied, the corresponding first sub-score is 2; when the "if (chk _ string (DRIVER _ OBJECT _32. HardwareDatabase))" condition is satisfied, the corresponding first sub-score is 1. "DRIVER _ OBJECT _32.MajorFunction [0]) > > 31" may represent the acquisition of the 31 st digit of the MajorFunction function. When the "if (DRIVER _ OBJECT _32.MajorFunction [0]) > > 31" condition is satisfied, the corresponding first sub-score is 3. The first score can be obtained by performing a weighted calculation process on the first sub-score in table 2.

In addition, the second object state condition and the second sub-score corresponding to the driver object can be referred to table 3. In table 3, "%" represents the remainder calculation.

TABLE 3

The second sub-scores of the driver objects under the second object state conditions are shown in table 3. Similarly, the second score can be obtained by performing a weighted calculation process on the second sub-score in table 3.

Further, in some other exemplary embodiments, the first score and the second score may not be distinguished, and the first score and the second score may be combined into an object score to form a memory feature.

In step S430, the memory features to be detected are classified, identified, and processed to obtain the memory classification result of the virtual machine.

In this exemplary embodiment, the classification and identification process may be a process of performing classification processing on the determined memory features to be detected. The memory classification result may be a classification result of a memory state of the virtual machine obtained by classifying the memory features to be detected, and may be used to determine whether a malicious program exists in the virtual machine.

After the memory features to be detected are obtained, classification and identification processing can be carried out on the memory features to be detected, and due to the fact that a classification method of unsupervised learning does not need to train samples, kernel rootkit viruses are not common, so that the samples are very few, the memory features cannot be trained sufficiently by using supervised learning, and a satisfactory effect is difficult to achieve. In addition, in the information related to the internal access certificate, the ratio of the normal memory information to the malicious memory information is very different, and usually more than 95% of the whole request is the normal memory information, and the imbalance is very difficult for the supervised learning method. Therefore, the method and the device can process the memory characteristics to be detected by adopting an unsupervised cluster analysis method so as to determine the memory classification result corresponding to the virtual machine.

In an example embodiment of the present disclosure, a pre-constructed memory information classification model is obtained; and classifying, identifying and processing the memory characteristics to be detected through a memory information classification model to obtain a memory classification result of the virtual machine. The memory information classification model can be a classification model trained by adopting a cluster analysis method of unsupervised learning.

Specifically, the memory characteristics to be detected obtained after characteristic extraction is carried out on the memory data to be detected can be analyzed through a memory information classification model, when enough samples exist, an unsupervised machine learning algorithm can learn the memory characteristics, the samples are divided into normal and abnormal without supervision, and kernel-level behavior matching is combined, so that kernel rootkit virus identification and discovery are achieved.

It should be noted that, in the present disclosure, the memory characteristics to be detected extracted in the memory data determining step and the characteristic extracting step may be directly input into the memory information classification model, and the memory information classification model determines the corresponding memory classification result. The virtual memory information can also be directly input into the memory information classification model, the memory data determination step and the feature extraction step are carried out by the memory information classification model to obtain the memory features to be detected, and then the memory features to be detected are classified to obtain the corresponding memory classification results. The present disclosure is not limited thereto in any particular way.

In an example embodiment of the present disclosure, the memory information classification model is obtained by training through the following steps: acquiring a memory information data set; the memory information data set comprises training memory data of a plurality of virtual machines; determining corresponding training memory characteristics according to the training memory data to generate a memory characteristic training set; and acquiring an initial clustering model, and training the initial clustering model based on a memory feature training set to obtain a memory information classification model.

The memory information data set may be a data set composed of training memory data of a plurality of virtual machines. The training memory data of the virtual machine may be memory data determined from the acquired virtual memory information. The training memory features may be memory features obtained by performing feature extraction processing on training memory data. The memory feature training set may be a data set consisting of a plurality of training memory features. The initial clustering model may be a pre-established cluster classification model.

After the memory features to be detected are determined, the memory features to be detected can be classified through a pre-constructed memory information classification model. Referring to fig. 8, fig. 8 schematically illustrates a training flow diagram of a memory information classification model according to an embodiment of the present disclosure. The memory information classification model is obtained by training through the following steps: in step S810, a memory information data set is acquired; the memory information dataset includes training memory data for a plurality of virtual machines. The method includes the steps that a memory information data set formed by training memory data of a plurality of virtual machines is obtained, the obtaining mode of the training memory data of the virtual machines is the same as that of the memory data to be detected, and the obtaining mode of the training memory data of the virtual machines is not repeated in the method. In step S820, a training memory feature corresponding to each training memory data is determined to generate a memory feature training set. And performing feature extraction processing on the acquired training memory data to obtain corresponding training memory features. The process of extracting the features of the training memory data is the same as the process of extracting the features of the memory data to be detected to obtain the features of the memory to be detected, which is not described in detail in this disclosure. After the obtained training memory features are extracted, a memory feature training set can be generated according to the training memory features. In step S830, an initial clustering model is obtained, and the initial clustering model is trained based on the memory feature training set to obtain a memory information classification model. After the initial clustering model is obtained, the obtained memory feature training set can be input into the initial clustering model, and the initial clustering model is trained based on the memory feature training set to obtain a memory information classification model.

For example, the following steps may be adopted to perform classification and identification processing on the memory features to be detected through the memory information classification model.

In an example embodiment of the present disclosure, feature aggregation processing is performed on memory features to be detected to generate a corresponding object to be detected; determining a plurality of reference virtual machines, and respectively determining the reference memory characteristics of each reference virtual machine; performing feature aggregation processing on the reference memory features of each reference virtual machine to respectively generate corresponding adjacent objects; determining a local density deviation value between an object to be detected and each adjacent object; and determining a memory classification result according to the local density deviation value.

The characteristic aggregation processing may be a process of aggregating a plurality of memory characteristics to be detected. The object to be detected may be an object obtained by performing feature aggregation processing on a plurality of memory features to be detected. The reference virtual machine may be a virtual machine running in the present virtual device or a virtual machine running in another virtual device. The reference memory characteristics may be memory characteristics extracted from virtual memory information of the reference virtual machine. The adjacent object may be an object obtained by performing feature aggregation processing on a plurality of reference memory features. The local density deviation value may be obtained by comparing a local density value of the object to be detected with a local density value of each adjacent object.

After determining the plurality of memory features to be detected, feature aggregation processing can be performed on the memory features to be detected, so that corresponding objects to be detected are generated. Specifically, feature aggregation processing may be performed on the kernel module value, the driver object address value, the device tree value, the callback value, the system service descriptor table value, and the object score wait-to-detect memory features extracted from the virtual machine, so as to generate an object to be detected. After the plurality of reference virtual machines are determined, the reference memory characteristics respectively corresponding to the reference virtual machines can be determined through the memory data acquisition step and the characteristic extraction step. The reference memory characteristics may be kernel module values, driver object address values, device tree values, callback values, system service descriptor table values, object scores, and the like corresponding to the respective reference virtual machines. And performing feature aggregation processing on each reference memory feature to generate adjacent objects. For the generated object to be detected and each adjacent object, a local density deviation value between the object to be detected and each adjacent object can be calculated, so as to determine a memory classification result according to the local density deviation value.

In an example embodiment of the present disclosure, a memory feature to be detected is obtained; the memory characteristics to be detected comprise a kernel module value, a driving object address value, a device tree value, a callback value, a system service descriptor table value and an object score; and performing characteristic aggregation processing on the kernel module value, the drive object address value, the equipment tree value, the callback value, the system service descriptor table value and the object score to generate the object to be detected.

Specifically, the determined characteristics of the memory to be detected can be referred to as shown in table 4; the kernel module value, the drive object address value, the equipment tree value, the callback value and the system service descriptor table value can be Boolean type data; the first score and the second score may be integer data. When the value of the kernel module value is 0, the kernel module without hiding can be represented; when the value of the kernel module value is 1, a hidden kernel module can be represented. When the value of the driver object value is 0, it may indicate that the driver object is not Hook; when the value of the driver object value is 0, it may indicate that the driver object is Hook. When the value of the device tree value is 0, it can indicate that there is no abnormal/unknown driver on the device; when the value of the device tree value is 1, it may indicate that there is an abnormal/unknown driver on the device. When the value of the callback value is 0, it can be shown that the rootkit does not hide itself; when the value of the callback value is 1, it can be indicated that the rootkit hides itself. When the value of the system service descriptor table is 0, it may indicate that the SSDT is not modified or Hook; when the value of the system service descriptor table is 1, it may indicate that the SSDT has been modified or Hook.

And performing feature aggregation processing on the memory features to be detected in the table 4 to generate an object to be detected, wherein the generated object to be detected can be represented by one point. By way of example, implementation may be employed to:

X＝[[x1],[x2],[x3],[x4],[x5],[x6],[x7]]；

clf＝LocalOutlierFactor(n_neighbors＝2)；

clf.fit_predict(X)；

wherein, X can represent the memory features to be detected, and since the present disclosure includes 7 memory features to be detected, the X array corresponds to 7 values; localeotlierfactor () may represent a local anomaly factor algorithm; the memory feature X to be detected can be subjected to feature aggregation processing through the clf.fit _ prediction (X), and a corresponding object to be detected is generated.

TABLE 4

Those skilled in the art will readily understand that the process of performing feature aggregation processing on the reference memory features and generating the adjacent objects is the same as the above processing process, and details of this disclosure are not repeated here. Therefore, after feature aggregation processing is performed on the reference memory features, the object points corresponding to one another can be generated.

Clustering analysis may be the process of grouping a set of similar objects in a data set. These similar groups, called clusters, are widely used in machine learning for unsupervised classification tasks. In a practical application scenario, there are a variety of different clustering models, and each model corresponds to a variety of clustering algorithms. The most common cluster analysis algorithms may include connectivity-based cluster analysis algorithms, center-based cluster analysis algorithms, distribution-based cluster analysis algorithms, and density-based cluster analysis algorithms; wherein the density-based clustering algorithm may group objects using the concept of data density.

This disclosure will be described by taking as an example a process of clustering with a Local density algorithm using Local Outlier Factors (LOFs), using LOF algorithms for class learning, rather than finding global bias values as with other density-based clustering methods, attempting to find anomalous data objects by measuring Local bias of a given data object relative to its neighbors. To estimate the density, the distances of these neighboring objects need to be considered first. By comparing the local density of an object with the local densities of its neighbors, regions of similar density can be determined. In addition, objects with a lower density than their neighbors may also be detected, which may be considered as deviating objects. Referring to fig. 9, fig. 9 schematically illustrates a density of an object X versus a density of its neighbors, according to one embodiment of the disclosure. The density of object X in fig. 9 is much lower than that of its neighbors.

In one example embodiment of the present disclosure, a distance between each pair of an object to be detected and each adjacent object is determined as an object distance; determining a plurality of neighborhood objects from a plurality of neighboring objects; determining the reachable distance between the object to be detected and a plurality of neighborhood objects according to the distances of the objects; a local density deviation value is determined based on each of the reachable distances.

Wherein the object distance may be a distance between any two objects. The neighborhood objects may be all neighboring objects in a certain neighborhood of the object to be detected, for example, the kth distance neighborhood of the object to be detected may be all objects (including object points of the kth distance) within the kth distance of the object to be detected. The reachable distance may be a distance between the object to be detected and a first specified number of nearest neighbors.

Specifically, before calculating the local density deviation value, a plurality of adjacent objects corresponding to the object to be detected and the object distance between each adjacent object and the adjacent object may be determined. And determining a k-th distance neighborhood of the object to be detected, and taking the object in the k-th distance neighborhood as a neighborhood object. And determining the reachable distances between the object to be detected and the plurality of neighborhood objects from the determined distances of the plurality of objects. Further, a local density deviation value is determined based on the calculated plurality of reachable distances.

For example, the reachable distance may be calculated by the following steps.

In one example embodiment of the present disclosure, a target neighboring object corresponding to an object to be detected is determined according to a plurality of object distances; the target adjacent objects are nearest adjacent objects with the first appointed number of the objects to be detected; taking the distance between the object to be detected and the adjacent object of the target as the relative distance of the target; and determining the reachable distance according to the relative distance of the target.

The target adjacent object can be an adjacent object which is a distance from the object to be detected to a first specified number of nearest distances; wherein the specified number can be represented by k, and k can represent any number, such as 3, 5, 9, etc. The target relative distance may be a distance between the object to be detected and an object adjacent to the target. For the kth distance (k-distance), assuming that the object to be detected is a point p, the kth distance d of the point p_k(p) is defined as follows: d_k(p) ═ d (p, o), and satisfies: 1) at least k points o 'epsilon C { x ≠ p } in the set, which do not include p, and d (p, o') is less than or equal to d (p, o); 2) at most k-1 points in the set, o 'epsilon C { x ≠ p } excluding p, satisfying d (p, o') < d (p, o). Referring to fig. 10, fig. 10 illustrates a schematic diagram of a kth distance (k-5) of an object p according to one embodiment of the present disclosure. The kth distance of point p, i.e. the distance of points k away from point pIon (excluding p); wherein d is₅(p) may be distance 1010.

Specifically, the reachable distance between the object to be detected and the object in the neighborhood of the object can be calculated according to formula 1.

reachability-distance_k(X, Y) ═ max { k-distance (Y) }, d (X, Y) } (formula 1)

Wherein, reachability-distance_k(X, Y) may represent the kth reachable distance of object Y from object X, at least the kth distance of object Y, or the true distance between object X and object Y; k-distance (Y) may represent the kth distance of object Y, i.e., the distance of object Y to the kth nearest neighbor object (excluding object Y itself); d (X, Y) may represent the distance between object X and object Y.

As can be seen from the above definition of the reachable distance, objects belonging to the k nearest neighbors of X can be considered equidistant. Referring to fig. 11, fig. 11 shows a schematic diagram of object y and object z having the same reachable distance with respect to object X when k is 3, according to one embodiment of the present disclosure. When k is 3, the object y and the object z have the same reachable distance with respect to the object X.

In an example embodiment of the present disclosure, an reachable distance corresponding to an object to be detected is determined as a first reachable distance, and a first average reachable distance is determined according to the first reachable distance; determining the reachable distance corresponding to the neighborhood object as a second reachable distance, and determining a second average reachable distance according to the second reachable distance; determining the local reachability density of the object to be detected according to the first average reachable distance; determining local reachability density of the neighborhood object according to the second average reachable distance; and determining a local density deviation value according to the local reachability density of the object to be detected and the local reachability density of the neighborhood object.

The first reachable distance may be a reachable distance corresponding to the object to be detected, that is, a reachable distance from a neighborhood point of the object to be detected to the object to be detected. The first average reachable distance may be a distance obtained by averaging all the calculated first reachable distances. The second reachable distance may be a reachable distance corresponding to a neighborhood object. The second average reachable distance may be a distance obtained by averaging all the calculated second reachable distances. The local accessibility density (lrd) of the object to be detected may be a density value determined from the first average reachable distance. The local reachability density of the neighborhood object may be a density value determined from the second average reachability distance.

Specifically, since the object to be detected may be represented by X and the neighborhood object may be represented by Y, the local reachability density lrd (X) of the object to be detected may be defined as the reciprocal of the average reachability distance between the point in the kth neighborhood of the object to be detected X and the point X. The specific calculation method of lrd (x) can be referred to formula 2.

Wherein N is_k(X) may represent the kth distance domain for point X, i.e., all points within the kth distance of point X, including the kth distance; | N_k(X) | may represent the number of points of the k-th field of point X.

According to the calculation mode of the local reachability density of the object to be detected, the local reachability density of the neighborhood object of the object to be detected can be calculated, which is not described in detail in this disclosure. After the local reachability density of the neighboring object is calculated, a local density deviation value may be determined from the local reachability density of the object to be detected and the local reachability density of the neighboring object. The local density deviation value can be a local outlier factor of an object X to be detected (k neighbors are taken), and LOF can be adopted_k(X) represents. The local density deviation value may be defined as an average of ratios of local reachability densities of the neighborhood points of the object to be detected to local reachability densities of the object to be detected itself. Specifically, the local density deviation, LOF, of the object to be detected can be calculated according to equation 3_k(X) may represent a local density score for comparison of the object to be detected and neighboring objects in its neighborhood.

In step S440, it is determined whether a malicious program exists in the virtual machine according to the memory classification result.

In this example embodiment, the memory classification result may be a classification result determined according to the local reachability density value. For example, LOF for object to be detected_k(X) if LOF_k(X) if the value of (X) is 1, the local density of the object X to be detected is the same as that of the neighbor; if LOF_kThe value of (X) being a larger value, e.g. LOF_kIf the value of (X) is 20, the object X to be detected is an abnormal value, and the larger the value is, the more the object X deviates from the cluster group.

With continued reference to fig. 5, in step S530, an unsupervised machine learning manner may be adopted to perform cluster analysis processing on the extracted to-be-detected memory features, that is, the above-mentioned to-be-detected memory features are calculated by the method, so as to determine whether memory information with an abnormal value occurs. And if the abnormal value appears in the classification result, which indicates that the malicious program exists in the virtual machine, alarming is carried out so as to achieve the purpose of finding and identifying the malicious program at the kernel level.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Further, in the present exemplary embodiment, a malicious program identification apparatus is also provided. The malicious program identification device can be applied to a server or terminal equipment. Referring to fig. 12, the malware identification apparatus 1200 may include a memory data determination module 1210, a memory characteristic determination module 1220, a first classification processing module 1230, and a result determination module 1240. Wherein:

the memory data determining module 1210 is configured to determine virtual memory information of the virtual machine, and determine memory data to be detected according to the virtual memory information.

The memory feature determining module 1220 is configured to perform feature extraction processing on the memory data to be detected to obtain the memory feature to be detected.

The first classification processing module 1230 is configured to perform classification, identification and processing on the memory features to be detected, so as to obtain a memory classification result of the virtual machine.

And the result determining module 1240 is used for determining whether the malicious program exists in the virtual machine according to the memory classification result.

The details of each module or unit in the above malicious program identification apparatus have been described in detail in the corresponding malicious program identification method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A malware identification method comprising:

determining virtual memory information of a virtual machine, and determining memory data to be detected according to the virtual memory information;

performing feature extraction processing on the to-be-detected memory data to obtain to-be-detected memory features;

classifying, identifying and processing the memory features to be detected to obtain a memory classification result of the virtual machine;

and determining whether a malicious program exists in the virtual machine according to the memory classification result.

2. The method of claim 1, wherein determining the virtual memory information of the virtual machine comprises:

acquiring snapshot data of the virtual machine, and determining a volatile memory dump file from the snapshot data;

and taking the volatile memory dump file as the virtual memory information.

3. The method according to claim 1, wherein the determining the memory data to be detected according to the virtual memory information comprises:

acquiring a memory information extraction plug-in;

and extracting the information of the virtual memory information through the memory information extraction plug-in to obtain the memory data to be detected.

4. The method according to claim 1 or 3, wherein the memory data to be detected comprises one or more of the following data: kernel module data, driver object data, device tree data, system service descriptor table data, and callback data.

5. The method according to claim 1, wherein the memory data to be detected comprises driver object data, and the memory characteristics to be detected comprises an object score of a driver object;

the characteristic extraction processing is carried out on the memory data to be detected to obtain the memory characteristics to be detected, and the characteristic extraction processing comprises the following steps:

determining a plurality of object state conditions for the driver object from the driver object data;

determining object sub-scores respectively corresponding to the driver objects under the object state conditions;

and carrying out weighted calculation processing on the plurality of object sub-scores to obtain the object score.

6. The method of claim 1, further comprising:

acquiring a pre-constructed memory information classification model;

and classifying, identifying and processing the memory characteristics to be detected through the memory information classification model to obtain a memory classification result of the virtual machine.

7. The method of claim 6, wherein the in-memory information classification model is trained by:

acquiring a memory information data set; the memory information data set comprises training memory data of a plurality of virtual machines;

determining corresponding training memory characteristics according to the training memory data to generate a memory characteristic training set;

and acquiring an initial clustering model, and training the initial clustering model based on the memory feature training set to obtain the memory information classification model.

8. The method according to claim 6, wherein the classifying, identifying and processing the memory features to be detected through the memory information classification model to obtain the memory classification result of the virtual machine comprises:

performing characteristic aggregation processing on the memory characteristics to be detected to generate a corresponding object to be detected;

determining a plurality of reference virtual machines, and respectively determining the reference memory characteristics of each reference virtual machine;

performing the feature aggregation processing on the reference memory features of each reference virtual machine to respectively generate corresponding adjacent objects;

determining a local density deviation value between the object to be detected and each adjacent object;

and determining the memory classification result according to the local density deviation value.

9. The method according to claim 8, wherein the performing feature aggregation processing on the to-be-detected memory features to generate corresponding to-be-detected objects comprises:

acquiring the memory characteristics to be detected; the memory characteristics to be detected comprise a kernel module value, a driving object address value, a device tree value, a callback value, a system service descriptor table value and an object score;

and performing feature aggregation processing on the kernel module value, the drive object address value, the equipment tree value, the callback value, the system service descriptor table value and the object score to generate the object to be detected.

10. The method of claim 8, wherein determining local density deviation values between the object to be detected and each of the adjacent objects comprises:

determining the distance between the object to be detected and each adjacent object as an object distance;

determining a plurality of neighborhood objects from a plurality of the neighboring objects;

determining the reachable distance between the object to be detected and a plurality of neighborhood objects according to the object distances;

determining the local density deviation values based on each of the achievable distances.

11. The method according to claim 10, wherein said determining the reachable distance between the object to be detected and the plurality of neighborhood objects according to the plurality of object distances comprises:

determining a target adjacent object corresponding to the object to be detected according to the object distances; the target adjacent objects are nearest adjacent objects with the number of the first appointed number of the objects to be detected;

taking the distance between the object to be detected and the adjacent object of the target as a target relative distance;

and determining the reachable distance according to the target relative distance.

12. The method of claim 10, wherein determining the local density deviation values as a function of each of the achievable distances comprises:

determining the reachable distance corresponding to the object to be detected as a first reachable distance, and determining a first average reachable distance according to the first reachable distance;

determining the reachable distance corresponding to the neighborhood object as a second reachable distance, and determining a second average reachable distance according to the second reachable distance;

determining the local reachability density of the object to be detected according to the first average reachable distance;

determining a local reachability density for the neighborhood object from the second average reachable distance;

and determining the local density deviation value according to the local reachability density of the object to be detected and the local reachability density of the neighborhood object.

13. A malware identification device, comprising:

the memory data determining module is used for determining virtual memory information of the virtual machine and determining memory data to be detected according to the virtual memory information;

the memory characteristic determining module is used for extracting the characteristics of the memory data to be detected to obtain the characteristics of the memory to be detected;

the first classification processing module is used for performing classification recognition processing on the memory features to be detected to obtain a memory classification result of the virtual machine;

and the result determining module is used for determining whether the malicious program exists in the virtual machine according to the memory classification result.

14. An electronic device, comprising:

a processor; and

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the malware identification method of any one of claims 1-12.

15. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out a malware identification method as claimed in any one of claims 1 to 12.