CN116305127A - Virus detection model training method, device and storage medium - Google Patents

Virus detection model training method, device and storage medium Download PDF

Info

Publication number
CN116305127A
CN116305127A CN202310302861.4A CN202310302861A CN116305127A CN 116305127 A CN116305127 A CN 116305127A CN 202310302861 A CN202310302861 A CN 202310302861A CN 116305127 A CN116305127 A CN 116305127A
Authority
CN
China
Prior art keywords
function
detection model
call graph
virus detection
symbol
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310302861.4A
Other languages
Chinese (zh)
Inventor
王绪国
鲍迪
胡越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Asiainfo Technologies (chengdu) Inc
Original Assignee
Asiainfo Technologies (chengdu) Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asiainfo Technologies (chengdu) Inc filed Critical Asiainfo Technologies (chengdu) Inc
Priority to CN202310302861.4A priority Critical patent/CN116305127A/en
Publication of CN116305127A publication Critical patent/CN116305127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a virus detection model training method, a device and a storage medium, which relate to the field of computers and are used for realizing accurate detection of Mirai viruses. The method comprises the following steps: acquiring a plurality of executable files, determining a first function call graph in a plurality of function call graphs, and determining at least one second function with the similarity of function symbols with the first function being larger than a first threshold value; determining a third function with the maximum similarity with the function symbol section of the first function in the at least one second function; adding the function attribute information of the third function as the attribute information of the first function into the first function call graph, and determining a second function call graph; comparing the function information in the second function call graph with the function information in the virus database to determine whether the executable file corresponding to the first function call graph is a virus file; if yes, training a virus detection model based on the second function call graph. The embodiment of the application is applied to the virus detection process.

Description

Virus detection model training method, device and storage medium
Technical Field
The present disclosure relates to the field of computers, and in particular, to a method and apparatus for training a virus detection model, and a storage medium.
Background
With the development of internet technology, internet of things devices are beginning to be widely used. However, the safety protection measures of the internet of things equipment are imperfect, so that Mirai viruses invade the internet of things equipment in a large amount.
The source code of the Mirai virus is open-source and is embodied in the form of a binary executable file, so that the Mirai virus is easy to rapidly evolve, and in the process of the evolution of the Mirai virus, the function name and part of symbol information of the function in the Mirai virus can be deleted, so that the Mirai virus is difficult to accurately detect.
Therefore, how to accurately detect Mirai virus in the internet of things equipment is a technical problem that needs to be solved at present.
Disclosure of Invention
The application provides a virus detection model training method, a device and a storage medium, which are used for realizing accurate detection of Mirai virus.
In order to achieve the above purpose, the present application adopts the following technical scheme:
in a first aspect, the present application provides a virus detection model training method, including a virus detection model training device obtaining a plurality of executable files, each executable file in the plurality of executable files including at least one function; the virus detection model training device determines a plurality of function call graphs based on a plurality of executable files; an executable file corresponds to a function call graph; the virus detection model training device determines a first function call graph in the function call graphs, wherein the first function call graph is a first function comprising missing function attribute information; the virus detection model training device determines at least one second function with the similarity of function symbols with the first function being larger than a first threshold value in a plurality of functions of a plurality of executable files; the virus detection model training device determines a third function with the maximum similarity with the function symbol section of the first function in at least one second function; the function symbol section includes at least one function symbol; the virus detection model training device adds the function attribute information of the third function as the attribute information of the first function into the first function call graph, and determines a second function call graph; the virus detection model training device compares the function information in the second function call graph with the function information in the virus database, determines whether the executable file corresponding to the first function call graph is a virus file, and if so, trains the virus detection model based on the second function call graph.
With reference to the first aspect, in one possible implementation manner, the method further includes: the virus detection model training device compares the function symbol of the first function with the function symbol of each function in the plurality of executable files according to the BLAST algorithm, and determines the similarity of the function symbol of the first function and each function; the function symbols include: the classes of the function symbols and the number of the function symbols are in one-to-one correspondence; the virus detection model training device selects at least one second function with the similarity of the function symbols with the first function being larger than a first threshold value from a plurality of functions of a plurality of executable files.
With reference to the first aspect, in one possible implementation manner, the method further includes: the virus detection model training device compares the function symbol section in the first function with the function symbol section of at least one second function according to a Smith-Waterman algorithm, and determines the similarity of the function symbol sections of the first function and each second function; the virus detection model training device selects a third function with the maximum similarity with the function symbol section of the first function from at least one second function.
With reference to the first aspect, in one possible implementation manner, the method further includes: the virus detection model training device acquires the number of times that each function in M functions is called by the front and back function; the virus detection model training device deletes N functions with the least times of being called by the front and back functions in the second function call graph to obtain a third function call graph; wherein N is smaller than M, N is a positive integer; the virus detection model training device inputs the function in the third function call graph into the initial network model for training based on machine learning of the support vector machine SVM, and determines a virus detection model.
In a second aspect, an embodiment of the present application provides a virus detection model training apparatus, including: the system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is used for acquiring a plurality of executable files, and each executable file in the plurality of executable files comprises at least one function; a processing unit for determining a plurality of function call graphs based on the plurality of executable files; an executable file corresponds to a function call graph; the processing unit is further used for determining a first function call graph in the function call graphs, wherein the first function call graph is a first function comprising missing function attribute information; the processing unit is further used for determining at least one second function with the similarity of function symbols with the first function being larger than a first threshold value in a plurality of functions of the executable files; the processing unit is also used for determining a third function with the maximum similarity with the function symbol section of the first function in the at least one second function; the function symbol section includes at least one function symbol; the processing unit is further used for adding the function attribute information of the third function as the attribute information of the first function into the first function call graph and determining a second function call graph; the processing unit is further used for comparing the function information in the second function call graph with the function information in the virus database, determining whether the executable file corresponding to the first function call graph is a virus file, and training a virus detection model based on the second function call graph if the executable file is the virus file.
With reference to the second aspect, in a possible implementation manner, the processing unit is further configured to: comparing the function symbol of the first function with the function symbol of each function in the plurality of executable files according to the BLAST algorithm, and determining the similarity of the function symbol of the first function and each function; the function symbols include: the classes of the function symbols and the number of the function symbols are in one-to-one correspondence; at least one second function having a similarity in function sign to the first function greater than a first threshold is selected from a plurality of functions of the plurality of executable files.
With reference to the second aspect, in a possible implementation manner, the processing unit is further configured to: comparing the function symbol section in the first function with the function symbol section of at least one second function according to a Smith-Waterman algorithm, and determining the similarity of the function symbol sections of the first function and each second function; from the at least one second function, a third function having the greatest similarity to the function symbol segment of the first function is selected.
With reference to the second aspect, in one possible implementation manner, the second function call graph includes M functions, where M is a positive integer; the acquisition unit is further used for: acquiring the number of times that each function in M functions is called by the front and back function; the processing unit is further used for: deleting N functions with the least times of being called by the front and back items of functions in the second function call graph to obtain a third function call graph; wherein N is smaller than M, N is a positive integer; based on machine learning of the support vector machine SVM, the function in the third function call graph is input into the initial network model for training, and a virus detection model is determined.
In a third aspect, embodiments of the present application provide a virus detection model training apparatus, including: a processor and a memory; wherein the memory is configured to store computer-executable instructions that, when executed by the virus detection model training apparatus, cause the virus detection model training apparatus to perform the virus detection model training method as described in any one of the possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having instructions stored therein, which when executed by a processor of a virus detection model training apparatus, enable the virus detection model training apparatus to perform a virus detection model training method as described in any one of the possible implementations of the first aspect.
These and other aspects of the present application will be more readily apparent from the following description.
The scheme at least brings the following beneficial effects: in the embodiment of the application, in the prior art, in the process of the evolution of the Mirai virus, the function name and part of symbol information of the function in the Mirai virus are generally deleted to obtain functions lacking function attribute information, so that virus detection software cannot timely and accurately detect the Mirai virus. Compared with the prior art, in the embodiment of the application, firstly, because the function in the Mirai virus may lack the function of the function attribute information, the virus detection model training device determines the first function lack the function attribute information from the function call graph generated by the executable file, and verifies whether the executable file corresponding to the first function is a virus file. Therefore, the function can be detected in a targeted manner, and the efficiency of detecting the Mirai virus is improved. Second, since a third function having complete function attribute information most similar to the first function may exist among the plurality of executable files to which the first function belongs. Therefore, the virus detection model training device adds the attribute information of the third function as the attribute information of the first function to the first function call graph, so that the part of the first function missing function information can be accurately restored.
In addition, in the process of restoring the function information missing from the first function, the virus detection model training device determines at least one second function, of which the similarity of the function symbols with the first function is larger than a first threshold, in a plurality of functions of a plurality of executable files; the virus detection model training device selects a third function from at least one second function, so that the third function is searched in a grading manner, and the efficiency of restoring the missing function information of the first function is improved. And the virus detection model training device can detect viruses of the executable files corresponding to the first function of which the missing function information is restored, so that the detection efficiency is improved. Finally, aiming at executable files belonging to viruses, a virus detection model is trained to facilitate the subsequent detection of the speed of the viruses, so that the Mirai viruses in the Internet of things equipment are accurately detected.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a virus detection model detection system according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a training device for virus detection model according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for training a virus detection model according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for training a virus detection model according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for training a virus detection model according to an embodiment of the present application;
FIG. 6 is a flowchart of a method for training a virus detection model according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another training device for virus detection model according to an embodiment of the present application.
Detailed Description
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.
The terms "first" and "second" and the like in the description and in the drawings are used for distinguishing between different objects or for distinguishing between different processes of the same object and not for describing a particular sequential order of objects.
Furthermore, references to the terms "comprising" and "having" and any variations thereof in the description of the present application are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or apparatus.
It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more.
In order to more clearly describe the technical solutions in the embodiments of the present application, the following will explain the nouns in the embodiments of the present application.
1. Embedded device
An embedded device is a specialized computer system used to build an embedded system. IEEE (institute of electrical and electronics engineers) defines embedded devices as means for controlling, monitoring, or assisting in the operation of machines and devices.
2. Internet of things (The Internet of things IOT)
IOT is a network that enables interconnection of all common physical objects that can be addressed independently based on information carriers such as the internet, broadcast television networks, traditional telecommunications networks, etc. The IOT has important features of common object instrumentation, autonomous terminal internetworking, and pervasive service intellectualization.
3、DDoS
Distributed denial of service (Distributed Denial of Service, DDoS) attacks flood a target server or its surrounding infrastructure with large-scale internet traffic to destroy the malicious behavior of the target server, service, or network normal traffic. The DDoS attack uses a plurality of damaged computer systems as attack traffic sources to achieve the attack effect. The machines utilized by DDoS include computers, other networking resources (e.g., ioT devices).
4. Mirai virus
Mirai virus is a kind of malicious software, and by remotely controlling the IOT device, the device becomes a remotely controllable device, thereby being a part of a botnet, and realizing large-scale distributed denial of service attack. It is mainly aimed at network consumer devices such as IP cameras and home routers Mirai botnet. Mirai virus was first discovered by MalwareMustDie research group 8 in 2016.
5. Function call graph
A function call graph is a control flow graph that represents call relationships between subroutines in a computer program. Each node in the function call graph represents a procedure, and each edge (f, g) represents a procedure f call procedure g. Thus, the loops in the figure represent recursive procedure calls.
6. Picture core (graph kernel)
In structure mining, a kernel is a kernel function that computes an inner product on a graph. The kernel can be intuitively understood as a function of the measured graph versus similarity. The kernel allows kernel learning algorithms (e.g., support vector machines) to work directly on the graph without feature extraction to convert them into real-valued feature vectors of fixed length. The graphic core finds application in bioinformatics, chemoinformatics (as a molecular core) and social network analysis.
7. Support vector machine
In machine learning, a support vector machine (SVM, also known as a support vector network) is a supervised learning model with associated learning algorithms for analyzing data for classification and regression analysis. The SVM is one of the most robust predictive methods based on the learning framework or VC theory proposed by the statistics Vapnik (1982,1995) and Chervonenkis (1974) by the Vapnik and colleagues in AT & T Bell laboratory development (Boser et al, 1992, guyon et al, 1993, vapnik et al, 1997). Given a set of training examples, each labeled as belonging to one of two classes, the SVM training algorithm builds a model, assigning new examples to one class or the other, making it a non-probabilistic binary linear classifier (although there is scaling of Platt to use SVM in the probabilistic classification setup). The SVM maps training examples to points in space to maximize the width of the gap between the two categories. The new examples are then mapped to the same space and are predicted to belong to one category depending on which side of the gap they fall on.
8. Basic Local Alignment Search Tool (BLAST)
In bioinformatics, BLAST is an algorithm and program for comparing major biological sequence information, such as amino acid sequences of proteins or nucleotides of DNA and/or RNA sequences. BLAST searches enable researchers to compare a subject protein or nucleotide sequence (referred to as a query) to a library or database of sequences and identify database sequences that are similar to the query sequence above a certain threshold. For example, after a previously unknown gene is found in mice, scientists typically perform a BLAST search of the human genome to see if humans carry similar genes; BLAST will recognize sequences in the human genome that are similar to mouse genes based on their similarity.
9. Shortest Path (short Path)
In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in the graph such that the sum of the weights of its constituent edges is minimal. The problem of finding the shortest path between two intersections on a road map can be modeled as a special case of the shortest path problem in the graph, where the vertices correspond to intersections and the edges correspond to road segments, the length of each road segment including a weighting.
The foregoing is an explanation of the terminology and is presented below in connection with the description of the prior art of the embodiments of the present application.
With the development of internet technology, internet of things devices are beginning to be widely used. However, the safety protection measures of the internet of things equipment are imperfect, so that Mirai viruses invade the internet of things equipment in a large amount.
The source code of the Mirai virus is open-source and is embodied in the form of a binary executable file, so that the Mirai virus is easy to rapidly evolve, and in the process of the evolution of the Mirai virus, the function name and part of symbol information of the function in the Mirai virus can be deleted, so that the Mirai virus is difficult to accurately detect.
In addition, since the ecosystem of the internet of things is applied to various architectures (such as MIPS and ARM), it is also difficult for an expert to generate a unified detection method directly by making rules.
Part of the code segments in the Mirai malicious sample consist of statically linked standard library functions, while the rest consists of custom functions. However, in the related art, it is very difficult to directly restore symbol information, and the type information is restored based on rules or a machine learning-based method, but there is no satisfactory result.
Therefore, how to detect Mirai virus in the internet of things equipment is a technical problem that needs to be solved at present.
In order to solve the technical problems in the related art, the embodiment of the application provides a virus detection model training method: the virus detection model training device acquires a plurality of executable files, and each executable file in the plurality of executable files comprises at least one function; the virus detection model training device determines a plurality of function call graphs based on a plurality of executable files; an executable file corresponds to a function call graph; the virus detection model training device determines a first function call graph in the function call graphs, wherein the first function call graph is a first function comprising missing function attribute information; the virus detection model training device determines at least one second function with the similarity of function symbols with the first function being larger than a first threshold value in a plurality of functions of a plurality of executable files; the virus detection model training device determines a third function with the maximum similarity with the function symbol section of the first function in at least one second function; the function symbol section includes at least one function symbol; the virus detection model training device adds the function attribute information of the third function as the attribute information of the first function into the first function call graph, and determines a second function call graph; the virus detection model training device compares the function information in the second function call graph with the function information in the virus database, determines whether the executable file corresponding to the first function call graph is a virus file, and if so, trains the virus detection model based on the second function call graph.
In the prior art, in the process of Mirai virus evolution, function names and partial symbol information of functions in the Mirai virus are usually deleted to obtain functions with missing function attribute information, so that virus detection software cannot timely and accurately detect the Mirai virus. Compared with the prior art, in the embodiment of the application, first, because the function in the Mirai virus may lack the function of the function attribute information, the virus detection model training device determines a first function lack of the function attribute information from the function call graph generated by the executable file, and verifies whether the executable file corresponding to the first function is a virus file. Therefore, the function can be detected in a targeted manner, and the efficiency of detecting the Mirai virus is improved. Second, since a third function having complete function attribute information most similar to the first function may exist among the plurality of executable files to which the first function belongs. Therefore, the virus detection model training device adds the attribute information of the third function as the attribute information of the first function to the first function call graph, so that the part of the first function missing function information can be accurately restored.
In addition, in the process of restoring the function information missing from the first function, the virus detection model training device determines at least one second function, of which the similarity of the function symbols with the first function is larger than a first threshold, in a plurality of functions of a plurality of executable files; the virus detection model training device selects a third function from at least one second function, so that the third function is searched in a grading manner, and the efficiency of restoring the missing function information of the first function is improved. And the virus detection model training device can detect viruses of the executable files corresponding to the first function of which the missing function information is restored, so that the detection efficiency is improved. Finally, aiming at executable files belonging to viruses, a virus detection model is trained to facilitate the subsequent detection of the speed of the viruses, so that the Mirai viruses in the Internet of things equipment are accurately detected.
The virus detection model training method can be applied to a virus detection model detection system. In the following, a detailed description of a virus detection model detection system provided in an embodiment of the present application is provided with reference to fig. 1, where, as shown in fig. 1, the virus detection model detection system includes: function preparation section 11, virus detection model training device 12, and virus detection device 13.
A function preparation unit 11 configured to: a plurality of executable files are acquired and sent to virus detection model training apparatus 12.
The virus detection model training apparatus 12 includes: a function processing unit 121 and a model training module 122.
The function processing unit 121 includes: architecture recognition unit 1211, function call graph generation unit 1212, function restoration unit 1213, and function pruning unit 1214.
The architecture identification unit 1211 is configured to: identifying an architecture of an executable file, the architecture domain comprising: MIPS architecture and ARM architecture.
The function call diagram generating unit 1212 is configured to: a plurality of function call graphs is determined based on the plurality of executable files.
The function restoring unit 1213 is configured to: and determining a third function with the maximum similarity with the function symbol section of the first function in the functions of the function call graphs, wherein the function attribute information of the third function is added into the first function call graph as the attribute information of the first function.
A function pruning unit 1214 configured to: and deleting N functions with the least times of being called by the front and back items of functions in the second function call graph to obtain a third function call graph.
Model training module 122 configured to: and determining a function execution flow in the third function call graph by a short Path method, inputting the function in the third function call graph into an initial network model for training based on machine learning of a Support Vector Machine (SVM), and determining a virus detection model.
A virus detection device 13 configured to: and verifying whether the target executable file is a virus file according to the virus detection model and the target executable file.
The basic hardware structure of the virus detection model training apparatus in the virus detection model training system includes elements included in the virus detection model training apparatus 200 shown in fig. 2. The hardware configuration of the virus detection model training apparatus 200 will be described below using the virus detection model training apparatus 200 shown in fig. 2 as an example.
As shown in fig. 2, the virus detection model training apparatus 200 includes at least one processor 201, a communication line 202, and at least one communication interface 204, and may also include a memory 203. The processor 201, the memory 203, and the communication interface 204 may be connected through a communication line 202.
The processor 201 may be a central processing unit (central processing unit, CPU), an application specific integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits configured to implement embodiments of the present application, such as: one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA).
Communication line 202 may include a path for communicating information between the above-described components.
The communication interface 204, for communicating with other devices or communication networks, may use any transceiver-like device, such as ethernet, radio access network (radio access network, RAN), WLAN, etc.
The memory 203 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable re ad-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disc storage, a compact disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to include or store the desired program code in the form of instructions or data structures and that can be accessed by a computer.
In one possible design, the memory 203 may exist independent of the processor 201, that is, the memory 203 may be a memory external to the processor 201, where the memory 203 may be connected to the processor 201 through a communication line 202, for storing execution instructions or application program codes, and the execution is controlled by the processor 201 to implement a virus detection model training method provided in the embodiments described below. In yet another possible design, the memory 203 may be integrated with the processor 201, i.e., the memory 203 may be an internal memory of the processor 201, e.g., the memory 203 may be a cache, may be used to temporarily store some data and instruction information, etc.
As one implementation, processor 201 may include one or more CPUs, such as CPU0 and CPU1 in fig. 2. As another implementation, virus detection model training apparatus 200 may include a plurality of processors, such as processor 201 and processor 207 in fig. 2. As yet another implementation, virus detection model training apparatus 200 may further include an output device 205 and an input device 206.
The following describes in detail the virus detection model training method provided in the embodiment of the present application with reference to fig. 3, and as shown in fig. 3, the virus detection model training method includes:
s301, a virus detection model training device acquires a plurality of executable files.
Wherein each executable file of the plurality of executable files includes at least one function.
Optionally, the plurality of executable files includes a Mirai sample and a Benign sample.
In one possible implementation manner, the virus detection model training device obtains a Mirai sample from an online virus analysis platform; the Benign sample is obtained from the Linux operating system and the GNU software package, so that the data of the executable file can be ensured to be non-threatening.
Exemplary, as shown in table 1, the virus detection model training device obtains 12091 Mirai samples with symbol information of the MIPS architecture from the online virus analysis platform; obtaining 6672 Mirai samples of unsigned information of the MIPS architecture from an online virus analysis platform; 21330 Mirai samples with ARM architecture signed information are obtained from an online virus analysis platform; from the online virus analysis platform, 7363 Mirai samples of ARM architecture unsigned information were obtained.
Exemplary, as shown in table 1, the virus detection model training device obtains 4167 Benign samples of the MIPS architecture unsigned information from the Linux operating system and GNU software package; and acquiring 1025 Benign samples of ARM architecture unsigned information from the Linux operating system and the GNU software package.
TABLE 1 executable file information
Figure BDA0004146070260000101
Figure BDA0004146070260000111
S302, the virus detection model training device determines a plurality of function call graphs based on a plurality of executable files.
Wherein one executable file corresponds to one function call graph.
In one possible implementation, the virus detection model training apparatus generates a control flow graph by an angr software based on an executable file, where the control flow graph is used to characterize call relationships between functions. The virus detection model training device obtains the entry and the jumping point of the function under the assembly language level through the disassembly tool, and generates a function call graph corresponding to the executable file according to the call relation among the functions, the entry and the jumping point of the function.
It should be explained that, since executable files of different architectures are different in binary representation after being back-end optimized, a unified analysis tool cannot be used to analyze the executable files. In order to solve the above-mentioned problems, in the related art, unified analysis of analysis tools is achieved by setting an Intermediate Representation (IR) to avoid differences generated by back-end optimization of executable files of different architectures in a cross-architecture scene. However, since the related art has the defects of low accuracy and information loss in the process of intermediate representation conversion, the accuracy of analyzing the executable file by the analysis tool is seriously reduced. For this reason, the virus detection model training device in the embodiment of the application identifies the type of architecture of the executable file, and generates the binary file according to the type of architecture.
Optionally, the virus detection model training device compiles executable files of different architectures into binary files based on an open source standard library of the C language, and generates a control flow graph through angr software. The functions in the control flow graph are represented by binary, and the virus detection model training device obtains the names of the customized functions through a malicious software binary file containing symbol information.
It should be explained that, since each function is composed of a plurality of instructions, each instruction includes an operator and an operand, and the semantics of the instruction are mainly concentrated on the operator, the virus detection model training device simplifies the instruction by deleting the operand, and reduces the complexity of the instruction.
Optionally, the virus detection model training device classifies the operators (operators) according to their semantic similarity.
Illustratively, the virus detection model training apparatus represents MOV, MOVS, MVN in the MOV-class operator with M; JMP-class operators are denoted by B.
S303, the virus detection model training device determines a first function call graph in the function call graphs.
The first function call graph is a first function comprising missing function attribute information.
Optionally, the function attribute information includes a name of the function and function symbol information.
It should be explained that in the related art, in the process of the Mirai virus evolution, generally, the function name and part of symbol information of the function in the Mirai virus may be deleted, so as to obtain some functions lacking attribute information of the function. Therefore, the virus detection model training device finds out all the first functions with missing function attribute information, is convenient for restoring the function attribute information missing by the first functions, and determines whether the executable files corresponding to the first functions are virus files.
S304, the virus detection model training device determines at least one second function with the similarity of the function symbols with the first function being larger than a first threshold value in a plurality of functions of a plurality of executable files.
In one possible implementation manner, the virus detection model training device obtains the types of the function symbols of the functions of the executable files and the number of the function symbols corresponding to the types of the function symbols, and compares the number of the function symbols corresponding to the types of the function symbols of the functions of the executable files with the number of the function symbols corresponding to the types of the function symbols of the first function. A function symbol similarity of each function of the plurality of executable files to the first function is determined, and at least one second function having a similarity greater than a first threshold is selected.
S305, the virus detection model training device determines a third function with the maximum similarity with the function symbol section of the first function in at least one second function.
Wherein the function symbol segment comprises at least one function symbol.
S306, the virus detection model training device adds function attribute information of the third function as attribute information of the first function into the first function call graph, and determines a second function call graph.
In a possible implementation manner, the virus detection model training device compares the function attribute information of the third function with the attribute information of the first function to determine the missing function attribute information of the first function; and adding the missing function attribute information of the first function to the first function in the first function call graph, and determining a second function call graph.
S307, the virus detection model training device compares the function information in the second function call graph with the function information in the virus database to determine whether the executable file corresponding to the first function call graph is a virus file.
In a possible implementation manner, if the virus detection model training device searches the function information in the second function call graph in the virus database, the executable file corresponding to the first function call graph is a virus file; if the virus detection model training device does not find the function information in the second function call graph in the virus database, the executable file corresponding to the first function call graph is a non-virus file.
And S308, if so, the virus detection model training device trains a virus detection model based on the second function call graph.
In a possible implementation manner, the virus detection model training device inputs the function in the second function call graph into the initial network model for training based on machine learning of the Support Vector Machine (SVM) to determine a virus detection model.
The scheme at least brings the following beneficial effects:
in the prior art, in the process of Mirai virus evolution, function names and partial symbol information of functions in the Mirai virus are usually deleted to obtain functions with missing function attribute information, so that virus detection software cannot timely and accurately detect the Mirai virus. Compared with the prior art, in the embodiment of the application, firstly, because the function in the Mirai virus may lack the function of the function attribute information, the virus detection model training device determines the first function lack the function attribute information from the function call graph generated by the executable file, and verifies whether the executable file corresponding to the first function is a virus file. Therefore, the function can be detected in a targeted manner, and the efficiency of detecting the Mirai virus is improved. Second, since a third function having complete function attribute information most similar to the first function may exist among the plurality of executable files to which the first function belongs. Therefore, the virus detection model training device adds the attribute information of the third function as the attribute information of the first function to the first function call graph, so that the part of the first function missing function information can be accurately restored.
In addition, in the process of restoring the function information missing from the first function, the virus detection model training device determines at least one second function, of which the similarity of the function symbols with the first function is larger than a first threshold, in a plurality of functions of a plurality of executable files; the virus detection model training device selects a third function from at least one second function, so that the third function is searched in a grading manner, and the efficiency of restoring the missing function information of the first function is improved. And the virus detection model training device can detect viruses of the executable files corresponding to the first function of which the missing function information is restored, so that the detection efficiency is improved. Finally, aiming at executable files belonging to viruses, a virus detection model is trained to facilitate the subsequent detection of the speed of the viruses, so that the Mirai viruses in the Internet of things equipment are accurately detected.
Referring to fig. 3, as shown in fig. 4, the above-mentioned process of determining at least one second function with a similarity of function symbols with the first function being greater than the first threshold value in the functions of the executable files by the virus detection model training apparatus may be specifically implemented as follows S401-S403.
S401, the virus detection model training device compares the function symbol of the first function with the function symbol of each function in the plurality of executable files according to the BLAST algorithm.
Wherein the function symbols include: the types of the function symbols and the number of the function symbols are in one-to-one correspondence.
In bioinformatics, basic local alignment search tools (Basic Loc al Alignment Search Tool, BLAST) are an algorithm and program for comparing primary biological sequence information. The application is based on BLAST algorithm in bioinformatics, and the virus detection model training device improves the BLAST algorithm and applies the improved BLAST algorithm to the function mapping of missing function symbol information to quickly find out the function similar to the first function.
In bioinformatics, the original DNA nucleotide sequence (i.e., A, T, C, G elements total) and protein sequence are queried by the BLAST algorithm for a limited number of bases or nucleotides (20 total). The BLAST algorithm improved by the virus detection model training device supports the mapping of function names after the instruction reduction of 26 elements. BLAST queries in bioinformatics correspond to simplified instruction sequences by indexing the seeds. The virus detection model training device improves the reading speed of index seeds by introducing Single Instruction Multiple Data (SIMD).
It should be explained that before comparing the function symbol of the first function with the function symbol of each function in the plurality of executable files, the virus detection model training device analyzes each function in the plurality of executable files to obtain the category of the function symbol of each function and the number of the function symbols. Furthermore, the virus detection model training device constructs a first memory pool according to the category of the function symbol and the number of the function symbol of each function, wherein the first memory pool comprises the objective function, the category of the function symbol corresponding to the objective function and the number of the category.
In one possible implementation, the virus detection model training apparatus inputs the class of function symbols and the number of function symbols of the first function into BLAST query software, and compares the function symbols of the first function with the function symbols of each of the plurality of executable files in the first memory pool.
Alternatively, the virus detection model training means inputs the classes of the plurality of function symbols and the function symbols into separate thread pools.
S402, the virus detection model training device determines the similarity of the function symbols of the first function and each function.
Illustratively, the similarity of the first function to function a is 80%; the similarity of the first function and the function B is 90%; the similarity of the first function to function C is 63%; the similarity of the first function to function D is 75%.
S403, the virus detection model training device selects at least one second function with the similarity of function symbols with the first function being larger than a first threshold value from a plurality of functions of a plurality of executable files.
Illustratively, when the first threshold is 68%, since function a, function B, function D are all greater than the first threshold, the at least one second function includes: function A, function B, function D; since the function C is smaller than the first threshold, the at least one second function does not comprise: function C.
The scheme at least brings the following beneficial effects: in the embodiment of the application, the virus detection model training device can rapidly and accurately inquire a plurality of second functions similar to the function sign of the first function through the BLAST algorithm, and the efficiency of matching inquiry is improved. Because the similarity of the function symbols of the second function and the first function is larger than the first threshold value, the range of searching the function most similar to the first function is reduced.
Referring to fig. 3, as shown in fig. 5, the above-mentioned process of determining, by the virus detection model training apparatus, the third function having the greatest similarity with the function symbol segment of the first function among the at least one second function may be specifically implemented by following S501 to S503.
S501, the virus detection model training device compares the function symbol segments in the first function with the function symbol segments of at least one second function according to a Smith-Waterman algorithm.
In a possible implementation manner, the virus detection model training device uses the function symbol section formed by the same function symbol in the first function as the same function symbol section, so that when the virus detection model training device compares the function symbol sections with a plurality of same function symbols, only any one of the function symbol sections is required to be compared, the comparison times of the virus detection model training device are reduced, and the comparison efficiency is improved.
S502, the virus detection model training device determines similarity of function symbol segments of the first function and each second function.
Illustratively, the similarity of the function symbol segment m in the first function to the function segment a in the function a is 95%; the similarity of the function symbol section m in the first function and the function section B in the function B is 95.6%; the similarity of the function symbol segment m in the first function to the function segment D in the function D is 94%.
S503, the virus detection model training device selects a third function with the maximum similarity with the function symbol section of the first function from at least one second function.
In a possible implementation manner, the virus detection model training device queries the third function from the second memory pool through the SSE4 (Streamin g SIMD Extensions 4) matrix parallel instruction, and the second memory pool includes at least one second function.
It should be explained that SSE4 is used to accelerate the reading and writing of matrix data.
The virus detection model training device selects a function B with the greatest similarity with the function symbol section of the first function from the functions a, B and D as the third function.
The scheme at least brings the following beneficial effects: in the embodiment of the application, the virus detection model training device compares the function symbol section in the first function with the function symbol section of at least one second function through a Smith-Waterman algorithm. Compared with the prior art, the function symbols of the viruses are compared one by one in sequence, so that the comparison efficiency is improved, and the third function with the maximum similarity with the function symbol section of the first function can be more quickly and accurately determined.
In a possible implementation manner, after S306, the second function call graph includes M functions, where M is a positive integer, and the virus detection model training device prunes the functions in the second function call graph. The following describes a procedure in which the virus detection model training apparatus prunes the function in the second function call graph. As a possible embodiment of the present application, in connection with fig. 3, as shown in fig. 6, the above-mentioned process may be implemented by the following S601-S603.
S601, a virus detection model training device acquires the number of times that each function in M functions is called by a front function and a back function.
It should be noted that, as shown in fig. 1, functions in the function call graph are connected together by arrows, the beginning of the arrows are called functions, and the pointing end of the arrows are calling functions.
In one possible implementation manner, the virus detection model training device obtains a first number of functions starting from the function to be detected (i.e. functions calling the function to be detected) and a second number of functions passing through the function to be detected (i.e. functions called by the function to be detected) from the second function call graph, and the virus detection model training device determines the times of calling the current function by the front and back functions in the M functions according to the first number and the second number.
Optionally, the M functions include network communication class functions.
S602, deleting N functions with the least times of being called by the front and back functions in the second function call graph by the virus detection model training device to obtain a third function call graph.
Wherein N is smaller than M, and N is a positive integer.
It should be explained that the malicious behavior executable files of Mirai and its variants are inconsistent with Trojan, and the functions called by the malicious behavior executable files are mostly functions used at high frequency. Therefore, the virus detection model training apparatus deletes the N functions in the second function call graph that are called by the front and rear functions the least.
S603, the virus detection model training device inputs the function in the third function call graph into the initial network model for training based on machine learning of the support vector machine SVM, and determines a virus detection model.
Optionally, the virus detection model training device determines the function execution flow in the third function call graph through a Shortest Path method.
The scheme at least brings the following beneficial effects: in the embodiment of the application, the virus detection model training device performs branch subtraction on the function in the second function call graph, so that an accurate virus detection model is obtained.
It can be seen that the above technical solutions provided in the embodiments of the present application are mainly described from the method perspective. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
According to the embodiment of the application, the virus detection model training device can be divided into the functional modules according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. Optionally, the division of the modules in the embodiments of the present application is schematic, which is merely a logic function division, and other division manners may be actually implemented.
Fig. 7 is a schematic structural diagram of a virus detection model training device 70 according to an embodiment of the present application. The virus detection model training apparatus 70 includes: an acquisition unit 701 and a processing unit 702.
An obtaining unit 701, configured to obtain a plurality of executable files, where each executable file in the plurality of executable files includes at least one function; a processing unit 702 configured to determine a plurality of function call graphs based on a plurality of executable files; an executable file corresponds to a function call graph; the processing unit 702 is further configured to determine a first function call graph in the plurality of function call graphs, where the first function call graph is a first function including missing function attribute information; the processing unit 702 is further configured to determine at least one second function, which has a similarity of function symbols with the first function greater than a first threshold, among the functions of the executable files; the processing unit 702 is further configured to determine a third function with a maximum similarity with the function symbol segment of the first function from the at least one second function; the function symbol section includes at least one function symbol; the processing unit 702 is further configured to add function attribute information of the third function as attribute information of the first function to the first function call graph, and determine a second function call graph; the processing unit 702 is further configured to compare the function information in the second function call graph with the function information in the virus database, determine whether the executable file corresponding to the first function call graph is a virus file, and if yes, train the virus detection model based on the second function call graph.
Optionally, the processing unit 702 is further configured to: comparing the function symbol of the first function with the function symbol of each function in the plurality of executable files according to the BLAST algorithm, and determining the similarity of the function symbol of the first function and each function; the function symbols include: the classes of the function symbols and the number of the function symbols are in one-to-one correspondence; at least one second function having a similarity in function sign to the first function greater than a first threshold is selected from a plurality of functions of the plurality of executable files.
Optionally, the processing unit 702 is further configured to: comparing the function symbol section in the first function with the function symbol section of at least one second function according to a Smith-Waterman algorithm, and determining the similarity of the function symbol sections of the first function and each second function; from the at least one second function, a third function having the greatest similarity to the function symbol segment of the first function is selected.
Optionally, the second function call graph includes M functions, where M is a positive integer; the acquisition unit 701 is further configured to: acquiring the number of times that each function in M functions is called by the front and back function; the processing unit 702 is further configured to: deleting N functions with the least times of being called by the front and back items of functions in the second function call graph to obtain a third function call graph; wherein N is smaller than M, N is a positive integer; based on machine learning of the support vector machine SVM, the function in the third function call graph is input into the initial network model for training, and a virus detection model is determined.
Wherein the processing unit 702 may be a processor or a controller. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. A processor may also be a combination of computing functions, including for example, one or more microprocessor combinations, a combination of DSPs and microprocessors, and the like. The communication unit may be a transceiver circuit or a communication interface, etc. The memory module may be a memory. When the processing unit 702 is a processor, the communication unit is a communication interface, and the storage module is a memory, the virus detection model training device according to the embodiments of the present application may be the virus detection model training device shown in fig. 2.
From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the network node is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described system, module and network node may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
The embodiment of the application further provides a computer readable storage medium, in which instructions are stored, and when the computer executes the instructions, the computer executes each step in the method flow shown in the method embodiment.
The embodiment of the application also provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running a computer program or instructions to realize the virus detection model training method in the method embodiment.
Embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the virus detection model training method of the above method embodiments.
The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: electrical connections having one or more wires, portable computer diskette, hard disk. Random access Memory (Random Access Memory, RAM), read-Only Memory (ROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), registers, hard disk, optical fiber, portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any other form of computer-readable storage medium suitable for use by a person or persons of skill in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuit, ASIC). In embodiments of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Since the apparatus, device, computer readable storage medium, and computer program product in the embodiments of the present invention may be applied to the above-mentioned method, the technical effects that can be obtained by the apparatus, device, computer readable storage medium, and computer program product may also refer to the above-mentioned method embodiments, and the embodiments of the present invention are not repeated herein.
The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for training a virus detection model, the method comprising:
acquiring a plurality of executable files, wherein each executable file in the plurality of executable files comprises at least one function;
determining a plurality of function call graphs based on the plurality of executable files; an executable file corresponds to a function call graph;
determining a first function call graph in the function call graphs, wherein the first function call graph is a first function comprising missing function attribute information;
determining at least one second function with the similarity of function symbols with the first function being larger than a first threshold value in a plurality of functions of the executable files;
Determining a third function with the maximum similarity with the function symbol section of the first function in the at least one second function; the function symbol section comprises at least one function symbol;
adding the function attribute information of the third function as the attribute information of the first function into the first function call graph, and determining a second function call graph;
comparing the function information in the second function call graph with the function information in the virus database to determine whether the executable file corresponding to the first function call graph is a virus file;
if yes, training a virus detection model based on the second function call graph.
2. The method of claim 1, wherein determining at least one second function of the plurality of functions of the plurality of executable files having a similarity in function sign to the first function that is greater than a first threshold comprises:
comparing the function symbol of the first function with the function symbol of each function in the plurality of executable files according to a BLAST algorithm, and determining the similarity of the function symbol of the first function and the function symbol of each function; the function symbols include: the classes of the function symbols and the number of the function symbols are in one-to-one correspondence;
Selecting the at least one second function with the function symbol similarity to the first function greater than the first threshold from a plurality of functions of the plurality of executable files.
3. The method of claim 2, wherein determining a third function of the at least one second function having a greatest similarity to a function symbol segment of the first function comprises:
comparing the function symbol segments in the first function with the function symbol segments of the at least one second function according to a Smith-Waterman algorithm, and determining the similarity of the function symbol segments of the first function and each second function;
and selecting the third function with the maximum similarity with the function symbol section of the first function from the at least one second function.
4. A method according to any one of claims 1-3, wherein the second function call graph includes M functions, where M is a positive integer;
the training of the virus detection model based on the second function call graph comprises the following steps:
acquiring the number of times that each function in the M functions is called by a front-back function;
deleting N functions with the least times of being called by the front and back functions in the second function call graph to obtain a third function call graph; wherein N is less than M, and N is a positive integer;
And based on machine learning of a Support Vector Machine (SVM), inputting the function in the third function call graph into an initial network model for training, and determining the virus detection model.
5. A virus detection model training apparatus, the apparatus comprising: an acquisition unit and a processing unit:
the acquiring unit is used for acquiring a plurality of executable files, and each executable file in the plurality of executable files comprises at least one function;
the processing unit is used for determining a plurality of function call graphs based on the plurality of executable files; an executable file corresponds to a function call graph;
the processing unit is further configured to determine a first function call graph in the plurality of function call graphs, where the first function call graph is a first function including missing function attribute information;
the processing unit is further configured to determine at least one second function, which has a similarity of function symbols with the first function greater than a first threshold, among the functions of the executable files;
the processing unit is further configured to determine a third function with a maximum similarity between the at least one second function and a function symbol segment of the first function; the function symbol section comprises at least one function symbol;
The processing unit is further configured to add function attribute information of the third function as attribute information of the first function to the first function call graph, and determine a second function call graph;
the processing unit is further configured to compare function information in the second function call graph with function information in a virus database, and determine whether an executable file corresponding to the first function call graph is a virus file;
and the processing unit is further used for training a virus detection model based on the second function call graph if the virus detection model is the same.
6. The apparatus of claim 5, wherein the processing unit is further configured to:
comparing the function symbol of the first function with the function symbol of each function in the plurality of executable files according to a BLAST algorithm, and determining the similarity of the function symbol of the first function and the function symbol of each function; the function symbols include: the classes of the function symbols and the number of the function symbols are in one-to-one correspondence;
selecting the at least one second function with the function symbol similarity to the first function greater than the first threshold from a plurality of functions of the plurality of executable files.
7. The apparatus of claim 6, wherein the processing unit is further configured to:
comparing the function symbol segments in the first function with the function symbol segments of the at least one second function according to a Smith-Waterman algorithm, and determining the similarity of the function symbol segments of the first function and each second function;
and selecting the third function with the maximum similarity with the function symbol section of the first function from the at least one second function.
8. The apparatus according to any one of claims 5-7, wherein the second function call graph includes M functions, where M is a positive integer;
the acquisition unit is further configured to: acquiring the number of times that each function in the M functions is called by a front-back function;
the processing unit is further configured to:
deleting N functions with the least times of being called by the front and back functions in the second function call graph to obtain a third function call graph; wherein N is less than M, and N is a positive integer;
and based on machine learning of a Support Vector Machine (SVM), inputting the function in the third function call graph into an initial network model for training, and determining the virus detection model.
9. A virus detection model training device, comprising: a processor and a memory; wherein the memory is configured to store computer-executable instructions that, when executed by the virus detection model training apparatus, cause the virus detection model training apparatus to perform the virus detection model training method of any one of claims 1-4.
10. A computer readable storage medium comprising instructions that, when executed by a virus detection model training apparatus, cause the computer to perform the virus detection model training method of any of claims 1-4.
CN202310302861.4A 2023-03-24 2023-03-24 Virus detection model training method, device and storage medium Pending CN116305127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310302861.4A CN116305127A (en) 2023-03-24 2023-03-24 Virus detection model training method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310302861.4A CN116305127A (en) 2023-03-24 2023-03-24 Virus detection model training method, device and storage medium

Publications (1)

Publication Number Publication Date
CN116305127A true CN116305127A (en) 2023-06-23

Family

ID=86813022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310302861.4A Pending CN116305127A (en) 2023-03-24 2023-03-24 Virus detection model training method, device and storage medium

Country Status (1)

Country Link
CN (1) CN116305127A (en)

Similar Documents

Publication Publication Date Title
US9990583B2 (en) Match engine for detection of multi-pattern rules
Venkatraman et al. A hybrid deep learning image-based analysis for effective malware detection
Redmond et al. A cross-architecture instruction embedding model for natural language processing-inspired binary code analysis
US20210082539A1 (en) Gene mutation identification method and apparatus, and storage medium
US10200391B2 (en) Detection of malware in derived pattern space
Kong et al. Discriminant malware distance learning on structural information for automated malware classification
Namanya et al. Similarity hash based scoring of portable executable files for efficient malware detection in IoT
Tang et al. Dynamic API call sequence visualisation for malware classification
Gao et al. Android malware detection via graphlet sampling
Kumar et al. An improved k-Means Clustering algorithm for Intrusion Detection using Gaussian function
CN108491228A (en) A kind of binary vulnerability Code Clones detection method and system
Yuan et al. IoT malware classification based on lightweight convolutional neural networks
US20190228151A1 (en) System and method for malware signature generation
CN105468975B (en) Method for tracing, the apparatus and system of malicious code wrong report
Wang et al. Explainable apt attribution for malware using nlp techniques
CN106874762B (en) Android malicious code detecting method based on API dependence graph
Yu et al. Specview: malware spectrum visualization framework with singular spectrum transformation
Hu et al. Exploit internal structural information for IoT malware detection based on hierarchical transformer model
KR20180133726A (en) Appratus and method for classifying data using feature vector
Warnecke et al. Don’t paint it black: White-box explanations for deep learning in computer security
CN116305127A (en) Virus detection model training method, device and storage medium
Liu et al. ImageDroid: Using deep learning to efficiently detect Android malware and automatically mark malicious features
Joodaki et al. Protein complex detection from PPI networks on Apache Spark
CN115545091A (en) Integrated learner-based malicious program API (application program interface) calling sequence detection method
CN111090859B (en) Malicious software detection method based on graph editing distance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination