CN114254319A - Network virus identification method and device, computer equipment and storage medium - Google Patents

Network virus identification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114254319A
CN114254319A CN202111521591.3A CN202111521591A CN114254319A CN 114254319 A CN114254319 A CN 114254319A CN 202111521591 A CN202111521591 A CN 202111521591A CN 114254319 A CN114254319 A CN 114254319A
Authority
CN
China
Prior art keywords
virus
target
detected
model
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111521591.3A
Other languages
Chinese (zh)
Inventor
潘佳斌
董雷
童志明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Antiy Technology Group Co Ltd
Original Assignee
Antiy Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Antiy Technology Group Co Ltd filed Critical Antiy Technology Group Co Ltd
Priority to CN202111521591.3A priority Critical patent/CN114254319A/en
Publication of CN114254319A publication Critical patent/CN114254319A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The application provides a network virus identification method, a network virus identification device, computer equipment and a storage medium, relates to the technical field of computational security, and is used for improving the identification accuracy of network viruses. The method mainly comprises the following steps: determining original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes, wherein the original characteristics comprise static characteristics and dynamic characteristics; performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof; based on the model parameters of the target neural network model, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model; and identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.

Description

Network virus identification method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for identifying a network virus, a computer device, and a storage medium.
Background
The malicious code recognition objectively solves a complex and ultra-large-scale network virus classification and discrimination task. The traditional method for extracting the discriminant feature fragments by manual analysis or automation is difficult to provide enough generalization capability to discover unknown samples, and has certain hysteresis. Therefore, machine-learned classification methods can be utilized to supplement the traditional ability to recognize cyber viruses through a trained good learning model.
In the traditional technology, aiming at specific field problems, the scale of a training sample set seriously restricts the expression of a model. On one hand, the stability of the model can be improved by using all data which are not completely related to the existing field, but the sensitivity of the model to the problems in the specific field is restricted; on the other hand, if only the information of the data set in the domain is relied on, the problem of insufficient training data set is further aggravated, the problem of overfitting of the artificial intelligence model is amplified, and the practicability of the model is restricted. Therefore, the accuracy of identifying the network virus based on the existing model is low.
Disclosure of Invention
The embodiment of the application provides a network virus identification method, a network virus identification device, computer equipment and a storage medium, which are used for improving the accuracy of network virus identification.
The embodiment of the invention provides a network virus identification method, which comprises the following steps:
determining original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes, wherein the original characteristics comprise static characteristics and dynamic characteristics;
performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof;
based on the model parameters of the target neural network model, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model;
and identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
The embodiment of the invention provides a network virus identification device, which comprises:
the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining original characteristics and virus labels respectively corresponding to various types of virus sample program codes, and the original characteristics comprise static characteristics and dynamic characteristics;
the training module is used for carrying out neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof;
the training module is further used for carrying out neural network learning on original characteristics corresponding to the sample program codes of the target viruses and the virus labels based on the model parameters of the target neural network model to obtain a target virus identification model;
and the identification module is used for identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
A computer device comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the network virus identification method.
A computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the network virus identification method described above.
A computer program product comprising a computer program which, when executed by a processor, implements the above-described network virus identification method.
The invention provides a network virus identification method, a network virus identification device, computer equipment and a storage medium, wherein, firstly, original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes are determined, and the original characteristics comprise static characteristics and dynamic characteristics; then, performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model; and then, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model based on model parameters of the target neural network model, and finally identifying whether the sample to be detected belongs to the network viruses of the target types or not according to the target virus identification model. The invention trains a target neural network model by combining the migration learning with a deep neural network model structure and by using the original characteristics of all virus types and virus labels, and then applies the model parameters of the target neural network model to a specific field model (target type virus) to realize the multiplexing of basic knowledge information. Meanwhile, the model is continuously trained by using training samples in the application field in combination with the actual requirements of the specific problem field, so that the expression capability of the model for the specific field is enhanced. And the identification accuracy rate of the target virus is improved.
Drawings
Fig. 1 is a flowchart of a network virus identification method provided in the present application;
FIG. 2 is a flow chart of another network virus identification method provided in the present application;
fig. 3 is a schematic structural diagram of an identification apparatus for network viruses provided in the present application.
Fig. 4 is a schematic diagram of a computer device provided in the present application.
Detailed Description
In order to better understand the technical solutions described above, the technical solutions of the embodiments of the present application are described in detail below with reference to the drawings and the specific embodiments, and it should be understood that the specific features of the embodiments and the embodiments of the present application are detailed descriptions of the technical solutions of the embodiments of the present application, and are not limitations of the technical solutions of the present application, and the technical features of the embodiments and the embodiments of the present application may be combined with each other without conflict.
Referring to fig. 1, a method for identifying a network virus according to an embodiment of the present invention specifically includes steps S101 to S104:
step S101, determining original characteristics and virus labels respectively corresponding to various types of virus sample program codes.
The original features comprise static features and dynamic features, and refer to malicious code feature information extracted from sample program codes through means of static and dynamic feature analysis and the like. Specifically, the static characteristics obtained through static analysis include file format information, file attribute information, character string information, binary information, and instruction characteristic information; the dynamic characteristics obtained by the dynamic analysis include local behavior characteristics, network behavior characteristics, API call characteristics, and the like, and the embodiment of the present invention is not particularly limited.
For the embodiment of the present invention, the virus tags are used to indicate the types of viruses, and there are a plurality of corresponding virus tags for how many types of viruses exist in the embodiment. The types of viruses can be classified into virus, trojan, worm and other categories, each category has a plurality of different malicious code families, each family may have a plurality of different variants, and each variant has a plurality of different files; the different sample classes here may be any of the different malicious code variants.
It should be noted that the virus tag in this embodiment may represent, in addition to the corresponding virus type, an expression form of the corresponding virus, where the expression form may be self-extracting packet, adding shell, and the like, and the expression form is not specifically limited in this embodiment.
Further, after the original features are obtained, the corresponding preprocessing needs to be performed according to the feature value types corresponding to the original features, where the feature value types refer to extracted original representation forms of the features, for example, for a person, the feature value type of height and weight is a numerical value, the feature value type of gender is a boolean variable, and the fingerprint is a picture. Specifically, the characteristic value types in this embodiment include a numerical value (number of file resources, number of file sections), a boolean variable (whether executable sections exist), serialized data (disassembly instruction sequence), a graph structure characteristic (system call flow chart), and the like, and this embodiment is not particularly limited.
In this embodiment, the pre-treatment includes at least one or more of the following treatments: if the characteristic value type is a numerical characteristic or a coding characteristic, performing normalization processing on the corresponding characteristic; if the characteristic value type is a sequence type characteristic, performing word vectorization on the corresponding characteristic by using an Embedding method; and if the characteristic value type is the characteristic of the relational graph, carrying out graph vectorization on the corresponding characteristic.
And S102, performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof.
In this embodiment, a target neural network model is constructed, which includes but is not limited to a CNN structure network, an RNN structure network, a Bert structure network, and the like, and the constructed target neural network model has a multilayer structure. Specifically, after determining original features and virus labels respectively corresponding to multiple types of virus sample program codes, the original features preprocessed by normalization, Embedding and other methods are used as the input of the model, the virus labels are used as the output to perform neural network learning, and the target neural network model and model parameters therein are obtained through training.
And S103, performing neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses based on the model parameters of the target neural network model to obtain a target virus identification model.
The target virus may be any kind of virus, for example, the target virus may be an Advanced Persistent Threat (APT) or a Downloader type, and the present embodiment does not specifically limit the present invention.
The target type virus recognition model in the embodiment of the invention is divided into two large structural parts, wherein the multilayer structure connected with the input layer is a model migration layer, and the structure connected with the output layer is a model field layer. The model migration layer reserves more generalization knowledge information related to the target neural network model; the model domain layer reserves more basic information related to the target neural network model.
The target type virus identification model multiplexes a migration layer structure and parameters in a target neural network model, and a model field layer structure is reconstructed by combining field actual requirements; multiplexing the model parameters of the migration layer, and training the parameters of a target type virus identification model by utilizing field sample information (the target type virus identification model); and when the model parameters are stable, deriving a newly-built model structure carrying the requirements of the matching field.
And step S104, identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
In an optional embodiment provided by the present invention, the identifying, based on the target class virus identification model, whether the sample to be detected belongs to a network virus of a target class includes: acquiring original characteristics corresponding to a sample code to be detected, wherein the original characteristics comprise static characteristics and dynamic characteristics; and inputting the original characteristics corresponding to the sample code to be detected into the target type virus identification model, and determining the virus classification result of the sample code to be detected.
In this embodiment, after obtaining the original feature corresponding to the sample code to be detected, the original feature needs to be preprocessed, and the preprocessing process may specifically be: determining the characteristic value types respectively corresponding to all the characteristics in the original characteristics; and preprocessing the corresponding features according to the feature value types to obtain the processed original features. Wherein the pre-treatment comprises at least one or more of the following treatments: if the characteristic value type is a numerical characteristic or a coding characteristic, performing normalization processing on the corresponding characteristic; if the characteristic value type is a sequence type characteristic, performing word vectorization on the corresponding characteristic; and if the characteristic value type is the characteristic of the relational graph, carrying out graph vectorization on the corresponding characteristic.
The invention provides a network virus identification method, which comprises the steps of firstly determining original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes, wherein the original characteristics comprise static characteristics and dynamic characteristics; then, performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model; and then, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model based on model parameters of the target neural network model, and finally identifying whether the sample to be detected belongs to the network viruses of the target types or not according to the target virus identification model. The invention trains a target neural network model by combining the migration learning with a deep neural network model structure and by using the original characteristics of all virus types and virus labels, and then applies the model parameters of the target neural network model to a specific field model (target type virus) to realize the multiplexing of basic knowledge information. Meanwhile, the model is continuously trained by using training samples in the application field in combination with the actual requirements of the specific problem field, so that the expression capability of the model for the specific field is enhanced. And the identification accuracy rate of the target virus is improved.
Referring to fig. 2, another network virus identification method according to an embodiment of the present invention includes steps S201 to S205:
step S201, obtaining the original characteristics corresponding to the sample code to be detected.
Wherein the original features include static features and dynamic features. It should be noted that the detailed description of step S201 in this embodiment is the same as the description of the corresponding step in fig. 1, and this embodiment is not repeated herein.
Step S202, calculating the similarity between the original characteristics corresponding to the sample code to be detected and the characteristics of various types of viruses in the virus library.
The virus library stores various types of viruses and corresponding virus characteristics, and the virus characteristics also include static characteristics and dynamic characteristics. After the original features corresponding to the sample code to be detected are obtained, the similarity between the static features in the original features and the static features of various viruses in the virus library is calculated, the similarity between the dynamic features in the original features and the dynamic features of various viruses in the virus library is calculated, and then the sum of the similarity between the static features and the similarity between the dynamic features is calculated.
In an optional embodiment provided by the present invention, after obtaining the similarity between the static features in the original features and the static features of various viruses in the virus library, and the similarity between the dynamic features in the original features and the dynamic features of various viruses in the virus library, the similarity of the sample code to be detected may be obtained through weighted calculation.
For example, the similarity between the static features in the original features and the static features of various viruses in the virus library is 80%, the similarity between the dynamic features in the original features and the dynamic features of various viruses in the virus library is 50%, and if the weight value of the dynamic features is 0.8 and the weight value corresponding to the static features is 0.2, the similarity of the sample code to be detected obtained through weighting calculation is 56%.
In step S203, if the similarity exceeds the threshold, the virus type corresponding to the virus feature in the virus library is determined as the target type.
The threshold may be set according to actual requirements, and this embodiment does not specifically limit this. Specifically, the present embodiment determines, as the target category, a virus category corresponding to a virus feature whose similarity exceeds a threshold in the virus library.
For example, the virus library includes virus characteristics of 5 virus types, i.e., virus type 1, virus type 2, virus type 3, virus type 4, and virus type 5. After the similarity between the original characteristic corresponding to the sample code to be detected and the virus characteristic of the virus type 1 is 65%, the similarity between the original characteristic corresponding to the sample code to be detected and the virus characteristic of the virus type 2 is 60%, the similarity between the original characteristic corresponding to the sample code to be detected and the virus characteristic of the virus type 3 is 90%, the similarity between the original characteristic corresponding to the sample code to be detected and the virus characteristic of the virus type 4 is 54%, the similarity between the original characteristic corresponding to the sample code to be detected and the virus characteristic of the virus type 5 is 89%, and if the threshold value is 85%, the virus type 3 and the virus type 5 can be determined to be target types. After the target type is determined, the original characteristics corresponding to the sample code to be detected are respectively input into the type 3 virus identification model and the type 5 virus identification model, so as to determine whether the sample code to be detected belongs to the virus of the virus type 3 or the virus of the virus type 5.
And S204, respectively inputting the original characteristics corresponding to the sample code to be detected into the corresponding target type virus identification models, and determining the probability value of the sample code to be detected in each target type virus identification model.
And S205, determining the result of the target type virus identification model with the highest probability value as the virus classification result of the sample code to be detected.
For example, if the target types determined in step S203 are virus type 3 and virus type 5, the original features corresponding to the sample code to be detected are respectively input into a type 3 virus identification model and a type 5 virus identification model, and if the probability of the sample code to be detected being virus type 3 is 80% obtained by the type 3 virus identification model and the probability of the sample code to be detected being virus type 5 is 40% obtained by the type 5 virus identification model, the virus classification result corresponding to the sample code to be detected can be determined to be virus type 5.
In the embodiment of the invention, the target neural network model is trained by combining the migration learning with the deep neural network model structure and by using the original characteristics of all virus types and virus labels, and then the multi-layer migration model of the model is applied to a specific field model (target type virus), so that the multiplexing of basic knowledge information is realized. And different virus type recognition models are obtained by training specific type characteristic data, so that whether the corresponding program code belongs to the virus of the virus type can be recognized according to the different virus type recognition models, and the accuracy of virus recognition is improved through the embodiment.
In an optional embodiment provided by the present invention, in order to verify the accuracy of the identification result of the target type virus identification model, after determining the result of the target type virus identification model with the highest probability value as the virus classification result of the sample code to be detected, the sample code to be detected is input into a sandbox for execution, and the execution result corresponding to the sample code to be detected is determined; verifying whether the sample code to be detected is consistent with the result of the target type virus identification model with the highest probability value according to the execution result; and updating and training the target type virus identification model with the highest probability value according to the verification result.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, an apparatus for identifying a network virus is provided, where the apparatus for identifying a network virus corresponds to the method for identifying a network virus in the foregoing embodiment one to one. As shown in fig. 3, the functional modules of the network virus identification apparatus are described in detail as follows:
a determining module 31, configured to determine original features and virus tags respectively corresponding to multiple types of virus sample program codes, where the original features include static features and dynamic features;
the training module 32 is used for performing neural network learning according to the original features and the virus labels to obtain a target neural network model and model parameters thereof;
the training module 32 is further configured to perform neural network learning on original features and virus labels corresponding to sample program codes of the target virus type based on the model parameters of the target neural network model to obtain a target virus type identification model;
and the identifying module 33 is configured to identify whether the sample to be detected belongs to the network virus of the target category based on the target category virus identification model.
In an alternative embodiment, the identification module 33 is specifically configured to;
acquiring original features corresponding to a sample code to be detected, wherein the original features comprise static features and dynamic features;
and inputting the original characteristics corresponding to the sample code to be detected into the target type virus identification model, and determining the virus classification result of the sample code to be detected.
In an optional embodiment, the apparatus further comprises: a pre-processing module 34;
the determining module 33 is further configured to determine a feature value type corresponding to each feature in the original features;
and the preprocessing module 34 is configured to preprocess the corresponding features according to the feature value types to obtain processed original features.
In an alternative embodiment, the pre-treatment comprises at least one or more of the following treatments:
if the characteristic value type is a numerical characteristic or a coding characteristic, performing normalization processing on the corresponding characteristic;
if the characteristic value type is a sequence type characteristic, performing word vectorization on the corresponding characteristic;
and if the characteristic value type is the characteristic of the relational graph, carrying out graph vectorization on the corresponding characteristic.
In an alternative embodiment, the identification module 33 is specifically configured to;
calculating the similarity between the original characteristics corresponding to the sample code to be detected and various types of virus characteristics in a virus library; the virus library stores virus types respectively corresponding to various types of virus characteristics;
and if the similarity exceeds a threshold value, determining the virus type corresponding to the virus characteristics in the virus library as the target type. In an alternative embodiment, the identification module 33 is specifically configured to:
respectively inputting the original characteristics corresponding to the sample code to be detected into corresponding target type virus identification models, and determining the probability value of the sample code to be detected in each target type virus identification model;
and determining the result of the target type virus identification model with the highest probability value as the virus classification result of the sample code to be detected.
In an optional embodiment, the apparatus further comprises: a verification module 35;
the determining module 33 is further configured to input the sample code to be detected into a sandbox for execution, and determine an execution result corresponding to the sample code to be detected;
the verification module 35 is configured to verify whether the sample code to be detected is consistent with the result of the target class virus identification model with the highest probability value according to the execution result;
the training module 32 is further configured to update and train the target type virus identification model in which the probability value is highest according to the verification result.
For specific limitations of the network virus identification device, reference may be made to the above limitations of the network virus identification method, which are not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a network virus identification method.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
determining original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes, wherein the original characteristics comprise static characteristics and dynamic characteristics;
performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof;
based on the model parameters of the target neural network model, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model;
and identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
determining original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes, wherein the original characteristics comprise static characteristics and dynamic characteristics;
performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof;
based on the model parameters of the target neural network model, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model;
and identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
In one embodiment, a computer program product is provided, the computer program product comprising a computer program executed by a processor to perform the steps of:
determining original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes, wherein the original characteristics comprise static characteristics and dynamic characteristics;
performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof;
based on the model parameters of the target neural network model, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model;
and identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method for identifying a network virus, the method comprising:
determining original characteristics and virus labels respectively corresponding to a plurality of types of virus sample program codes, wherein the original characteristics comprise static characteristics and dynamic characteristics;
performing neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof;
based on the model parameters of the target neural network model, carrying out neural network learning on original characteristics and virus labels corresponding to the sample program codes of the target viruses to obtain a target virus identification model;
and identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
2. The method according to claim 1, wherein the identifying whether the sample to be detected belongs to the network virus of the target species based on the target species virus identification model comprises:
acquiring original features corresponding to a sample code to be detected, wherein the original features comprise static features and dynamic features;
and inputting the original characteristics corresponding to the sample code to be detected into the target type virus identification model, and determining the virus classification result of the sample to be detected.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
determining the characteristic value types respectively corresponding to all the characteristics in the original characteristics;
and preprocessing the corresponding features according to the feature value types to obtain the processed original features.
4. The method of claim 3, wherein the pre-processing comprises at least one or more of:
if the characteristic value type is a numerical characteristic or a coding characteristic, performing normalization processing on the corresponding characteristic;
if the characteristic value type is a sequence type characteristic, performing word vectorization on the corresponding characteristic;
and if the characteristic value type is the characteristic of the relational graph, carrying out graph vectorization on the corresponding characteristic.
5. The method according to claim 2, wherein before inputting the original features corresponding to the sample code to be detected into the target species virus identification model and determining the virus classification result of the sample to be detected, the method further comprises:
calculating the similarity between the original characteristics corresponding to the sample code to be detected and various types of virus characteristics in a virus library; the virus library stores virus types respectively corresponding to various types of virus characteristics;
and if the similarity exceeds a threshold value, determining the virus type corresponding to the virus characteristics in the virus library as the target type.
6. The method according to claim 5, wherein the inputting the original features corresponding to the sample code to be detected into the target-class virus identification model and determining the virus classification result of the sample code to be detected comprises:
respectively inputting the original characteristics corresponding to the sample code to be detected into corresponding target type virus identification models, and determining the probability value of the sample code to be detected in each target type virus identification model;
and determining the result of the target type virus identification model with the highest probability value as the virus classification result of the sample code to be detected.
7. The method of claim 5, wherein after determining the result of the target class virus identification model with the highest probability value as the virus classification result of the sample code to be detected, the method further comprises:
inputting the sample code to be detected into a sandbox for execution, and determining an execution result corresponding to the sample code to be detected;
verifying whether the sample code to be detected is consistent with the result of the target type virus identification model with the highest probability value according to the execution result;
and updating and training the target type virus identification model with the highest probability value according to the verification result.
8. An apparatus for identifying a network virus, the apparatus comprising:
the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining original characteristics and virus labels respectively corresponding to various types of virus sample program codes, and the original characteristics comprise static characteristics and dynamic characteristics;
the training module is used for carrying out neural network learning according to the original characteristics and the virus labels to obtain a target neural network model and model parameters thereof;
the training module is further used for carrying out neural network learning on original characteristics corresponding to the sample program codes of the target viruses and the virus labels based on the model parameters of the target neural network model to obtain a target virus identification model;
and the identification module is used for identifying whether the sample to be detected belongs to the network viruses of the target type or not based on the target type virus identification model.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the network virus identification method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the network virus identification method according to any one of claims 1 to 7.
CN202111521591.3A 2021-12-13 2021-12-13 Network virus identification method and device, computer equipment and storage medium Pending CN114254319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111521591.3A CN114254319A (en) 2021-12-13 2021-12-13 Network virus identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111521591.3A CN114254319A (en) 2021-12-13 2021-12-13 Network virus identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114254319A true CN114254319A (en) 2022-03-29

Family

ID=80794985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111521591.3A Pending CN114254319A (en) 2021-12-13 2021-12-13 Network virus identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114254319A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN110363003A (en) * 2019-07-25 2019-10-22 哈尔滨工业大学 A kind of Android virus static detection method based on deep learning
CN110428052A (en) * 2019-08-01 2019-11-08 江苏满运软件科技有限公司 Construction method, device, medium and the electronic equipment of deep neural network model
CN111046959A (en) * 2019-12-12 2020-04-21 上海眼控科技股份有限公司 Model training method, device, equipment and storage medium
CN111783088A (en) * 2020-06-03 2020-10-16 杭州迪普科技股份有限公司 Malicious code family clustering method and device and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN110363003A (en) * 2019-07-25 2019-10-22 哈尔滨工业大学 A kind of Android virus static detection method based on deep learning
CN110428052A (en) * 2019-08-01 2019-11-08 江苏满运软件科技有限公司 Construction method, device, medium and the electronic equipment of deep neural network model
CN111046959A (en) * 2019-12-12 2020-04-21 上海眼控科技股份有限公司 Model training method, device, equipment and storage medium
CN111783088A (en) * 2020-06-03 2020-10-16 杭州迪普科技股份有限公司 Malicious code family clustering method and device and computer equipment

Similar Documents

Publication Publication Date Title
CN110765265B (en) Information classification extraction method and device, computer equipment and storage medium
CN110135157B (en) Malicious software homology analysis method and system, electronic device and storage medium
CN111767707B (en) Method, device, equipment and storage medium for detecting Leideogue cases
CN111832294B (en) Method and device for selecting marking data, computer equipment and storage medium
CN111931935B (en) Network security knowledge extraction method and device based on One-shot learning
CN116361801B (en) Malicious software detection method and system based on semantic information of application program interface
Nowotny Two challenges of correct validation in pattern recognition
CN113408897A (en) Data resource sharing method applied to big data service and big data server
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN111046394A (en) Method and system for enhancing anti-attack capability of model based on confrontation sample
CN116015703A (en) Model training method, attack detection method and related devices
CN113918936A (en) SQL injection attack detection method and device
CN112528306A (en) Data access method based on big data and artificial intelligence and cloud computing server
CN112115266A (en) Malicious website classification method and device, computer equipment and readable storage medium
CN114285587A (en) Domain name identification method and device and domain name classification model acquisition method and device
Agrahari et al. Adaptive PCA-based feature drift detection using statistical measure
CN111783088A (en) Malicious code family clustering method and device and computer equipment
Paik et al. Malware family prediction with an awareness of label uncertainty
CN114254319A (en) Network virus identification method and device, computer equipment and storage medium
CN114266045A (en) Network virus identification method and device, computer equipment and storage medium
CN114266046A (en) Network virus identification method and device, computer equipment and storage medium
CN115375370A (en) Patent pricing evaluation method, device, computer equipment and medium
US11868473B2 (en) Method for constructing behavioural software signatures
CN113409014A (en) Big data service processing method based on artificial intelligence and artificial intelligence server
CN112860573A (en) Smartphone malicious software detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination