CN113569241A

CN113569241A - Virus detection method and device

Info

Publication number: CN113569241A
Application number: CN202110857502.6A
Authority: CN
Inventors: 唐侃毅; 周波; 褚军
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-10-29

Abstract

The application provides a virus detection method and a virus detection device. The method comprises the following steps: extracting static characteristics of an executable file to be detected; inputting the static characteristics into a static detection model to obtain a result value for representing a static detection result; if the result value hits a preset dynamic detection condition threshold value, extracting the running characteristics of the executable file; inputting the operation characteristics into a dynamic detection model, and determining whether the executable file is a virus file according to an output result of the dynamic detection model. It can be seen that the static detection and the dynamic detection are combined, so that the aim of integrally considering both the detection efficiency and the detection accuracy is fulfilled.

Description

Virus detection method and device

Technical Field

The present application relates to the field of computer security technologies, and in particular, to a method and an apparatus for detecting a virus.

Background

Computer viruses generally refer to artificially manufactured programs which have destructive effects on computer information or systems, have destructive, infectious and latent properties, and have the characteristic of high propagation speed along with the rapid development of network technologies.

A computer virus, as a program, is executable and therefore, it usually exists in the form of an executable file. In order to distinguish between normal executable files and virus files, virus detection needs to be performed on the executable files.

Currently, virus detection for executable files generally employs a single detection method, i.e., the same detection method is performed for all executable files, such as a static detection method or a dynamic detection method.

The static detection method mainly analyzes static characteristics (characteristics when a file is not operated) of the executable file to obtain an analysis result of whether the executable file is a virus file. The detection method has low detection accuracy on virus files subjected to shell adding or encryption processing.

The dynamic detection method mainly analyzes dynamic characteristics (characteristics of file operation) of the executable file to obtain an analysis result of whether the executable file is a virus file. The detection method is time-consuming, and therefore, the detection efficiency is low.

Disclosure of Invention

In view of the above, the present application provides a virus detection method and apparatus, which are used to consider both the virus detection accuracy and the detection efficiency.

In order to achieve the purpose of the application, the application provides the following technical scheme:

in a first aspect, the present application provides a method for virus detection, the method comprising:

extracting static characteristics of an executable file to be detected;

inputting the static characteristics into a static detection model to obtain a result value for representing a static detection result;

if the result value hits a preset dynamic detection condition threshold value, extracting the running characteristics of the executable file;

inputting the operating characteristics into a dynamic detection model, and determining whether the executable file is a virus file according to an output result of the dynamic detection model.

Optionally, the method further includes:

and if the result value is not hit in the dynamic detection condition threshold value, determining whether the executable file is a virus file or not according to the result value.

Optionally, the static features include byte features, import features, text features, and attribute features, where the byte features include a first byte feature determined based on the number of occurrences of the byte value and a second byte feature determined based on the byte entropy.

Optionally, extracting text features of the executable file includes:

counting the occurrence times of each readable character in an American Standard Code for Information Interchange (ASCII) Code table;

and performing hash operation of a preset dimension on a data combination consisting of the corresponding occurrence times of each readable character to obtain the text characteristics of the preset dimension, wherein the preset dimension is greater than the number of the readable characters in the ASCII code table.

Optionally, the extracting the dynamic feature of the executable file includes:

acquiring running information of the executable file during simulation running, wherein the running information comprises a name of a called Application Programming Interface (API), a number of a thread calling the API and a sequence number called by the API in the thread;

and extracting operation features from the operation information according to a preset feature extraction rule, wherein the operation features comprise global features, local features, API sequence features and API probability features of the API.

Optionally, the dynamic detection model is a pre-trained fusion model composed of a plurality of detection models, the plurality of detection models include at least one Text Convolutional Neural network (Text-CNN) model, and the inputting the operating characteristics into the dynamic detection model includes:

inputting the API sequence features into the at least one Text-CNN model;

inputting the features of the operating features except the API sequence features into the detection models of the dynamic detection model except the at least one Text-CNN model.

In a second aspect, the present application provides a virus detection apparatus, the apparatus comprising:

the extraction unit is used for extracting the static characteristics of the executable file to be detected;

the input unit is used for inputting the static characteristics into a static detection model to obtain a result value for representing a static detection result;

the extraction unit is further configured to extract an operation feature of the executable file if the result value hits a preset dynamic detection condition threshold;

the input unit is further configured to input the operation characteristics into a dynamic detection model, and determine whether the executable file is a virus file according to an output result of the dynamic detection model.

Optionally, the apparatus further comprises:

and the determining unit is used for determining whether the executable file is a virus file according to the result value if the result value is not hit in the dynamic detection condition threshold value.

Optionally, the extracting unit extracts a text feature of the executable file, including:

counting the number of times of occurrence of each readable character in the executable file aiming at each readable character in the ASCII code table;

Optionally, the extracting unit extracts the dynamic feature of the executable file, including:

acquiring running information of the executable file during simulation running, wherein the running information comprises the name of a called API, the number of a thread calling the API and a sequence number called by the API in the thread;

Optionally, the dynamic detection model is a pre-trained fusion model composed of a plurality of detection models, the plurality of detection models includes at least one Text-CNN model, and the inputting unit inputs the operation characteristic into the dynamic detection model, including:

inputting the API sequence features into the at least one Text-CNN model;

As can be seen from the above description, in the embodiment of the present application, static detection is performed on an executable file first, and when the static detection cannot accurately determine the file type (normal file or virus file) of the executable file, dynamic detection is performed on the executable file, so as to improve detection accuracy. On the contrary, if the file type of the executable file can be accurately determined by static detection, dynamic detection does not need to be performed on the executable file, so that the detection efficiency is improved. It can be seen that the detection efficiency and the detection accuracy can be effectively considered.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a virus detection method according to an embodiment of the present disclosure;

FIG. 2 is a block diagram of a static detection flow shown in an embodiment of the present application;

FIG. 3 is a block diagram of a dynamic detection flow shown in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a virus detection apparatus according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the embodiments of the present application, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the negotiation information may also be referred to as second information, and similarly, the second information may also be referred to as negotiation information without departing from the scope of the embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The application provides a virus detection method, which combines static detection and dynamic detection to achieve the aim of integrally considering both virus detection efficiency and detection accuracy.

For the purpose of making the objects, aspects and advantages of the present application more apparent, the following detailed description of the present application is made with reference to the accompanying drawings and specific embodiments:

referring to fig. 1, a flowchart of a virus detection method according to an embodiment of the present application is shown. As shown in fig. 1, the process may include the following steps:

step 101, extracting static characteristics of an executable file to be detected.

In the Windows operating system, the Executable file may be a Portable Executable (PE) file, and common PE files include an Executable (EXE) file, a Dynamic library (DLL) file, a system (SYS) file, and the like.

Herein, static features of an executable refer to relevant features of the executable that are not running.

For one embodiment, the static features may include byte features, import features, text features, and attribute features.

The following describes the extraction of these several features:

byte characteristics:

each file actually exists in binary form in the disk. As an example, a binary form of an executable file may be represented as: 0x 010 x 050 x 030 x01 … …. It can be seen that the executable file consists of a series of bytes.

The embodiment of the application aims at extracting byte characteristics of the executable file in the binary form. Here, the extracted byte features include a first byte feature determined based on the number of occurrences of the byte value and a second byte feature determined based on the byte entropy. It is to be understood that the first byte characteristic and the second byte characteristic are named for convenience of distinguishing and are not used for limitation.

As an example, the number (also referred to as dimension) of the first byte features required to be extracted may be determined according to the value range (0-255) of a single byte, and then, for each byte value (0x00, 0x01, 0x02, … …, 0xff), the number of times each byte value appears in the executable file may be counted separately. For example, a first byte signature of 256 dimensions [0, 2, 0, 1, … …, 5] can be obtained when 0x00 appears 0 times in the file, 0x01 appears 2 times in the file, 0x02 appears 0 times in the file, 0x03 appears 1 time in the file, … …, 0xff appears 5 times in the file. The 256-dimensional first byte characteristic can reflect the byte value distribution of the executable file.

As an example, the preset moving window size is 1024 bytes, and the moving step size is 256 bytes. And gradually moving on the executable file in the binary form by the moving window in a preset moving step. The byte entropy of each window is sequentially calculated according to 1024 bytes of the position of each window, and specifically, the byte entropy can be calculated by the following formula:

wherein n represents that the byte has n values; p_iThe probability of the ith byte value (byte value for short) in the window is represented; e represents the byte entropy of the window, which is used to characterize the uncertainty of the byte value within the window.

And generating a multidimensional second byte characteristic corresponding to the executable file according to the byte entropy of each window and the occurrence frequency of each byte value in the window. See table 1 for a 256 x 8 dimensional second byte feature example.

TABLE 1

Wherein, the horizontal axis represents byte values, including 256 byte values from 0x00 to 0 xff; the vertical axis represents byte entropy, including 8 byte entropies from 0 to 7.

For each byte value (256 byte values from 0x00 to 0xff), the number of times each byte value appears in the window is counted, and the byte entropy of the window is calculated according to equation (1).

As an example, if it is determined that 0x00 appears 10 times in the window, 0x01 appears 2 times in the window, 0x02 appears 20 times in the window, … …, 0xfe appears 0 times in the window, 0xff appears 5 times in the window, and the byte entropy of the current window is 1 according to the above statistical and calculation method, the number of occurrences of each byte value of the statistics is recorded in the row having the byte entropy of 1, as shown in table 2.

TABLE 2

Similarly, the above process is performed for each window and the results of the process are accumulated in the above table until the window slides over all bytes of the executable file to obtain the 256 × 8 dimensional second byte characteristic of the executable file.

Lead-in feature

The import table of the executable file is mainly used for recording system resource information required by file operation, such as a name of a system function required to be called, a name of a dynamic link library, and the like, and the system resource information is recorded in the import table in a character string form. In order to facilitate computer processing, the embodiment of the present application adopts a preset hash algorithm to convert a character string in an import table into a number as an extracted import feature.

The use of hash algorithms to convert strings into numbers is a mature technology and is not described in detail here. However, the dimension of the conversion can be set according to actual requirements, for example, a 256-dimensional hash algorithm is adopted to perform string/number conversion, so as to obtain 256-dimensional import characteristics.

Text features

The executable file may also be opened in the form of text, including strings of letters, numbers, symbols. The letters, numbers and symbols are usually characters with an ASCII code value of 0x 20-0 x7e in the ASCII code table, and are usually called readable characters or printable characters.

The embodiment of the application counts the occurrence times of each readable character in the ASCII code table in the executable file. For example, the character "! "(corresponding to an ASCII code value of 0x21) appears 4 times in the file, the character" # "(corresponding to an ASCII code value of 0x23) appears 17 times in the file, the character" a "(corresponding to an ASCII code value of 0x61) appears 30 times in the file, and the character" b "(corresponding to an ASCII code value of 0x62) appears 107 times in the file, … ….

Since the number of readable characters in the ASCII code table is only 95, only 95 statistical values, or 95-dimensional features, can be obtained through the above statistics. In order to improve the importance of the text features, the text features obtained through statistics are expanded to obtain text features with larger dimensionality.

Specifically, hash operation of a preset dimension is performed on a data combination composed of the corresponding occurrence times of each readable character, so as to obtain the text feature of the preset dimension. Here, the preset dimension is larger than the number of readable characters in the ASCII code table, for example, the preset dimension is 256 dimensions.

Through dimension extension, the importance of text features can be improved, all dimension features can be controlled within a certain range, and the phenomenon that some features correspond to too large numerical values and some features correspond to too small numerical values is avoided.

Attribute features

Here, the attribute feature refers to other accessory features that the file has in addition to the aforementioned main features (byte feature, import feature, text feature), such as a file header feature, a file general feature, a file section feature.

The file header features refer to features extracted based on file header information. The file header information is mainly used for explaining on which machine the file runs, sections, link time and the like. For example, the file header includes a Machine code (Machine) field for identifying a Machine code of a Central Processing Unit (CPU) running the file; the number of sections (English) field is used to identify the number of sections present in the file; the time of creation (english) field is used to identify the time of creation of the file, and so on. Among these information, information in a digital format (for example, the number of sections) may be directly used as features, and information in a text format may be used after being converted into a digital format.

The document general feature refers to a feature extracted based on document general information. The file general information typically includes: file size, file size in memory, whether it is in debug format, output information, input information, number of access resources, file signature, file flag, etc. Some of these information are in digital format, e.g., file size, size of the file in memory, which can be used directly as a feature; some of the text formats, such as output information, input information, file signatures, and file flags, may be used as features after converting the text format into a numerical format by a hash algorithm.

The section feature of the file refers to a feature extracted based on information of each section included in the executable file. Here, it should be noted that information of each section constituting the executable file, for example, whether a section is readable, writable, executable, and the like, is recorded in the file section table, and therefore, the embodiment of the present application may extract section features of the executable file according to the information of each section recorded in the file section table. For example, the number of sections with a length of 0, the number of sections named empty, the number of readable executable sections, the number of writable sections, the size of sections, etc. are counted. Similarly, if the information related to the digital format can be directly used as the feature, the information related to the text format needs to be converted into the digital format for use.

Through the above processing, static features required for static detection are obtained, for example, 2304-dimensional byte features (256-dimensional first byte features +256 × 8-dimensional second byte features), 256-dimensional import features, 256-dimensional text features, and 1024-dimensional attribute features.

And 102, inputting the static characteristics into a static detection model to obtain a result value for representing a static detection result.

As an example, the static detection model may be a Multi-Layer neural network (MLP). The neural network may consist of 1 input layer, 5 hidden layers, 1 output layer. The multidimensional static features obtained in step 101 (for example, 2304+256+ 1024 + 3840-dimensional static features) are input into the input layer, each hidden layer is composed of 512 nodes, and the output layer outputs 1-dimensional result values. The result value is usually between 0 and 1, for example, 0 represents a normal file, 1 represents a virus file, the closer the result value is to 0, the higher the probability of representing as a normal file is, and conversely, the closer to 1, the higher the probability of representing as a virus file is.

Referring to fig. 2, a static detection flow diagram is shown in the embodiment of the present application.

And 103, if the result value hits a preset dynamic detection condition threshold value, extracting the running characteristics of the executable file.

As can be seen from the analysis of the result values in step 102, when the result values approach 0 or 1, the file type (normal file or virus file) can be accurately predicted; when the result value is far from 0 or 1, for example, in the interval of 0.2-0.8, the prediction accuracy will be greatly reduced.

In order to meet the requirements of the overall detection efficiency and the detection accuracy, the dynamic detection condition threshold can be preset according to the actual application scene, for example, the dynamic detection condition threshold is preset to be 0.2-0.8.

If the result value output in step 102 does not hit the condition threshold, for example, the result value is 0.9512, which is close to 1, then the executable file can be accurately determined to be a virus file; for another example, if the result value is 0.084 and approaches 0, the executable file can be accurately determined to be a normal file.

If the result value output in step 102 hits the condition threshold, for example, the output result value is 0.4, which indicates that the static detection cannot accurately determine the file type, at this time, the dynamic detection may be used to improve the file detection accuracy. Therefore, the sandbox can be used for simulating the running of the executable file so as to extract the running characteristics of the executable file.

Specifically, the running information of the executable file is obtained. The run information may include the name of the called API, the number of the thread that called the API, and the sequence number in the thread that the API was called. Referring to Table 3, for an example of the running information of the executable file (file 1):

filename	Name of API	Thread numbering	Called order in threads
				file1	RegKeyExAapi1	2332	0
file1	CpFileAapi1	2332	1
				file1	OpenSCAapi1	2332	2
file1	CrtServiceAapi	2332	3
				file1	RegKeyExAapi1	2468	0
file1	CpFileAapi1	2468	1
				file1	OpenSCAapi1	2468	2
file1	CrtServiceAapi	2468	3
				file1	StartServiceA	2468	4
file1	NtCreateThreadEx	2468	5

TABLE 3

Taking the entry 1 as an example, the executable file1 first calls RegKeyExAapi1 when running, the RegKeyExAapi1 is called by the thread with the number 2332, and the RegKeyExAapi1 is the first API called by the thread 2332, and the corresponding call sequence number is 0.

And after all the running information of the executable file is acquired, extracting the running characteristics from the running information according to a preset characteristic extraction rule. The operational characteristics may include global characteristics, local characteristics, API sequence characteristics, and API probability characteristics.

Each dynamic feature extraction will be explained below:

global features

Global features generally refer to features extracted for a single run of information, such as features extracted only for thread numbers, features extracted only for call sequence numbers.

Here, a description will be given taking, as an example, a global feature extracted only for a thread number. See Table 4, based on

Example of global features resulting from thread numbering in table 3.

Name of API	Counting	Mean value	Variance (variance)	Minimum value	25％	50％	75％	Maximum value
									CpFileAapi1	2	2400	96.17	2332	2366	2400	2434	2468
CrtServiceAapi	2	2400	96.17	2332	2366	2400	2434	2468
									NtCreateThreadEx	1	2468	Air conditioner	2468	2468	2468	2468	2468
OpenSCAapi1	2	2400	96.17	2332	2366	2400	2434	2468
									RegKeyExAapi1	2	2400	96.17	2332	2366	2400	2434	2468
StartServiceA	1	2468	Air conditioner	2468	2468	2468	2468	2468

TABLE 4

Taking the entry 1 as an example, the count value of 2 indicates that CpFileAapi1 is called 2 times in file 1; mean 2400 is the average of the numbers (2332 and 2468) of the threads that called CpFileAapi 1; variance 96.17 is the variance of the number of threads invoking CpFileAapi 1; 2332 is the minimum thread number to call CpFileAapi 1; 2468 is the maximum thread number for calling CpFileAapi 1; 2366 is the thread number value at 25% component between the minimum thread number and the maximum thread number; 2400 is the thread number value at 50% of the component between the minimum thread number and the maximum thread number; 2434 is the thread number value at 75% of the component between the minimum and maximum thread numbers. Of course, the thread number values at other components (e.g., 0.2, 0.4, 0.6, 0.8) may also be extracted as the case may be.

Local features

The local features generally refer to features extracted in conjunction with a plurality of operational information, and may include second-order local features and higher-order local features.

Here, the second-order local feature refers to a feature extracted based on a combination of two pieces of operation information. For example, based on the combination of the file name and the thread number, counting the characteristics of the count value, the maximum value, the minimum value, the mean value, the variance and the like of the API calling sequence number in the corresponding thread; for another example, based on the combination of the file name and the thread number, the characteristics such as the count value of the API in the corresponding thread are counted.

Higher-order local features refer to features extracted based on a combination of more than two run information. For example, based on the combination of the file name, the API, and the thread number, the API calls for the characteristics of the count value, the maximum value, the minimum value, the mean value, the variance, and the like of the sequence number.

API sequence characteristics

The API sequence features are used to characterize the API call order of the file.

As shown in table 3, API names are usually represented in a character string form, and the embodiment of the present application needs to convert the API names in the character string form into a number form, and then characterize the calling order of the API based on the number form, that is, obtain API sequence features represented based on the number form.

As an example, all APIs of a file call may be first ordered by ASCII code, resulting in the following API order: CpFileAapi1, CrtServiceAapi, ntcreatetradax, OpenSCManager, RegKeyExAapi1, StartServiceA, then, 0 for CpFileAapi1, 1 for crtserveapi, 2 for ntcreatetradax, 3 for OpenSCManager, 4 for regkeyexaaapi 1, 5 for StartServiceA may be defined, and the API sequence feature of file1 may be represented as [4, 0, 3, 1, 4, 0, 3, 1, 5, 2 ].

API probabilistic features

As one example, an N-gram model may be employed to extract API probability features.

For example, if a 2-gram model is used, which indicates that the occurrence of one API is related to the previous API, the embodiment of the present application may extract the statistical count, the maximum value, the minimum value, and the like of the thread number or the call sequence number corresponding to the whole in the executable file as a whole.

For another example, by using a 20-gram model, which indicates that the occurrence of one API is related to the first 19 APIs, the embodiment of the present application may extract, as a whole, the statistical count, the maximum value, the minimum value, and the like of the thread number or the call sequence number corresponding to the whole executable file.

The above is the run feature extraction for executable files.

And 104, inputting the running characteristics into the dynamic detection model, and determining whether the executable file is a virus file according to an output result of the dynamic detection model.

In the embodiment of the application, the dynamic detection model is a pre-trained fusion model composed of a plurality of detection models. The plurality of detection models may include an eXtreme Gradient boost (XGboost, abbreviated as XGB), an MLP, a Light Gradient Boost (LGB), and a Text-CNN.

Referring to fig. 3, a block diagram of a dynamic detection process according to an embodiment of the present application is shown. The method comprises a dynamic detection model formed by a plurality of detection models. The multiple detection models comprise 4 XGB models (XGB 1-XGB 4), 2 MLB models (MLB1, MLB2), 4 LGB models (LGB 1-LGB 4) and 2 TEXT-CNN models (TEXT-CNN1, TEXT-CNN 2).

Wherein the XGB 1-XGB 4 are XGB models trained by adopting different super parameters; LGB 1-LGB 4 are LGB models trained by adopting different super parameters; MLB1 and MLB2 are provided with hidden layers with different depths; TEXT-CNN1 can be a TEXT-CNN model employing 7 different sized conventional convolution kernels; TEXT-CNN2 may be a TEXT-CNN model that employs 16 different size hole convolution kernels, i.e., the receptive field of the convolution kernels is increased by injecting holes over the convolution kernels.

Specifically, when the operation features obtained in step 103 are input to the dynamic inspection model shown in fig. 3, as shown in fig. 3, API sequence features may be input to TEXT-CNN1 and TEXT-CNN2, and operation features other than the API sequence features may be input to XGB1 to XGB4, MLB1, MLB2, and LGB1 to LGB4, respectively. After the multiple models are fused and output, a final detection result is obtained, and whether the executable file is a virus file or not can be accurately determined.

At this point, the virus detection process is completed.

According to the virus detection process, static detection is firstly performed on the executable file in the embodiment of the application, and when the file type (normal file or virus file) of the executable file cannot be accurately determined through the static detection, dynamic detection is performed on the executable file, so that the detection accuracy is improved. On the contrary, if the file type of the executable file can be accurately determined by static detection, dynamic detection does not need to be performed on the executable file, so that the detection efficiency is improved. Therefore, the embodiment of the application can effectively give consideration to both the detection efficiency and the detection accuracy.

The method provided by the embodiment of the present application is described above, and the virus detection apparatus provided by the embodiment of the present application is described below:

referring to fig. 4, a schematic structural diagram of a virus detection apparatus provided in the embodiment of the present application is shown. The device includes: an extraction unit 401 and an input unit 402, wherein:

an extracting unit 401, configured to extract a static feature of an executable file to be detected;

an input unit 402, configured to input the static feature into a static detection model, so as to obtain a result value representing a static detection result;

the extracting unit 401 is further configured to extract an operation feature of the executable file if the result value hits a preset dynamic detection condition threshold;

the input unit 402 is further configured to input the operation characteristic into a dynamic detection model, and determine whether the executable file is a virus file according to an output result of the dynamic detection model.

As an embodiment, the apparatus further comprises:

As one embodiment, the static features include byte features, import features, text features, and attribute features, wherein the byte features include a first byte feature determined based on a number of occurrences of a byte value and a second byte feature determined based on a byte entropy.

As an embodiment, the extracting unit 401 extracts a text feature of the executable file, including:

As an embodiment, the extracting unit 401 extracts the dynamic feature of the executable file, including:

As an embodiment, the dynamic detection model is a pre-trained fusion model composed of a plurality of detection models, the plurality of detection models includes at least one Text-CNN model, and the inputting unit 402 inputs the operation features into the dynamic detection model, including:

inputting the API sequence features into the at least one Text-CNN model;

Thus, the description of the apparatus is completed. In the embodiment of the application, static detection is performed on the executable file, and when the static detection cannot accurately determine the file type (normal file or virus file) of the executable file, dynamic detection is performed on the executable file, so that the detection accuracy is improved. On the contrary, if the file type of the executable file can be accurately determined by static detection, dynamic detection does not need to be performed on the executable file, so that the detection efficiency is improved. It can be seen that the detection efficiency and the detection accuracy can be effectively considered.

The above description is only a preferred embodiment of the present application, and should not be taken as limiting the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application shall be included in the scope of the present application.

Claims

1. A method for detecting a virus, the method comprising:

extracting static characteristics of an executable file to be detected;

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 1, wherein the static features comprise byte features, import features, text features, and attribute features, wherein the byte features comprise a first byte feature determined based on a number of occurrences of a byte value and a second byte feature determined based on a byte entropy.

4. The method of claim 3, wherein extracting textual features of the executable file comprises:

counting the occurrence times of each readable character in the American Standard Code for Information Interchange (ASCII) code table in the executable file;

5. The method of claim 1, wherein said extracting dynamic features of said executable file comprises:

acquiring running information of the executable file during simulation running, wherein the running information comprises the name of a called Application Program Interface (API), the number of a thread calling the API and a sequence number called by the API in the thread;

6. The method of claim 5, wherein the dynamic detection model is a pre-trained fusion model consisting of a plurality of detection models, the plurality of detection models including at least one Text convolutional neural network Text-CNN model, the inputting the operational features into the dynamic detection model comprising:

inputting the API sequence features into the at least one Text-CNN model;

7. A virus detection apparatus, the apparatus comprising:

8. The apparatus of claim 7, wherein the static features comprise byte features, import features, text features, and attribute features, wherein the byte features comprise a first byte feature determined based on a number of occurrences of a byte value and a second byte feature determined based on a byte entropy.

9. The apparatus of claim 8, wherein the extraction unit extracts a text feature of the executable file, comprising:

10. The apparatus of claim 7, wherein the extraction unit to extract dynamic features of the executable file comprises: