CN110879888A

CN110879888A - Virus file detection method, device and equipment

Info

Publication number: CN110879888A
Application number: CN201911122399.XA
Authority: CN
Inventors: 王春磊
Original assignee: New H3C Big Data Technologies Co Ltd
Current assignee: New H3C Big Data Technologies Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2020-03-13

Abstract

The application provides a virus file detection method, a virus file detection device and virus file detection equipment. The method comprises the following steps: performing word segmentation processing on a character string represented by a file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected; for each element in the feature matrix, converting the value of the element into a gray value to obtain a gray image to be detected corresponding to the file to be detected; inputting the gray-scale image to be detected into a trained virus classifier; and determining whether the file to be detected is a virus file or not according to the classification result output by the virus classifier. Therefore, file identification is converted into image identification, and virus files which are slightly changed or upgraded are identified by utilizing the characteristic of high identification accuracy of the classifier, so that the omission probability of the virus files is reduced.

Description

Virus file detection method, device and equipment

Technical Field

The present application relates to the field of network communication technologies, and in particular, to a method, an apparatus, and a device for detecting a virus file.

Background

The harm of virus files to computers is not insignificant. Such as illegal acquisition of computer privileges, illegal access to private computers, illegal control of computer resources, hijacking of user assets, etc. In order to protect against virus files, it is necessary to identify the virus files.

At present, the methods for detecting virus files mainly include the following two methods:

the method I comprises the steps of extracting partial texts or character strings from virus samples to serve as feature codes, and storing the feature codes into a virus library. When a file to be detected is received, extracting the feature codes of the file in the same extraction mode, and comparing the feature codes with the feature codes in the virus library. And if the consistent feature codes exist, determining that the file to be detected is a virus file.

And secondly, carrying out hash operation on the virus sample, and storing the hash value into a virus library. And when the file to be detected is received, executing the same hash operation on the file to be detected. And comparing the hash value corresponding to the file to be detected with the hash value in the virus library. And if the consistent hash value exists, determining that the file to be detected is a virus file.

However, the two methods can identify the virus file only under the condition that the file to be detected is completely consistent with the virus sample. If the file to be detected is a file which is slightly modified or upgraded on the basis of the known virus sample, the existing detection method cannot identify the virus file, so that the detection is missed.

Content of application

In view of this, the present application provides a method, an apparatus, and a device for detecting a virus file, so as to reduce the probability of missing detection of the virus file.

In order to achieve the purpose of the application, the application provides the following technical scheme:

in a first aspect, the present application provides a method for detecting a virus file, the method comprising:

performing word segmentation processing on a character string represented by a file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected;

for each element in the feature matrix, converting the value of the element into a gray value to obtain a gray image to be detected corresponding to the file to be detected;

inputting the gray-scale image to be detected into a trained virus classifier;

and determining whether the file to be detected is a virus file or not according to the classification result output by the virus classifier.

Optionally, before inputting the grayscale image to be detected into the trained virus classifier, the method further includes:

dividing a virus sample set into a training sample set and a testing sample set, wherein the virus sample set comprises a plurality of known virus samples;

training the deep learning model by using the virus samples in the training sample set to obtain a virus classifier;

verifying the classification accuracy of the virus classifier by using the virus samples in the test sample set for the virus classifier obtained by training;

and if the classification accuracy reaches a preset accuracy threshold, determining that the virus classifier is trained.

Optionally, the method further includes:

and if the classification accuracy rate does not reach the preset accuracy rate threshold value, selecting a part of virus samples from the test sample set to continue training the deep learning model until the classification accuracy rate of the trained virus classifier reaches the preset accuracy rate threshold value.

Optionally, the word segmentation processing is performed on the character string represented by the file to be detected to obtain the word segmentation characteristics of the file to be detected and the characteristic matrix of the file to be detected, and the word segmentation processing includes:

dividing the character string represented by the file to be detected into N word segmentation features according to the principle that a preset number of characters are divided into one word segmentation feature, and two adjacent word segmentation features in the character string do not comprise characters in the same position, wherein N is a positive integer;

and constructing a feature matrix of the file to be detected based on the N word segmentation features.

Optionally, the converting the value of the element into a gray value includes:

and based on the gray value range, carrying out normalization processing on the values of the elements to obtain corresponding gray values.

In a second aspect, the present application provides a virus file detection apparatus, the apparatus comprising:

the word segmentation unit is used for performing word segmentation processing on the character string represented by the file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected;

the conversion unit is used for converting the value of each element in the characteristic matrix into a gray value to obtain a to-be-detected gray map corresponding to the to-be-detected file;

the input unit is used for inputting the gray-scale image to be detected into the trained virus classifier;

and the first determining unit is used for determining whether the file to be detected is a virus file or not according to the classification result output by the virus classifier.

Optionally, the apparatus further comprises:

the device comprises a dividing unit, a judging unit and a judging unit, wherein the dividing unit is used for dividing a virus sample set into a training sample set and a testing sample set, and the virus sample set comprises a plurality of known virus samples;

the training unit is used for training the deep learning model by using the virus samples in the training sample set to obtain a virus classifier;

the verification unit is used for verifying the classification accuracy of the virus classifier by using the virus samples in the test sample set for the virus classifier obtained by training;

and the second determining unit is used for determining that the virus classifier is trained if the classification accuracy reaches a preset accuracy threshold.

Optionally, the training unit is further configured to select a part of the virus samples from the test sample set to continue training the deep learning model if the classification accuracy does not reach a preset accuracy threshold, until the classification accuracy of the trained virus classifier reaches the preset accuracy threshold.

Optionally, the word segmentation unit performs word segmentation on the character string represented by the file to be detected to obtain the word segmentation characteristics of the file to be detected and the characteristic matrix of the file to be detected, and the word segmentation includes:

Optionally, the converting unit converts the values of the elements into gray values, and includes:

In a third aspect, the present application provides an apparatus comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to implement the virus file detection method described above.

In a fourth aspect, the present application provides a machine-readable storage medium having stored therein machine-executable instructions, which when executed by a processor, implement the above-mentioned virus file detection method.

From the above description, it can be seen that in the application, the file identification is converted into the image identification, and the virus file slightly changed or upgraded can be identified by utilizing the characteristic of high identification accuracy of the classifier, so that the omission probability of the virus file is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for detecting a virus file according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating an implementation of training a virus classifier according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating an implementation of step 101 according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a virus file detection apparatus according to an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an apparatus according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the negotiation information may also be referred to as second information, and similarly, the second information may also be referred to as negotiation information without departing from the scope of the embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The embodiment of the application provides a virus file detection method. In the method, file identification is converted into image identification, and virus files which are slightly changed or upgraded can be identified by utilizing the characteristic of high identification accuracy of the classifier, so that the omission probability of the virus files is reduced.

In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application are described in detail below with reference to the accompanying drawings and specific embodiments:

referring to fig. 1, a flowchart of a virus file detection method according to an embodiment of the present application is shown. The process can be applied to equipment needing virus defense. Such as personal computers, servers, etc. The application is not limited to a particular type of device.

As shown in fig. 1, the process may include the following steps:

step 101, performing word segmentation processing on the character string represented by the file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected.

In order to protect against virus files, virus detection needs to be performed on the received files. Here, the received file is referred to as a file to be detected.

The files to be detected are usually in the form of strings, for example, the file "8 a345D673AB3043D4a220D 5F" in the form of hexadecimal strings.

The method comprises the following steps of carrying out word segmentation processing on a character string represented by a file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected. In one example, the elements in the feature matrix of the document to be detected are word segmentation features of the document to be detected.

The specific process of word segmentation processing involved in this step is described below, and is not described herein for the time being.

And 102, converting the value of each element in the characteristic matrix into a gray value to obtain a gray image to be detected corresponding to the file to be detected.

Here, the elements of the feature matrix are the word segmentation features obtained after the word segmentation processing in step 101.

In the step, the value of each element is converted into a corresponding gray value, and a gray map corresponding to the file to be detected is obtained. Here, the gray scale image corresponding to the document to be detected is referred to as a gray scale image to be detected.

The process of converting the value of the element into the gray value in this step is described below, and will not be described herein again.

And 103, inputting the gray-scale image to be detected into the trained virus classifier.

The virus classifier is a classifier trained on known virus samples. The process of training the virus separator is described below, and is not repeated here.

In this step, the gray-scale image to be detected obtained in step 102 is input into the trained virus classifier, and the classification result output by the virus classifier can be obtained.

And step 104, determining whether the file to be detected is a virus file or not according to the classification result output by the virus classifier.

The classification result output by the virus classifier can directly indicate whether the file to be detected is a virus file or not. For example, when the classification result is a first value, the file to be detected is represented as a virus file; and when the classification result is a second value, the file to be detected is not a virus file.

Thus, the flow shown in fig. 1 is completed.

As can be seen from the flow shown in fig. 1, in the embodiment of the present application, file identification is converted into image identification, and a virus file that is slightly changed or upgraded can be identified by using the characteristic of high identification accuracy of a classifier, so that the probability of missed detection of the virus file is reduced.

The process of training the virus classifier is described below. Referring to fig. 2, an implementation process of training a virus classifier is shown in an embodiment of the present application.

As shown in fig. 2, the process may include the following steps:

step 201, dividing a virus sample set into a training sample set and a testing sample set.

Here, the virus sample set includes a plurality of known virus samples.

And 202, training the deep learning model by using the virus samples in the training sample set to obtain a virus classifier.

The deep learning is a complex machine learning, and has a good recognition effect in the aspects of voice and image recognition. The typical deep learning model is a Convolutional Neural Network (CNN) model.

As an embodiment, the virus classifier can be obtained by training a convolutional neural network model by using virus samples in a training sample set.

Specifically, each virus sample in the training sample set is converted into a corresponding virus sample gray-scale map. The conversion process can refer to the aforementioned processing process (step 101 and step 102) for forwarding the file to be detected into the grayscale image to be detected, and details are not described here.

And setting the convolutional neural network model. For example, the convolutional neural network model is set to include two convolutional layers, and the convolutional cores of each convolutional layer are 32 and 64, respectively. The activation function of the convolutional neural network model is the Relu activation function.

And inputting the gray level image of each virus sample obtained by conversion into the set convolutional neural network model. In the convolutional neural network model, the gray-scale maps of the virus samples are firstly processed by a convolutional layer to obtain virus sample information sequences, and then the virus sample information sequences are input into a plurality of LSTM (Long Short-Term Memory) units to obtain virus characteristics. Inputting the extracted virus characteristics into a mean value pooling layer for smoothing, inputting the smoothed virus characteristics into a Dropout layer for dimension reduction, and finally inputting a Sigmoid function and outputting a classification result. The internal processing of the convolutional neural network model is prior art and will not be described in detail herein.

And training the convolutional neural network model by using all virus samples in the training sample set to obtain the virus classifier.

And step 203, verifying the classification accuracy of the virus classifier by using the virus samples in the test sample set for the virus classifier obtained by training.

After the virus classifier is obtained through step 202, the classification accuracy of the virus classifier needs to be verified.

Therefore, in the embodiment of the application, each virus sample in the test sample set is input into the trained virus classifier, and whether the virus classifier is classified correctly is determined according to the classification result output by the virus classifier.

For example, a virus file is input into a virus classifier, and if the classification result output by the virus classifier is a virus file, the classification is determined to be correct; and if the output classification result is a normal file, determining that the classification is wrong.

The embodiment of the application counts the number of the correctly classified virus samples, and takes the ratio of the number of the correctly classified virus samples to the total number of the virus samples in the test sample set as the classification accuracy of the virus classifier.

And 204, if the classification accuracy reaches a preset accuracy threshold, determining that the virus classifier is trained and meets the use requirement.

Thus, the flow shown in fig. 2 is completed.

As can be seen from the flow shown in fig. 2, in the embodiment of the present application, the virus classifier is trained by using the training sample set, and the classification accuracy of the virus classifier is verified by using the test sample set. After the virus classifier passes the verification, the virus classifier is used for identifying the virus file, so that the accuracy of virus file identification is ensured.

As an embodiment, if the classification accuracy obtained in step 203 does not reach the preset accuracy threshold, it indicates that the classification accuracy of the virus classifier obtained through the training in step 202 is low, and the accuracy of virus file identification cannot be guaranteed. Therefore, in the embodiment of the application, part of the virus samples can be selected from the test sample set, and the deep learning model is continuously trained until the classification accuracy of the trained virus classifier reaches the preset accuracy threshold, so that the training of the virus classifier is completed.

In addition, when a new virus which cannot be identified by the virus classifier is found, the new virus sample can be added, and the virus classifier is retrained, so that the virus classifier can identify the new virus. Compared with the prior art, the virus characteristic needs to be manually extracted and the virus characteristic library needs to be maintained, and the automatic update of the virus classifier can be realized in the embodiment of the application.

The following describes a process of performing word segmentation processing on the character string represented by the file to be detected in step 101. Referring to fig. 3, a flow of implementing step 101 is shown in the embodiment of the present application.

As shown in fig. 3, the process may include the following steps:

step 301, dividing a character string represented by a file to be detected into N word segmentation features according to a principle that a preset number of characters are divided into one word segmentation feature and two adjacent word segmentation features in the character string do not include characters in the same position.

Here, N is a positive integer.

As an example, the preset number may be 2. Take the file "8 a345D673AB3043D4a220D 5F" represented by hexadecimal character string as an example. Dividing each 2 hexadecimal characters into a word segmentation feature, wherein two word segmentation features adjacent in position do not comprise characters at the same position, and the word segmentation features obtained after division are as follows: 8A, 34, 5D, 67, 3A, B3, 04, 3D, 4A, 22, 0D, 5F.

As an example, the preset number may be 3. The file "8 a345D673AB3043D4a220D 5F" represented by a hexadecimal string is still taken as an example. Dividing each 3 hexadecimal characters into a word segmentation feature, wherein two word segmentation features adjacent in position do not comprise characters at the same position, and the word segmentation features obtained after division are as follows: 8A3, 45D, 673, AB3, 043, D4A, 220, D5F.

The above two examples are merely illustrative, and the present application does not limit the preset number.

Here, it should be noted that the existing word segmentation principle generally includes characters in the same position in two adjacent word segmentation features. Still taking the file "8 a345D673AB3043D4a220D 5F" represented by hexadecimal character string as an example, dividing 2 hexadecimal characters into a word segmentation feature, the following word segmentation features can be obtained: 8A, A3, 34, 45, 5D, D6, 67, 73, 3A, AB, B3, 30, 04, 43, 3D, D4, 4A, A2, 22, 20, 0D, D5, 5F.

The inventor finds that:

by combining the method provided by the application, the accuracy comparison result of the file identification is shown in table 1 by using the word segmentation characteristics divided by the existing word segmentation principle and the word segmentation characteristics divided by the word segmentation principle in the embodiment of the application.

TABLE 1

It can be seen that under the condition that the same file information is contained and the word segmentation characteristics are divided by the same number of characters, the division mode of the method can acquire fewer word segmentation characteristics under the condition that the recognition accuracy is hardly influenced, so that the operation complexity is reduced, and the word segmentation efficiency is improved.

And 302, constructing a feature matrix of the file to be detected based on the N word segmentation features.

In the embodiment of the application, the structure and the operation complexity of the deep learning model can be comprehensively considered, and the size of the feature matrix is preset, for example, the size of the feature matrix is preset to be M rows × K columns.

If the number N of the word segmentation features obtained in step 301 is greater than mxk, mxk word segmentation features may be selected from the N word segmentation features, for example, the previous mxk word segmentation features are selected to form a feature matrix of M rows × K columns.

Taking the 12 word segmentation features 8A, 34, 5D, 67, 3A, B3, 04, 3D, 4A, 22, 0D, and 5F obtained in step 301 as an example, the preset feature matrix size is 3 × 3, and the first 9 word segmentation features are selected from the word segmentation features to form a3 × 3 feature matrix shown in table 2.

8A	34	5D
			67	3A	B3
04	3D	4A

TABLE 2

If the number N of the word segmentation features obtained in step 301 is smaller than M × K, 0 may be complemented to form a feature matrix of M rows × K columns.

Taking the 8 word segmentation features 8a3, 45D, 673, AB3, 043, D4A, 220, and D5F obtained in step 301 as an example, the size of the preset feature matrix is 3 × 3, and if the number of the word segmentation features is smaller than the size of the feature matrix, a word segmentation feature 0 is supplemented to form a3 × 3 feature matrix shown in table 3.

8A3	45D	673
			AB3	043	D4A
220	D5F	0

TABLE 3

The flow shown in fig. 3 is completed.

The process shown in fig. 3 is used to implement word segmentation processing, and obtain a feature matrix corresponding to the document to be detected.

Next, a process of converting the value of each element in the feature matrix into a gray scale value in step 102 to obtain a gray scale image to be detected corresponding to the file to be detected is described.

As can be seen from the foregoing description, the elements in the feature matrix are word segmentation features. Based on different word segmentation principles, the value ranges of the obtained word segmentation characteristics (elements) are different. For example, based on the principle that 2 hexadecimal characters are divided into a word segmentation feature, the decimal value range of the obtained word segmentation feature is 0-255; based on the principle that 3 hexadecimal characters are divided into a word segmentation characteristic, the decimal value range of the obtained word segmentation characteristic is 0-4095.

Therefore, in the embodiment of the application, the value of each element (word segmentation feature) is normalized according to the gray value range (0-255), so as to obtain the gray value corresponding to each pixel point. In one example, the normalization formula is:

G＝F/H×D

wherein G represents a gray value; f is a decimal number corresponding to the value of the element; h is the maximum value of the decimal value range of the element + 1; d is the maximum value of the gray scale value range + 1.

Taking the feature matrix shown in table 2 as an example, each element in the feature matrix is composed of 2 hexadecimal characters, the corresponding decimal value range is 0-255, and is the same as the gray value range, so that the value of each element can be directly converted into the gray value corresponding to each pixel point. After conversion, as shown in table 4.

138	52	93
			103	58	179
4	61	74

TABLE 4

Taking the feature matrix shown in table 3 as an example, each element in the feature matrix is composed of 3 hexadecimal characters, the corresponding decimal value range is 0-4095, and the gray value range is 0-255, so the value of each element needs to be normalized and converted into a gray value within the range of 0-255.

Taking the first element 8a3 in table 3 as an example, and the corresponding decimal number is 2211, the gray value of the pixel point corresponding to the element is: 2211/4096 × 256 is 138. By analogy, the gray value of the pixel point corresponding to each element is obtained, as shown in table 5.

138	70	103
			171	4	213
34	214	0

TABLE 5

And at this point, converting the file to be detected into a corresponding gray-scale image to be detected.

In order to describe the method provided by the embodiment of the present application, the following describes the apparatus provided by the embodiment of the present application:

referring to fig. 4, a schematic structural diagram of an apparatus provided in an embodiment of the present application is shown. The device includes: a word segmentation unit 401, a conversion unit 402, an input unit 403, and a first determination unit 404, wherein:

the word segmentation unit 401 is configured to perform word segmentation on a character string represented by a file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected;

a converting unit 402, configured to convert, for each element in the feature matrix, a value of the element into a gray scale value, so as to obtain a to-be-detected gray scale map corresponding to the to-be-detected file;

an input unit 403, configured to input the grayscale image to be detected into a trained virus classifier;

a first determining unit 404, configured to determine whether the file to be detected is a virus file according to the classification result output by the virus classifier.

As an embodiment, the apparatus further comprises:

As an embodiment, the training unit is further configured to, if the classification accuracy does not reach a preset accuracy threshold, select a part of the virus samples from the test sample set to continue training the deep learning model until the classification accuracy of the trained virus classifier reaches the preset accuracy threshold.

As an embodiment, the word segmentation unit 401 performs word segmentation on the character string represented by the file to be detected to obtain the word segmentation characteristics of the file to be detected and the characteristic matrix of the file to be detected, and includes:

As an embodiment, the converting unit 402 converts the value of the element into a gray value, including:

The description of the apparatus shown in fig. 4 is thus completed. In the embodiment of the application, the file identification is converted into the image identification, and the virus file slightly changed or upgraded can be identified by utilizing the characteristic of high identification accuracy of the classifier, so that the omission probability of the virus file is reduced.

The following describes the apparatus provided in the embodiment of the present invention:

fig. 5 is a schematic diagram of a hardware structure of an apparatus according to an embodiment of the present invention. The apparatus may include a processor 501, a machine-readable storage medium 502 having stored thereon machine-executable instructions. The processor 501 and the machine-readable storage medium 502 may communicate via a system bus 503. Also, the processor 501 may perform the virus file detection method described above by reading and executing machine-executable instructions in the machine-readable storage medium 502 corresponding to the virus file detection logic.

The machine-readable storage medium 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium 502 may include at least one of the following storage media: volatile memory, non-volatile memory, other types of storage media. The volatile Memory may be a Random Access Memory (RAM), and the nonvolatile Memory may be a flash Memory, a storage drive (e.g., a hard disk drive), a solid state disk, and a storage disk (e.g., a compact disk, a DVD).

Embodiments of the present invention also provide a machine-readable storage medium, such as machine-readable storage medium 502 in fig. 5, comprising machine-executable instructions that are executable by processor 501 in a device to implement the virus file detection method described above.

So far, the description of the apparatus shown in fig. 5 is completed.

The above description is only a preferred embodiment of the present application, and should not be taken as limiting the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application shall be included in the scope of the present application.

Claims

1. A method for virus file detection, the method comprising:

inputting the gray-scale image to be detected into a trained virus classifier;

2. The method of claim 1, wherein before inputting the gray scale image to be detected into the trained virus classifier, the method further comprises:

3. The method of claim 2, wherein the method further comprises:

4. The method of claim 1, wherein the performing word segmentation on the character string represented by the file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected comprises:

5. The method of claim 1, wherein said converting the value of the element to a grayscale value comprises:

6. A virus file detection apparatus, comprising:

7. The apparatus of claim 6, wherein the apparatus further comprises:

8. The apparatus of claim 7, wherein:

and the training unit is also used for selecting partial virus samples from the test sample set to continue training the deep learning model if the classification accuracy rate does not reach a preset accuracy rate threshold value until the classification accuracy rate of the trained virus classifier reaches the preset accuracy rate threshold value.

9. The apparatus according to claim 6, wherein the word segmentation unit performs word segmentation on the character string represented by the file to be detected to obtain word segmentation characteristics of the file to be detected and a characteristic matrix of the file to be detected, and includes:

10. The apparatus of claim 6, wherein the conversion unit converts the values of the elements into grayscale values, comprising:

11. A device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: carrying out the method steps of any one of claims 1 to 5.

12. A machine-readable storage medium having stored therein machine-executable instructions which, when executed by a processor, perform the method steps of any of claims 1-5.