CN111400707A - File macro virus detection method, device, equipment and storage medium - Google Patents

File macro virus detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN111400707A
CN111400707A CN202010162723.7A CN202010162723A CN111400707A CN 111400707 A CN111400707 A CN 111400707A CN 202010162723 A CN202010162723 A CN 202010162723A CN 111400707 A CN111400707 A CN 111400707A
Authority
CN
China
Prior art keywords
file
macro
detected
code
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010162723.7A
Other languages
Chinese (zh)
Inventor
杨玉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202010162723.7A priority Critical patent/CN111400707A/en
Publication of CN111400707A publication Critical patent/CN111400707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/565Static detection by checking file integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Abstract

The application discloses a file macro virus detection method, which comprises the following steps: determining a file to be detected in the obtained target files; extracting macro code features in a file to be detected; and determining whether the file to be detected carries macro viruses or not by utilizing a pre-trained detection model based on the macro code characteristics of the file to be detected. By applying the technical scheme provided by the embodiment of the application, the condition that the macro virus is carried by the file to be detected can be avoided, the operation of the macro virus is triggered due to the fact that the file to be detected is opened or executed, damage is caused to a computer or a document of a user, a plurality of files to be detected can be continuously detected in real time, moreover, by means of the macro code characteristics of the files to be detected and a pre-trained detection model, whether the files to be detected carry the macro virus or not can be accurately determined, the identification accuracy rate of the macro virus is improved, and the detection capability of unknown macro virus is improved. The application also discloses a file macro virus detection device, equipment and a storage medium, and the file macro virus detection device, the equipment and the storage medium have corresponding technical effects.

Description

File macro virus detection method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer application technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting macro viruses of a file.
Background
The macro virus is a computer virus registered in the macro of common office documents or templates, and has the characteristics of high popularity, strong infectivity, great harm and the like. Once opened, documents carrying macro viruses may be executed, thereby endangering the computer.
At present, one method for detecting macro viruses in files is to use a sandbox dynamic detection technology to place files in a sandbox for execution, obtain an execution report, and determine whether macro viruses are carried.
This method has certain disadvantages. Firstly, each time a file is detected, a set of system environment needs to be started and destroyed, and the file cannot be detected in real time. In addition, many macro viruses are provided with anti-sandboxing technology, so that the macro viruses in the file cannot be accurately identified by the method.
Disclosure of Invention
The application aims to provide a method, a device, equipment and a storage medium for detecting macro viruses of files so as to detect whether the macro viruses are carried in the files in real time and improve the detection accuracy.
In order to solve the technical problem, the application provides the following technical scheme:
a file macro virus detection method comprises the following steps:
determining a file to be detected in the obtained target files;
extracting macro code features in the file to be detected;
and determining whether the to-be-detected file carries macro viruses or not by utilizing a pre-trained detection model based on the macro code characteristics of the to-be-detected file.
In a specific embodiment of the present application, the determining a file to be detected in the obtained target file includes:
carrying out format recognition on the target file;
judging whether the target file is an office document file or not according to a format identification result;
if the target file is an office document file, judging whether the target file contains a macro code;
and if the target file contains the macro code, determining the target file as a file to be detected.
In a specific embodiment of the present application, the extracting macro code features in the file to be detected includes:
judging whether the to-be-detected file contains a macro code;
if the to-be-detected file contains the macro code, extracting the macro code from the to-be-detected file to obtain a macro code file;
and extracting macro code features in the macro code file.
In a specific embodiment of the present application, the extracting a macro code from the file to be detected to obtain a macro code file includes:
extracting macro codes line by line in the file to be detected by utilizing a macro code extraction tool;
and splicing the extracted macro codes to obtain a macro code file.
In a specific embodiment of the present application, the determining, based on the macro code feature of the file to be detected and by using a pre-trained detection model, whether the file to be detected carries a macro virus includes:
inputting the macro code characteristics of the file to be detected into a pre-trained detection model to obtain a predicted value;
and determining whether the file to be detected carries the macro virus or not according to the predicted value.
In a specific embodiment of the present application, the determining whether the file to be detected carries a macro virus according to the predicted value includes:
comparing the predicted value with a preset threshold value;
and judging whether the file to be detected carries the macro virus or not according to the comparison result.
In one embodiment of the present application, the detection model is obtained by pre-training through the following steps:
establishing a detection initial model;
acquiring a white file and a black file carrying macro codes;
respectively extracting macro code features in the white file and the black file;
and training the detection initial model based on the macro code characteristics of the white files and the black files to obtain the trained detection model.
In one embodiment of the present application, the macro code features of the white file and the black file include at least one of the following features:
information entropy, registry operation, file reading and writing, network request, shell creation, code conversion, automatic execution and character string confusion and splicing.
A file macro virus detection apparatus comprising:
the file determining module is used for determining a file to be detected in the obtained target files;
the characteristic extraction module is used for extracting macro code characteristics in the file to be detected;
and the macro virus detection module is used for determining whether the to-be-detected file carries macro viruses or not by utilizing a pre-trained detection model based on the macro code characteristics of the to-be-detected file.
A file macro virus detection device, comprising:
a memory for storing a computer program;
a processor, configured to implement the steps of any one of the file macro virus detection methods when executing the computer program.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the file macro virus detection methods described above.
By applying the technical scheme provided by the embodiment of the application, after the file to be detected is determined in the obtained target file, the macro code feature in the file to be detected can be extracted, and based on the macro code feature of the file to be detected, whether the file to be detected carries macro viruses or not can be determined by utilizing a pre-trained detection model. The macro code characteristics are obtained without opening or executing the file to be detected, and corresponding characteristics can be obtained from the macro code of the file to be detected, so that the condition that the macro virus is carried by the file to be detected and the operation of the macro virus is triggered due to the opening or the execution of the file to be detected, damage to a computer or a document of a user can be avoided, and the real-time detection can be continuously carried out on a plurality of files to be detected. Moreover, by utilizing the macro code characteristics of the file to be detected and the pre-trained detection model, whether the file to be detected carries macro viruses or not can be accurately determined, the macro virus identification accuracy is improved, and the detection capability of unknown macro viruses is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating an implementation of a document macro virus detection method according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of a document macro virus detection apparatus according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a file macro virus detection device in an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an implementation flowchart of a document macro virus detection method provided in an embodiment of the present application is shown, where the method may include the following steps:
s110: and determining the file to be detected in the obtained target file.
Macro virus, usually written using vba (visual Basic for application), depends on the office document itself, is independent of the system, and is often propagated by means of mail attachments and the like, inducing users to open the document and enable macros in the mail subject or body, and often releasing files or pulling PE virus from a remote server to solidify attack results after macro virus codes are executed.
The embodiment of the application only detects the File, and can cover a plurality of Protocol scenes, such as e-mail, intranet SMB (Server Message Block) File sharing, FTP (File Transfer Protocol), HTTP (HyperText Transfer Protocol) File Transfer and the like.
In practical application, macro virus detection may be performed in the scene, and when it is monitored that a file exists in the scene, the file may be determined as an object file, and then the file to be detected is determined in the object file, or macro virus detection may be performed in a file directory, and according to a received detection instruction, each file in an appointed directory is determined as an object file, and then the file to be detected is determined in the object file, or some appointed files carrying macro codes may be determined as object files, and then the file to be detected is determined in the object file.
The file to be detected is the file which needs to detect whether the file carries the macro virus or not at present.
S120: and extracting the macro code characteristics in the file to be detected.
In the embodiment of the application, whether the file to be detected carries the macro code or not can be determined, and if the file to be detected carries the macro code, the macro code feature in the file to be detected can be extracted first. The macro code features may include at least one of the following features: information entropy, registry operation, file reading and writing, network request, shell (computer shell) creation, code conversion, automatic execution and character string confusion and splicing.
Wherein, the information entropy characteristic is as follows: the amount of information in the macro code can be measured;
registry operating characteristics: if the macro code is macro virus code, the macro code tends to have registry writing behavior for persistent survival;
file read-write characteristics: if the macro code is a macro virus code, the macro code often has an operation of releasing a file, a part of malicious files in a file package can be written into a disk of a victim, and an infected macro virus can infect other normal files of the victim;
network request feature: if the macro code is macro virus code, the macro code is usually only a payload carrier, and often the virus parent is pulled from the internet to execute the macro virus code, a network request is required in the process, and normal documents usually do not have the behavior;
creating a shell feature: if the macro code is macro virus code, it will often create a wshell (Windows shell application) environment or powershell (a command line shell and scripting environment) environment to execute commands, which normal documents typically do not have;
code conversion characteristics: the feature is often a method used by macro virus confusion, and key codes are hidden through code conversion to avoid feature code scanning;
an automatic execution feature: if the macro code is a macro virus code, the macro code block can be automatically executed when a user opens the document and starts the macro, and the normal document is less in use;
character string confusion splicing characteristics: if the macro code is macro virus code, it will hide key code or information, such as malicious url, from the antivirus engine.
Of course, in practical application, other macro code features may also be extracted according to practical situations.
The macro code features are extracted without opening or executing the file to be detected, and corresponding features can be obtained from the macro code of the file to be detected, so that the condition that the macro virus is carried by the file to be detected and the operation of the macro virus is triggered due to the opening or the execution of the file to be detected can be avoided, the damage to a computer or a document of a user can be avoided, and the infection risk can be reduced.
After the macro code features in the file to be detected are extracted, the operation of step S130 may be continuously performed.
S130: and determining whether the file to be detected carries macro viruses or not by utilizing a pre-trained detection model based on the macro code characteristics of the file to be detected.
In the embodiment of the application, a large number of white files and black files carrying macro codes can be obtained in advance, wherein the white files refer to that the macro codes carried by the white files are normal codes and are not macro virus codes, and the black files refer to that the macro codes carried by the black files are abnormal codes and are macro virus codes. The detection model can be obtained by training with white files and black files, or by training with white files or black files, respectively.
After the macro code features in the file to be detected are obtained, whether the file to be detected carries macro viruses or not can be determined by utilizing a pre-trained detection model based on the macro code features of the file to be detected. Specifically, the macro code features of the file to be detected can be input into a pre-trained detection model, the macro code features are analyzed by using the detection model to obtain a classification result, and whether the file to be detected carries macro viruses or not is determined based on the classification result.
If the file to be detected is determined to carry the macro virus, the file to be detected can be prohibited from being opened or executed, and alarm information is output, so that a user can process the file to be detected in time.
By applying the method provided by the embodiment of the application, after the file to be detected is determined in the obtained target file, the macro code feature in the file to be detected can be extracted, and based on the macro code feature of the file to be detected, whether the file to be detected carries macro viruses or not can be determined by utilizing a pre-trained detection model. The macro code characteristics are obtained without opening or executing the file to be detected, and corresponding characteristics can be obtained from the macro code of the file to be detected, so that the condition that the macro virus is carried by the file to be detected and the operation of the macro virus is triggered due to the opening or the execution of the file to be detected, damage to a computer or a document of a user can be avoided, and the real-time detection can be continuously carried out on a plurality of files to be detected. Moreover, by utilizing the macro code characteristics of the file to be detected and the detection model obtained by training, whether the file to be detected carries macro viruses or not can be accurately determined, the identification accuracy of the macro viruses is improved, and the detection capability of unknown macro viruses is improved.
In one embodiment of the present application, step S110 may include the steps of:
the method comprises the following steps: carrying out format recognition on the target file;
step two: judging whether the target file is an office document file or not according to the format identification result, and if so, executing a third step;
step three: judging whether the target file contains macro codes or not; if yes, executing step four;
step four: and determining the target file as the file to be detected.
For convenience of description, the above four steps are combined for illustration.
In the embodiment of the present application, the target file may be obtained first. The target file may be a file in the currently monitored mail attachment or may be a file in a designated directory.
After the target file is obtained, format recognition can be performed on the target file. The document file usually has a specific file structure, and can be identified in a relatively accurate format through magic numbers or file header information fields and the like.
According to the format recognition result, it can be judged whether the target file is an office document file. Specifically, the determination may be performed by using mime _ type (multimedia file type value) carried in the format identification result, and if the mime _ type is doc, excel, ppt, the target file may be determined to be an office document file.
If the target file is judged to be an office document file, further, whether the macro code is contained in the target file may be judged. Specifically, an attempt may be made to extract the macro code from the target file line by using a macro code extraction tool, such as oletools, and if the macro code can be extracted, it may be determined that the target file contains the macro code, and if the extraction fails, it may be determined that the target file does not contain the macro code, and subsequent operations may not be performed on the target file.
When the macro code is contained in the target file, the target file can be determined as the file to be detected. And performing subsequent detection operations. If a macro code extraction attempt is made on the target file through the macro code extraction tool, the macro code characteristics can be obtained by using the extracted macro code.
In the embodiment of the application, all files are not determined as the files to be detected, but the files are determined as the office document files through format recognition, and the target files containing the macro codes are determined as the files to be detected, and the corresponding macro virus detection is performed, so that the waste of detection resources can be avoided, and the detection efficiency is improved.
In one embodiment of the present application, step S120 may include the steps of:
the first step is as follows: judging whether the file to be detected contains macro codes or not; if so, executing the second step;
the second step is that: extracting a macro code from a file to be detected to obtain a macro code file;
the third step: and extracting macro code features in the macro code file.
For convenience of description, the above three steps are combined for illustration.
After the file to be detected is determined, whether the file to be detected contains the macro code or not can be judged, if yes, the macro code can be extracted from the file to be detected, and the macro code file is obtained. Specifically, macro codes can be extracted line by line in the file to be detected by using a macro code extraction tool, such as oletools, and then the extracted macro codes are spliced to obtain a macro code file.
After the macro code file is obtained, macro code features may be extracted in the macro code file.
In the embodiment of the application, the macro code is extracted from the file to be detected to obtain the macro code file, and then the macro code feature is extracted from the macro code file. Because the document file is actually a file packet, if the whole file is used as an object for extracting the macro code features, the whole file is easily interfered, and the main attack method of the document file is in the macro code, therefore, the macro code is firstly extracted from the file to be detected to obtain the macro code file, and then the effect of extracting the macro code features from the macro code file is better.
In one embodiment of the present application, step S130 may include the steps of:
the method comprises the following steps: inputting the macro code characteristics of the file to be detected into a detection model trained in advance to obtain a predicted value;
step two: and determining whether the file to be detected carries the macro virus or not according to the predicted value.
For convenience of description, the above two steps are combined for illustration.
After the macro code features in the file to be detected are obtained, the macro code features of the file to be detected can be input into the detection model to obtain a predicted value. The detection model may be trained based on pre-obtained white and/or black files carrying macro code.
In different types of detection models, the size of the predicted value and the size of the possibility that the file to be detected carries the macro virus may have different corresponding relations. Different thresholds may be set for different detection models. And comparing the predicted value with a preset threshold value, and judging whether the file to be detected carries the macro virus or not according to a comparison result. In some detection models, the larger the predicted value is, the higher the possibility that the file to be detected carries the macro virus is, and the smaller the predicted value is, the smaller the possibility that the file to be detected carries the macro virus is, that is, when the predicted value is greater than a preset threshold value, the file to be detected can be judged to carry the macro virus. In some detection models, the larger the predicted value is, the smaller the possibility that the file to be detected carries the macro virus is, and the smaller the predicted value is, the larger the possibility that the file to be detected carries the macro virus is, that is, when the predicted value is smaller than a preset threshold value, the file to be detected can be judged to carry the macro virus.
In practical applications, the threshold may be set based on historical data and experience.
In one embodiment of the present application, the detection model may be obtained by pre-training by:
the first step is as follows: establishing a detection initial model;
the second step is that: acquiring a white file and a black file carrying macro codes;
the third step: respectively extracting macro code features in the white file and the black file;
the fourth step: and training the detection initial model based on the macro code characteristics of the white files and the black files to obtain a trained detection model.
For convenience of description, the above four steps are combined for illustration.
In the embodiment of the present application, a detection initial model may be established first. The detection initial model may be an XGBoost (eXtreme Gradient Boosting) model, and may also be a decision tree model, a naive bayes model, or the like.
The XGboost model is a machine learning algorithm, and the core idea of the XGboost model is to integrate a plurality of tree models together to form a strong classifier.
The method comprises the steps that a white file and a black file can be obtained through file collection and the like, the white file and the black file both carry macro codes, the macro codes in the white file are normal codes, and the macro codes in the black file are macro virus codes.
And then obtaining the macro code characteristics in the white file and the black file. Specifically, for each of the white file and the black file, a macro code extraction tool may be used to extract macro codes line by line in the file, and the extracted macro codes are spliced to obtain a macro code file, and a macro code feature is extracted from the macro code file.
The macro code features of the white and black files include at least one of the following features: information entropy, registry operation, file reading and writing, network request, shell creation, code conversion, automatic execution and character string confusion and splicing.
Based on the macro code characteristics of the white files and the black files, the initial detection model can be trained, and the trained detection model can be obtained. Specifically, one part of the white files and the black files can be added into a training set, the other part of the white files and the black files can be added into a testing set, the training set is used for training the detection initial model, the parameters of the detection initial model are adjusted, the testing set is used for testing the trained detection initial model, if the accuracy of the test result is higher than the set accuracy threshold, the currently trained detection initial model can be determined as a trained detection model, if the accuracy of the test result is not higher than the accuracy threshold, the training of the detection initial model can be repeated, or adding some white files and black files into the training set, repeating the training of the detection initial model until the test result is higher than the accuracy threshold, the trained detection model can be obtained, and then whether the file to be detected carries the macro virus can be detected by using the detection model.
In practical application, the detection model can be trained according to a set time interval or when a training instruction is received, so that parameters in the detection model are continuously adjusted, and the detection accuracy is higher.
Corresponding to the above method embodiments, the present application embodiment further provides a file macro virus detection apparatus, and the file macro virus detection apparatus described below and the file macro virus detection method described above may be referred to in correspondence.
Referring to fig. 2, the apparatus includes the following modules:
a file determining module 210, configured to determine a file to be detected in the obtained target files;
the feature extraction module 220 is configured to extract and obtain macro code features in the file to be detected;
and the macro virus detection module 230 is configured to determine whether the to-be-detected file carries macro viruses or not by using a pre-trained detection model based on the macro code features of the to-be-detected file.
By applying the device provided by the embodiment of the application, after the file to be detected is determined in the obtained target file, the macro code feature in the file to be detected can be extracted, and based on the macro code feature of the file to be detected, whether the file to be detected carries macro viruses or not can be determined by utilizing a pre-trained detection model. The macro code characteristics are obtained without opening or executing the file to be detected, and corresponding characteristics can be obtained from the macro code of the file to be detected, so that the condition that the macro virus is carried by the file to be detected and the operation of the macro virus is triggered due to the opening or the execution of the file to be detected, damage to a computer or a document of a user can be avoided, and the real-time detection can be continuously carried out on a plurality of files to be detected. Moreover, by utilizing the macro code characteristics of the file to be detected and the trained detection model, whether the file to be detected carries macro viruses or not can be accurately determined, the identification accuracy of the macro viruses is improved, and the detection capability of unknown macro viruses is improved.
In one embodiment of the present application, the file determining module 210 is configured to:
carrying out format recognition on the target file;
judging whether the target file is an office document file or not according to the format identification result;
if the target file is an office document file, judging whether the target file contains macro codes or not;
and if the target file contains the macro code, determining the target file as the file to be detected.
In one embodiment of the present application, the feature extraction module 220 is configured to:
judging whether the file to be detected contains macro codes or not;
if the to-be-detected file contains the macro code, extracting the macro code from the to-be-detected file to obtain a macro code file;
and extracting macro code features in the macro code file.
In one embodiment of the present application, the feature extraction module 220 is configured to:
extracting macro codes line by line in the file to be detected by utilizing a macro code extraction tool;
and splicing the extracted macro codes to obtain a macro code file.
In one embodiment of the present application, the macro virus detection module 230 is configured to:
inputting the macro code characteristics of the file to be detected into a detection model trained in advance to obtain a predicted value;
and determining whether the file to be detected carries the macro virus or not according to the predicted value.
In one embodiment of the present application, the macro virus detection module 230 is configured to:
comparing the predicted value with a preset threshold value;
and judging whether the file to be detected carries the macro virus or not according to the comparison result.
In a specific embodiment of the present application, the method further includes a model training module, configured to train and obtain the detection model by:
establishing a detection initial model;
acquiring a white file and a black file carrying macro codes;
respectively extracting macro code features in the white file and the black file;
and training the detection initial model based on the macro code characteristics of the white files and the black files to obtain a trained detection model.
In one embodiment of the present application, the macro code features of the white and black files include at least one of the following features:
information entropy, registry operation, file reading and writing, network request, shell creation, code conversion, automatic execution and character string confusion and splicing.
Corresponding to the above method embodiment, an embodiment of the present application further provides a file macro virus detection device, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the file macro virus detection method when executing the computer program.
As shown in fig. 3, which is a schematic diagram of a structure of a file macro virus detection device, the file macro virus detection device may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all communicate with each other through a communication bus 13.
In the embodiment of the present application, the processor 10 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, a field programmable gate array or other programmable logic device, etc.
The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in an embodiment of the file macro virus detection method.
The memory 11 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, in this embodiment, the memory 11 stores at least the program for implementing the following functions:
determining a file to be detected in the obtained target files;
extracting macro code features in a file to be detected;
and determining whether the file to be detected carries macro viruses or not by utilizing a pre-trained detection model based on the macro code characteristics of the file to be detected.
In one possible implementation, the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a feature extraction function, a model training function), and the like; the storage data area may store data created during use, such as feature data, model data, and the like.
Further, the memory 11 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.
The communication interface 13 may be an interface of a communication module for connecting with other devices or systems.
Of course, it should be noted that the structure shown in fig. 3 does not constitute a limitation on the document macro virus detection device in the embodiment of the present application, and in practical applications, the document macro virus detection device may include more or less components than those shown in fig. 3, or some components in combination.
Corresponding to the above method embodiments, the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the file macro virus detection method are implemented.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The principle and the implementation of the present application are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (11)

1. A file macro virus detection method is characterized by comprising the following steps:
determining a file to be detected in the obtained target files;
extracting macro code features in the file to be detected;
and determining whether the to-be-detected file carries macro viruses or not by utilizing a pre-trained detection model based on the macro code characteristics of the to-be-detected file.
2. The method according to claim 1, wherein the determining the files to be detected in the obtained target files comprises:
carrying out format recognition on the target file;
judging whether the target file is an office document file or not according to a format identification result;
if the target file is an office document file, judging whether the target file contains a macro code;
and if the target file contains the macro code, determining the target file as a file to be detected.
3. The method according to claim 1, wherein the extracting macro code features in the file to be detected comprises:
judging whether the to-be-detected file contains a macro code;
if the to-be-detected file contains the macro code, extracting the macro code from the to-be-detected file to obtain a macro code file;
and extracting macro code features in the macro code file.
4. The method according to claim 3, wherein the extracting macro codes from the file to be detected to obtain a macro code file comprises:
extracting macro codes line by line in the file to be detected by utilizing a macro code extraction tool;
and splicing the extracted macro codes to obtain a macro code file.
5. The method according to claim 1, wherein the determining whether the file to be detected carries macro viruses or not by using a pre-trained detection model based on the macro code features of the file to be detected comprises:
inputting the macro code characteristics of the file to be detected into a pre-trained detection model to obtain a predicted value;
and determining whether the file to be detected carries the macro virus or not according to the predicted value.
6. The method according to claim 5, wherein the determining whether the file to be detected carries the macro virus according to the predicted value comprises:
comparing the predicted value with a preset threshold value;
and judging whether the file to be detected carries the macro virus or not according to the comparison result.
7. The method according to any one of claims 1 to 6, wherein the detection model is obtained by pre-training by:
establishing a detection initial model;
acquiring a white file and a black file carrying macro codes;
respectively extracting macro code features in the white file and the black file;
and training the detection initial model based on the macro code characteristics of the white files and the black files to obtain the trained detection model.
8. The method of claim 7, wherein the macro-code features of the white and black files comprise at least one of:
information entropy, registry operation, file reading and writing, network request, shell creation, code conversion, automatic execution and character string confusion and splicing.
9. A file macro virus detection apparatus, comprising:
the file determining module is used for determining a file to be detected in the obtained target files;
the characteristic extraction module is used for extracting macro code characteristics in the file to be detected;
and the macro virus detection module is used for determining whether the to-be-detected file carries macro viruses or not by utilizing a pre-trained detection model based on the macro code characteristics of the to-be-detected file.
10. A file macro virus detection device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the file macro virus detection method according to any one of claims 1 to 8 when executing the computer program.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the file macro virus detection method according to any one of claims 1 to 8.
CN202010162723.7A 2020-03-10 2020-03-10 File macro virus detection method, device, equipment and storage medium Pending CN111400707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010162723.7A CN111400707A (en) 2020-03-10 2020-03-10 File macro virus detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010162723.7A CN111400707A (en) 2020-03-10 2020-03-10 File macro virus detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111400707A true CN111400707A (en) 2020-07-10

Family

ID=71428708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010162723.7A Pending CN111400707A (en) 2020-03-10 2020-03-10 File macro virus detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111400707A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580045A (en) * 2020-12-11 2021-03-30 杭州安恒信息技术股份有限公司 Method, device and medium for detecting malicious document based on macro encryption
CN112818347A (en) * 2021-02-22 2021-05-18 深信服科技股份有限公司 File label determination method, device, equipment and storage medium
CN113065132A (en) * 2021-03-25 2021-07-02 深信服科技股份有限公司 Confusion detection method and device for macro program, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841999A (en) * 2012-07-16 2012-12-26 北京奇虎科技有限公司 Method and device for detecting macro virus of files
CN103246847A (en) * 2013-05-13 2013-08-14 腾讯科技(深圳)有限公司 Method and device for scanning and killing macro viruses
CN109063482A (en) * 2018-08-09 2018-12-21 博彦科技股份有限公司 Macrovirus recognition methods, device, storage medium and processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841999A (en) * 2012-07-16 2012-12-26 北京奇虎科技有限公司 Method and device for detecting macro virus of files
CN103246847A (en) * 2013-05-13 2013-08-14 腾讯科技(深圳)有限公司 Method and device for scanning and killing macro viruses
WO2014183434A1 (en) * 2013-05-13 2014-11-20 Tencent Technology (Shenzhen) Company Limited Method and device for removing macro virus
CN109063482A (en) * 2018-08-09 2018-12-21 博彦科技股份有限公司 Macrovirus recognition methods, device, storage medium and processor

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580045A (en) * 2020-12-11 2021-03-30 杭州安恒信息技术股份有限公司 Method, device and medium for detecting malicious document based on macro encryption
CN112818347A (en) * 2021-02-22 2021-05-18 深信服科技股份有限公司 File label determination method, device, equipment and storage medium
CN112818347B (en) * 2021-02-22 2024-04-09 深信服科技股份有限公司 File tag determining method, device, equipment and storage medium
CN113065132A (en) * 2021-03-25 2021-07-02 深信服科技股份有限公司 Confusion detection method and device for macro program, electronic equipment and storage medium
CN113065132B (en) * 2021-03-25 2023-11-03 深信服科技股份有限公司 Method and device for detecting confusion of macro program, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104598824B (en) A kind of malware detection methods and device thereof
US10834128B1 (en) System and method for identifying phishing cyber-attacks through deep machine learning via a convolutional neural network (CNN) engine
CN111400707A (en) File macro virus detection method, device, equipment and storage medium
CN109344615B (en) Method and device for detecting malicious command
CN111917740B (en) Abnormal flow alarm log detection method, device, equipment and medium
CN111460446B (en) Malicious file detection method and device based on model
CN112417439A (en) Account detection method, device, server and storage medium
CN110795732A (en) SVM-based dynamic and static combination detection method for malicious codes of Android mobile network terminal
CN109992969B (en) Malicious file detection method and device and detection platform
CN113360912A (en) Malicious software detection method, device, equipment and storage medium
EP3905084A1 (en) Method and device for detecting malware
CN103810428B (en) Method and device for detecting macro virus
US10621343B1 (en) Generic and static detection of malware installation packages
US11797668B2 (en) Sample data generation apparatus, sample data generation method, and computer readable medium
CN112565308B (en) Malicious application detection method, device, equipment and medium based on network traffic
CN113904861B (en) Encryption traffic safety detection method and device
CN116303290A (en) Office document detection method, device, equipment and medium
CN116305113A (en) Executable file detection method, device, equipment and storage medium
CN110008987B (en) Method and device for testing robustness of classifier, terminal and storage medium
CN111464510A (en) Network real-time intrusion detection method based on rapid gradient lifting tree model
CN105468972B (en) A kind of mobile terminal document detection method
CN111382432A (en) Malicious software detection and classification model generation method and device
CN113378161A (en) Security detection method, device, equipment and storage medium
CN113141332B (en) Command injection identification method, system, equipment and computer storage medium
CN111723370A (en) Method and equipment for detecting malicious behavior of container

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200710

RJ01 Rejection of invention patent application after publication