CN112883375A - Malicious file identification method, device, equipment and storage medium - Google Patents

Malicious file identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN112883375A
CN112883375A CN202110146065.7A CN202110146065A CN112883375A CN 112883375 A CN112883375 A CN 112883375A CN 202110146065 A CN202110146065 A CN 202110146065A CN 112883375 A CN112883375 A CN 112883375A
Authority
CN
China
Prior art keywords
file
sample
sample file
malicious
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110146065.7A
Other languages
Chinese (zh)
Inventor
刘彬彬
杨达明
李泽莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202110146065.7A priority Critical patent/CN112883375A/en
Publication of CN112883375A publication Critical patent/CN112883375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The invention belongs to the technical field of malicious file identification, and discloses a malicious file identification method, a malicious file identification device, malicious file identification equipment and a storage medium. The method comprises the following steps: acquiring a sample file, and performing feature extraction on the sample file to acquire a first feature of the sample file; obtaining a second feature subjected to dimensionality reduction processing according to the first feature based on a memory mapping mode; and carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file. Compared with the existing mode of loading all the characteristics of the sample file into the memory for characteristic dimension reduction and further identifying the malicious file, the method provided by the invention can effectively solve the problem of overhigh operating memory during malicious file identification.

Description

Malicious file identification method, device, equipment and storage medium
Technical Field
The present invention relates to the technical field of malicious file identification, and in particular, to a malicious file identification method, apparatus, device, and storage medium.
Background
When malicious file identification is carried out, dimension reduction operation is usually required to be carried out firstly, some algorithms are easy to perform in a poor mode on high-dimensional data, and the usability of the algorithms can be improved through dimension reduction. The dimension reduction can solve the multiple collinearity problem by deleting redundant features, the dimension reduction model generally has a more complex data structure, for example, an original data structure comprises first-order features, vector dimension information of the first-order features corresponding to second-order features, first-order feature quantity, second-order feature quantity and the like, and the vector dimension generally comprises information of dimension coordinates, weights and the like. The data structure formed by the method often needs a large memory for storage, and the problems of unsmooth computer card, unsmooth operation and the like can be caused when the memory occupancy is too high, so that how to optimize the memory becomes a problem to be solved urgently when dimension reduction operation is carried out.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a malicious file identification method, a malicious file identification device, malicious file identification equipment and a storage medium, and aims to solve the technical problem that memory occupation is too high when malicious file identification is carried out in the prior art.
In order to achieve the above object, the present invention provides a malicious file identification method, including the following steps:
acquiring a sample file, and performing feature extraction on the sample file to acquire a first feature of the sample file;
obtaining a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping mode;
and carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file.
In a possible embodiment, the obtaining the second feature subjected to the dimension reduction processing according to the first feature based on the memory mapping manner includes:
performing dimension reduction operation on the first feature based on a memory mapping mode to obtain a second feature;
in a possible embodiment, said performing artificial intelligence prediction on said sample file according to said second feature comprises:
extracting character string features of the sample file according to the second features;
and predicting the character string characteristics by using a preset prediction model so as to identify whether the sample file is a malicious file.
In a possible embodiment, the obtaining the second feature subjected to the dimension reduction processing according to the first feature based on the memory mapping manner includes:
acquiring a mapping relation table;
the mapping relation table comprises a mapping relation between the position pointers of the first characteristic and the second characteristic;
and determining the second characteristic through a memory mapping mode according to the position pointer of the second characteristic.
In a possible embodiment, the determining the second feature according to the location pointer of the second feature by a memory mapping method includes:
performing pointer offset operation on the position pointer of the second characteristic to obtain an offset position pointer;
and determining the second characteristic through a memory mapping mode based on the offset position pointer.
In a possible embodiment, after the step of performing artificial intelligence prediction on the sample file according to the second feature, the method further includes:
when the sample file is identified to be a malicious file, loading a family classification model to a memory;
and carrying out virus name classification on the sample files according to the family classification model to obtain a virus name classification result of the sample files. And when the family classification model is loaded into the memory, the family classification model is dynamically loaded into the memory when the sample file is identified as a malicious file, so that the occupation of the memory is reduced, and the technical problem of overhigh memory occupation when virus name classification is not carried out is solved.
In a possible embodiment, the step of performing virus name classification on the sample file according to the family classification model to obtain a virus name classification result of the sample file includes:
obtaining virus characteristics in the sample file according to the family classification model;
and carrying out virus name classification on the sample file according to the virus characteristics to obtain a virus name classification result of the sample file.
In a possible embodiment, after the step of obtaining the virus name classification result of the sample file, the method further includes:
and sending the virus name classification result to a target terminal so that the target terminal analyzes and displays the sample file and the corresponding virus name classification result.
In a possible embodiment, the sample is a portable executable file.
In addition, in order to achieve the above object, the present invention further provides a malicious file identification apparatus, which includes an obtaining module, a dimension reduction module, and an artificial intelligence prediction module;
the acquisition module is used for acquiring a sample file, and extracting the characteristics of the sample file to acquire the first characteristics of the sample file;
the dimension reduction module is used for obtaining a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping mode;
and the artificial intelligence prediction module is used for carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file.
In addition, in order to achieve the above object, the present invention further provides a malicious file identification apparatus, including: a memory, a processor and a malicious file identification program stored on the memory and executable on the processor, the malicious file identification program being configured to implement the steps of the malicious file identification method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium having a malicious file identification program stored thereon, where the malicious file identification program implements the steps of the malicious file identification method as described above when executed by a processor.
The method comprises the steps of obtaining a sample file, and extracting characteristics of the sample file to obtain first characteristics of the sample file; obtaining a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping mode; and carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file. Compared with the existing mode of loading all the characteristics of the sample file into the memory for characteristic dimension reduction and further identifying the malicious file, the method provided by the invention can effectively solve the problem of overhigh operating memory during malicious file identification.
Drawings
Fig. 1 is a schematic structural diagram of a malicious file identification device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a malicious file identification method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a malicious file identification method according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a malicious file identification method according to a third embodiment of the present invention;
FIG. 5 is a schematic diagram of a functional implementation flow of an embodiment of a malicious file identification method according to the present invention;
fig. 6 is a block diagram illustrating a first embodiment of a malicious file identification apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a malicious file identification device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the malicious file identification apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the malicious file identification apparatus, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and a malicious file identification program.
In the malicious file identification apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the malicious file identification device of the present invention may be disposed in the malicious file identification device, and the malicious file identification device invokes the malicious file identification program stored in the memory 1005 through the processor 1001 and executes the malicious file identification method provided by the embodiment of the present invention.
Based on the malicious file identification device, an embodiment of the present invention provides a malicious file identification method, and referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the malicious file identification method according to the present invention.
In this embodiment, the malicious file identification method includes the following steps:
step S10: and acquiring a sample file, and performing feature extraction on the sample file to acquire a first feature of the sample file.
It should be noted that the execution subject of the embodiment may be a computing service device with network communication and program running, such as a mobile phone, a tablet, a personal computer, and the like. The present embodiment and the following embodiments will be described below by taking an artificial intelligence engine as an example.
It should be understood that the sample file may be a portable executable file (PE) waiting for dimension reduction feature extraction or virus killing or classification, etc., the portable executable file may be an executable file on a microsoft operating system, for example, the common EXE, DLL, OCX, SYS, COM, etc. may be PE files, and the feature extraction may be converting any data (such as text or images) into digital features that can be used for machine learning. The feature extraction includes text feature extraction, dictionary feature extraction, and the like, the feature extraction on the sample file may be to extract a specific character string and the like from the sample file according to a format such as a file structure and the like according to a feature extraction method to form a first feature which can be used for analysis of an artificial intelligence engine, the first feature may be the first feature of the sample file extracted by the sample file according to the feature extraction method, and the first feature may be a byte statistical feature, a character string feature, a section information feature, an import/export function feature, a file information feature, and the like of the sample file.
Step S20: and obtaining a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping mode.
It should be noted that, the dimension reduction process may be to reduce the number of features in some dimensions, the obtained result is no correlation between features, and the dimension reduction mode may be feature selection or principal component analysis, etc. The memory mapping may be mapping a file or other objects to an address space of a process, so as to implement a one-to-one mapping relationship between a file disk address and a section of virtual address in the virtual address space of the process. After the mapping relation is realized, the process can read and write the memory section by using a pointer mode, and the system can automatically write back the dirty page to the corresponding file disk, namely, the operation on the file is completed without calling system calling functions such as reading and writing.
It should be understood that the dimension reduction model generally has a more complex data structure, for example, the original data structure is composed of first-order features, vector dimension information of the first-order features corresponding to the second-order features, a first-order feature quantity, a second-order feature quantity, and the like, and the vector dimension generally includes dimension coordinates, weights, and the like. The data structure formed by the method usually needs a large memory for storage, the existing characteristic dimension reduction needs to load all the data structures into the memory to work normally, at the moment, the occupied memory space is too high, the problems of unsmooth computer clamping, unsmooth operation and the like can be caused, and the embodiment enables the memory space to be read according to a pointer mode when the information for dimension reduction is called by the memory mapping mode according to the information originally needing to be stored into the memory for dimension reduction, so that the problem of the overhigh memory during the operation of the artificial intelligence engine is optimized.
Step S30: and carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file.
It should be noted that the artificial intelligence may be training under a large amount of historical data by using a machine learning algorithm, finding a certain rule from the historical data and making a prediction behavior in the future, and the performing artificial intelligence prediction on the sample file according to the second feature may be extracting a character string feature of the sample file according to the second feature, performing artificial intelligence prediction on the character string feature by using a preset prediction model, and obtaining a black-and-white attribute value of the sample file, where the preset prediction model may be an artificial intelligence model or another model capable of identifying the black-and-white attribute value of the sample file, when the black-and-white attribute value of the sample file is a black file, the black-and-white attribute value of the sample file is a malicious file, and obtaining the black-and-white attribute value of the sample file according to the artificial intelligence prediction model may use a file black-and-white determination method based on artificial intelligence in the prior art, or the like, the embodiment is not limited herein.
The embodiment acquires a sample file, and performs feature extraction on the sample file to acquire a first feature of the sample file; obtaining a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping mode; and performing artificial intelligence prediction on the sample file according to the second characteristic to identify whether the sample file is a malicious file, and compared with the existing mode of loading all the characteristics of the sample file into a memory to perform characteristic dimension reduction and further identifying the malicious file, the method of the embodiment can effectively solve the problem of overhigh running memory during malicious file identification.
Referring to fig. 3, fig. 3 is a flowchart illustrating a malicious file identification method according to a second embodiment of the present invention.
Based on the first embodiment described above, in the present embodiment, the step S20 includes:
step S201: and acquiring a mapping relation table, wherein the mapping relation table comprises the mapping relation between the position pointers of the first characteristic and the second characteristic.
It should be noted that the mapping relationship between the location pointers of the first feature and the second feature may be a mapping relationship between the location pointers of the first feature and the second feature determined according to a mapping relationship algorithm, the mapping relationship algorithm may optimize a data structure according to a feature dimension reduction model, and the data structure may be a data structure between the first feature and the second feature, so that the dimension reduction model can perform dynamic mapping and addressing through a memory mapping mode, and find a corresponding feature mapping dimension according to an address in a memory.
Step S202: and performing pointer offset operation on the position pointer with the second characteristic to obtain an offset position pointer.
Step S203: and determining the second feature to perform feature dimension reduction on the first feature according to the mapping relation in the memory by a memory mapping mode based on the shifted position pointer to obtain a dimension reduction feature.
It should be understood that, when performing dimension reduction processing, the mapping relationship may be stored in a memory, the determining, based on the offset position pointer, the second feature by a memory mapping manner may be reading memory data by a pointer manner, obtaining pointer information of the first feature according to the mapping relationship, performing pointer offset according to the pointer information of the first feature and calculating the mapping relationship between the first feature and the second feature, and further obtaining pointer information of the second feature after dimension reduction, that is, a data structure that needs dimension reduction may be reduced by a memory mapping manner through the pointer information and the mapping relationship, so that dimension reduction operation is flexibly and highly performed, and no corresponding dimension reduction data structure is stored in a user state of an operating system in memory usage. And further, the use of the internal memory of the artificial intelligence engine during the dimension reduction processing is reduced. The memory space is optimized.
The embodiment obtains a mapping relation table; the mapping relation table comprises a mapping relation between the position pointers of the first characteristic and the second characteristic; performing pointer offset operation on the position pointer of the second characteristic to obtain an offset position pointer; and determining the second characteristic through a memory mapping mode based on the offset position pointer. When the dimension reduction operation is carried out, the dynamic mapping and addressing are directly carried out in a pointer mode without storing information such as a data structure of the dimension reduction data and the like in the memory, corresponding characteristic information is found according to the pointer information in the memory, the dimension reduction operation is completed, and the use of the memory is optimized.
Referring to fig. 4, fig. 4 is a flowchart illustrating a malicious file identification method according to a third embodiment of the present invention.
Based on the foregoing embodiments, in this embodiment, after the step S30, the method further includes:
step S301: and loading the family classification model to a memory when the sample file is identified to be a malicious file.
It should be noted that the identifying that the sample file is a malicious file may be identifying a black-and-white attribute value of the sample file, when the black-and-white attribute value of the sample file is a black file, determining that the sample file is a malicious file, identifying that the black-and-white attribute value of the sample file may be obtaining a black-and-white classification model, performing black-and-white classification on the sample file according to the black-and-white classification model and the second feature of the sample file, determining whether the sample file is a black file, when the sample file is a black file, that is, the sample file is a malicious file, and when the sample file is a white file, indicating that the sample file has no computer virus.
It is understood that a Computer Virus (Computer Virus) is a set of Computer instructions or program code that a compiler inserts into a Computer program to destroy Computer functions or data, that affects Computer usage, that replicates itself and that has the ability to become transmissible, covert, infective, latent, excitable, expressive or destructive properties. Computer viruses are recognized as the first major enemy of data security, and since 1987, computer viruses are generally regarded as important worldwide, and China also discovers computer viruses for the first time in 1989. At present, the novel virus is developed in the forward direction in a direction of being more destructive, more secret, higher in infection rate, faster in propagation speed and the like. Therefore, the basic knowledge of computer viruses must be studied deeply to enhance the protection against computer viruses.
It should be noted that the family classification model may be a model that is created based on data analysis according to artificial intelligence prediction or neural network, and the like, and may classify the virus name of the sample file, the virus name may be a family feature of a virus, and is used to distinguish and identify a virus family, and the virus name classification, i.e., the family classification, may be a virus name that is obtained from a computer virus in the sample file according to characteristics of the computer virus.
It should be understood that the family classification model is only used to classify the virus names of the computer viruses in the sample files when the sample files are black files, i.e., malicious files, so that the use frequency of the family classification model in an actual use scene is low.
Step S302: and obtaining the virus characteristics in the sample files according to the family classification model, and performing virus name classification on the sample files according to the virus characteristics to obtain the virus name classification result of the sample files.
It should be noted that the virus characteristics may be characteristics of the computer virus or different influences on the operating system, and some virus programs delete files, encrypt data in a disk, and even destroy the entire system and data, so that the data cannot be recovered, thereby causing irreparable loss. The side effects of virus programs are less likely to reduce system operating efficiency, and more likely to result in system crash data loss. The expressiveness of a virus program represents the real intent of the virus designer. The virus name refers to a virus family characteristic, is used for distinguishing and identifying a virus family, and is used for classifying and naming viruses according to the characteristics of the viruses for convenient management, and the classification of the virus name of the sample file according to the characteristics of the viruses can be based on a link mode, an attack model, an attack mode and the like of the viruses, for example, when the characteristics of the viruses are that one or more new viruses are released from the body to a system directory during operation and the released new viruses are damaged, the viruses are named as virus planting program viruses; when the user clicks the virus, the virus can directly damage the computer of the user, and the virus is named as a destructive program virus and the like.
Step S303: and sending the virus name classification result to a target terminal so that the target terminal analyzes and displays the sample file and the corresponding virus name classification result.
It should be understood that the target terminal may be a visual display screen, and the target terminal analyzes and displays the sample file and the corresponding virus name classification result, so that the user can selectively view or kill the sample file and the corresponding virus name classification result.
Referring to fig. 5, fig. 5 is a schematic functional implementation flow diagram of an embodiment of the malicious file identification method according to the present invention.
As shown in fig. 5, an artificial intelligence engine first obtains a sample file, performs feature extraction on the sample file to obtain a first feature of the sample file, performs feature dimension reduction on the first feature based on a memory mapping manner to obtain a second feature of the sample file, obtains a black-and-white classification model, performs black-and-white classification on the sample file according to the black-and-white classification model and the second feature of the sample file, determines whether the sample file is a black file, dynamically loads a family classification model when the sample file is a black file, that is, the sample file is a malicious file, performs virus name classification on the sample file according to the family classification model to obtain a virus name of the sample file, and does not process the sample file when the sample file is not a black file, that is, the sample file is a white file, family classification models do not need to be loaded, thereby further reducing memory usage.
According to the embodiment, artificial intelligence prediction is carried out on the sample file according to the second characteristic, a black and white attribute value of the sample file is obtained, and a family classification model is dynamically loaded; obtaining virus characteristics in the sample file according to the family classification model; and carrying out virus name classification on the sample file according to the virus characteristics to obtain a virus name classification result of the sample file. The family classification model is used for classifying the virus names of the computer viruses in the sample files only when the sample files are black files, so that the use frequency of the family classification model in an actual use scene is low.
Referring to fig. 6, fig. 6 is a block diagram of the first embodiment of the apparatus of the present invention.
As shown in fig. 6, the malicious file identification apparatus provided in the embodiment of the present invention includes an obtaining module 10, a dimension reduction module 20, and an artificial intelligence prediction module 30;
the obtaining module 10 is configured to obtain a sample file, perform feature extraction on the sample file, and obtain a first feature of the sample file;
the dimension reduction module 20 is configured to obtain a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping manner;
the artificial intelligence prediction module 30 is configured to perform artificial intelligence prediction on the sample file according to the second characteristic, so as to identify whether the sample file is a malicious file.
In the embodiment, a sample file is obtained, and feature extraction is performed on the sample file to obtain a first feature of the sample file; obtaining a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping mode; and carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file. Compared with the existing mode of loading all the characteristics of the sample file into the memory for characteristic dimension reduction and further identifying the malicious file, the method can effectively solve the problem of overhigh operating memory during malicious file identification.
It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the parameter operation method provided in any embodiment of the present invention, and are not described herein again.
A second embodiment of the device according to the invention is proposed on the basis of the first embodiment of the device according to the invention described above.
In this embodiment, the dimension reduction module 20 is further configured to perform a dimension reduction operation on the first feature based on a memory mapping manner to obtain the second feature.
Further, the artificial intelligence prediction module 30 is further configured to extract a character string feature of the sample file according to the second feature; and predicting the character string characteristics by using a preset prediction model so as to identify whether the sample file is a malicious file.
Further, the dimension reduction module 20 is further configured to obtain a mapping relationship table; the mapping relation table comprises a mapping relation between the position pointers of the first characteristic and the second characteristic; and determining the second characteristic through a memory mapping mode according to the position pointer of the second characteristic.
Further, the dimension reduction module 20 is further configured to perform pointer offset operation on the location pointer of the second feature to obtain an offset location pointer; and determining the second characteristic through a memory mapping mode based on the offset position pointer.
Further, the malicious file identification device further comprises a virus name classification module, wherein the virus name classification module is used for loading a family classification model to a memory when the sample file is identified to be a malicious file; and carrying out virus name classification on the sample files according to the family classification model to obtain a virus name classification result of the sample files.
Further, the virus name classification module obtains virus characteristics in the sample file according to the family classification model; and carrying out virus name classification on the sample file according to the virus characteristics to obtain a virus name classification result of the sample file.
Further, the virus name classification module is further configured to send the virus name classification result to a target terminal, so that the target terminal analyzes and displays the sample file and the corresponding virus name classification result.
Other embodiments or specific implementation manners of the malicious file identification apparatus of the present invention may refer to the above method embodiments, and are not described herein again.
In addition, an embodiment of the present invention further provides a storage medium, where a malicious file identification program is stored on the storage medium, and when executed by a processor, the malicious file identification program implements the steps of the malicious file identification method described above.
In addition, an embodiment of the present invention further provides a malicious file identification device, where the malicious file identification device includes: the malicious file identification method comprises a memory, a processor and a malicious file identification program which is stored on the memory and can run on the processor, wherein the malicious file identification program realizes the steps of the malicious file identification method when being executed by the processor.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., a rom/ram, a magnetic disk, an optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (12)

1. A malicious file identification method is characterized by comprising the following steps:
acquiring a sample file, and performing feature extraction on the sample file to acquire a first feature of the sample file;
based on a memory mapping mode, according to the first characteristic, obtaining a second characteristic subjected to dimensionality reduction;
and carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file.
2. The method for identifying malicious files according to claim 1, wherein the obtaining of the second feature subjected to the dimension reduction processing according to the first feature based on the memory mapping manner includes:
and performing dimension reduction operation on the first characteristic based on a memory mapping mode to obtain the second characteristic.
3. The malicious file identification method according to claim 1, wherein the artificial intelligence predicting the sample file according to the second feature comprises:
extracting character string features of the sample file according to the second features;
and predicting the character string characteristics by using a preset prediction model so as to identify whether the sample file is a malicious file.
4. The method for identifying malicious files according to claim 1, wherein the obtaining of the second feature subjected to the dimension reduction processing according to the first feature based on the memory mapping manner includes:
acquiring a mapping relation table; the mapping relation table comprises a mapping relation between position pointers of the first characteristic and the second characteristic;
and determining the second characteristic through a memory mapping mode according to the position pointer of the second characteristic.
5. The method according to claim 4, wherein the determining the second characteristic by a memory mapping manner according to the location pointer of the second characteristic comprises:
performing pointer offset operation on the position pointer of the second characteristic to obtain an offset position pointer;
and determining the second characteristic through a memory mapping mode based on the offset position pointer.
6. The method according to claim 1, wherein after the step of performing artificial intelligence prediction on the sample file according to the second feature to identify whether the sample file is a malicious file, the method further comprises:
when the sample file is identified to be a malicious file, loading a family classification model to a memory;
and carrying out virus name classification on the sample files according to the family classification model to obtain a virus name classification result of the sample files.
7. The method for identifying malicious files according to claim 6, wherein the step of classifying the virus names of the sample files according to the family classification model to obtain the classification result of the virus names of the sample files comprises:
obtaining virus characteristics in the sample file according to the family classification model;
and carrying out virus name classification on the sample file according to the virus characteristics to obtain a virus name classification result of the sample file.
8. The malicious file identification method according to claim 7, wherein the step of obtaining the virus name classification result of the sample file is followed by further comprising:
and sending the virus name classification result to a target terminal so that the target terminal analyzes and displays the sample file and the corresponding virus name classification result.
9. The malicious file identification method according to any one of claims 1 to 8, wherein the sample file is a portable executable file.
10. An apparatus, comprising an acquisition module, a dimension reduction module, and an artificial intelligence prediction module;
the acquisition module is used for acquiring a sample file, and extracting the characteristics of the sample file to acquire the first characteristics of the sample file;
the dimension reduction module is used for obtaining a second feature subjected to dimension reduction processing according to the first feature based on a memory mapping mode;
and the artificial intelligence prediction module is used for carrying out artificial intelligence prediction on the sample file according to the second characteristic so as to identify whether the sample file is a malicious file.
11. An apparatus, characterized in that the apparatus comprises: a memory, a processor and a malicious file identification program stored on the memory and executable on the processor, the malicious file identification program being configured to implement the steps of the malicious file identification method according to any of claims 1 to 9.
12. A storage medium having stored thereon a malicious file identification program which, when executed by a processor, implements the steps of the malicious file identification method according to any one of claims 1 to 9.
CN202110146065.7A 2021-02-03 2021-02-03 Malicious file identification method, device, equipment and storage medium Pending CN112883375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110146065.7A CN112883375A (en) 2021-02-03 2021-02-03 Malicious file identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110146065.7A CN112883375A (en) 2021-02-03 2021-02-03 Malicious file identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112883375A true CN112883375A (en) 2021-06-01

Family

ID=76056786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110146065.7A Pending CN112883375A (en) 2021-02-03 2021-02-03 Malicious file identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112883375A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093994A (en) * 2023-09-18 2023-11-21 卫士通(广州)信息安全技术有限公司 Suspected virus file analysis method, system, equipment and storable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359439A (en) * 2018-10-26 2019-02-19 北京天融信网络安全技术有限公司 Software detecting method, device, equipment and storage medium
CN110444193A (en) * 2018-01-31 2019-11-12 腾讯科技(深圳)有限公司 The recognition methods of voice keyword and device
CN110619213A (en) * 2018-06-20 2019-12-27 深信服科技股份有限公司 Malicious software identification method, system and related device based on multi-model features
CN111190893A (en) * 2018-11-15 2020-05-22 华为技术有限公司 Method and device for establishing feature index
CN112231696A (en) * 2020-10-30 2021-01-15 奇安信科技集团股份有限公司 Malicious sample identification method and device, computing equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444193A (en) * 2018-01-31 2019-11-12 腾讯科技(深圳)有限公司 The recognition methods of voice keyword and device
CN110619213A (en) * 2018-06-20 2019-12-27 深信服科技股份有限公司 Malicious software identification method, system and related device based on multi-model features
CN109359439A (en) * 2018-10-26 2019-02-19 北京天融信网络安全技术有限公司 Software detecting method, device, equipment and storage medium
CN111190893A (en) * 2018-11-15 2020-05-22 华为技术有限公司 Method and device for establishing feature index
CN112231696A (en) * 2020-10-30 2021-01-15 奇安信科技集团股份有限公司 Malicious sample identification method and device, computing equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093994A (en) * 2023-09-18 2023-11-21 卫士通(广州)信息安全技术有限公司 Suspected virus file analysis method, system, equipment and storable medium

Similar Documents

Publication Publication Date Title
US10867038B2 (en) System and method of detecting malicious files with the use of elements of static analysis
US9239922B1 (en) Document exploit detection using baseline comparison
CN107688743B (en) Malicious program detection and analysis method and system
US8256000B1 (en) Method and system for identifying icons
CN113268768B (en) Desensitization method, device, equipment and medium for sensitive data
CN113127125B (en) Page automatic adaptation method, device, equipment and storage medium
CN111435391A (en) Method and apparatus for automatically determining interactive GUI elements to be interacted with in GUI
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
CN114693192A (en) Wind control decision method and device, computer equipment and storage medium
CN111597553A (en) Process processing method, device, equipment and storage medium in virus searching and killing
CN103679027A (en) Searching and killing method and device for kernel level malware
CN111448552A (en) Observation and classification of device events
CN110543756B (en) Device identification method and device, storage medium and electronic device
US20200210382A1 (en) System and method of deletion of files and counteracting their restoration
CN112883375A (en) Malicious file identification method, device, equipment and storage medium
CN111488574B (en) Malicious software classification method, system, computer equipment and storage medium
CN111881446A (en) Method and device for identifying malicious codes of industrial internet
CN115495737A (en) Malicious program invalidation method, device, equipment and storage medium
WO2022041714A1 (en) Document processing method and apparatus, electronic device, storage medium, and program
CN114579965A (en) Malicious code detection method and device and computer readable storage medium
CN114461833A (en) Picture evidence obtaining method and device, computer equipment and storage medium
EP3674876B1 (en) System and method of deletion of files and counteracting their restoration
CN111262818B (en) Virus detection method, system, device, equipment and storage medium
CN111240696A (en) Method for extracting similar modules of mobile malicious program
KR102662965B1 (en) Apparatus and method for detecting ai based malignant code in structured document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination