CN114943078A - File identification method and device - Google Patents

File identification method and device Download PDF

Info

Publication number
CN114943078A
CN114943078A CN202210588323.1A CN202210588323A CN114943078A CN 114943078 A CN114943078 A CN 114943078A CN 202210588323 A CN202210588323 A CN 202210588323A CN 114943078 A CN114943078 A CN 114943078A
Authority
CN
China
Prior art keywords
file
identified
virus
hash
header
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210588323.1A
Other languages
Chinese (zh)
Inventor
郭玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co Ltd filed Critical New H3C Security Technologies Co Ltd
Priority to CN202210588323.1A priority Critical patent/CN114943078A/en
Publication of CN114943078A publication Critical patent/CN114943078A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/565Static detection by checking file integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Virology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a file identification method and device, and relates to the technical field of safety. The method comprises the following steps: carrying out file identification on the received file to be identified; when the file to be identified is a file of a set type, extracting a file header from the file to be identified; performing hash calculation on the target information in the file header to obtain a local hash value; matching the local hash value with a virus hash feature library, wherein the virus hash feature library comprises the characteristic hash value of the virus; and when the matching is successful, confirming that the file to be identified is a virus file. By adopting the method, the accuracy of the virus detection result is improved under the condition that a large amount of memory is not required to be occupied when the virus detection is carried out on the file passing through the network equipment.

Description

File identification method and device
Technical Field
The present application relates to the field of security technologies, and in particular, to a file identification method and apparatus.
Background
Deep Packet Inspection (DPI) Deep security is a security mechanism that detects and controls network traffic passing through network devices based on application layer information. In an increasingly complex network security threat, many malicious activities (e.g., worms, viruses, spam, bugs, etc.) are hidden in the application layer payload of data packets. The traditional security protection technology only depends on the security detection technology of a network layer and a transmission layer, and cannot meet the network security requirement. Therefore, the network device must have a DPI function to detect and control network application layer information, thereby ensuring the security of data content and improving the security of the network.
At present, the antivirus detection mainly detects virus files transmitted in a network in a mode string matching and full-text Hash mode. The pattern string matching mode is to detect malicious files by a static feature code scanning technology, but the method needs to configure large-scale feature code rules, so that more memories are occupied, and higher requirements are put forward on the memories of network equipment. The full-text hash mode has strong integrity on files, and if messages are out of order in the file transmission process, the hash calculation of the files is mistaken, so that the accuracy of the detection result of the virus files is influenced.
Therefore, how to improve the accuracy of the virus detection result is one of the considerable technical problems under the condition that a large amount of memory is not required to be occupied when the file passing through the network device is subjected to virus identification.
Disclosure of Invention
In view of this, the present application provides a file identification method and apparatus, so as to improve accuracy of a virus detection result without occupying more memory when performing virus detection on a file passing through a network device.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of the present application, there is provided a file identification method, including:
carrying out file identification on the received file to be identified;
when the file to be identified is a file of a set type, extracting a file header from the file to be identified;
performing hash calculation on the target information in the file header to obtain a local hash value;
matching the local hash value with a virus hash feature library, wherein the virus hash feature library comprises the characteristic hash value of the virus;
and when the matching is successful, confirming that the file to be identified is a virus file.
Optionally, the file to be identified includes an executable file under a Windows operating system; the file header comprises an image file header and an optional image header;
performing hash calculation on the target information in the file header to obtain a local hash value, including:
performing hash calculation on the mapping file header to obtain an intermediate hash value;
extracting target selectable image header information matched with the machine type code from the selectable image headers according to the machine type code;
and carrying out Hash calculation according to the intermediate Hash value and the target selectable image header information to obtain the local Hash value.
Optionally, the file identification method provided in this embodiment further includes:
and when the matching is not successful, performing pattern string feature matching on the file to be identified so as to identify whether the file to be identified is a virus file.
Optionally, the file identification method provided in this embodiment further includes:
and when the partial hash value is not successfully matched with the virus hash feature library or the file to be identified is not identified as a virus file when the pattern string feature of the file to be identified is matched, carrying out full-text hash calculation on the file to be identified to obtain a hash result when the file to be identified is a complete file, and identifying whether the file to be identified is a virus file according to the hash result.
Optionally, extracting a file header from the file to be identified includes:
and disassembling the file to be identified by using the file analysis plug-in corresponding to the set type so as to extract a file header in the file to be identified.
According to a second aspect of the present application, there is provided a document identification apparatus comprising:
the identification module is used for carrying out file identification on the received file to be identified;
the extraction module is used for extracting a file header from the file to be identified when the file to be identified is a file of a set type;
the hash calculation module is used for carrying out hash calculation on the target information in the file header to obtain a local hash value;
the first matching module is used for matching the local hash value with a virus hash feature library, wherein the virus hash feature library comprises the feature hash value of the virus;
and the confirming module is used for confirming that the file to be identified is a virus file when the matching result of the first matching module is that the matching is successful.
Optionally, the file to be identified includes an executable file under a Windows operating system; the file header comprises an image file header and an optional image header;
the hash calculation module is specifically configured to perform hash calculation on the image file header to obtain an intermediate hash value; extracting target selectable image header information matched with the machine type code from the selectable image headers according to the machine type code; and carrying out Hash calculation according to the intermediate Hash value and the target selectable image header information to obtain the local Hash value.
Optionally, the file identification apparatus provided in this embodiment further includes:
and the second matching module is used for performing pattern string feature matching on the file to be identified to identify whether the file to be identified is a virus file or not when the matching result of the first matching module is that the matching is not successful.
Optionally, the file identification apparatus provided in this embodiment further includes:
and the third matching module is used for carrying out full-text hash calculation on the file to be identified to obtain a hash result when the matching result of the first matching module is that the matching is not successful or the matching result of the second matching module is that the file to be identified is not identified as a virus file, and identifying whether the file to be identified is a virus file according to the hash result.
Optionally, the extracting module is specifically configured to disassemble the file to be identified by using the file parsing plug-in corresponding to the set type, so as to extract a file header in the file to be identified.
According to a third aspect of the present application, there is provided an electronic device comprising a processor and a machine-readable storage medium, the machine-readable storage medium storing a computer program executable by the processor, the processor being caused by the computer program to perform the method provided by the first aspect of the embodiments of the present application.
According to a fourth aspect of the present application, there is provided a machine-readable storage medium storing a computer program which, when invoked and executed by a processor, causes the processor to perform the method provided by the first aspect of the embodiments of the present application.
The beneficial effects of the embodiment of the application are as follows:
in the file identification method and device provided by the embodiment of the application, after the received file to be identified is identified, when the file to be identified is identified to be a file of a set type, a file header is extracted from the file to be identified; then, performing hash calculation on target information in the file header to obtain a local hash value; matching the local hash value with a virus hash feature library; and when the matching is successful, confirming that the file to be identified is a virus file. According to the embodiment, only the file header of the file to be identified needs to be subjected to local hash calculation, the local hash result obtained based on the local hash calculation can be matched with the virus hash feature library, so that whether the file to be identified is a virus file or not is identified, and therefore, when the file passing through the network equipment is subjected to virus detection, more memories do not need to be occupied, and the accuracy of the virus detection result is improved. In addition, the Hash calculation is not needed to be carried out on the basis of the whole file to be identified, so that the identification speed of file identification is improved.
Drawings
Fig. 1 is a schematic flowchart of a file identification method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a document identification device according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a hardware structure of an electronic device implementing a file identification method according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with aspects such as the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the corresponding listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The document identification method provided in the present application is explained in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a file identification method provided in the present application, where the method may be applied to a network security device, where the network security device may be, but is not limited to, a firewall and the like, and when the network security device implements the method, the method may include the following steps:
s101, carrying out file identification on the received file to be identified.
In this step, after the traffic enters the network security device, the file transmitted in the traffic is identified, and for convenience of description, the transmitted file may be referred to as the file to be identified.
Alternatively, before the file to be identified is identified, the application identification may be performed on the flow, and when the application of the set protocol is identified, step S101 is performed, that is, the file identification is performed on the file to be identified of the application conforming to the set protocol.
Specifically, for some scenarios that only need application control, when an application corresponding to a data stream is identified, the data stream representing the application is safe to a certain extent, and deep packet inspection processing is not required to be executed, so that packet processing performance is improved to a certain extent and packet processing time is saved. In this scenario, in order to further improve the security of the data stream entering the network, the present embodiment proposes to perform the flow shown in fig. 1 after identifying the application.
In addition, there may be a case where an application needs to be identified in other scenarios, and in order to meet the actual requirements of other scenarios, the application identification is performed first on the premise of changing the implementation flow of other scenarios as little as possible, and then the flow shown in fig. 1 is executed after the application identification.
It should be noted that the setting protocol may be, but is not limited to, http and FTP (document transfer protocol), etc.
S102, when the file to be identified is a file with a set type, extracting a file header from the file to be identified.
In this step, after the file type of the file to be identified is identified, since the proportion of the flow of some types of executable files in the flow is large, viruses generally invade the file, in order to ensure the security of the file and the security of the network, the file type of the file to be identified by the viruses is set in the application, when the file type of the file to be identified is the file type, it is indicated that the virus detection needs to be performed on the file to be identified, and the file header is extracted from the file to be identified.
S103, performing hash calculation on the target information in the file header to obtain a local hash value.
In this step, the target information is characteristic information for virus identification, that is, the characteristic information may be changed by a virus, and based on this, target information of the header is extracted, and then hash calculation is performed on the target information to obtain the local hash value.
And S104, matching the local hash value with a virus hash feature library.
The virus hash feature library comprises a feature hash value of the virus.
In this step, in order to identify whether a virus exists in the file, a virus feature library is configured in advance, and hash calculation is performed on the virus features of the currently existing virus, so as to obtain a feature hash value of the virus, and further generate a virus hash feature library. On this basis, when the local hash value is matched with the virus hash feature library, if the virus hash feature library comprises the local hash value, the matching of the local hash value and the virus hash feature library is confirmed to be successful, and step S105 is executed, namely the file to be identified is a virus file; and when the virus hash feature library does not comprise the partial hash value, confirming that the partial hash value is not matched with the virus hash feature library.
It should be noted that the virus hash feature library may be dynamically updated, and as the number of viruses increases, hash calculation is performed on the virus features of the newly added viruses to obtain the feature hash value of the newly added viruses, and then the feature hash value is updated into the virus hash feature library.
It should be noted that the virus hash feature library may further include a virus identifier of a virus, that is, the virus hash feature library records a correspondence between the virus identifier and a virus feature, so that when the local hash value is matched with the virus hash feature library, it may be recognized whether a file to be recognized has a virus, and also which kind of virus belongs to the file to be recognized. Specifically, the local hash value is confirmed in the virus hash feature library, so that the existence of the virus in the file to be recognized is further confirmed, and meanwhile, the virus identification of the virus in the file to be recognized can be determined based on the corresponding relation between the virus identification and the virus feature. Therefore, the accuracy of the virus identification result can be improved, and meanwhile, after the virus identification result is displayed to a user, the user can conveniently execute effective countermeasures based on the identified virus.
And S105, when the matching is successful, confirming that the file to be identified is a virus file.
By implementing the file identification method, after the received file to be identified is identified, when the file to be identified is identified to be the file with the set type, a file header is extracted from the file to be identified; then, performing hash calculation on target information in the file header to obtain a local hash value; matching the local hash value with a virus hash feature library; and when the matching is successful, confirming that the file to be identified is a virus file. According to the embodiment, only the file header of the file to be identified needs to be subjected to local hash calculation, the local hash result obtained based on the local hash calculation can be matched with the virus hash feature library, so that whether the file to be identified is a virus file or not is identified, and therefore, when the file passing through the network equipment is subjected to virus detection, more memories do not need to be occupied, and the accuracy of the virus detection result is improved. In addition, the Hash calculation is not needed to be carried out on the basis of the whole file to be identified, so that the identification speed of file identification is improved.
Alternatively, the file to be identified may be, but is not limited to, an executable file, an office file, a compressed file, and the like.
Alternatively, the setting type may be, but is not limited to, a PE executable file under a Windows operating system. On the basis, the file header comprises an image file header and an optional image header; based on this, step S103 may be performed according to the following procedure: performing hash calculation on the mapping file header to obtain an intermediate hash value; extracting target selectable image header information matched with the machine type code from the selectable image headers according to the machine type code; and performing hash calculation according to the intermediate hash value and the target optional image header information to obtain the local hash value.
Alternatively, step S102 may be performed according to the following procedure: and disassembling the file to be identified by using the file analysis plug-in corresponding to the set type so as to extract a file header in the file to be identified.
Specifically, the file to be identified may be disassembled by using a file parsing plug-in corresponding to the file to be identified, and when the identified file type of the file to be identified is a PE executable file of PE type, the mapping file header and the optional mapping header may be parsed when the PE executable file is parsed and identified by using the file parsing plug-in of the PE executable file.
On this basis, because the traffic of the network security device continuously enters the network security device, correspondingly, when the file to be identified is identified, the identification is also continuously performed, that is, according to the sequence of the contents in the header, the image file header is identified first, and when other contents of the file header of the file to be identified are subsequently received, the optional image header is continuously identified.
It should be noted that, because the bit widths of the machine type codes supported by the viruses are different, a target selectable image header for performing secondary hashing is selected according to the bit width of the machine type code, that is, after the image file header is parsed, the machine type code is parsed from the image file header, and on this basis, after the selectable image header is extracted, target selectable image header information consistent with the bit width of the parsed machine type code is extracted from the selectable image header, and then hash calculation is performed based on the intermediate hash value and the target selectable influence header information, so as to obtain a partial hash value, thereby adapting to the machine type code supported by the viruses, and further, based on the partial hash value, it is ready to identify whether the file to be identified is a virus file.
Optionally, based on any one of the above embodiments, in this embodiment, when the matching is not successful, pattern string feature matching is performed on the file to be identified, so as to identify whether the file to be identified is a virus file.
Specifically, when the matching is not successful based on the local hash value, it may be determined to some extent that the file to be identified is not a virus file, and in order to perform virus detection on the file to be identified more readedly, in this embodiment, pattern string feature matching is performed on the file to be identified, so as to further determine whether the file to be identified is a virus file, thereby further improving a virus identification result of the file to be identified.
It should be noted that the method based on pattern string feature matching may be implemented by referring to the currently provided method, and this embodiment does not limit this.
Further, when the partial hash value is not successfully matched with the virus hash feature library or the file to be identified is not identified as a virus file when the pattern string feature of the file to be identified is matched, full-text hash calculation is performed on the file to be identified when the file to be identified is a complete file to obtain a hash result, and whether the file to be identified is a virus file is identified according to the hash result.
Specifically, when the matching result based on the local hash value and the virus hash feature library is that the matching is not successful, in order to more accurately perform virus identification on the file to be identified, the embodiment provides that whether the current file to be identified is complete is determined, if the current file to be identified is a complete file, hash calculation is performed on the file to be identified to obtain a hash result, and then whether the file to be identified is a virus file is determined according to the hash result, so that the accuracy of the virus detection result of the file is further improved.
If the matching result of the pattern string feature matching is unsuccessful, it indicates that the matching result based on the local hash value and the pattern string feature matching is unsuccessful, in this embodiment, it is proposed to determine whether the current file to be identified is complete, and if the current file to be identified is a complete file, hash calculation is performed on the file to be identified to obtain a hash result, and then it is determined whether the file to be identified is a virus file according to the hash result, so as to further improve the accuracy of the virus detection result of the file.
It should be noted that virus identification based on a complete file to be identified can be implemented according to the method provided at present, and this embodiment does not limit this.
Optionally, when performing the hash calculation, the information digest MD5 algorithm or the like may be used, but not limited thereto.
It should be noted that the parsed image file header may include, but is not limited to, a Machine type code Machine, the number of segments in the file to be identified, the size of the optional image header, and the like.
When the File to be identified is a PE executable File, the extracted File header may be recorded as a PE File header, and on this basis, when the PE executable File is subjected to File disassembly by using the File parsing plug-in corresponding to the PE executable File, the image File header and the optional image header may be disassembled, and in addition, information such as a DOS header, a DOS STUB, a PE signature, and the like may be disassembled.
First, the contents of the above-mentioned image file header are introduced as follows:
the Machine type identification code, denoted as Machine, is a unique Machine code for each CPU, and may include, but is not limited to, 32 bits, 64 bits, and the like, for example, the Machine code compatible with a 32-bit Intel x86 chip is 14C.
The Number Of Sections in the file to be identified is recorded as Number Of Sections, which indicates the Number Of Sections existing in the file, i.e. the Number Of section Sections in the PE file, such as the Number Of section Sections in the PE section table, such as data, text, etc.
The Size Of the selectable image Header is marked as Size Of Optional Header, and the Size Of the architected selectable image Header PE Option Header.
The OPTIONAL IMAGE Header is denoted as PE Option Header, and its structure is IMAGE _ optical _ Header32, and the main members of the OPTIONAL IMAGE Header include 9, specifically:
and the Magic word, Magic, is used for indicating that the type of the file to be identified is PE type, 32 bits or 64 bits.
The Address Of Entry Point is the program Entry Address, which indicates the initial Address Of the code that the program first executes.
Image Base is a mapping Base address, the PE file maps the real address position in the memory space, and indicates the priority loading address of the file (the range of the 32-bit process virtual memory is 0-7 FFFFFFF).
Section Alignment is the memory Alignment granularity, i.e. the Alignment granularity when the PE file is mapped to the memory. File Alignment is the disk Alignment granularity, i.e. the Alignment granularity of a PE File when stored in a disk. The former sets the minimum unit of the section in the memory, and the latter sets the minimum unit of the section in the disk file.
Size Of Image: the total size of the Image of the PE file in the memory is specified, i.e. the size of the space occupied by the PE Image in the virtual memory.
Size Of heads, which is the Size Of the whole PE header, including the total Size Of DOS header + PE mark + standard PE header + optional PE header + section table.
Subsystem, which is a Subsystem used by the user interface, distinguishes system driver files from normal executable files.
Number Of rvanadandSize, which specifies the Number Of Data Directory arrays for the Number Of Directory entries.
Data Directory, an array consisting of a Data Directory table IMAGE _ Data _ direct structure.
It should be noted that a virus file may be determined according to information such as a program entry address, a node structure address, a timestamp, and a size of a space occupied by the PE image in the virtual memory.
By adopting the file identification method provided by any embodiment of the application, even if the file is not sensitive to local modification of the file, when a virus file is changed (such as code segment change or additional data change), the file header information of the file is fixed, so that similar variant viruses can be detected by the same rule, and the practicability and the universality of the file identification method provided by the application are improved. In addition, in order to reduce the false alarm rate, when a local hash rule is extracted by analyzing a mass of samples, rules which are easy to misreport such as PE (polyethylene) shell adding, packet adding and infection modes are eliminated, and the removed parts can be subjected to supplementary detection through full-text hash and pattern string feature matching, so that the accuracy of file identification is further improved.
Therefore, by adopting the file identification method provided by any embodiment of the application, the partial hash calculation method has a certain merging rate, and compared with the full-text hash algorithm, the partial hash algorithm can cover a large number of virus samples, and the use of a memory is reduced.
In addition, for the local hash matching method, because the information of the concerned PE file is at the head of the file, only the first plurality of bytes of the PE file need to be calculated and processed, the hash calculation and matching of the full text are not needed, or the AC matching processing is carried out on the whole file, so that the CPU utilization rate is greatly reduced, and meanwhile, the file identification performance is greatly improved.
Moreover, as the partial hash can be calculated for the first packet of most scenes, the identification rate can be improved by adopting any file identification method provided by the application under the condition that the scenes such as IP layer disorder, TCP disorder, application layer disorder and the like are not recombined under the condition that the messages are out of order.
Based on the same inventive concept, the application also provides a file identification device corresponding to the file identification method. The document identification apparatus may be implemented by referring to the above description of the document identification method, and will not be discussed one by one here.
Referring to fig. 2, fig. 2 is a file identification apparatus provided in a network security device according to an exemplary embodiment of the present application, where the apparatus includes:
the identification module 201 is used for identifying the received file to be identified;
the extracting module 202 is configured to extract a file header from the file to be identified when the file to be identified is a file of a set type;
the hash calculation module 203 is configured to perform hash calculation on the target information in the file header to obtain a local hash value;
a first matching module 204, configured to match the local hash value with a virus hash feature library, where the virus hash feature library includes a feature hash value of a virus;
a confirming module 205, configured to confirm that the file to be identified is a virus file when the matching result of the first matching module is that matching is successful.
Optionally, based on the foregoing embodiment, the file to be identified in this embodiment includes an executable file under a Windows operating system; the file header comprises an image file header and an optional image header;
on this basis, the hash calculation module 203 is specifically configured to perform hash calculation on the image file header to obtain an intermediate hash value; extracting target selectable image header information matched with the machine type code from the selectable image headers according to the machine type code; and carrying out Hash calculation according to the intermediate Hash value and the target selectable image header information to obtain the local Hash value.
Optionally, based on any one of the foregoing embodiments, the file identification apparatus provided in this embodiment further includes:
and a second matching module (not shown in the figure), configured to perform pattern string feature matching on the file to be identified to identify whether the file to be identified is a virus file when the matching result of the first matching module is that the matching is not successful.
Further, the message processing apparatus provided in this embodiment further includes:
a third matching module (not shown in the figure), configured to, when the matching result of the first matching module 204 is that the matching is not successful, or when the matching result of the second matching module (not shown in the figure) is that the file to be identified is not identified as a virus file, perform full-text hash calculation on the file to be identified when the file to be identified is an intact file, to obtain a hash result, and identify whether the file to be identified is a virus file according to the hash result.
Optionally, based on any one of the embodiments, the extracting module 202 is specifically configured to utilize a file parsing plug-in corresponding to the setting type to perform parsing processing on the file to be identified, so as to extract a file header in the file to be identified.
In the file identification device provided by any embodiment of the application, after the received file to be identified is identified, when the file to be identified is identified as a file of a set type, a file header is extracted from the file to be identified; then, carrying out hash calculation on the target information in the file header to obtain a local hash value; matching the local hash value with a virus hash feature library; and when the matching is successful, confirming that the file to be identified is a virus file. According to the embodiment, only the file header of the file to be identified needs to be subjected to local hash calculation, the local hash result obtained based on the local hash calculation can be matched with the virus hash feature library, so that whether the file to be identified is a virus file or not is identified, and therefore, when the file passing through the network equipment is subjected to virus detection, more memories do not need to be occupied, and the accuracy of the virus detection result is improved. In addition, the Hash calculation is not needed to be carried out on the basis of the whole file to be identified, so that the identification speed of file identification is improved.
Based on the same inventive concept, the embodiment of the present application provides an electronic device, which may be, but is not limited to, the network security device. As shown in fig. 3, the electronic device includes a processor 301 and a machine-readable storage medium 302, the machine-readable storage medium 302 stores a computer program capable of being executed by the processor 301, and the processor 301 is caused by the computer program to execute the file identification method provided in any embodiment of the present application. In addition, the electronic device further comprises a communication interface 303 and a communication bus 304, wherein the processor 301, the communication interface 303 and the machine-readable storage medium 302 are communicated with each other through the communication bus 304.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The machine-readable storage medium 302 may be a Memory, and the Memory may include a Random Access Memory (RAM), a DDR SRAM (Double Data Rate Synchronous Dynamic Random Access Memory), and may also include a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
As for the embodiments of the electronic device and the machine-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing method embodiments, the description is relatively simple, and reference may be made to the partial description of the method embodiments for relevant points.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The implementation process of the functions and actions of each unit/module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the units/modules described as separate parts may or may not be physically separate, and the parts displayed as units/modules may or may not be physical units/modules, may be located in one place, or may be distributed on a plurality of network units/modules. Some or all of the units/modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only a preferred embodiment of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for identifying a document, comprising:
carrying out file identification on the received file to be identified;
when the file to be identified is a file of a set type, extracting a file header from the file to be identified;
performing hash calculation on the target information in the file header to obtain a local hash value;
matching the local hash value with a virus hash feature library, wherein the virus hash feature library comprises the characteristic hash value of the virus;
and when the matching is successful, confirming that the file to be identified is a virus file.
2. The method of claim 1, wherein the file to be identified comprises an executable file under a Windows operating system; the file header comprises an image file header and an optional image header;
performing hash calculation on the target information in the file header to obtain a local hash value, including:
performing hash calculation on the mapping file header to obtain an intermediate hash value;
extracting target selectable image header information matched with the machine type code from the selectable image headers according to the machine type code;
and carrying out Hash calculation according to the intermediate Hash value and the target selectable image header information to obtain the local Hash value.
3. The method of claim 1, further comprising:
and when the matching is not successful, performing pattern string feature matching on the file to be identified so as to identify whether the file to be identified is a virus file.
4. The method of claim 3, further comprising:
and when the partial hash value is not successfully matched with the virus hash feature library or the file to be identified is not identified as a virus file when the pattern string feature of the file to be identified is matched, carrying out full-text hash calculation on the file to be identified to obtain a hash result when the file to be identified is a complete file, and identifying whether the file to be identified is a virus file according to the hash result.
5. The method of claim 1, wherein extracting a header from the file to be identified comprises:
and disassembling the file to be identified by using the file analysis plug-in corresponding to the set type so as to extract a file header in the file to be identified.
6. A document identification device, comprising:
the identification module is used for carrying out file identification on the received file to be identified;
the extraction module is used for extracting a file header from the file to be identified when the file to be identified is a file of a set type;
the hash calculation module is used for carrying out hash calculation on the target information in the file header to obtain a local hash value;
the first matching module is used for matching the local hash value with a virus hash feature library, wherein the virus hash feature library comprises the feature hash value of the virus;
and the confirming module is used for confirming that the file to be identified is a virus file when the matching result of the first matching module is that the matching is successful.
7. The apparatus of claim 6, wherein the file to be identified comprises an executable file under a Windows operating system; the file header comprises an image file header and an optional image header;
the hash calculation module is specifically configured to perform hash calculation on the image file header to obtain an intermediate hash value; extracting target selectable image header information matched with the machine type code from the selectable image headers according to the machine type code; and performing hash calculation according to the intermediate hash value and the target optional image header information to obtain the local hash value.
8. The apparatus of claim 6, further comprising:
and the second matching module is used for performing pattern string feature matching on the file to be identified when the matching result of the first matching module is that the matching is not successful so as to identify whether the file to be identified is a virus file.
9. The apparatus of claim 8, further comprising:
and the third matching module is used for carrying out full-text hash calculation on the file to be identified to obtain a hash result when the matching result of the first matching module is that the matching is not successful or the matching result of the second matching module is that the file to be identified is not identified as a virus file, and identifying whether the file to be identified is a virus file according to the hash result.
10. The apparatus of claim 6,
the extracting module is specifically configured to disassemble the file to be identified by using the file parsing plug-in corresponding to the set type, so as to extract a file header in the file to be identified.
CN202210588323.1A 2022-05-27 2022-05-27 File identification method and device Pending CN114943078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210588323.1A CN114943078A (en) 2022-05-27 2022-05-27 File identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210588323.1A CN114943078A (en) 2022-05-27 2022-05-27 File identification method and device

Publications (1)

Publication Number Publication Date
CN114943078A true CN114943078A (en) 2022-08-26

Family

ID=82910035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210588323.1A Pending CN114943078A (en) 2022-05-27 2022-05-27 File identification method and device

Country Status (1)

Country Link
CN (1) CN114943078A (en)

Similar Documents

Publication Publication Date Title
RU2680736C1 (en) Malware files in network traffic detection server and method
KR100862187B1 (en) A Method and a Device for Network-Based Internet Worm Detection With The Vulnerability Analysis and Attack Modeling
US8024804B2 (en) Correlation engine for detecting network attacks and detection method
CN109768992B (en) Webpage malicious scanning processing method and device, terminal device and readable storage medium
KR101874373B1 (en) A method and apparatus for detecting malicious scripts of obfuscated scripts
US8336098B2 (en) Method and apparatus for classifying harmful packet
JP5832951B2 (en) Attack determination device, attack determination method, and attack determination program
CN101848092A (en) Malicious code detection method and device
US8332941B2 (en) Exploit nonspecific host intrusion prevention/detection methods and systems and smart filters therefor
CN112822204A (en) NAT detection method, device, equipment and medium
US9306908B2 (en) Anti-malware system, method of processing packet in the same, and computing device
CN114785567A (en) Traffic identification method, device, equipment and medium
CN112202717B (en) HTTP request processing method and device, server and storage medium
CN113765849B (en) Abnormal network flow detection method and device
CN113890758A (en) Threat information method, device, equipment and computer storage medium
CN112583827A (en) Data leakage detection method and device
WO2024036822A1 (en) Method and apparatus for determining malicious domain name, device, and medium
CN109361674B (en) Bypass access streaming data detection method and device and electronic equipment
CN114943078A (en) File identification method and device
CN112953957B (en) Intrusion prevention method, system and related equipment
US20220070179A1 (en) Dynamic segmentation apparatus and method for preventing spread of security threat
CN115603985A (en) Intrusion detection method, electronic device and storage medium
CN108650249A (en) POC attack detection methods, device, computer equipment and storage medium
CN113886812A (en) Detection protection method, system, computer equipment and readable storage medium
CN114417349A (en) Attack result determination method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination