CN116401661A - File detection method, device, equipment and storage medium - Google Patents

File detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN116401661A
CN116401661A CN202310300540.0A CN202310300540A CN116401661A CN 116401661 A CN116401661 A CN 116401661A CN 202310300540 A CN202310300540 A CN 202310300540A CN 116401661 A CN116401661 A CN 116401661A
Authority
CN
China
Prior art keywords
file
identified
hash
import table
virus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310300540.0A
Other languages
Chinese (zh)
Inventor
郭玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co Ltd filed Critical New H3C Security Technologies Co Ltd
Priority to CN202310300540.0A priority Critical patent/CN116401661A/en
Publication of CN116401661A publication Critical patent/CN116401661A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a file detection method, device, equipment and storage medium, and relates to the technical field of security. The method comprises the following steps: carrying out file identification on the file to be identified; when the file to be identified is a file of a set type, extracting import table information from the file to be identified; carrying out hash calculation on the information of the import table to obtain a hash value of the import table; matching the import table hash value with a virus hash feature library, wherein the virus hash feature library comprises import table hash values of virus files; and when the matching is successful, confirming that the file to be identified is a virus file. Therefore, the accuracy of the virus detection result is improved under the condition that more memory is not occupied when the file is subjected to virus detection.

Description

File detection method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of security technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a file.
Background
DPI (Deep Packet Inspection ) deep security is a security mechanism that detects and controls network traffic through network devices based on application layer information. In increasingly complex network security threats, many malicious acts (e.g., worms, spam, vulnerabilities, etc.) are hidden in the application layer payload of the data message. Traditional security protection technology only depends on the security detection technology of a network layer and a transmission layer, and cannot meet the network security requirement. Therefore, the network device must have a DPI function to detect and control information of the network application layer, so as to ensure the security of the data content and improve the security of the network.
At present, virus prevention detection mainly detects virus files transmitted in a network through pattern string matching and full-text hash modes. The mode string matching mode is to detect malicious files through a static feature code scanning technology, but the method needs to configure large-scale feature code rules, so that more memory is occupied, and higher requirements are put on the memory of network equipment. The full text hash mode has stronger integrity to the file, and if the message is disordered in the file transmission process, the file hash calculation is wrong, so that the accuracy of the virus file detection result is affected.
Therefore, how to improve the accuracy of the detection result of the virus file is one of the technical problems to be considered under the condition that more memory is not required to be occupied when the virus identification is performed on the file.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus, a device, and a storage medium for detecting a file, so as to improve accuracy of a virus detection result without occupying more memory when detecting a virus for the file.
Specifically, the application is realized by the following technical scheme:
according to a first aspect of the present application, there is provided a document detection method, including:
carrying out file identification on the file to be identified;
when the file to be identified is a file of a set type, extracting import table information from the file to be identified;
carrying out hash calculation on the information of the import table to obtain a hash value of the import table;
matching the import table hash value with a virus hash feature library, wherein the virus hash feature library comprises import table hash values of virus files;
and when the matching is successful, confirming that the file to be identified is a virus file.
According to a second aspect of the present application, there is provided a document detection apparatus comprising:
the identification module is used for carrying out file identification on the file to be identified;
the extraction module is used for extracting import table information from the file to be identified when the file to be identified is a file of a set type;
the hash calculation module is used for carrying out hash calculation on the information of the import table to obtain a hash value of the import table;
the first matching module is used for matching the import table hash value with a virus hash feature library, wherein the virus hash feature library comprises the import table hash value of a virus file;
and the confirming module is used for confirming that the file to be identified is a virus file when the matching result of the first matching module is that the matching is successful.
According to a third aspect of the present application there is provided an electronic device comprising a processor and a machine-readable storage medium storing a computer program executable by the processor, the processor being caused by the computer program to perform the method provided by the first aspect of the embodiments of the present application.
According to a fourth aspect of the present application there is provided a machine-readable storage medium storing a computer program which, when invoked and executed by a processor, causes the processor to perform the method provided by the first aspect of the embodiments of the present application.
The beneficial effects of the embodiment of the application are that:
in the method, the device, the equipment and the storage medium for detecting the file provided by the embodiment of the application, the file to be identified is identified; when the file to be identified is a file of a set type, extracting import table information from the file to be identified; carrying out hash calculation on the information of the import table to obtain a hash value of the import table; matching the import table hash value with a virus hash feature library, wherein the virus hash feature library comprises import table hash values of virus files; and when the matching is successful, confirming that the file to be identified is a virus file. Therefore, the file to be identified is subjected to virus detection through the import table of the file to be identified, and only the import table of the file is concerned, so that more memory is not required to be occupied, and the accuracy of a virus detection result is improved. In addition, since hash calculation is not needed based on the whole file to be identified, the file detection speed is improved, the CPU utilization rate is reduced, and the file detection performance is improved.
Drawings
Fig. 1 is a schematic flow chart of a file detection method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an import table according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a document detection apparatus according to an embodiment of the present application;
fig. 4 is a schematic hardware structure of an electronic device for implementing a file detection method according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with aspects as described herein.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the corresponding listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The document detection method provided in the present application is described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a file detection method provided in the present application, where the method may be applied to a network security device, and the network security device may be, but is not limited to, a firewall, etc. an embodiment of the method is described by taking a network security device as an example, and when the network security device is implementing the file detection method, the method may include the following steps:
s101, carrying out file identification on the file to be identified.
In this step, after the traffic enters the network security device, the file transmitted in the traffic is identified, and for convenience of description, the transmitted file may be referred to as the file to be identified.
S102, when the file to be identified is a file of a set type, extracting import table information from the file to be identified.
In this step, after the file type of the file to be identified is identified, since the ratio of the executable files of some types in the traffic is very large, viruses generally invade into such files, in order to ensure the security of the file and the security of the network, the file type of the file to be identified by viruses is set, and when the file type of the file to be identified is the file type, it is indicated that virus detection needs to be performed on the file to be identified, and then the import table information is extracted from the file to be identified.
It should be noted that the file of the above setting type may be, but not limited to, a PE file, which is an executable file used under the Windows operating system. Because the PE executable file has a high duty ratio in the current network flow, the application provides for virus detection of the PE file. On this basis, the import table mechanism is a mechanism that the PE file imports the API from other third party programs for calling. In the field of antivirus, import tables are important for an application. In general, the general behavior of the program can be guessed from the import table. All API functions provided by the system under the Windows platform are done using import and export tables, so if some functions of the system are called in the application, this information will be represented in the import table. That is, for each piece of software (malware), its import table hash is unique because the compiler formulates an import address table (Import Address Table, IAT) based on the order in which each function appears in the source code.
S103, carrying out hash calculation on the information of the import table to obtain a hash value of the import table.
In this step, since the virus file can be identified based on the import table, hash calculation can be performed on the import table information after the import table information is extracted, thereby obtaining the import table hash value.
S104, matching the hash value of the import table with a virus hash feature library.
The virus hash characteristic library comprises a hash value of an import table of a virus file. Specifically, the import table hash value of the virus file may be determined in advance based on the import table information of the virus file that has been currently determined, and then a virus hash feature library may be constructed based on the determination of each virus file and the corresponding import table hash value.
On the basis, in order to identify whether viruses exist in the files, the network security equipment can carry out matching processing on the import table hash value of the files to be identified and the import table hash value of the virus files in the virus hash feature library after obtaining the import table hash value of the files to be identified, so that whether the import table hash value of the files to be identified is contained in the virus hash feature library is determined. If it is confirmed that the virus hash feature library includes the import table hash value, then it is confirmed that the import table hash value is successfully matched with the virus hash feature library, and step S105 is executed, namely, it is confirmed that the file to be identified is a virus file; and when the virus hash feature library does not comprise the import table hash value, confirming that the import table hash value is not matched with the virus hash feature library.
It is noted that the virus hash feature library can be dynamically updated, and as viruses increase, hash calculation is performed on the information of the import table of the newly added virus file to obtain the feature hash value of the newly added viruses, and then the feature hash value is updated into the virus hash feature library.
It should be noted that the virus hash feature library may further include a virus identifier of a virus file, that is, a correspondence between the virus identifier and a virus feature is recorded in the virus hash feature library, so that when the hash value of the import table is matched with the virus hash feature library, it can be identified whether a virus exists in the file to be identified, and also identify which virus belongs to. Specifically, when the hash value of the import table is confirmed to be in the virus hash feature library, viruses in the files to be identified are confirmed, and meanwhile, the virus identification of the viruses in the files to be identified can be confirmed based on the corresponding relation between the virus identification and the virus features. Therefore, the accuracy of the virus identification result can be improved, and the user can conveniently execute effective countermeasure based on the identified virus after the virus identification result is displayed to the user.
And S105, when the matching is successful, confirming that the file to be identified is a virus file.
Therefore, the virus detection of the file to be identified based on the import table is realized, and the hash calculation is carried out only by extracting the information of the import table from the file to be identified, and the full-text hash calculation is not required to be executed, so that the file detection efficiency is greatly improved, and meanwhile, the memory is saved.
In the file detection method, the file to be identified is identified; when the file to be identified is a file of a set type, extracting import table information from the file to be identified; carrying out hash calculation on the information of the import table to obtain a hash value of the import table; matching the import table hash value with a virus hash feature library, wherein the virus hash feature library comprises import table hash values of virus files; and when the matching is successful, confirming that the file to be identified is a virus file. Therefore, the file to be identified is subjected to virus detection through the import table of the file to be identified, and only the import table of the file is concerned, so that more memory is not required to be occupied, and the accuracy of a virus detection result is improved. In addition, since hash calculation is not needed based on the whole file to be identified, the file detection speed is improved, the CPU utilization rate is reduced, and the file detection performance is improved.
Optionally, the import table information includes a library identifier of each dynamic link library and an import table function identifier corresponding to the dynamic link library. On this basis, step S103 may be performed according to the following procedure: and carrying out hash calculation processing on each library identifier and each corresponding import table function identifier to obtain an import table hash value.
Specifically, the import table is a collection of dynamic connection libraries used in the record PE file, and a dll library occupies a position of element information in the import table, wherein the element describes specific information of the import dll, such as the latest modification time of the dll, the name of a function in the dll, a serial number, a function address after the dll is loaded, and the like.
On this basis, in practical application, the file to be identified may include at least one dynamic link library dll, and then the library identifier of each dynamic link library and the identifier of the import table function corresponding to the dynamic link library are recorded in the import table, that is, the above-mentioned import table function identifier. Therefore, in the embodiment, when performing hash calculation, the library identifier of each dynamic link library and the corresponding import table function identifier can be extracted, and then hash calculation is performed together, so as to obtain the import table hash value.
Further, the step of performing hash calculation processing on each library identifier and each corresponding import table function identifier to obtain an import table hash value may be performed according to the following procedure: removing the suffix of the library identifier aiming at the library identifier of each dynamic link library to obtain a modified library identifier; splicing the modified library identifier and the imported table function identifier corresponding to the dynamic link library to obtain a first spliced character of the dynamic link library; performing splicing processing on the first spliced characters of each dynamic link library to obtain second spliced characters; and carrying out information abstraction algorithm MD5 calculation processing on the second spliced character to obtain an import table hash value.
Specifically, description will be made taking a library identification of a dynamically linked library as a library name and an import table function identification as an import table name. When the import table information includes 2 dynamic link libraries, namely, kernell 32.Dll and kernell 33.Dll, and the import table function name corresponding to kernell 32.Dll includes GetCommandLineA, getVersion and the import table function name corresponding to kernell 33.Dll includes GetCommandLineB, setEnvironmentVariableB, when hash calculation is performed on the two library names and the corresponding import table function name, the following rule is processed: first, the suffixes of kernel32.Dll and kernel33.Dll are removed, and the resulting modified library identifications are denoted as kernel32 and kernel33. Then, splicing the kernel32 and the function name of the corresponding import table, and separating the kernel by characters; meanwhile, the names of the imported table functions are separated by characters, and the spliced character strings are converted into lowercase letters, so that a first spliced character of kernel32.Dll is obtained, and the obtained first spliced character can be expressed as kernel32.Getcommand line and kernel32.Getversion. Similarly, the first concatenated character of kernel33 may be represented as: getcomandlineb, kernel33. Setavimentvariabab. Further, the two first concatenated characters are subjected to a second concatenation process, with the two first concatenated characters being separated by a character, so that a second concatenated character can be obtained, which is denoted as kernel32. Getommandline, kernel32.Getversion, kernel33. Getommandline b, kernel33. Setvmentvariabab. Further, the hash calculation processing of MD5 can be performed on the obtained second concatenation character, thereby obtaining the import table hash value.
For better understanding of the import table, the structure of the import table shown in fig. 2 is taken as an example, and based on this, the dynamic link library name and the import table function name can be obtained as follows. First, the dynamic link library is stored in the location indicated by the Name field in the import table shown in fig. 2, and the import table function Name is stored in the Name field indicated by the image_root_data in the import table shown in fig. 2. Based on this, when extracting the dynamically linked library name and the import table function name from the import table, the following can be made: because the virtual address points to the origin first thunder, the Name can be found according to the virtual address and the offset of the Name from the origin first thunder, and then the dynamic link library Name KERNEL32.Dll of the dynamic link library pointed to by the Name can be determined. In addition, the origin first thunder points to the import function name table INT, and the import function name table INT records the import table function name corresponding to the dynamic link library keyel32. Dll, so based on this, the import table function name corresponding to the dynamic link library keyel32. Dll can be found based on the virtual address, such as GetCommandLineA, getVersion pointed to by the image_foundation_data in the INT shown in fig. 2 is the import table function name. Based on the method, hash calculation processing can be performed based on the library name of the dynamic link library and the import table function, so that an import table hash value is obtained, and further, the virus hash feature library is matched based on the import table hash value so as to confirm whether the virus hash feature library is hit or not, and when the virus hash feature library is hit, the file to be identified is a virus file.
The structure of one element of the import table, i.e., one structure, and the array of the structure of the import table is as follows:
Figure BDA0004145709070000081
the image_trunk_data structure is summarized BY only one complex, and generally uses four bytes Of Address Of DATA to obtain the Address Of image_impurt_by_name, as follows:
Figure BDA0004145709070000091
it should be noted that an image_stunk_data32 structure occupies four bytes, indexing a function NAME or sequence number, but the index is conditional, i.e. the value of the four bytes is RVA of image_impurt_by_name if the highest bit of the four bytes is 0; however, if the most significant bit of the four bytes is 1, the most significant bit is directly removed without indexing image_impurt_by_name with the value, and the remaining 31-bit value is the export sequence number of the IMPORT table function in the export table.
Based on this, by adopting the above method, for some shelled samples, small tools and programs (because of few import functions, the compilation may be the same), their import table hash values may be the same. Thus, the same hash value can be used for detecting similar variant viruses, so that the situation that a large number of hash values need to exist in the virus Ha Xiku is avoided. Meanwhile, in order to reduce the false alarm rate, when analyzing the hash rule of the extraction and import table of a mass sample, the rule which is easy to be misreported such as PE (polyethylene) shell adding, packing, infection and the like can be removed, and then the removed part is subjected to complementary detection through full text hash and pattern string feature rules.
Optionally, based on any one of the foregoing embodiments, in this embodiment, in order to better ensure accuracy of a virus detection result, it is further provided that pattern string feature matching is performed on the file to be identified, so as to identify whether the file to be identified is a virus file.
Specifically, when the hash value based on the import table is not successfully matched, it can be confirmed to a certain extent that the file to be identified is not a virus file, and in order to perform virus detection on the file to be identified more easily, the embodiment proposes that pattern string feature matching is performed on the file to be identified to further confirm whether the file to be identified is a virus file, so that the virus identification result of the file to be identified is further improved.
It should be noted that, the method based on pattern string feature matching may be implemented with reference to the method provided so far, which is not limited in this embodiment.
Further, when the hash value of the import table is not successfully matched with the virus hash feature library, or when the file to be identified is not identified as a virus file when pattern string feature matching is carried out on the file to be identified, whether the complete file to be identified is collected currently is confirmed, when the file to be identified is the complete file, full-text hash calculation is carried out on the file to be identified, a hash result is obtained, and whether the file to be identified is the virus file is identified according to the hash result.
Specifically, when the matching result based on the hash value of the import table and the virus hash feature library is that the matching result is not successful, in order to more accurately identify the virus of the file to be identified, the embodiment proposes that whether the current file to be identified is complete is confirmed, if the current file to be identified is a complete file, hash calculation is performed on the file to be identified to obtain a hash result, and then whether the file to be identified is a virus file is confirmed according to the hash result, so that the accuracy of the virus detection result of the file is further improved.
And when the matching result of the pattern string feature matching is unsuccessful, the result of the pattern string feature matching based on the hash value of the import table is indicated to be unsuccessful, and in this embodiment, it is proposed to confirm whether the current file to be identified is complete, if the current file to be identified is complete, hash calculation is performed on the file to be identified to obtain a hash result, and then, whether the file to be identified is a virus file is confirmed according to the hash result, thereby further improving accuracy of the virus detection result of the file.
It should be noted that, the virus identification based on the complete file to be identified may be implemented according to the method provided at present, which is not limited in this embodiment.
After the file detection method provided by any embodiment is used for detecting the file to be identified, a subsequent action can be performed on the file to be identified, for example, if the file to be identified is detected to be a virus file, the file to be identified can be discarded; if the file to be identified is detected to be not a virus file, namely, belongs to a safe file, the file to be identified can be subjected to release processing so as to process the file according to a subsequent flow.
By adopting the file detection method provided by any embodiment of the application, even though the file is not sensitive to local modification, when a virus file is changed (such as a code segment is changed or additional data is changed), the file header information is fixed, so that similar variant viruses can be detected by the same rule, and the practicability and the universality of the file detection method provided by the application are improved. In addition, in order to reduce the false alarm rate, when analyzing the hash rule of the extraction and import table of a mass sample, the rule of PE (polyethylene) with shell, package, infection and the like which is easy to generate false alarm is removed, and the removed part can be subjected to complementary detection through full-text hash and pattern string feature matching, so that the accuracy of file identification is further improved.
By adopting the file detection method provided by any embodiment of the application, the merging rate is improved to a certain extent by adopting the import table hash calculation method, and compared with the full-text hash algorithm, the rule number is changed by a certain amount.
In addition, for the import table matching method, as the information of the concerned PE file is at the file head, only the first plurality of bytes of the PE file need to be calculated and processed, hash calculation and matching are not needed to be carried out on the whole file, or AC matching processing is carried out on the whole file, so that the CPU utilization rate is greatly reduced, and meanwhile, the file identification performance is greatly improved. Meanwhile, according to the file detection method, the hash rule of the import table is tested by hundreds of millions of white sample data, and the false alarm rate is within the false alarm rate standard of the anti-virus security industry.
Furthermore, since the hash of the import table can be calculated for the first packet of most of the scenes, the recognition rate can be improved by adopting any file detection method provided by the application under the condition that the scenes such as the disorder of the message including the disorder of the IP layer, the disorder of the TCP and the disorder of the application layer are not recombined.
Based on the same inventive concept, the application also provides a file detection device corresponding to the file detection method. The implementation of the document detection apparatus may refer specifically to the description of the document detection method described above, and will not be discussed here.
Referring to fig. 3, fig. 3 is a file detecting apparatus provided in an exemplary embodiment of the present application, provided in a network security device, the apparatus including:
the identifying module 301 is configured to identify a file to be identified;
the extracting module 302 is configured to extract import table information from the file to be identified when the file to be identified is a file of a set type;
a hash calculation module 303, configured to perform hash calculation on the import table information to obtain an import table hash value;
a first matching module 304, configured to match the import table hash value with a virus hash feature library, where the virus hash feature library includes an import table hash value of a virus file;
and the confirming module 305 is configured to confirm that the file to be identified is a virus file when the matching result of the first matching module is that the matching is successful.
Optionally, based on the foregoing embodiment, in this embodiment, the import table information includes a library identifier of each dynamically linked library and an import table function identifier corresponding to the dynamically linked library. On this basis, the hash calculation module 303 is specifically configured to perform hash calculation on each library identifier and each corresponding import table function identifier, so as to obtain an import table hash value.
In the file detection device provided by the embodiment, the file to be identified is subjected to virus detection through the import table of the file to be identified, and as only the import table of the file is concerned, more memory is not required to be occupied, and the accuracy of a virus detection result is improved. In addition, since hash calculation is not needed based on the whole file to be identified, the file detection speed is improved, the CPU utilization rate is reduced, and the file detection performance is improved.
Further, the hash calculation module 303 is specifically configured to remove, for each library identifier of the dynamically linked library, a suffix of the library identifier, and obtain a modified library identifier; splicing the modified library identifier and the imported table function identifier corresponding to the dynamic link library to obtain a first spliced character of the dynamic link library; performing splicing processing on the first spliced characters of each dynamic link library to obtain second spliced characters; and carrying out information abstraction algorithm MD5 calculation processing on the second spliced character to obtain an import table hash value.
Optionally, based on any one of the foregoing embodiments, the file detection method provided in this embodiment further includes:
and a second matching module (not shown in the figure) for performing pattern string feature matching on the file to be identified so as to identify whether the file to be identified is a virus file.
Optionally, based on any one of the foregoing embodiments, the file detection method provided in this embodiment further includes:
and a third matching module (not shown in the figure) configured to perform full-text hash calculation on the file to be identified to obtain a hash result when the first matching module confirms that the hash value of the import table is not successfully matched with the virus hash feature library, or when the second matching module does not recognize that the file to be identified is a virus file when performing pattern string feature matching on the file to be identified, and if the file to be identified is a complete file, identify whether the file to be identified is a virus file according to the hash result.
Based on the same inventive concept, embodiments of the present application provide an electronic device, which may be, but is not limited to, the network security device described above. As shown in fig. 4, the electronic device includes a processor 401 and a machine-readable storage medium 402, the machine-readable storage medium 402 storing a computer program executable by the processor 401, the processor 401 being caused by the computer program to perform a file detection method provided by any of the embodiments of the present application. The electronic device further comprises a communication interface 403 and a communication bus 404, wherein the processor 401, the communication interface 403 and the machine readable storage medium 402 communicate with each other via the communication bus 404.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The machine-readable storage medium 402 may be a Memory, which may include random access Memory (Random Access Memory, RAM), DDR SRAM (Double Data Rate Synchronous Dynamic Random Access Memory, double rate synchronous dynamic random access Memory), or Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
For the electronic device and the machine-readable storage medium embodiments, the description is relatively simple, and reference should be made to the description of the method embodiments for relevant points, since the method content involved is substantially similar to that of the method embodiments described above.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The implementation process of the functions and roles of each unit/module in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be repeated here.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The above described apparatus embodiments are merely illustrative, wherein the units/modules illustrated as separate components may or may not be physically separate, and the components shown as units/modules may or may not be physical units/modules, i.e. may be located in one place, or may be distributed over a plurality of network units/modules. Some or all of the units/modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims (12)

1. A document detection method, comprising:
carrying out file identification on the file to be identified;
when the file to be identified is a file of a set type, extracting import table information from the file to be identified;
carrying out hash calculation on the information of the import table to obtain a hash value of the import table;
matching the import table hash value with a virus hash feature library, wherein the virus hash feature library comprises import table hash values of virus files;
and when the matching is successful, confirming that the file to be identified is a virus file.
2. The method of claim 1, wherein the import table information includes a library identification of each dynamically linked library and an import table function identification corresponding to the dynamically linked library;
performing hash calculation on the information of the import table to obtain a hash value of the import table, including:
and carrying out hash calculation processing on each library identifier and each corresponding import table function identifier to obtain an import table hash value.
3. The method of claim 2, wherein performing a hash calculation on each library identifier and each corresponding import table function identifier to obtain an import table hash value comprises:
removing the suffix of the library identifier aiming at the library identifier of each dynamic link library to obtain a modified library identifier;
splicing the modified library identifier and the imported table function identifier corresponding to the dynamic link library to obtain a first spliced character of the dynamic link library;
performing splicing processing on the first spliced characters of each dynamic link library to obtain second spliced characters;
and carrying out information abstraction algorithm MD5 calculation processing on the second spliced character to obtain an import table hash value.
4. The method as recited in claim 1, further comprising:
and carrying out pattern string feature matching on the file to be identified so as to identify whether the file to be identified is a virus file or not.
5. The method as recited in claim 4, further comprising:
when the hash value of the import table is not successfully matched with the virus hash feature library or the file to be identified is not identified as a virus file when pattern string feature matching is carried out on the file to be identified, carrying out full-text hash calculation on the file to be identified to obtain a hash result, and identifying whether the file to be identified is a virus file according to the hash result.
6. A document detection apparatus, comprising:
the identification module is used for carrying out file identification on the file to be identified;
the extraction module is used for extracting import table information from the file to be identified when the file to be identified is a file of a set type;
the hash calculation module is used for carrying out hash calculation on the information of the import table to obtain a hash value of the import table;
the first matching module is used for matching the import table hash value with a virus hash feature library, wherein the virus hash feature library comprises the import table hash value of a virus file;
and the confirming module is used for confirming that the file to be identified is a virus file when the matching result of the first matching module is that the matching is successful.
7. The apparatus of claim 6, wherein the import table information includes a library identification for each dynamically linked library and an import table function identification corresponding to the dynamically linked library;
the hash calculation module is specifically configured to perform hash calculation processing on each library identifier and each corresponding import table function identifier to obtain an import table hash value.
8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,
the hash calculation module is specifically configured to remove a suffix of a library identifier for each dynamically linked library to obtain a modified library identifier; splicing the modified library identifier and the imported table function identifier corresponding to the dynamic link library to obtain a first spliced character of the dynamic link library; performing splicing processing on the first spliced characters of each dynamic link library to obtain second spliced characters; and carrying out information abstraction algorithm MD5 calculation processing on the second spliced character to obtain an import table hash value.
9. The apparatus as recited in claim 6, further comprising:
and the second matching module is used for carrying out pattern string feature matching on the file to be identified so as to identify whether the file to be identified is a virus file or not.
10. The apparatus as recited in claim 9, further comprising:
and the third matching module is used for carrying out full-text hash calculation on the file to be identified to obtain a hash result when the first matching module confirms that the hash value of the import table is not successfully matched with the virus hash feature library or the second matching module does not identify that the file to be identified is a virus file when carrying out pattern string feature matching on the file to be identified, and identifying whether the file to be identified is the virus file according to the hash result.
11. An electronic device comprising a processor and a machine-readable storage medium storing a computer program executable by the processor, the processor being caused by the computer program to perform the method of any one of claims 1-5.
12. A machine-readable storage medium storing a computer program which, when invoked and executed by a processor, causes the processor to perform the method of any one of claims 1-5.
CN202310300540.0A 2023-03-23 2023-03-23 File detection method, device, equipment and storage medium Pending CN116401661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310300540.0A CN116401661A (en) 2023-03-23 2023-03-23 File detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310300540.0A CN116401661A (en) 2023-03-23 2023-03-23 File detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116401661A true CN116401661A (en) 2023-07-07

Family

ID=87019084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310300540.0A Pending CN116401661A (en) 2023-03-23 2023-03-23 File detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116401661A (en)

Similar Documents

Publication Publication Date Title
KR101083311B1 (en) System for detecting malicious script and method for detecting malicious script using the same
EP1959367B1 (en) Automatic extraction of signatures for Malware
US8307432B1 (en) Generic shellcode detection
WO2015120752A1 (en) Method and device for handling network threats
JP4995170B2 (en) Fraud detection method, fraud detection device, fraud detection program, and information processing system
KR101874373B1 (en) A method and apparatus for detecting malicious scripts of obfuscated scripts
US9038161B2 (en) Exploit nonspecific host intrusion prevention/detection methods and systems and smart filters therefor
US20200304521A1 (en) Bot Characteristic Detection Method and Apparatus
CN107066883A (en) System and method for blocking script to perform
KR20180081053A (en) Systems and Methods for Domain Generation Algorithm (DGA) Malware Detection
Li et al. FEPDF: a robust feature extractor for malicious PDF detection
EP3077922A1 (en) Method and apparatus for generating a plurality of indexed data fields
JP5656266B2 (en) Blacklist extraction apparatus, extraction method and extraction program
CN113965419B (en) Method and device for judging attack success through reverse connection
KR101558054B1 (en) Anti-malware system and packet processing method in same
CN109361674B (en) Bypass access streaming data detection method and device and electronic equipment
CN116401661A (en) File detection method, device, equipment and storage medium
CN113890758B (en) Threat information method, threat information device, threat information equipment and computer storage medium
CN112583827A (en) Data leakage detection method and device
CN111159111A (en) Information processing method, device, system and computer readable storage medium
CN112966269B (en) Searching and killing method and device based on browser plug-in
CN114301689B (en) Campus network security protection method and device, computing equipment and storage medium
CN113032783B (en) Virus detection method and system based on non-code characteristics
CN114943078A (en) File identification method and device
CN111737693B (en) Method for determining characteristics of malicious software, and method and device for detecting malicious software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination