WO2022021501A1 - 恶意文件的确定方法及装置 - Google Patents

恶意文件的确定方法及装置 Download PDF

Info

Publication number
WO2022021501A1
WO2022021501A1 PCT/CN2020/110318 CN2020110318W WO2022021501A1 WO 2022021501 A1 WO2022021501 A1 WO 2022021501A1 CN 2020110318 W CN2020110318 W CN 2020110318W WO 2022021501 A1 WO2022021501 A1 WO 2022021501A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
blocks
detected
preset
feature value
Prior art date
Application number
PCT/CN2020/110318
Other languages
English (en)
French (fr)
Inventor
赵烨
袁巍
路鹏
王海旭
Original Assignee
山石网科通信技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山石网科通信技术股份有限公司 filed Critical 山石网科通信技术股份有限公司
Priority to US17/439,827 priority Critical patent/US20230153432A1/en
Publication of WO2022021501A1 publication Critical patent/WO2022021501A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/16Implementing security features at a particular protocol layer
    • H04L63/168Implementing security features at a particular protocol layer above the transport layer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/564Static detection by virus signature recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/565Static detection by checking file integrity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms

Definitions

  • the present application relates to the technical field of information processing, and in particular, to a method and device for determining malicious files.
  • a file bearer layer protocol e.g., SMB-Server Message Block protocol
  • a file bearer layer protocol e.g., SMB-Server Message Block protocol
  • the detection engine decodes layer by layer according to the protocol, and extracts file blocks.
  • the hash characteristic value of the block is calculated according to the order of the file blocks, and the hash characteristic value of the whole file is obtained after the file is received.
  • the algorithms such as MD5 that calculate the hash characteristic value of malicious files in the industry are required to be executed when the file blocks are in order, and the hash characteristic value cannot be calculated when the file data blocks are out of order to detect malicious files. If the files are out of order, in the worst case, the intermediate network security device needs to cache all the file blocks before calculating the hash feature value, which requires too much memory space of the device. If the device does not have enough space to cache the file blocks, the orderly calculation of the hash feature value result cannot be completed, and the detection of large malicious files cannot be realized.
  • the main purpose of the present application is to provide a malicious file determination method and device to solve the problem in the related art that it is difficult to detect whether the to-be-detected file is a malicious file if the device does not have enough space to cache all the file blocks of the to-be-detected file.
  • a method for determining malicious files includes: judging whether the received multiple file blocks meet preset requirements, wherein the multiple file blocks are all file blocks of the file to be detected, and the preset requirements at least include: Size requirements and sorting requirements of the multiple file blocks; if the multiple file blocks do not meet the preset requirements, calculate the hash characteristic value of the first file block, wherein the first file block is in the device cache area
  • the feature value and the sequence-independent hash feature value of each subfile are used to determine whether the file to be detected is a malicious file.
  • calculating the sequence-independent hash feature value of each subfile includes: dividing each subfile into a plurality of data blocks of preset bits, and calculating the sequence-independent hash feature values in the data blocks of each preset bit. ; Accumulate and calculate the order-independent hash characteristic values in the data blocks of each preset bit to obtain the order-independent hash characteristic values of each sub-file.
  • the method further includes: if there is a data block smaller than a preset bit in the data blocks divided into each sub-file, filling the data block smaller than the preset bit, so that each data block is a preset size. data block of bits.
  • judging whether the file to be detected is a malicious file comprising: obtaining the value of the file to be detected. Length; based on the length of the file to be detected, the hash feature value of the header file block and the sequence-independent hash feature value of each sub-file, a matching query is performed in a preset database to judge the file to be detected. whether it is a malicious file.
  • a matching query is performed in a preset database to determine the to-be-detected Whether the file is a malicious file, including: if the preset database stores the same length as the file to be detected, the hash feature value of the header file block is the same, and the sequence-independent hash feature value of each subfile is the same the data information, then the to-be-detected file is determined to be a malicious file.
  • the method further includes: judging the size of the to-be-detected file; if the size of the to-be-detected file exceeds the preset size, executing the judgment The steps of whether the received multiple file blocks meet the preset requirements.
  • the method is applied to a scenario in which the data block of the to-be-detected file is acquired in the out-of-order transmission of the application layer protocol in the network security device.
  • a malicious file determination device comprising: a first judgment unit configured to judge whether the received multiple file blocks meet the preset requirements, wherein the multiple The number of file blocks are all file blocks of the file to be detected, and the preset requirements include at least: size requirements of the multiple file blocks and sorting requirements of the multiple file blocks; the first calculation unit is set to In the case where multiple file blocks do not meet the preset requirements, the hash characteristic value of the header file block is calculated, wherein the header file block is the file block that has been cached in the order of the file blocks in the device cache area; the second A computing unit is configured to divide the plurality of file blocks into a preset number of sub-files, and calculate the order-independent hash feature value of each sub-file; a second judgment unit is configured to be based on the hash value of the header file block It is determined whether the to-be-detected file is a malicious file based on the hash feature value and the sequence-independent
  • a non-volatile storage medium includes a stored program, wherein the program executes any one of the above-mentioned methods for determining malicious files .
  • a processor is provided, and the processor is configured to run a program, wherein when the program runs, the method for determining a malicious file described in any one of the above is executed.
  • the following steps are adopted: judging whether the received multiple file blocks meet the preset requirements, wherein the multiple file blocks are all file blocks of the file to be detected, and the preset requirements include at least: size requirements of the multiple file blocks and the sorting requirements of multiple file blocks; if multiple file blocks do not meet the preset requirements, the hash characteristic value of the first file block is calculated, where the first file block has been cached in the order of the file blocks in the device cache area File block; divide multiple file blocks into a preset number of sub-files, and calculate the order-independent hash characteristic value of each sub-file; based on the hash characteristic value of the first file block and the order-independent hash value of each sub-file
  • the feature value is used to determine whether the file to be detected is a malicious file, which solves the problem in the related art that it is difficult to detect whether the file to be detected is a malicious file if the device does not have enough space to cache all the file blocks of the file to be detected.
  • the device By judging whether the file to be detected is a malicious file based on the hash feature value of the file blocks that have been cached in the order of the file blocks and the sequence-independent hash feature value of each sub-file, it is realized that the device does not have enough space for caching. In the case of all the file blocks of the file to be detected, the effect of detecting whether the file to be detected is a malicious file can also be detected.
  • FIG. 1 is a flowchart of a method for determining a malicious file provided according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of calculation of supplementary features in a method for determining malicious files provided according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a method for determining a malicious file provided according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an apparatus for determining a malicious file provided according to an embodiment of the present application.
  • Malicious files are scripts used to damage computer systems or steal user privacy, such as viruses and Trojans.
  • a method for determining malicious files is provided.
  • FIG. 1 is a flowchart of a method for determining a malicious file according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:
  • Step S101 judging whether the received multiple file blocks meet the preset requirements, wherein the multiple file blocks are all file blocks of the file to be detected, and the preset requirements at least include: size requirements of multiple file blocks and multiple file blocks sorting requirements.
  • Step S102 if the multiple file blocks do not meet the preset requirements, calculate the hash characteristic value of the first file block, wherein the first file block is the file block that has been cached in the order of the file blocks in the device cache area.
  • the size of all file blocks of the file to be detected exceeds the preset size and the file blocks are not arranged in sequence, it is considered that the file blocks of the file to be detected do not meet the preset requirements.
  • the upper limit of the device cache area is 2M
  • the size of all file blocks of the acquired file to be detected is 5M, that is, the size of the file to be detected exceeds 2M, and the file blocks are not arranged in sequence
  • the calculation is performed in the device cache.
  • the part in the dotted box is the file blocks that have been cached in the order of the file blocks in the device cache. Sort the file blocks that have been cached in the order of the file blocks, and calculate the sorted files in the device cache.
  • the hash feature value of the block (corresponding to the sorting in Figure 2, the capital Hash result is calculated).
  • Step S103 Divide the multiple file blocks into a preset number of sub-files, and calculate a sequence-independent hash feature value of each sub-file.
  • Hash value the hash characteristic value of the file blocks that have been cached in the order of the file blocks in the device cache
  • the cumulative sum of the files the order-independent hash characteristic value of each sub-file
  • calculating the sequence-independent hash feature value of each subfile includes: dividing each subfile into a plurality of data blocks of preset bits, and calculating each subfile.
  • the order-independent hash feature value in the data block of preset bits; the order-independent hash feature value in each preset-bit data block is accumulated and calculated to obtain the order-independent hash value of each subfile Eigenvalues.
  • the algorithm for calculating the order-independent hash eigenvalues in the data block can participate in the calculation in any order for the data units in the input data block, and the obtained results are the same.
  • calculating the order-independent hash eigenvalues The algorithm is cumulative sum.
  • the size of the file to be detected is 10M
  • the device cache area can buffer 1M of data
  • the 10M is divided into 5 sub-files.
  • 0-2M is the first subfile, which is received out of order by the device due to out-of-order transmission of the application layer protocol, and cannot be completely cached.
  • the device After receiving the data transmitted out of order, the device divides it into several 64-bit data blocks, and then calculates the order-independent hash feature value of each 64-bit data block, and then divides the order-independent hash value of each 64-bit data block.
  • the hash eigenvalues are accumulated and calculated to obtain order-independent hash eigenvalues of 0-2M sub-files.
  • the method for calculating the hash eigenvalues of the 2-4M subfiles, the 4-6M subfiles, the 6-8M subfiles, and the 6-8M subfiles is the same as the method for calculating the hash eigenvalues of the 0-2M subfiles, and will not be repeated here.
  • the illustration of calculating the Hash value of the file header (the hash characteristic value of the file blocks that have been cached in the order of the file blocks in the device cache) and the cumulative sum of the files (the order-independent hash characteristic value of each sub-file) can be shown in the following table 2 shown.
  • Step S104 based on the hash feature value of the header file block and the sequence-independent hash feature value of each subfile, determine whether the file to be detected is a malicious file.
  • the file to be detected is a malicious file through the six values in Table 2 (the Hash value of the file header, the accumulated sum 1, . . . , the accumulated sum 5).
  • the method further includes: if there is a data block smaller than a preset bit in the data block divided into each sub-file, determining the data block smaller than the preset bit. Padding is performed so that each block is a block of preset bits.
  • the preset bit is 64 bits, however, the tail of the 8-10M data block is less than 64 bits, and the tail is 58 bits, which can be filled with 0 to make up 64 bits, thereby ensuring the subsequent calculation of the hash feature value of each data block (the accumulation in Table 2). and ) for more accurate identification of malicious files.
  • the hash feature value of the header file block and the sequence-independent hash value of each sub-file are used.
  • Hash feature value, to determine whether the file to be detected is a malicious file including: obtaining the length of the file to be detected; based on the length of the file to be detected, the hash feature value of the header file block, and the sequence-independent hash feature value of each subfile
  • a matching query is performed in a preset database to determine whether the file to be detected is a malicious file.
  • the preset database has pre-stored the lengths of all malicious files in the historical data, the hash feature values of the file blocks that can be cached in the device cache area, and the sequence-independent hash feature values of each subfile.
  • a matching query is performed in the preset database, and then Quickly determine whether the file to be detected is a malicious file.
  • the file to be detected is determined to be the same. for malicious files.
  • the method is applied to the out-of-order transmission of the application layer protocol in the network security device (for example, such as the server message block protocol SMB) to obtain the data of the file to be detected. block scene.
  • the network security device for example, such as the server message block protocol SMB
  • the detection of malicious files can be realized by using the above method.
  • the size of the file to be detected can be determined; if the size of the file to be detected exceeds the preset size, the step of determining whether the received multiple file blocks meet the preset requirements is performed. That is, if the size of the transmitted file is known in advance, in the method for determining malicious files provided by the embodiments of the present application, only files whose files to be detected exceed the preset size are applied to the method for determining malicious files provided by the embodiments of the present application.
  • the cumulative sum of each sub-file if not, wait for the cache area to be calculated according to the file block. After the hash characteristic values of the sequentially cached file blocks (capital file blocks) are calculated, the cumulative sum of each sub-file is calculated. If multiple file blocks can be cached sequentially, that is, all files in the cache area can be cached sequentially, the hash characteristic value of the entire file is calculated.
  • the above method solves the problem in the related art that if the device does not have enough space to cache all the file blocks of the file to be detected, it is difficult to detect whether the file to be detected is a malicious file.
  • the device does not have enough space to cache the file to be detected.
  • the effect of detecting whether the file to be detected is a malicious file can also be detected.
  • the embodiment of the present application also provides a malicious file determination device. It should be noted that the malicious file determination device of the present application embodiment can be used to execute the malicious file determination method provided by the present application embodiment. The following describes the device for determining malicious files provided by the embodiments of the present application.
  • FIG. 4 is a schematic diagram of an apparatus for determining a malicious file according to an embodiment of the present application.
  • the device includes: a first judgment unit 401, a first calculation unit 402, a second calculation unit 403 and a second judgment unit 404.
  • the first judging unit 401 is configured to judge whether the received multiple file blocks meet the preset requirements, wherein the multiple file blocks are all file blocks of the file to be detected, and the preset requirements at least include: multiple file blocks size requirements and ordering requirements for multiple file blocks.
  • the first calculation unit 402 is configured to calculate the hash characteristic value of the header file block when the multiple file blocks do not meet the preset requirements, wherein the header file block is in the order of the file blocks in the device cache area. Cached file blocks.
  • the second calculation unit 403 is configured to divide a plurality of file blocks into a preset number of sub-files, and calculate a hash characteristic value independent of the sequence of each sub-file.
  • the second judging unit 404 is configured to judge whether the file to be detected is a malicious file based on the hash feature value of the header file block and the sequence-independent hash feature value of each subfile.
  • the first determination unit 401 determines whether the received multiple file blocks meet the preset requirements, wherein the multiple file blocks are all file blocks of the file to be detected, and the preset requirements are at least Including: the size requirements of multiple file blocks and the sorting requirements of multiple file blocks; when the multiple file blocks do not meet the preset requirements, the first calculation unit 402 calculates the hash feature value of the first file block, wherein, The first file block is a file block that has been cached in the order of the file blocks in the device cache area; the second calculation unit 403 divides the multiple file blocks into a preset number of sub-files, and calculates the sequence-independent hash of each sub-file.
  • the second judgment unit 404 judges whether the file to be detected is a malicious file based on the hash feature value of the header file block and the sequence-independent hash feature value of each subfile. By judging whether the file to be detected is a malicious file based on the hash feature value of the file blocks that have been cached in the order of the file blocks and the sequence-independent hash feature value of each sub-file, it is realized that the device does not have enough space for caching. In the case of all the file blocks of the file to be detected, the effect of detecting whether the file to be detected is a malicious file can also be detected.
  • the second calculation unit 403 includes: a first calculation module, configured to divide each sub-file into a plurality of data blocks of preset bits, and calculate each preset bit. Set the order-independent hash characteristic value in the data block of bits; the second calculation module is set to accumulate and calculate the order-independent hash characteristic value in each preset bit data block to obtain the hash characteristic value of each sub-file. Order-independent hash feature values.
  • the device further includes: a padding unit, configured to: if there is a data block smaller than a preset bit in the data blocks divided into each sub-file, then The data blocks smaller than the preset bits are padded so that each data block is a data block of the preset bits.
  • the second judging unit 404 includes: an obtaining module, configured to obtain the length of the file to be detected; a query module, configured to be based on the length of the file to be detected, the header The hash feature value of the file block and the sequence-independent hash feature value of each sub-file are matched and queried in a preset database to determine whether the file to be detected is a malicious file.
  • the query module includes: a determination module, configured to store a hash feature value of a header file block that is the same length as the file to be detected in the preset database. If the data information is the same and the sequence-independent hash feature value of each sub-file is the same, the to-be-detected file is determined to be a malicious file.
  • the device further includes: a third judgment unit, configured to judge the file to be detected before judging whether the received multiple file blocks meet the preset requirements.
  • the execution unit is set to execute the step of judging whether the received multiple file blocks meet the preset requirements when the size of the file to be detected exceeds the preset size.
  • the apparatus is applied to a scenario in which data blocks of a file to be detected are acquired in out-of-order transmission of an application layer protocol in a network security device.
  • the malicious file determination device includes a processor and a memory.
  • the above-mentioned first judgment unit 401, first calculation unit 402, second calculation unit 403, and second judgment unit 404 are all stored in the memory as program units, and are processed by the processing unit.
  • the processor executes the above-mentioned program units stored in the memory to realize the corresponding functions.
  • the processor includes a kernel, and the kernel calls the corresponding program unit from the memory.
  • the kernel can set one or more, and by adjusting the kernel parameters, in the case that the device does not have enough space to cache all the file blocks of the file to be detected, the effect of detecting whether the file to be detected is malicious can also be detected.
  • Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one memory chip.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • An embodiment of the present invention provides a storage medium on which a program is stored, and when the program is executed by a processor, a method for determining the malicious file is implemented.
  • An embodiment of the present invention provides a processor for running a program, wherein the method for determining the malicious file is executed when the program is running.
  • An embodiment of the present invention provides a device.
  • the device includes a processor, a memory, and a program stored in the memory and running on the processor.
  • the processor executes the program, the processor implements the following steps: judging whether the received multiple file blocks conform to preset requirements, wherein the multiple file blocks are all file blocks of the file to be detected, and the preset requirements include at least: size requirements of the multiple file blocks and sorting requirements of the multiple file blocks; if The multiple file blocks do not meet the preset requirements, then calculate the hash feature value of the header file block, wherein the header file block is the file block that has been cached in the order of the file blocks in the device cache area; A plurality of file blocks are divided into a preset number of sub-files, and the order-independent hash feature value of each sub-file is calculated; based on the hash feature value of the header file block and the sequence-independent hash value of each sub-file The eigenvalue is used to determine whether the file to be detected is a malicious file.
  • calculating the sequence-independent hash characteristic value of each sub-file includes: dividing each sub-file into a plurality of data blocks of preset bits, and calculating the number of data blocks in the data blocks of each preset bit. Order-independent hash characteristic value; the order-independent hash characteristic value in each preset bit data block is accumulated and calculated to obtain the order-independent hash characteristic value of each subfile.
  • the method further includes: if there is a data block smaller than a preset bit in the data blocks divided into each sub-file, filling the data block smaller than the preset bit to Make each data block a data block of preset bits.
  • the following steps may also be implemented: based on the hash feature value of the header file block and the sequence-independent hash feature value of each sub-file, judging whether the file to be detected is a malicious file, including : obtain the length of the file to be detected; perform a matching query in a preset database based on the length of the file to be detected, the hash feature value of the header file block and the sequence-independent hash feature value of each subfile , to determine whether the file to be detected is a malicious file.
  • the following steps may also be implemented: matching in a preset database based on the length of the file to be detected, the hash feature value of the header file block, and the sequence-independent hash feature value of each subfile Querying to determine whether the file to be detected is a malicious file, including: if the preset database stores the same length as the file to be detected, the hash feature value of the header file block is the same, and the sum of each sub-file is the same. If the sequence-independent hash feature values have the same data information, it is determined that the to-be-detected file is a malicious file.
  • the processor executes the program, the following steps may also be implemented: before judging whether the received multiple file blocks meet the preset requirements, the method further includes: judging the size of the file to be detected; if the size of the file to be detected is If the size exceeds the preset size, the step of judging whether the received multiple file blocks meet the preset requirements is performed.
  • the processor executes the program, the following steps may also be implemented: the method is applied to a scenario in which the data block of the to-be-detected file is acquired in the out-of-order transmission of the application layer protocol in the network security device.
  • the devices in this article can be servers, PCs, PADs, mobile phones, and so on.
  • the present application also provides a computer program product, which, when executed on a data processing device, is suitable for executing a program initialized with the following method steps: judging whether the received multiple file blocks meet preset requirements, wherein the multiple The number of file blocks are all file blocks of the file to be detected, and the preset requirements include at least: size requirements of the multiple file blocks and sorting requirements of the multiple file blocks; if the multiple file blocks do not meet the preset requirements Assuming requirements, calculate the hash characteristic value of the header file block, wherein the header file block is the file block that has been cached in the order of the file blocks in the device cache area; the multiple file blocks are divided into a preset number of subfiles, calculate the order-independent hash feature value of each subfile; based on the hash feature value of the header file block and the sequence-independent hash feature value of each subfile, determine the to-be-detected hash feature value Whether the file is malicious.
  • calculating the order-independent hash characteristic value of each sub-file includes: dividing each sub-file into a plurality of data blocks of preset bits, Calculate the order-independent hash feature value in the data block of each preset bit; accumulate and calculate the order-independent hash feature value in the data block of each preset bit to obtain the order-independent hash feature value of each subfile The hash feature value of .
  • the method further includes: if there is a data block smaller than a preset bit in the data blocks divided into each sub-file, performing an operation on the The data blocks of preset bits are padded so that each data block is a data block of preset bits.
  • a program When executed on a data processing device, it is also suitable for executing a program initialized with the following method steps: based on the hash feature value of the header file block and the sequence-independent hash feature value of each sub-file, determine the Whether the file to be detected is a malicious file, including: obtaining the length of the file to be detected; based on the length of the file to be detected, the hash feature value of the header file block and the sequence-independent hash of each subfile The characteristic value is matched and queried in a preset database to determine whether the file to be detected is a malicious file.
  • a program When executed on a data processing device, it is also suitable for executing a program initialized with the following method steps: based on the length of the file to be detected, the hash feature value of the header file block and the sequence-independent hash of each sub-file. It is hoped that the feature value is matched and queried in the preset database to determine whether the file to be detected is a malicious file, including: if the preset database has the same length as the file to be detected, the hash feature of the header file block is stored If the value is the same and the sequence-independent hash feature value of each sub-file is the same, the to-be-detected file is determined to be a malicious file.
  • the method When executed on a data processing device, it is also suitable for executing a program initialized with the following method steps: before judging whether the received multiple file blocks meet the preset requirements, the method further includes: judging the size; if the size of the file to be detected exceeds the preset size, execute the step of judging whether the received multiple file blocks meet the preset requirements.
  • the method When executed on a data processing device, it is also suitable for executing a program initialized with the following method steps: the method is applied to a scenario in which data blocks of the to-be-detected file are acquired in an application layer protocol out-of-order transmission in a network security device.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Abstract

一种恶意文件的确定方法及装置。该方法包括:判断接收到的多个文件块是否符合预设要求(S101),多个文件块是待检测文件的所有文件块,预设要求至少包括:多个文件块的大小要求和多个文件块的排序要求;若多个文件块不符合预设要求,则计算首部文件块的哈希特征值(S102),首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;将多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值(S103);基于首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断待检测文件是否为恶意文件(S104)。通过该方法,解决了相关技术中若设备没有足够空间缓存待检测文件的所有文件块,难以检测待检测文件是否为恶意文件的问题。

Description

恶意文件的确定方法及装置
本发明申请要求2020年07月30日申请的,申请号为202010754323.5,名称为“恶意文件的确定方法及装置”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本申请涉及信息处理技术领域,具体而言,涉及一种恶意文件的确定方法及装置。
背景技术
在当前网络安全设备中恶意文件检测技术应用很广泛。通常,网络安全设备中的文件承载层协议(例如,SMB-Server Message Block协议)解码器,从流量中提取文件内容,计算校验和,然后与预先生成的恶意文件校验和比较,如发现匹配则判定是否恶意文件。现有技术方案中判定恶意文件的步骤可以如下:首先,检测引擎按照协议逐层解码,提取文件块。针对文件块顺序计算块的哈希特征值,文件接收完毕后得到整体文件的哈希特征值。然后,使用整体文件的哈希特征值查恶意文件哈希特征值的特征库,确定是否与库中已知恶意文件哈希特征值结果相同,从而确定是否是恶意文件。
然而,业界计算恶意文件哈希特征值的算法(如MD5),都要求在文件块有序的情况下执行,无法在文件数据块乱序的情况下计算哈希特征值,以检测恶意文件。如果文件乱序,中间网络安全设备最坏情况下需要在缓存全部文件块的情况下,才能计算哈希特征值,对设备内存空间要求过高。若设备没有足够空间缓存文件块,导致无法完成哈希特征值结果的有序计算,无法实现大的恶意文件的检测。
针对相关技术中若设备没有足够空间缓存待检测文件的所有文件块,难以检测待检测文件是否为恶意文件的问题,目前尚未提出有效的解决方案。
发明内容
本申请的主要目的在于提供一种恶意文件的确定方法及装置,以解决相关技术中若设备没有足够空间缓存待检测文件的所有文件块,难以检测待检测文件是否为恶意文件的问题。
为了实现上述目的,根据本申请的一个方面,提供了一种恶意文件的确定方法。该方法包括:判断接收到的多个文件块是否符合预设要求,其中,所述多个文件块是 待检测文件的所有文件块,所述预设要求至少包括:所述多个文件块的大小要求和所述多个文件块的排序要求;若所述多个文件块不符合预设要求,则计算首部文件块的哈希特征值,其中,所述首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;将所述多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值;基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件。
进一步地,计算每个子文件的与顺序无关的哈希特征值包括:将每个子文件分成多个预设比特的数据块,计算每个预设比特的数据块中与顺序无关的哈希特征值;将每个预设比特的数据块中与顺序无关的哈希特征值进行累加和计算,得到每个子文件的与顺序无关的哈希特征值。
进一步地,所述方法还包括:若每个子文件分成的数据块中存在小于预设比特的数据块,则对所述小于预设比特的数据块进行填充,以使每个数据块是预设比特的数据块。
进一步地,基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件,包括:获取所述待检测文件的长度;基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件。
进一步地,基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件,包括:若预设数据库中存储了与待检测文件的长度相同、所述首部文件块的哈希特征值相同和所述每个子文件的与顺序无关的哈希特征值相同的数据信息,则确定所述待检测文件为恶意文件。
进一步地,在判断接收到的多个文件块是否符合预设要求之前,所述方法还包括:判断所述待检测文件的大小;若所述待检测文件的大小超过预设大小,则执行判断接收到的多个文件块是否符合预设要求的步骤。
进一步地,所述方法应用于网络安全设备中的应用层协议无序传输中获取所述待检测文件的数据块的场景。
为了实现上述目的,根据本申请的一个方面,提供了一种恶意文件的确定装置,包括:第一判断单元,设置为判断接收到的多个文件块是否符合预设要求,其中,所述多个文件块是待检测文件的所有文件块,所述预设要求至少包括:所述多个文件块的大小要求和所述多个文件块的排序要求;第一计算单元,设置为在所述多个文件块 不符合预设要求的情况下,则计算首部文件块的哈希特征值,其中,所述首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;第二计算单元,设置为将所述多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值;第二判断单元,设置为基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件。
为了实现上述目的,根据本申请的一个方面,提供了一种非易失性存储介质,所述存储介质包括存储的程序,其中,所述程序执行上述任意一项所述的恶意文件的确定方法。
为了实现上述目的,根据本申请的一个方面,提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述任意一项所述的恶意文件的确定方法。
通过本申请,采用以下步骤:判断接收到的多个文件块是否符合预设要求,其中,多个文件块是待检测文件的所有文件块,预设要求至少包括:多个文件块的大小要求和多个文件块的排序要求;若多个文件块不符合预设要求,则计算首部文件块的哈希特征值,其中,首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;将多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值;基于首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断待检测文件是否为恶意文件,解决了相关技术中若设备没有足够空间缓存待检测文件的所有文件块,难以检测待检测文件是否为恶意文件的问题。通过根据中已按文件块的顺序缓存的文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断出待检测文件是否为恶意文件,实现了在设备没有足够空间缓存待检测文件的所有文件块的情况下,也可以检测待检测文件是否为恶意文件的效果。
附图说明
构成本申请的一部分的附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例提供的恶意文件的确定方法的流程图;
图2是根据本申请实施例提供的恶意文件的确定方法中增补特征的计算示意图;
图3是根据本申请实施例提供的恶意文件的确定方法的示意图;以及
图4是根据本申请实施例提供的恶意文件的确定装置的示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为了便于描述,以下对本申请实施例涉及的部分名词或术语进行说明:
恶意文件:例如,恶意文件为病毒、木马等用于破坏计算机系统或窃取用户隐私的脚本。
根据本申请的实施例,提供了一种恶意文件的确定方法。
图1是根据本申请实施例的恶意文件的确定方法的流程图。如图1所示,该方法包括以下步骤:
步骤S101,判断接收到的多个文件块是否符合预设要求,其中,多个文件块是待检测文件的所有文件块,预设要求至少包括:多个文件块的大小要求和多个文件块的排序要求。
步骤S102,若多个文件块不符合预设要求,则计算首部文件块的哈希特征值,其中,首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块。
若待检测文件的所有文件块的大小超过预设大小且文件块并不是按照顺序排列的,则认为待检测文件的文件块不符合预设要求。例如,设备缓存区上限为2M,获取到的待检测文件的所有文件块的大小为5M,也即,待检测文件的大小超过2M,且文件块并不是按照顺序排列的,则计算在设备缓存区中已按文件块的顺序缓存的文件块的哈希特征值。如图2所示,虚线框中的部分为设备缓存区中已按文件块的顺序缓存的文件块,对已按文件块的顺序缓存的文件块进行排序,计算设备缓存区中排序后 的文件块的哈希特征值(对应图2中的排序后算出首都Hash结果)。
步骤S103,将多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值。
需要说明的是,对于待检测文件的大小超过预设大小阈值(例如,检测设备缓冲上限为2M)的恶意文件,除保存原有的恶意文件特征定义外,增补如下表1所示,文件首部Hash值(设备缓存区中已按文件块的顺序缓存的文件块的哈希特征值)和文件累加和(每个子文件的与顺序无关的哈希特征值)作为第二种文件特征,增补的文件特征可用于独立识别恶意文件。
表1
Figure PCTCN2020110318-appb-000001
可选地,在本申请实施例提供的恶意文件的确定方法中,计算每个子文件的与顺序无关的哈希特征值包括:将每个子文件分成多个预设比特的数据块,计算每个预设比特的数据块中与顺序无关的哈希特征值;将每个预设比特的数据块中与顺序无关的哈希特征值进行累加和计算,得到每个子文件的与顺序无关的哈希特征值。
需要说的是,计算数据块中与顺序无关的哈希特征值的算法对于输入数据块内的数据单元可以任意顺序参与计算,得到的结果都相同,例如,计算与顺序无关的哈希特征值的算法为累加和。
例如,待检测的文件的大小为10M,设备缓存区能缓存1M的数据,将10M其分为5个子文件。0-2M为第一个子文件,该子文件由于应用层协议无序传输,而乱序被设备接收到,且无法完全缓存。接收到无序传输的数据,设备将它分为若干个64bit的数据块,然后算各个64bit的数据块的与顺序无关的哈希特征值,然后将每个64bit的数据块的与顺序无关的哈希特征值进行累加和计算,得到0-2M子文件的与顺序无关的哈希特征值。计算2-4M子文件,4-6M子文件,6-8M子文件,以及6-8M子文件的哈希特征值方法与计算0-2M子文件的哈希特征值一样,在此不在赘述。计算文件首部Hash值(设备缓存区中已按文件块的顺序缓存的文件块的哈希特征值)和文件累加和(每个子文件的与顺序无关的哈希特征值)的示意,可如下表2所示。
表2
Figure PCTCN2020110318-appb-000002
Figure PCTCN2020110318-appb-000003
步骤S104,基于首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断待检测文件是否为恶意文件。
例如,可以通过表2中的六个值(文件首部Hash值,累加和1,…,累加和5)确定待检测文件是否为恶意文件。
通过上述步骤,实现了在设备没有足够空间缓存待检测文件的所有文件块的情况下,也可以检测待检测文件是否为恶意文件。
可选地,在本申请实施例提供的恶意文件的确定方法中,该方法还包括:若每个子文件分成的数据块中存在小于预设比特的数据块,则对小于预设比特的数据块进行填充,以使每个数据块是预设比特的数据块。
例如,预设比特是64bit,然而,8-10M的数据块中尾部不足64bit,尾部58bit,可以填充0,以补足64bit,从而保证后续计算各个数据块的哈希特征值(表2中的累加和)的准确性,以便更准确的识别恶意文件。
为了提升确定待检测文件是否为恶意文件的效率,可选地,在本申请实施例提供的恶意文件的确定方法中,基于首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断待检测文件是否为恶意文件,包括:获取待检测文件的长度;基于待检测文件的长度、首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断待检测文件是否为恶意文件。
在上述方案中,预设数据库中已预先存储了历史数据中所有恶意文件的长度、设备缓存区能缓存的文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值。直接基于获取到的待检测文件的长度;基于待检测文件的长度、首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,即可快速的判断出待检测文件是否为恶意文件。具体地,若预设数据库中存储了与待检测文件的长度相同、首部文件块的哈希特征值相同和每个子文件的与顺序无关的哈希特征值相同的数据信息,则确定待检测文件为恶意文件。
可选地,在本申请实施例提供的恶意文件的确定方法中,该方法应用于网络安全设备中的应用层协议无序传输中(例如,如服务器消息块协议SMB)获取待检测文件的数据块的场景。
由于SMB协议在文件发送时不保证文件有序,因此,采用上述方法可以实现恶意文件的检测。在检测之前,可以判断待检测文件的大小;若待检测文件的大小超过预设大小,则执行判断接收到的多个文件块是否符合预设要求的步骤。也即,若预先知道传输的文件大小,在本申请实施例提供的恶意文件的确定方法中,仅对待检测文件超过预设大小的文件应用于本申请实施例提供的恶意文件的确定方法。
如图3所示,在接收到多个文件块时,进行数据解析,判断是否是全局失序,需要说明的是,全局失序是指待检测文件的多个文件块无法顺序缓存的情况,若多个文件块无法顺序缓存,则判断缓存区中的文件全局失序。然后计算缓存区中已按文件块的顺序缓存的文件块(首都文件块)的哈希特征值,由于可能多次接收才能接收完缓存区中应该存储的文件块,因此,判断缓存区中已按文件块的顺序缓存的文件块(首都文件块)的哈希特征值是否计算完毕,若计算完毕则计算各个子文件的累加和,若没有计算完毕,则等待缓存区中已按文件块的顺序缓存的文件块(首都文件块)的哈希特征值计算完毕后再计算各个子文件的累加和。若多个文件块能够顺序缓存,也即,缓存区中的文件能够全部属顺序缓存,计算整体文件的哈希特征值。在对待检测文件进行恶意文件判断是,进一步判断多个文件块是否是无法顺序缓存,若是,则通过首都哈希特征值和累加和组成的特征进行判断待检测文件是否为恶意文件。若不是,则基于计算出的整体文件的哈希特征值进行判断待检测文件是否为恶意文件。
通过上述方法,解决了相关技术中若设备没有足够空间缓存待检测文件的所有文件块,难以检测待检测文件是否为恶意文件的问题。通过根据已按文件块的顺序缓存的文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断出待检测文件是否为恶意文件,实现了在设备没有足够空间缓存待检测文件的所有文件块的情况下,也可以检测待检测文件是否为恶意文件的效果。
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例还提供了一种恶意文件的确定装置,需要说明的是,本申请实施例的恶意文件的确定装置可以用于执行本申请实施例所提供的用于恶意文件的确定方法。以下对本申请实施例提供的恶意文件的确定装置进行介绍。
图4是根据本申请实施例的恶意文件的确定装置的示意图。如图4所示,该装置 包括:第一判断单元401,第一计算单元402,第二计算单元403和第二判断单元404。
具体地,第一判断单元401,设置为判断接收到的多个文件块是否符合预设要求,其中,多个文件块是待检测文件的所有文件块,预设要求至少包括:多个文件块的大小要求和多个文件块的排序要求。
第一计算单元402,设置为在多个文件块不符合预设要求的情况下,则计算首部文件块的哈希特征值,其中,首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块。
第二计算单元403,设置为将多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值。
第二判断单元404,设置为基于首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断待检测文件是否为恶意文件。
本申请实施例提供的恶意文件的确定装置,第一判断单元401判断接收到的多个文件块是否符合预设要求,其中,多个文件块是待检测文件的所有文件块,预设要求至少包括:多个文件块的大小要求和多个文件块的排序要求;第一计算单元402在多个文件块不符合预设要求的情况下,则计算首部文件块的哈希特征值,其中,首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;第二计算单元403将多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值;第二判断单元404基于首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断待检测文件是否为恶意文件。通过根据中已按文件块的顺序缓存的文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值,判断出待检测文件是否为恶意文件,实现了在设备没有足够空间缓存待检测文件的所有文件块的情况下,也可以检测待检测文件是否为恶意文件的效果。
可选地,在本申请实施例提供的恶意文件的确定装置中,第二计算单元403包括:第一计算模块,设置为将每个子文件分成多个预设比特的数据块,计算每个预设比特的数据块中与顺序无关的哈希特征值;第二计算模块,设置为将每个预设比特的数据块中与顺序无关的哈希特征值进行累加和计算,得到每个子文件的与顺序无关的哈希特征值。
可选地,在本申请实施例提供的恶意文件的确定装置中,该装置还包括:填充单元,设置为在每个子文件分成的数据块中存在小于预设比特的数据块的情况下,则对小于预设比特的数据块进行填充,以使每个数据块是预设比特的数据块。
可选地,在本申请实施例提供的恶意文件的确定装置中,第二判断单元404包括:获取模块,设置为获取待检测文件的长度;查询模块,设置为基于待检测文件的长度、 首部文件块的哈希特征值和每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断待检测文件是否为恶意文件。
可选地,在本申请实施例提供的恶意文件的确定装置中,查询模块包括:确定模块,设置为在预设数据库中存储了与待检测文件的长度相同、首部文件块的哈希特征值相同和每个子文件的与顺序无关的哈希特征值相同的数据信息的情况下,则确定待检测文件为恶意文件。
可选地,在本申请实施例提供的恶意文件的确定装置中,该装置还包括:第三判断单元,设置为在判断接收到的多个文件块是否符合预设要求之前,判断待检测文件的大小;执行单元,设置为在待检测文件的大小超过预设大小的情况下,则执行判断接收到的多个文件块是否符合预设要求的步骤。
可选地,在本申请实施例提供的恶意文件的确定装置中,该装置应用于网络安全设备中的应用层协议无序传输中获取待检测文件的数据块的场景。
所述恶意文件的确定装置包括处理器和存储器,上述的第一判断单元401,第一计算单元402,第二计算单元403和第二判断单元404等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来在设备没有足够空间缓存待检测文件的所有文件块的情况下,也可以检测待检测文件是否为恶意文件的效果。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。
本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现所述恶意文件的确定方法。
本发明实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行所述恶意文件的确定方法。
本发明实施例提供了一种设备,设备包括处理器、存储器及存储在存储器上并可在处理器上运行的程序,处理器执行程序时实现以下步骤:判断接收到的多个文件块是否符合预设要求,其中,所述多个文件块是待检测文件的所有文件块,所述预设要求至少包括:所述多个文件块的大小要求和所述多个文件块的排序要求;若所述多个文件块不符合预设要求,则计算首部文件块的哈希特征值,其中,所述首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;将所述多个文件块划分为预设数量 的子文件,计算每个子文件的与顺序无关的哈希特征值;基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件。
处理器执行程序时还可以实现以下步骤:计算每个子文件的与顺序无关的哈希特征值包括:将每个子文件分成多个预设比特的数据块,计算每个预设比特的数据块中与顺序无关的哈希特征值;将每个预设比特的数据块中与顺序无关的哈希特征值进行累加和计算,得到每个子文件的与顺序无关的哈希特征值。
处理器执行程序时还可以实现以下步骤:所述方法还包括:若每个子文件分成的数据块中存在小于预设比特的数据块,则对所述小于预设比特的数据块进行填充,以使每个数据块是预设比特的数据块。
处理器执行程序时还可以实现以下步骤:基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件,包括:获取所述待检测文件的长度;基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件。
处理器执行程序时还可以实现以下步骤:基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件,包括:若预设数据库中存储了与待检测文件的长度相同、所述首部文件块的哈希特征值相同和所述每个子文件的与顺序无关的哈希特征值相同的数据信息,则确定所述待检测文件为恶意文件。
处理器执行程序时还可以实现以下步骤:在判断接收到的多个文件块是否符合预设要求之前,所述方法还包括:判断所述待检测文件的大小;若所述待检测文件的大小超过预设大小,则执行判断接收到的多个文件块是否符合预设要求的步骤。
处理器执行程序时还可以实现以下步骤:所述方法应用于网络安全设备中的应用层协议无序传输中获取所述待检测文件的数据块的场景。本文中的设备可以是服务器、PC、PAD、手机等。
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:判断接收到的多个文件块是否符合预设要求,其中,所述多个文件块是待检测文件的所有文件块,所述预设要求至少包括:所述多个文件块的大小要求和所述多个文件块的排序要求;若所述多个文件块不符合预设要求,则计算首部文件块的哈希特征值,其中,所述首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;将所述多个文件块划分为预设数量的子文件,计算每个子文件的 与顺序无关的哈希特征值;基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件。
当在数据处理设备上执行时,还适于执行初始化有如下方法步骤的程序:计算每个子文件的与顺序无关的哈希特征值包括:将每个子文件分成多个预设比特的数据块,计算每个预设比特的数据块中与顺序无关的哈希特征值;将每个预设比特的数据块中与顺序无关的哈希特征值进行累加和计算,得到每个子文件的与顺序无关的哈希特征值。
当在数据处理设备上执行时,还适于执行初始化有如下方法步骤的程序:所述方法还包括:若每个子文件分成的数据块中存在小于预设比特的数据块,则对所述小于预设比特的数据块进行填充,以使每个数据块是预设比特的数据块。
当在数据处理设备上执行时,还适于执行初始化有如下方法步骤的程序:基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件,包括:获取所述待检测文件的长度;基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件。
当在数据处理设备上执行时,还适于执行初始化有如下方法步骤的程序:基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件,包括:若预设数据库中存储了与待检测文件的长度相同、所述首部文件块的哈希特征值相同和所述每个子文件的与顺序无关的哈希特征值相同的数据信息,则确定所述待检测文件为恶意文件。
当在数据处理设备上执行时,还适于执行初始化有如下方法步骤的程序:在判断接收到的多个文件块是否符合预设要求之前,所述方法还包括:判断所述待检测文件的大小;若所述待检测文件的大小超过预设大小,则执行判断接收到的多个文件块是否符合预设要求的步骤。
当在数据处理设备上执行时,还适于执行初始化有如下方法步骤的程序:所述方法应用于网络安全设备中的应用层协议无序传输中获取所述待检测文件的数据块的场景。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等) 上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素, 而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (10)

  1. 一种恶意文件的确定方法,包括:
    判断接收到的多个文件块是否符合预设要求,其中,所述多个文件块是待检测文件的所有文件块,所述预设要求至少包括:所述多个文件块的大小要求和所述多个文件块的排序要求;
    若所述多个文件块不符合预设要求,则计算首部文件块的哈希特征值,其中,所述首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;
    将所述多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值;
    基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件。
  2. 根据权利要求1所述的方法,其中,计算每个子文件的与顺序无关的哈希特征值包括:
    将每个子文件分成多个预设比特的数据块,计算每个预设比特的数据块中与顺序无关的哈希特征值;
    将每个预设比特的数据块中与顺序无关的哈希特征值进行累加和计算,得到每个子文件的与顺序无关的哈希特征值。
  3. 根据权利要求2所述的方法,其中,所述方法还包括:
    若每个子文件分成的数据块中存在小于预设比特的数据块,则对所述小于预设比特的数据块进行填充,以使每个数据块是预设比特的数据块。
  4. 根据权利要求1所述的方法,其中,基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件,包括:
    获取所述待检测文件的长度;
    基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件。
  5. 根据权利要求4所述的方法,其中,基于待检测文件的长度、所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值在预设数据库中进行匹配查询,以判断所述待检测文件是否为恶意文件,包括:
    若预设数据库中存储了与待检测文件的长度相同、所述首部文件块的哈希特征值相同和所述每个子文件的与顺序无关的哈希特征值相同的数据信息,则确定所述待检测文件为恶意文件。
  6. 根据权利要求1所述的方法,其中,在判断接收到的多个文件块是否符合预设要求之前,所述方法还包括:
    判断所述待检测文件的大小;
    若所述待检测文件的大小超过预设大小,则执行判断接收到的多个文件块是否符合预设要求的步骤。
  7. 根据权利要求1所述的方法,其中,所述方法应用于网络安全设备中的应用层协议无序传输中获取所述待检测文件的数据块的场景。
  8. 一种恶意文件的确定装置,包括:
    第一判断单元,设置为判断接收到的多个文件块是否符合预设要求,其中,所述多个文件块是待检测文件的所有文件块,所述预设要求至少包括:所述多个文件块的大小要求和所述多个文件块的排序要求;
    第一计算单元,设置为在所述多个文件块不符合预设要求的情况下,则计算首部文件块的哈希特征值,其中,所述首部文件块是在设备缓存区中已按文件块的顺序缓存的文件块;
    第二计算单元,设置为将所述多个文件块划分为预设数量的子文件,计算每个子文件的与顺序无关的哈希特征值;
    第二判断单元,设置为基于所述首部文件块的哈希特征值和所述每个子文件的与顺序无关的哈希特征值,判断所述待检测文件是否为恶意文件。
  9. 一种非易失性存储介质,其中,所述存储介质包括存储的程序,其中,所述程序执行权利要求1至7中任意一项所述的恶意文件的确定方法。
  10. 一种处理器,其中,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至7中任意一项所述的恶意文件的确定方法。
PCT/CN2020/110318 2020-07-30 2020-08-20 恶意文件的确定方法及装置 WO2022021501A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/439,827 US20230153432A1 (en) 2020-07-30 2020-08-20 Method and Device for Determining Malicious File

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010754323.5 2020-07-30
CN202010754323.5A CN111881448B (zh) 2020-07-30 2020-07-30 恶意文件的确定方法及装置

Publications (1)

Publication Number Publication Date
WO2022021501A1 true WO2022021501A1 (zh) 2022-02-03

Family

ID=73204576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110318 WO2022021501A1 (zh) 2020-07-30 2020-08-20 恶意文件的确定方法及装置

Country Status (3)

Country Link
US (1) US20230153432A1 (zh)
CN (1) CN111881448B (zh)
WO (1) WO2022021501A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969732A (zh) * 2022-04-28 2022-08-30 国科华盾(北京)科技有限公司 恶意代码的检测方法、装置、计算机设备和存储介质
CN116401147A (zh) * 2023-02-08 2023-07-07 深圳开源互联网安全技术有限公司 一种函数库引用版本检测方法、设备及存储介质
CN116401147B (zh) * 2023-02-08 2024-05-03 深圳开源互联网安全技术有限公司 一种函数库引用版本检测方法、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113507433B (zh) * 2021-05-27 2023-04-07 新华三信息安全技术有限公司 一种数据检测方法及防火墙设备
CN113704761A (zh) * 2021-08-31 2021-11-26 上海观安信息技术股份有限公司 恶意文件的检测方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810247A (zh) * 2014-01-10 2014-05-21 国网信通亿力科技有限责任公司 基于分桶算法的灾备数据比对方法
US20150288707A1 (en) * 2012-12-21 2015-10-08 Huawei Technologies Co., Ltd. Virus Detecting Method and Device
CN110659484A (zh) * 2018-06-29 2020-01-07 卡巴斯基实验室股份制公司 生成对于文件信息的请求以执行防病毒扫描的系统和方法
CN110955893A (zh) * 2019-11-22 2020-04-03 杭州安恒信息技术股份有限公司 一种恶意文件威胁分析平台及恶意文件威胁分析方法
CN110995679A (zh) * 2019-11-22 2020-04-10 杭州迪普科技股份有限公司 一种文件数据流控制方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663281B (zh) * 2012-03-16 2015-03-18 华为数字技术(成都)有限公司 检测恶意软件的方法和装置
RU2634178C1 (ru) * 2016-10-10 2017-10-24 Акционерное общество "Лаборатория Касперского" Способ обнаружения вредоносных составных файлов
US20230273291A1 (en) * 2017-01-13 2023-08-31 Muhammed Zahid Ozturk Method, apparatus, and system for wireless monitoring with improved accuracy
CN106777388B (zh) * 2017-02-20 2020-11-24 华南理工大学 一种双重补偿的多表哈希图像检索方法
US10733294B2 (en) * 2017-09-11 2020-08-04 Intel Corporation Adversarial attack prevention and malware detection system
CN108566372A (zh) * 2018-03-01 2018-09-21 云易天成(北京)安全科技开发有限公司 基于哈希算法的文件信息防泄漏方法、介质及设备
CN110704383A (zh) * 2019-09-29 2020-01-17 广州视源电子科技股份有限公司 文件处理方法、计算机存储介质及相关设备
US20230155822A1 (en) * 2021-04-14 2023-05-18 Ahp-Tech Inc. Quantum-computing threats surveillance system and method for use in quantum communication environments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150288707A1 (en) * 2012-12-21 2015-10-08 Huawei Technologies Co., Ltd. Virus Detecting Method and Device
CN103810247A (zh) * 2014-01-10 2014-05-21 国网信通亿力科技有限责任公司 基于分桶算法的灾备数据比对方法
CN110659484A (zh) * 2018-06-29 2020-01-07 卡巴斯基实验室股份制公司 生成对于文件信息的请求以执行防病毒扫描的系统和方法
CN110955893A (zh) * 2019-11-22 2020-04-03 杭州安恒信息技术股份有限公司 一种恶意文件威胁分析平台及恶意文件威胁分析方法
CN110995679A (zh) * 2019-11-22 2020-04-10 杭州迪普科技股份有限公司 一种文件数据流控制方法、装置、设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969732A (zh) * 2022-04-28 2022-08-30 国科华盾(北京)科技有限公司 恶意代码的检测方法、装置、计算机设备和存储介质
CN114969732B (zh) * 2022-04-28 2023-04-07 国科华盾(北京)科技有限公司 恶意代码的检测方法、装置、计算机设备和存储介质
CN116401147A (zh) * 2023-02-08 2023-07-07 深圳开源互联网安全技术有限公司 一种函数库引用版本检测方法、设备及存储介质
CN116401147B (zh) * 2023-02-08 2024-05-03 深圳开源互联网安全技术有限公司 一种函数库引用版本检测方法、设备及存储介质

Also Published As

Publication number Publication date
CN111881448A (zh) 2020-11-03
CN111881448B (zh) 2022-10-14
US20230153432A1 (en) 2023-05-18

Similar Documents

Publication Publication Date Title
WO2022021501A1 (zh) 恶意文件的确定方法及装置
US10241682B2 (en) Dynamic caching module selection for optimized data deduplication
CN107040585B (zh) 一种业务校验的方法及装置
US8260801B2 (en) Method and system for parallel flow-awared pattern matching
US20210377188A1 (en) Hardware acceleration techniques using flow selection
US20160119198A1 (en) Deep Packet Inspection Method and Device, and Coprocessor
KR20190099053A (ko) 블록체인에서 블록 데이터를 검증하기 위한 방법 및 장치
US11704036B2 (en) Deduplication decision based on metrics
US10320695B2 (en) Message aggregation, combining and compression for efficient data communications in GPU-based clusters
WO2017107812A1 (zh) 一种用户日志存储方法及设备
US9215251B2 (en) Apparatus, systems, and methods for managing data security
WO2017107948A1 (zh) 文件的写聚合、读聚合方法及系统和客户端
CA2840450C (en) Method of hybrid message passing with shared memory
WO2020134703A1 (zh) 一种基于神经网络系统的图像处理方法及神经网络系统
US20170309298A1 (en) Digital fingerprint indexing
US20200311735A1 (en) Method, apparatus and storage medium for processing ethereum-based falsified transaction
US9747051B2 (en) Cluster-wide memory management using similarity-preserving signatures
JP2023158623A (ja) コード類似性に基づくファジーテスト方法、装置及び記憶媒体
CN112839055B (zh) 面向tls加密流量的网络应用识别方法、装置及电子设备
US10049113B2 (en) File scanning method and apparatus
CN114979236A (zh) 数据传输方法、装置、存储介质以及电子设备
US10606751B2 (en) Techniques for cache delivery
CN113918527B (zh) 一种基于文件缓存的调度方法、装置与计算设备
KR20240007582A (ko) Pim 장치 기반의 쿠쿠 해시 쿼리 방법, pim 장치 및 시스템
CN116069990A (zh) 检索数据的方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20947055

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20947055

Country of ref document: EP

Kind code of ref document: A1