CN110990897A - File fingerprint generation method and device - Google Patents

File fingerprint generation method and device Download PDF

Info

Publication number
CN110990897A
CN110990897A CN201911291250.4A CN201911291250A CN110990897A CN 110990897 A CN110990897 A CN 110990897A CN 201911291250 A CN201911291250 A CN 201911291250A CN 110990897 A CN110990897 A CN 110990897A
Authority
CN
China
Prior art keywords
fingerprint
file
main body
hash value
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911291250.4A
Other languages
Chinese (zh)
Inventor
陈德勇
尹家奇
邵燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wuyou Chuangxiang Information Technology Co Ltd
Original Assignee
Beijing Wuyou Chuangxiang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wuyou Chuangxiang Information Technology Co Ltd filed Critical Beijing Wuyou Chuangxiang Information Technology Co Ltd
Priority to CN201911291250.4A priority Critical patent/CN110990897A/en
Publication of CN110990897A publication Critical patent/CN110990897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a method and a device for generating a file fingerprint, wherein the method comprises the following steps: step S1, acquiring the type characteristics of the file, and taking the acquired type characteristic value as a pre-fingerprint; step S2, grouping the files, obtaining sampling file data according to a sampling point position formula, and obtaining a main body fingerprint hash value as a main body fingerprint by using a digest algorithm; step S3, counting the size of the file, converting the file size into a hash value as a post fingerprint; and step S4, splicing the front fingerprint, the main body fingerprint and the rear fingerprint to obtain a new file fingerprint.

Description

File fingerprint generation method and device
Technical Field
The invention relates to the technical field of file fingerprint algorithms, in particular to a file fingerprint generation method and device capable of meeting the scene of the requirement of rapidly calculating a file fingerprint of a large file.
Background
With the rapid development of storage technology and cloud service, the growth speed of data, particularly cloud data, is also doubled; the storage of mass data integrates a large number of various storage devices of different types in a network through application software to cooperatively work through functions of cluster application, a grid technology or a distributed file system and the like, and the functions of data storage and service access are provided for the outside together. Therefore, when the system is faced with a large data volume of a heterogeneous system, how to quickly compare and identify the content change of data and files and make corresponding feedback becomes a bottleneck for deploying large-scale services.
The existing file comparison method generally adopts a file fingerprint Algorithm MD5(Message-Digest Algorithm 5) to ensure that information transmission is complete and consistent. The MD5 file challenge algorithm is one of hash algorithms widely used by computers (abstract algorithm, hash algorithm), and the mainstream programming language generally has MD5 implementation, and operates data into a fixed length value, which is the basic principle of the hash algorithm, and includes the following steps: the MD5 processes the incoming information in 512-bit packets, each packet is divided into 16 32-bit sub-packets, and after a series of processing, the output of the algorithm consists of four 32-bit packets, and after the four 32-bit packets are concatenated, a 128-bit hash value is generated; the MD5 algorithm requires a round-robin operation on the message, the number of rounds being the number of 512-bit packets of information in the message.
As can be seen from the above MD5 algorithm steps, as the file content capacity becomes larger, the computation time and computation resources required by the MD5 algorithm will increase at a geometric speed, and for files with smaller capacity (below 1G), both the time and the computation resources required by the file fingerprint will occupy and still meet the usage requirement, but for large files (above 1G), the increase in the file capacity will result in the geometric speed increase of both the computation time and the computation resources, and for some scenarios requiring fast computation of the file fingerprint, the MD5 algorithm obviously cannot meet such scenario requirement. It can be seen that how to ensure that the computing time and computing resources for computing the file fingerprint steadily and limitedly rise under the condition of increasing the file capacity, rather than the geometric speed increase of the computing time and computing resources along with the increase of the capacity, is a problem to be solved urgently, and therefore, it is necessary to improve the MD5 file fingerprint algorithm to meet the demand scenario of fast computing the file fingerprint of a large file.
Disclosure of Invention
In order to overcome the defects of the prior art, the present invention provides a method and an apparatus for generating a file fingerprint, so as to increase the file fingerprint calculation speed and ensure the uniqueness of the file fingerprint.
In order to achieve the above object, the present invention provides a method for generating a file fingerprint, comprising the following steps:
step S1, acquiring the type characteristics of the file, and taking the acquired type characteristic value as a pre-fingerprint;
step S2, grouping the files, obtaining sampling file data according to a sampling point position formula, and obtaining a main body fingerprint hash value as a main body fingerprint by using a digest algorithm;
step S3, counting the size of the file, converting the file size into a hash value as a post fingerprint;
and step S4, splicing the front fingerprint, the main body fingerprint and the rear fingerprint to obtain a new file fingerprint.
Preferably, in step S1, extracting several bits of data of the file header of the file as the type feature of the file, expanding the numerical range of the type feature value, and taking the obtained type feature value as the pre-fingerprint;
preferably, in step S1, the first 32 bits of data in the file header of the file are extracted as the type feature of the file, and a 32-bit hash value is obtained by expanding the numerical range of the type feature value through hash calculation as the pre-fingerprint.
Preferably, in step S1, the hash formula used in the hash calculation is:
Figure BDA0002319121910000021
wherein s is a 32-bit characteristic value array of the file type characteristic, n is the length of the characteristic value array, and n is 32.
Preferably, in step S2, the files are grouped according to 64K size, sample file data is obtained through the sample point position formula, and a digest algorithm is used to obtain a 128-bit hash value of the main body fingerprint as the main body fingerprint.
Preferably, the sampling point position formula is as follows:
Figure BDA0002319121910000031
where i is the ith sample.
Preferably, in step S3, the file capacity of the file is obtained and converted into a 32-bit binary representation as the post-fingerprint.
Preferably, in step S4, the 32-bit hash value obtained in step S1 is used as a front fingerprint, the 128-bit hash value obtained in step S2 is used as a main fingerprint, and the 32-bit hash value obtained in step S3 is used as a back fingerprint to be spliced to obtain a new 192-bit hash value, so as to obtain a new file fingerprint.
Preferably, after step S4, the method further includes the following steps:
step S5, the equivalent comparison is performed on the file fingerprints generated by different files, and the files with the same file fingerprint hash value are determined to be the same file, and the files with different file fingerprint hash values are determined not to be the same file.
In order to achieve the above object, the present invention further provides a file fingerprint generating device, including:
the preposed fingerprint generating unit is used for acquiring the type characteristics of the file and taking the obtained type characteristic value as a preposed fingerprint;
the main body fingerprint generating unit is used for grouping the files, obtaining sampling file data according to a sampling point position formula and obtaining a main body fingerprint hashed value as a main body fingerprint by using a digest algorithm;
the post fingerprint generating unit is used for counting the size of the file, converting the file into a hash value and taking the hash value as a post fingerprint;
and the splicing unit is used for splicing the front fingerprint, the main body fingerprint and the rear fingerprint to obtain a new file fingerprint.
Compared with the prior art, the method and the device for generating the file fingerprint have the advantages that the type characteristics of the file are obtained, the obtained type characteristic value is used as the front fingerprint, the file is grouped, the data of the sampled file is obtained according to a sampling point position formula, the hash value of the main fingerprint is obtained by using a summary algorithm and is used as the main fingerprint, the size of the file is counted and converted into the hash value to be used as the rear fingerprint, and finally the obtained front fingerprint, the main fingerprint and the rear fingerprint are spliced to obtain the new file fingerprint.
Drawings
FIG. 1 is a flowchart illustrating steps of a method for generating a file fingerprint according to the present invention;
FIG. 2 is a system architecture diagram of a document fingerprint generation apparatus according to the present invention;
FIG. 3 is a flowchart of a process for generating a file fingerprint according to an embodiment of the present invention.
Detailed Description
Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.
FIG. 1 is a flowchart illustrating steps of a method for generating a file fingerprint according to the present invention. As shown in fig. 1, the method for generating a file fingerprint of the present invention includes the following steps:
and step S1, extracting a plurality of bits of data of the file header of the file as the type characteristics of the file, expanding the numerical range of the type characteristic value, and taking the obtained type characteristic value as the pre-fingerprint. In the embodiment of the invention, the first 32 bits of data of the file header are extracted as the type characteristics of the file, the numerical range of the type characteristic value is expanded through hash calculation to obtain the 32-bit hash value as the preposed fingerprint, and the numerical range of the type characteristic value is expanded through hash calculation by extracting the first 32 bits of data of the file header as the type characteristics of the file, so that the possibility of characteristic collision under a larger value range is reduced.
In the embodiment of the present invention, the hash formula used in the hash calculation is:
s[0]*31^(n-1)+s[1]*31^(n-2)+s[2]*31^(n-3)+...+s[n-1]
the above hash formula is simplified as:
Figure BDA0002319121910000051
where s is a 32-bit eigenvalue array of the file type characteristic, and n is the length of the eigenvalue array, in this embodiment, n is 32.
The reason for selecting 31 as the hash multiplier in the present invention is as follows:
(1) the prime number can effectively reduce the collision rate of the hash algorithm during hash calculation.
(2) Odd numbers may retain more information than even numbers when hash calculations are performed. Since a multiple of 2 corresponds to a shift operation, if an even number is selected, overflow occurs in the multiplication operation, resulting in loss of numerical information.
(3)31 has a good property in the case of satisfying prime numbers and odd numbers, that is, the multiplication operation of 31 can be simplified into shift and subtraction operations to obtain better performance, as an example of multiplying any positive integer by 31:
31 m equivalent to (32-1) m equivalent to 32 m-m equivalent to (m <5) -m
And step S2, grouping the files, obtaining the sampled file data through a sampling formula, and obtaining the hash value of the main body fingerprint as the main body fingerprint by using a digest algorithm MD 5. In the embodiment of the invention, files are grouped according to 64K size, sampling file data is obtained through a sampling formula, and a digest algorithm MD5 is used for obtaining a 128-bit hash value of a main body fingerprint as the main body fingerprint.
Specifically, in step S2, the files are grouped according to 64K size, the sampling position information is calculated by the following sampling formula, the sampled information is re-spliced into complete information, the spliced value is calculated by the MD5 algorithm as the main body fingerprint, and the size of the main body fingerprint is 128 bits according to the MD5 characteristic.
In the embodiment of the present invention, the sampling point position is calculated by the following sampling formula:
Figure BDA0002319121910000052
where i is the ith sample.
And obtaining a sampling rule of the sampling file data according to the sampling position calculated by the sampling formula, and checking the large-capacity file in an incremental mode. The efficiency and accuracy are balanced by the fact that the subsequent sampling interval is correspondingly increased when the number of samples of the large-capacity file is larger.
And step S3, counting the size of the file, and converting the file into a hash value with the corresponding digits of the pre-fingerprint as the post-fingerprint. Specifically, the file size is obtained and converted into a 32-bit binary representation as the post-fingerprint.
And step S4, splicing the front fingerprint, the main body fingerprint and the rear fingerprint to obtain a new file fingerprint. In the embodiment of the present invention, the 32-bit hash value obtained in step S1 is used as a pre-fingerprint, the 128-bit hash value obtained in step S2 is used as a main fingerprint, and the 32-bit hash value obtained in step S3 is used as a post-fingerprint to be spliced to obtain a new 192-bit hash value, i.e., a new file fingerprint is obtained.
Preferably, after step S5, the method for generating a file fingerprint according to the present invention further includes the following steps:
in step S5, the file fingerprints (192-bit hash values in the embodiment of the present invention) generated by different files are compared with each other to determine that the files with the identical hash values are the same file, and that the files with different hash values are not the same file.
Fig. 2 is a system architecture diagram of a file fingerprint generation apparatus according to the present invention. As shown in fig. 2, the present invention provides a file fingerprint generation apparatus, including:
the pre-fingerprint generating unit 201 is configured to extract data of a plurality of bits of a file header of the file as a type feature of the file, expand a numerical range of a type feature value, and use the obtained type feature value as a pre-fingerprint. In the embodiment of the present invention, the pre-fingerprint generating unit 201 extracts the first 32 bits of data of the file header as the type feature of the file, expands the numerical range of the type feature value by hash calculation to obtain the 32 bits of hash value, and as the pre-fingerprint, the present invention expands the numerical range of the type feature value by hash calculation by extracting the first 32 bits of data of the file header as the type feature of the file, thereby reducing the possibility of feature collision in a larger value range.
In the embodiment of the present invention, the hash formula used in the hash calculation is:
s[0]*31^(n-1)+s[1]*31^(n-2)+s[2]*31^(n-3)+...+s[n-1]
the above hash formula is simplified as:
Figure BDA0002319121910000071
where s is a 32-bit eigenvalue array of the file type characteristic, and n is the length of the eigenvalue array, in this embodiment, n is 32.
The reason for selecting 31 as the hash multiplier in the present invention is as follows:
(1) the prime number can effectively reduce the collision rate of the hash algorithm during hash calculation.
(2) Odd numbers may retain more information than even numbers when hash calculations are performed. Since a multiple of 2 corresponds to a shift operation, if an even number is selected, overflow occurs in the multiplication operation, resulting in loss of numerical information.
(3)31 has a good property in the case of satisfying prime numbers and odd numbers, that is, the multiplication operation of 31 can be simplified into shift and subtraction operations to obtain better performance, as an example of multiplying any positive integer by 31:
31 m equivalent to (32-1) m equivalent to 32 m-m equivalent to (m <5) -m
And a main body fingerprint generating unit 202, configured to group files, obtain sampled file data through a sampling formula, and obtain a main body fingerprint hash value as a main body fingerprint by using a digest algorithm MD 5. In an embodiment of the present invention, the main body fingerprint generation unit 202 groups files according to 64K size, obtains sampled file data by a sampling formula, and obtains 128-bit hash value of the main body fingerprint as the main body fingerprint by using the digest algorithm MD 5.
Specifically, the main body fingerprint generation unit 202 groups files according to 64K size, calculates sampling position information by the following sampling formula, re-splices the sampled information into complete information, calculates the spliced value as a main body fingerprint by the MD5 algorithm, and has a main body fingerprint size of 128 bits according to the MD5 characteristic.
In the embodiment of the present invention, the sampling point position is calculated by the following sampling formula:
Figure BDA0002319121910000081
where i is the ith sample.
And obtaining a sampling rule of the sampling file data according to the sampling position calculated by the sampling formula, and checking the large-capacity file in an incremental mode. The efficiency and accuracy are balanced by the fact that the subsequent sampling interval is correspondingly increased when the number of samples of the large-capacity file is larger.
The post fingerprint generating unit 203 is used for counting the size of the file, and converting the file into a hash value with the corresponding digits of the pre-fingerprint as the post fingerprint. Specifically, the post-fingerprint generating unit 203 acquires the file capacity, and converts it into a 32-bit binary representation as the post-fingerprint.
And the splicing unit 204 is used for splicing the front fingerprints, the main body fingerprints and the rear fingerprints to obtain new file fingerprints. In the embodiment of the present invention, the 32-bit hash value obtained by the pre-fingerprint generating unit 201 is used as the pre-fingerprint, the 128-bit hash value obtained by the main fingerprint generating unit 202 is used as the main fingerprint, and the 32-bit hash value obtained by the post-fingerprint generating unit 203 is used as the post-fingerprint, and the new 192-bit hash value is obtained by splicing, so as to obtain the new file fingerprint.
Preferably, the file fingerprint generating device of the present invention further includes:
the comparison unit is used for comparing equivalent values of file fingerprints (192-bit hash values in the embodiment of the invention) generated by different files, so that files with completely the same hash value can be judged to be the same file, and files with different hash values can be judged to be different files.
Examples
As shown in fig. 3, in this embodiment, the present invention is used for generating a file fingerprint of a large file, specifically, a file fingerprint generation process of a large file to be calculated is as follows:
step 1, extracting the data of the front 32 bits of the file header of the large file to be calculated as the type characteristic of the file, expanding the numerical range of the type characteristic value through hash calculation, thereby reducing the possibility of characteristic collision under a larger value range, and taking the type characteristic value obtained through the hash calculation as the preposed fingerprint.
And 2, grouping the files according to the size of 64K, calculating sampling position information through the following sampling position formula, splicing the sampled information into complete information again, calculating a spliced numerical value through an MD5 algorithm to serve as a main body fingerprint, and obtaining the size of the main body fingerprint of 128 bits according to the MD5 characteristic.
The sampling point position formula is as follows:
Figure BDA0002319121910000091
where i is the ith sample.
And 3, acquiring the file capacity, and converting the file capacity into a 32-bit binary representation mode to be used as the post fingerprint.
And 4, splicing the obtained 32-bit front fingerprints, 128-bit main body fingerprints and 32-bit rear fingerprints to obtain a new 192-bit file fingerprint.
The 1GB file is used as a test material, and the test material is compared by adopting an MD5 algorithm and the calculation time used by the invention, and the comparison result is as follows:
Figure BDA0002319121910000092
it can be seen that with the present invention, the overall process consumes 1/10 of the MD5 digest algorithm, which overall takes an order of magnitude less computational time.
In summary, the method and the device for generating the file fingerprint of the invention obtain the type characteristic value of the file as the pre-fingerprint, group the file, obtain the sampling file data according to the sampling point position formula, obtain the hash value of the main fingerprint as the main fingerprint by using the abstract algorithm, count the size of the file, convert the hash value into the post-fingerprint, and finally splice the obtained pre-fingerprint, the main fingerprint and the post-fingerprint to obtain the new file fingerprint.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims (10)

1. A file fingerprint generation method comprises the following steps:
step S1, acquiring the type characteristics of the file, and taking the acquired type characteristic value as a pre-fingerprint;
step S2, grouping the files, obtaining sampling file data according to a sampling point position formula, and obtaining a main body fingerprint hash value as a main body fingerprint by using a digest algorithm;
step S3, counting the size of the file, converting the file size into a hash value as a post fingerprint;
and step S4, splicing the front fingerprint, the main body fingerprint and the rear fingerprint to obtain a new file fingerprint.
2. A method of generating a file fingerprint according to claim 1, wherein: in step S1, a plurality of bits of data of the file header of the file are extracted as the type feature of the file, the range of the type feature value is expanded, and the obtained type feature value is used as the pre-fingerprint.
3. A method of generating a file fingerprint according to claim 2, wherein: in step S1, the first 32 bits of data of the file header of the file are extracted as the type feature of the file, and the numerical range of the type feature value is expanded by hash calculation to obtain a 32-bit hash value as the pre-fingerprint.
4. A method for generating a file fingerprint according to claim 3, wherein in step S1, the hash formula used in the hash calculation is:
Figure FDA0002319121900000011
wherein s is a 32-bit characteristic value array of the file type characteristic, n is the length of the characteristic value array, and n is 32.
5. A method of generating a file fingerprint according to claim 2, wherein: in step S2, the files are grouped according to 64K size, sample file data is obtained through the sample point position formula, and a digest algorithm is used to obtain a 128-bit hash value of the main body fingerprint as the main body fingerprint.
6. The method for generating a file fingerprint according to claim 5, wherein the sampling point position formula is as follows:
Figure FDA0002319121900000012
where i is the ith sample.
7. A method for generating a file fingerprint according to claim 5, wherein: in step S3, the file capacity of the file is obtained and converted into 32-bit binary representation as the post-fingerprint.
8. A method for generating a file fingerprint according to claim 7, wherein: in step S4, the 32-bit hash value obtained in step S1 is used as the front fingerprint, the 128-bit hash value obtained in step S2 is used as the main fingerprint, and the 32-bit hash value obtained in step S3 is used as the back fingerprint to be spliced to obtain a new 192-bit hash value, so as to obtain a new file fingerprint.
9. The method for generating a file fingerprint according to claim 8, further comprising the following steps after step S4:
step S5, the equivalent comparison is performed on the file fingerprints generated by different files, and the files with the same file fingerprint hash value are determined to be the same file, and the files with different file fingerprint hash values are determined not to be the same file.
10. An apparatus for generating a file fingerprint, comprising:
the preposed fingerprint generating unit is used for acquiring the type characteristics of the file and taking the obtained type characteristic value as a preposed fingerprint;
the main body fingerprint generating unit is used for grouping the files, obtaining sampling file data according to a sampling point position formula and obtaining a main body fingerprint hashed value as a main body fingerprint by using a digest algorithm;
the post fingerprint generating unit is used for counting the size of the file, converting the file into a hash value and taking the hash value as a post fingerprint;
and the splicing unit is used for splicing the front fingerprint, the main body fingerprint and the rear fingerprint to obtain a new file fingerprint.
CN201911291250.4A 2019-12-16 2019-12-16 File fingerprint generation method and device Pending CN110990897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911291250.4A CN110990897A (en) 2019-12-16 2019-12-16 File fingerprint generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911291250.4A CN110990897A (en) 2019-12-16 2019-12-16 File fingerprint generation method and device

Publications (1)

Publication Number Publication Date
CN110990897A true CN110990897A (en) 2020-04-10

Family

ID=70093787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911291250.4A Pending CN110990897A (en) 2019-12-16 2019-12-16 File fingerprint generation method and device

Country Status (1)

Country Link
CN (1) CN110990897A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214549A (en) * 2020-10-10 2021-01-12 中育数据(广州)科技有限公司 File feature code generation method and device and electronic equipment
CN114745348A (en) * 2022-05-26 2022-07-12 北京中睿天下信息技术有限公司 Mail fingerprint extraction method and system
CN114827309A (en) * 2022-04-19 2022-07-29 深信服科技股份有限公司 Equipment fingerprint generation method, device, equipment and readable storage medium
CN112214549B (en) * 2020-10-10 2024-06-04 中育数据(广州)科技有限公司 File feature code generation method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945241A (en) * 2011-10-28 2013-02-27 新游游戏株式会社 Hash data structure used for file comparison,hash comparison system and method
CN103699610A (en) * 2013-12-13 2014-04-02 乐视网信息技术(北京)股份有限公司 Method for generating file verification information, file verifying method and file verifying equipment
KR101667756B1 (en) * 2015-11-04 2016-10-19 한림대학교 산학협력단 Archive file de-duplication apparatus and method
CN108733843A (en) * 2018-05-29 2018-11-02 厦门市美亚柏科信息股份有限公司 File test method based on hash algorithm and sample Hash library generating method
CN110058952A (en) * 2018-01-18 2019-07-26 株洲中车时代电气股份有限公司 A kind of method of calibration and system of files in embedded equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945241A (en) * 2011-10-28 2013-02-27 新游游戏株式会社 Hash data structure used for file comparison,hash comparison system and method
CN103699610A (en) * 2013-12-13 2014-04-02 乐视网信息技术(北京)股份有限公司 Method for generating file verification information, file verifying method and file verifying equipment
KR101667756B1 (en) * 2015-11-04 2016-10-19 한림대학교 산학협력단 Archive file de-duplication apparatus and method
CN110058952A (en) * 2018-01-18 2019-07-26 株洲中车时代电气股份有限公司 A kind of method of calibration and system of files in embedded equipment
CN108733843A (en) * 2018-05-29 2018-11-02 厦门市美亚柏科信息股份有限公司 File test method based on hash algorithm and sample Hash library generating method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214549A (en) * 2020-10-10 2021-01-12 中育数据(广州)科技有限公司 File feature code generation method and device and electronic equipment
CN112214549B (en) * 2020-10-10 2024-06-04 中育数据(广州)科技有限公司 File feature code generation method and device and electronic equipment
CN114827309A (en) * 2022-04-19 2022-07-29 深信服科技股份有限公司 Equipment fingerprint generation method, device, equipment and readable storage medium
CN114827309B (en) * 2022-04-19 2024-02-23 深信服科技股份有限公司 Equipment fingerprint generation method, device, equipment and readable storage medium
CN114745348A (en) * 2022-05-26 2022-07-12 北京中睿天下信息技术有限公司 Mail fingerprint extraction method and system

Similar Documents

Publication Publication Date Title
US10218598B2 (en) Automatic parsing of binary-based application protocols using network traffic
CN113206860B (en) DRDoS attack detection method based on machine learning and feature selection
CN109255057B (en) Block generation method, device, equipment and storage medium
TWI437850B (en) A network flow abnormality detection system and a method of the same
CN110990897A (en) File fingerprint generation method and device
RU2292074C2 (en) Method and device for forming of initial value of pseudorandom number generator
CN103888449A (en) Method and device for packet reassembly
CN112486914B (en) Data packet storage and quick-checking method and system
CN111782700B (en) Data stream frequency estimation method, system and medium based on double-layer structure
Liu et al. Sf-sketch: a two-stage sketch for data streams
CN102546293B (en) High speed network flow network address measuring method based on Hash bit string multiplexing
CN111355671B (en) Network traffic classification method, medium and terminal equipment based on self-attention mechanism
JP2023546687A (en) Code similarity search
Tang et al. Towards memory-efficient streaming processing with counter-cascading sketching on FPGA
EP3926453A1 (en) Partitioning method and apparatus therefor
CN115952517A (en) Method and system for calculating hash value
CN110401451A (en) Automatic machine space compression method and system based on character set transformation
CN113452783B (en) Digital PAAS open platform system of block chain cloud architecture and implementation method
WO2015186646A1 (en) System and method for pairwise distance computation
US11748255B1 (en) Method for searching free blocks in bitmap data, and related components
CN113238711B (en) Efficient hash calculation method in field of electronic data evidence obtaining
CN112636915B (en) Batch signature verification method, device, equipment and medium based on SM2 cryptographic algorithm
CN112333155B (en) Abnormal flow detection method and system, electronic equipment and storage medium
CN114710444A (en) Data center flow statistical method and system based on tower abstract and evictable flow table
CN112200322A (en) Application management system and method of quantum random number

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410