CN110096483A - A kind of duplicate file detection method, terminal and server - Google Patents
A kind of duplicate file detection method, terminal and server Download PDFInfo
- Publication number
- CN110096483A CN110096483A CN201910380465.7A CN201910380465A CN110096483A CN 110096483 A CN110096483 A CN 110096483A CN 201910380465 A CN201910380465 A CN 201910380465A CN 110096483 A CN110096483 A CN 110096483A
- Authority
- CN
- China
- Prior art keywords
- file
- processed
- hash value
- server
- terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/137—Hash-based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/61—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention provides a kind of duplicate file detection methods, terminal and server, method includes: when needing to be uploaded to the file to be processed of server to server transmission user, terminal obtains the size of file to be processed, detect target value section belonging to the size of file to be processed, according to the corresponding file hash value calculation in target value section, calculate the hash value of file to be processed, and the transmission information of the hash value comprising file to be processed is sent to server, server is according to transmission information, determine whether file to be located is duplicate file, and response results are sent to terminal, response results include the information that file to be processed is duplicate file or the information that file to be processed is non-duplicate file.Based on above-mentioned processing, server withouts waiting for file to be processed, and all transmission terminates, so that it may obtain the hash value of file to be processed, in turn, server can determine whether file to be processed is duplicate file earlier.
Description
Technical field
The present invention relates to technical field of the computer network, more particularly to a kind of duplicate file detection method, terminal kimonos
Business device.
Background technique
With the fast development of computer networking technology, user not only can very easily be watched by video terminal online
The video oneself liked can also upload to video server by oneself shooting or by the video that other approach are got, with
Just the video sharing oneself uploaded is watched to other users.As server receives the video etc. that more and more users upload
File, these files can inevitably repeat.In order to avoid storing duplicate file, server needs to carry out the file that user uploads
It verifies one by one, to determine whether for duplicate file.
Therefore, to avoid storage duplicate file, the prior art has gone up transmitting file after the completion of file uploads, by calculating
Hash (Hash) value, and the hash value is compared with the hash value of storage file, come judge on this transmitting file whether be
Duplicate file.
However, inventor has found in the implementation of the present invention, at least there are the following problems for the prior art: the prior art
By calculate go up the hash value of transmitting file judge its whether be duplicate file process, cannot detect what user uploaded in time
Whether file is duplicate file.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of duplicate file detection method, terminal and server, can examine in time
Survey whether the file that user uploads is duplicate file.Specific technical solution is as follows:
In a first aspect, in order to achieve the above object, it is described the embodiment of the invention discloses a kind of duplicate file detection method
Method includes:
Terminal obtains the file to be processed that user needs to be uploaded to server;
The terminal obtains the size of the file to be processed when sending the file to be processed to the server;
The terminal detects target value section belonging to the size of the file to be processed, wherein different numerical value areas
Between respectively correspond different file Hash hash value calculations;
The terminal calculates the text to be processed according to the corresponding file hash value calculation in the target value section
The hash value of part;
The terminal sends the transmission information of the hash value comprising the file to be processed to the server;
The terminal receives the server for the response results for sending information, wherein the response results packet
Containing the information that the file to be processed is duplicate file or information that the file to be processed is non-duplicate file.
Optionally, the terminal is according to the corresponding file hash value calculation in the target value section, described in calculating
The hash value of file to be processed, comprising:
The terminal is according to the corresponding file hash value calculation in the target value section, to the file to be processed
The data for including are handled, and hash value to be selected is obtained;
The hash value for calculating the data of the size comprising the hash value to be selected and the file to be processed, will be calculated
Hash value of the hash value as the file to be processed.
Optionally, the target value section is (0, A);The terminal is according to the corresponding file in the target value section
Hash value calculation, the data for including to the file to be processed are handled, and obtain hash value to be selected, comprising:
The terminal calculates the full dose hash value of the file to be processed, and using the full dose hash value as hash to be selected
Value.
Optionally, the target value section be [A, B), wherein B > A;The terminal is according to the target value section
Corresponding file hash value calculation, the data for including to the file to be processed are handled, and obtain hash value to be selected, packet
It includes:
The terminal calculates the hash value that the data of head and default tail portion are preset comprising the file to be processed, and will meter
Obtained hash value is as the hash value to be selected.
Optionally, the target value section be [B ,+∞);The terminal is according to the corresponding text in the target value section
Part hash value calculation, the data for including to the file to be processed are handled, and obtain hash value to be selected, comprising:
The terminal calculates the data hash that head, default tail portion and default middle part are preset comprising the file to be processed
Value, and using the hash value being calculated as the hash value to be selected.
Optionally, described to send the size that information further includes the file to be processed.
Second aspect, it is in order to achieve the above object, described the embodiment of the invention discloses a kind of duplicate file detection method
Method includes:
The transmission information for the hash value comprising file to be processed that server receiving terminal is sent, wherein the transmission letter
Breath is the terminal when sending the file to be processed to the server, what Xiang Suoshu server was sent;
The server determines whether the file to be located is duplicate file according to the transmission information;
The server sends response results to the terminal, wherein the response results include the file to be processed
It is the information of non-duplicate file for the information of duplicate file or the file to be processed.
Optionally, the server determines whether the file to be located is duplicate file, packet according to the transmission information
It includes:
In the hash value of the local each storage file of the server detection, if exist and the file to be processed
The identical hash value of hash value;
If there is the identical hash of hash value with the file to be processed in the hash value of local each storage file
Value determines that the file to be located is duplicate file;
If there is no identical with the hash value of the file to be processed in the hash value of local each storage file
Hash value determines that the file to be processed is non-duplicate file.
Optionally, described to send the size that information further includes the file to be processed;
The server determines whether the file to be located is duplicate file according to the transmission information, comprising:
The server according to the size of the file to be processed determine the file to be processed belonging to target value area
Between;
The server detects in the hash value of the corresponding each storage file in the target value section, if exist with
The identical hash value of the hash value of the file to be processed;
If existed and the file to be processed in the hash value of the corresponding each storage file in the target value section
The identical hash value of hash value, determine that the file to be located is duplicate file;
If in the hash value of the corresponding each storage file in the target value section, be not present and the text to be processed
The identical hash value of the hash value of part determines that the file to be processed is non-duplicate file.
The third aspect, in order to achieve the above object, the embodiment of the invention discloses a kind of terminal, the terminal includes: to receive
Send out device and processor;
The transceiver needs to be uploaded to the file to be processed of server for obtaining user;It is sent out to the server
When sending the file to be processed, the size of the file to be processed is obtained;
The processor, for detecting target value section belonging to the size of the file to be processed, wherein different
Numerical intervals respectively correspond different file Hash hash value calculations;According to the corresponding file in the target value section
Hash value calculation calculates the hash value of the file to be processed;
The transceiver is also used to send the transmission letter of the hash value comprising the file to be processed to the server
Breath;The server is received for the response results for sending information, wherein the response results include the text to be processed
Part be duplicate file information or the file to be processed be non-duplicate file information.
Optionally, the processor is specifically used for according to the corresponding file hash value calculating side in the target value section
Formula, the data for including to the file to be processed are handled, and obtain hash value to be selected;Calculate comprising the hash value to be selected and
The hash value of the data of the size of the file to be processed, using the hash value being calculated as the hash of the file to be processed
Value.
Optionally, the target value section is (0, A);
The processor, specifically for calculating the full dose hash value of the file to be processed, and by the full dose hash value
As hash value to be selected.
Optionally, the target value section be [A, B), wherein B > A;
The processor, specifically for calculating the data comprising the default head of the file to be processed and default tail portion
Hash value, and using the hash value being calculated as the hash value to be selected.
Optionally, the target value section be [B ,+∞);
The processor is specifically used for calculating and presets head, default tail portion and default middle part comprising the file to be processed
Data hash value, and using the hash value being calculated as the hash value to be selected.
Optionally, described to send the size that information further includes the file to be processed.
Fourth aspect, in order to achieve the above object, the embodiment of the invention discloses a kind of server, the server packet
It includes: transceiver and processor;
The transceiver, the transmission information of the hash value comprising file to be processed for receiving terminal transmission, wherein institute
Stating and sending information is the terminal when sending the file to be processed to the server, what Xiang Suoshu server was sent;
The processor, for determining whether the file to be located is duplicate file according to the transmission information;
The transceiver is also used to send response results to the terminal, wherein the response results include described wait locate
Manage the information that file is duplicate file or the information that the file to be processed is non-duplicate file.
Optionally, the processor, specifically in the hash value of the local each storage file of detection, if exist and institute
State the identical hash value of hash value of file to be processed;If in the hash value of local each storage file, exist with it is described to
The identical hash value of hash value of file is handled, determines that the file to be located is duplicate file;If local each storage file
Hash value in, there is no hash value identical with the hash value of the file to be processed, determine that the file to be processed is non-
Duplicate file.
Optionally, described to send the size that information further includes the file to be processed;
The processor, specifically for according to the size of the file to be processed determine the file to be processed belonging to mesh
Mark numerical intervals;In the hash value for detecting the corresponding each storage file in the target value section, if exist with it is described to
Handle the identical hash value of hash value of file;If the hash value of the corresponding each storage file in the target value section
In, there is the identical hash value with the hash value of the file to be processed, determines that the file to be located is duplicate file;If institute
In the hash value for stating the corresponding each storage file in target value section, there is no identical as the hash value of the file to be processed
Hash value, determine the file to be processed be non-duplicate file.
At the another aspect that the present invention is implemented, a kind of duplicate file detection system is additionally provided, the system comprises terminals
And server;
The terminal needs to be uploaded to the file to be processed of server for obtaining user;It is sent to the server
When the file to be processed, the size of the file to be processed is obtained;Detect target belonging to the size of the file to be processed
Numerical intervals, wherein different numerical intervals respectively correspond different file Hash hash value calculations;According to the target
The corresponding file hash value calculation of numerical intervals calculates the hash value of the file to be processed;It is sent to the server
The transmission information of hash value comprising the file to be processed;
The server, for receiving the transmission information for the hash value comprising file to be processed that the terminal is sent;Root
According to the transmission information, determine whether the file to be located is duplicate file;Response results are sent to the terminal, wherein institute
It is the information of duplicate file or the file to be processed is non-duplicate file that state response results, which include the file to be processed,
Information.
The terminal is also used to receive the server for the response results for sending information.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable
It is stored with instruction in storage medium, when run on a computer, appoints described in above-mentioned first aspect so that computer executes
One duplicate file detection method.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable
It is stored with instruction in storage medium, when run on a computer, appoints described in above-mentioned second aspect so that computer executes
One duplicate file detection method.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced
Product, when run on a computer, so that computer executes any duplicate file detection method described in above-mentioned first aspect.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced
Product, when run on a computer, so that computer executes any duplicate file detection method described in above-mentioned second aspect.
The embodiment of the invention provides a kind of duplicate file detection methods, need to be uploaded to clothes sending user to server
When the file to be processed of business device, the size of the available file to be processed of terminal detects mesh belonging to the size of file to be processed
It marks numerical intervals and calculates the hash value of file to be processed according to the corresponding file hash value calculation in target value section, and
The transmission information of the hash value comprising file to be processed is sent to server, server determines file to be located according to information is sent
Whether be duplicate file, and to terminal send response results, response results include file to be processed be duplicate file information or
Person's file to be processed is the information of non-duplicate file.Based on above-mentioned processing, server withouts waiting for file to be processed and all passes
Send end, so that it may obtain the hash value of file to be processed, in turn, server can determine earlier file to be processed whether be
Duplicate file.
Certainly, implement any of the products of the present invention or method it is not absolutely required at the same reach all the above excellent
Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of flow chart of duplicate file detection method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of duplicate file detection method provided in an embodiment of the present invention;
Fig. 3 is a kind of structure chart of terminal provided in an embodiment of the present invention;
Fig. 4 is a kind of structure chart of server provided in an embodiment of the present invention;
Fig. 5 is a kind of structure chart of duplicate file detection system provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
The prior art by calculate gone up the hash value of transmitting file judge its whether be duplicate file process, Bu Nengji
When detection user upload file whether be duplicate file.
To solve the above-mentioned problems, the present invention provides a kind of duplicate file detection method, and this method can be respectively applied to
Terminal and server, terminal and server network intercommunication, terminal can be browser or other terminals.
The available user of terminal needs to be uploaded to the file to be processed of server, and sends text to be processed to server
Part.When terminal to server sends file to be processed, terminal can also obtain the size of file to be processed, and detect to be processed
Target value section belonging to the size of file, then, terminal can be according to the corresponding file hash value meters in target value section
Calculation mode calculates the hash value of file to be processed, and the transmission information of the hash value comprising file to be processed is sent to server.
Server then can receive the transmission information of the hash value comprising file to be processed of terminal transmission, and according to transmission
Information determines whether file to be located is duplicate file, and then, server can send response results to terminal, wherein response knot
Fruit includes the information that file to be processed is duplicate file or the information that file to be processed is non-duplicate file.
Based on above-mentioned processing, when sending file to be processed to server, terminal can also send to be processed to server
The hash value of file, in turn, server can determine whether file to be processed is duplicate file earlier.
It is described in detail below with specific embodiment to the present invention.
Referring to Fig. 1, Fig. 1 is a kind of flow chart of duplicate file detection method provided in an embodiment of the present invention, and this method can
To be applied to terminal, this method be may comprise steps of:
S101: terminal obtains the file to be processed that user needs to be uploaded to server.
Wherein, file to be processed can be the Internet resources of arbitrary format, for example, file to be processed can be video text
Part, or audio file can also be the files such as the installation kit of application program.File to be processed can be one, can also
To be multiple.If file to be processed be it is multiple, terminal can be with duplicate file detection method according to the present invention, successively to every
One file to be processed is handled.
The available user of terminal needs to be uploaded to the file (file i.e. to be processed) of server, with to file to be processed into
Row uploads.
In a kind of implementation, if terminal is browser, " upload " button can be set in the display interface of terminal,
When being somebody's turn to do " upload " button when the user clicks, terminal can show the list of file to be uploaded, and the file in the list is terminal sheet
The file on ground, user can select file to be processed from the file of terminal local, correspondingly, terminal is available, this is to be processed
File.
S102: terminal obtains the size of file to be processed when sending file to be processed to server.
Wherein, the size of file to be processed is the size of memory space shared by file to be processed, for example, file to be processed
Size can be 556MB, alternatively, the size of file to be processed can also be with 1000MB.
When terminal to server sends file to be processed, terminal can also obtain the size of file to be processed, with basis
The different numerical value of the size of file to be processed, perform corresponding processing.
S103: terminal detects target value section belonging to the size of file to be processed.
Wherein, different numerical intervals respectively correspond different file hash (Hash) value calculations.
The division mode of different numerical intervals can be rule of thumb configured by technical staff.For example, can will be big
In 0 and be less than first threshold file size be divided into a numerical intervals;Will be greater than or equal to first threshold, and less than the second threshold
The file size of value is divided into another numerical intervals, and second threshold is greater than first threshold;The file of the third that will be greater than or equal to threshold value
Size is divided into a numerical intervals, and third threshold value is greater than second threshold.First threshold, second threshold and third threshold value are positive
Number.
After the size that terminal determines file to be processed, terminal can determine numerical value area belonging to the size of file to be processed
Between (i.e. target value section) in turn can be according to target value section respective file hash value calculation, to text to be processed
Part is handled.
S104: terminal calculates file to be processed according to the corresponding file hash value calculation in target value section
Hash value.
Wherein, terminal can calculate the hash value of file to be processed according to preset algorithm, and preset algorithm can be sha1
(Secure Hash Algorithm, Secure Hash Algorithm) or other algorithms.
In a kind of implementation, terminal can be treated according to the corresponding file hash value calculation in target value section
The data that processing file includes are handled, using processing result as the hash value of file to be processed.
In another way, in order to enable the hash value of calculated file to be processed more effectively to embody text to be processed
The uniqueness of part, the method that terminal calculates the hash value of file to be processed may comprise steps of:
Step 1, according to the corresponding file hash value calculation in target value section, the number for including to file to be processed
According to being handled, hash value to be selected is obtained.
According to the size of file to be processed, the method that terminal calculates hash value to be selected may include following situations:
Situation one, when target value section is (0, A), terminal calculates the full dose hash value of file to be processed, and will be complete
Hash value is measured as hash value to be selected.
Wherein, the numerical value of A can be rule of thumb configured by technical staff, for example, A can be 40M.
In a kind of implementation, terminal determine file to be processed size belong to (0 40M) in the case where, due to wait locate
It is smaller to manage file, therefore, terminal can carry out Hash operation to all data that file to be processed includes, that is, terminal can be counted
The full dose hash value of file to be processed is calculated, and using full dose hash value as hash value to be selected.
Situation two, when target value section be [A, B), wherein when B > A, terminal calculate comprising file to be processed preset head
The hash value of the data in portion and default tail portion, and using the hash value being calculated as hash value to be selected.
Wherein, the size of the numerical value of B, default head and default tail portion can be rule of thumb configured by technical staff,
For example, B can be 128M, presets head and default tail portion can be 20M.
In a kind of implementation, terminal determine file to be processed size belong to [40M 128M) in the case where, if
Terminal calculates the full dose hash value of file to be processed, then can consume more computing resource, and waste more calculating duration.
Therefore, terminal can carry out sampling processing to file to be processed, that is, the available file to be processed of terminal presets head
The data of the data in portion and default tail portion, then, terminal can splice the data on default head and default tail portion, and right
Spliced data carry out Hash operation, using the result of operation as hash value to be selected.
Situation three, when target value section is [B ,+∞) when, terminal, which is calculated, presets head, default comprising file to be processed
The data hash value of tail portion and default middle part, and using the hash value being calculated as hash value to be selected.
In a kind of implementation, terminal determine file to be processed size belong to [128M+∞) in the case where, due to
File to be processed is larger, if terminal calculates the full dose hash value of file to be processed, can consume more computing resource, and wave
Take more calculating duration.
In addition, being obtained if the data that terminal only presets head and default tail portion to file to be processed are handled
The validity of hash value to be selected is lower.
Therefore, the available file to be processed of terminal presets the data on head, the data of default tail portion, and default middle part
Data, then, terminal can splice the data on default head, default tail portion and default middle part, and to spliced
Data carry out Hash operation, using the result of operation as hash value to be selected.
As it can be seen that above situation one, into situation three, for different size of file to be processed, terminal can be to text to be processed
Part carries out the processing of different modes, obtains different sampled data blocks, in turn, obtains file to be processed according to sampled data block
Hash value.
In a kind of implementation, the corresponding relationship of file size and number of samples can store in terminal, terminal can be with
According to the corresponding relationship, the corresponding destination sample number of the size of file to be processed is determined, then, terminal can be from text to be processed
Destination sample number data block is obtained in part, and Hash operation is carried out to destination sample number data block, obtains hash to be selected
Value.
The corresponding relationship of file size and number of samples can be with reference table (1).
Table (1)
File size (D) | Number of samples (S) |
D < 40M | 1 |
40M≤D < 128M | 2 |
128M≤D < 512M | 3 |
512M≤D < 1G | 4 |
1G≤D < 4G | 5 |
4G≤D | 6 |
By table (1) as it can be seen that when file to be processed is less than 40M, number of samples 1, at this point, terminal can not be to be processed
File is sampled, sampled data block, that is, file to be processed itself, all numbers that terminal can directly include to file to be processed
According to Hash operation is carried out, using operation result as hash value to be selected.
When file to be processed is more than or equal to 40M, and is less than 128M, number of samples 2, that is, terminal is available wait locate
The data block that 2 default sizes are obtained in the data that reason file includes, as sampled data block.
When file to be processed is more than or equal to 128M, and is less than 512M, number of samples 3, that is, terminal can be to be processed
The data block that 3 default sizes are obtained in the data that file includes, as sampled data block.
When file to be processed is more than or equal to 512M, and is less than 1G, number of samples 4, that is, terminal can be from text to be processed
The data block that 4 default sizes are obtained in the data that part includes, as sampled data block.
When file to be processed is more than or equal to 1G, and is less than 4G, number of samples 5, that is, terminal can be from file to be processed
The data block that 5 default sizes are obtained in the data for including, as sampled data block.
When file to be processed is more than or equal to 4G, number of samples 6, that is, the number that terminal can include from file to be processed
According to the middle data block for obtaining 6 default sizes, as sampled data block.
Above-mentioned default size can be 20M, and when the number of sampled data block is more than or equal to 2, sampled data block be can wrap
Include the default head of file to be processed and the data of default tail portion.
For two files of different-format, the difference of the header data of two files is larger, the tail of two files
The difference of portion's data is larger, and therefore, terminal is when sampling file to be processed, if destination sample number is more than or equal to 2,
The sampled data block that then terminal obtains may include the data that file to be processed presets head and default tail portion, so that sampling
Data block can more accurately embody the uniqueness of file to be processed.
In addition, remove the default head of file to be processed and the data of default tail portion, terminal can also according to preset rules,
Determine other sampled data blocks (data block at i.e. default middle part, be properly termed as middle part sampled data block).
In a kind of implementation, if destination sample number is odd number, and destination sample number is greater than 2, then terminal can be with
The data block (being properly termed as midpoint sample data block) that the data midpoint that file to be processed includes presets size is obtained, remaining
Middle part sampled data block is then evenly distributed on the two sides for the data midpoint that file to be processed includes according to preset interval.
If destination sample number is even number, and destination sample number is greater than 2, then the data midpoint that file to be processed includes
Place is evenly distributed on the data midpoint that file to be processed includes according to preset interval without sampling, middle part sampled data root tuber
Two sides.
For example, if file to be processed be more than or equal to 128M, and be less than 512M, middle part sampled data block be 1, then to
The data midpoint that processing file includes obtains midpoint sample data block, as middle part sampled data block.
If file to be processed is more than or equal to 512M, and is less than 1G, preset interval can be 128M, middle part sampled data block
It is 2, the distance between the data midpoint that this two middle part sampled data blocks and file to be processed include is 128M.
If file to be processed is more than or equal to 1G, and is less than 4G, preset interval can be 256M, and middle part sampled data block is
3, one of middle part sampled data block is the midpoint sample data block for the data midpoint that file to be processed includes, remaining two
The two sides for the data midpoint that a middle part sampled data block includes in file to be processed, and in the data for including with file to be processed
Distance at point is 256M.
If file to be processed is more than or equal to 4G, preset interval can be 512M, and middle part sampled data block is 4, respectively
The two sides for the data midpoint for including positioned at file to be processed, and distinguish at a distance from the data midpoint for including with file to be processed
For 512M and 1024M.
Step 2 calculates the hash value of the data of the size comprising hash value to be selected and file to be processed, will be calculated
Hash value of the hash value as file to be processed.
After obtaining hash value to be selected, terminal can splice the size of hash value to be selected and file to be processed, so
Afterwards, terminal can carry out Hash operation to spliced data, using the result of operation as the hash value of file to be processed.
As it can be seen that the hash value for the file to be processed that method through this embodiment obtains, can not only embody text to be processed
The data that part includes can also embody the size of file to be processed, can effectively embody the uniqueness of file to be processed.
S105: terminal to server sends the transmission information of the hash value comprising file to be processed.
After terminal obtains the hash value of file to be processed, terminal can be sent to server comprising file to be processed
The transmission information of hash value.
Correspondingly, server can determine text to be processed according to the transmission information after server receives the transmission information
Whether part is duplicate file, and the response results for being directed to the transmission information are returned to terminal, and response results include file to be processed
It is the information of non-duplicate file for the information of duplicate file or file to be processed.The processing step of server will be in subsequent implementation
It is discussed in detail in example.
In turn, the response results that the available server of terminal is sent.
In a kind of implementation, if terminal is browser, the host process in terminal can be used for uploading file to be processed,
Host process passes through file (file) object and XMLHttpRequest (Extensible Marku p Language Hyper
Text Transfer Protocol Request, extensible markup language hypertext transfer protocol requests), to file to be processed
It is sliced, and by way of asynchronous upload, the file to be processed after slice is sent to server.
Meanwhile it includes text to be processed that terminal, which can also be sent by web worker (labourer) separate threads to server,
The transmission information of the hash value of part.
As it can be seen that the duplicate file detection method based on the embodiment of the present invention, when sending file to be processed to server, eventually
End can also send the hash value of file to be processed to server, and server withouts waiting for file to be processed all transmission knots
Beam, so that it may obtain the hash value of file to be processed, in turn, server can determine whether file to be processed is repetition earlier
File.In addition, the duplicate file detection method based on the embodiment of the present invention, can be responsible for calculating file to be processed by terminal
Hash value, and then the calculating pressure of server can be mitigated.
Optionally, the size that information can also include file to be processed is sent, that is, send to server wait locate in terminal
When managing the hash value of file, terminal can also send the size of file to be processed to server.
Correspondingly, server can determine to be processed in conjunction with the hash value of file to be processed and the size of file to be processed
Whether file is duplicate file, can be improved the efficiency of duplicate file detection method.
Referring to fig. 2, Fig. 2 is a kind of flow chart of duplicate file detection method provided in an embodiment of the present invention, and this method can
To be applied to server, this method be may comprise steps of:
S201: the transmission information for the hash value comprising file to be processed that server receiving terminal is sent.
Wherein, send what information can be sent when sending file to be processed to server to server for terminal.Wait locate
Reason file can be the Internet resources of arbitrary format, for example, file to be processed can be video file, or audio text
Part can also be the files such as the installation kit of application program.
The available user of terminal needs to be uploaded to the file (file i.e. to be processed) of server, and then, terminal can be to
Server sends file to be processed.
When sending file to be processed to server, terminal can also obtain the size of file to be processed, detect to be processed
Target value section belonging to the size of file, according to the corresponding file hash value calculation in target value section, calculate to
The hash value of file is handled, and sends the transmission information of the hash value comprising file to be processed to server.The processing side of terminal
Method may refer to being discussed in detail for above-described embodiment.
Correspondingly, server then can receive the transmission information of the hash value comprising file to be processed.
S202: server determines whether file to be located is duplicate file according to information is sent.
After server obtains and sends information, server can extract the hash value of file to be processed, correspondingly, S202 can
With the following steps are included:
In the hash value of the local each storage file of server detection, if exist identical as the hash value of file to be processed
Hash value, if it does, determining that file to be located is duplicate file, if it does not, determining that file to be located is non-duplicate file.
In a kind of implementation, after server obtains the hash value of file to be processed, server can be locally stored
All Files hash value in inquired, judge whether there is the identical hash value with the hash value of file to be processed, such as
Fruit exists, and server, which can be determined that, has stored file identical with file to be processed, that is, file to be processed is duplicate file, such as
Fruit is not present, and server can be determined that not stored file identical with file to be processed, that is, file to be processed is non-duplicate text
Part.
As it can be seen that the duplicate file detection method based on the embodiment of the present invention, when sending file to be processed to server, eventually
End can also send the hash value of file to be processed to server, and server withouts waiting for file to be processed all transmission knots
Beam, so that it may obtain the hash value of file to be processed, in turn, server can determine whether file to be processed is repetition earlier
File.
S203: server sends response results to terminal.
Wherein, response results are the information of duplicate file comprising file to be processed or file to be processed is non-duplicate file
Information.
After determining whether file to be processed is duplicate file, server then can send response results to terminal, with
Whether informing terminals file to be processed is duplicate file.
In addition, sending information can also include the big of file to be processed in order to improve the efficiency of duplicate file detection method
It is small, correspondingly, S202 may comprise steps of:
Server according to the size of file to be processed determine file to be processed belonging to target value section, detect number of targets
It is worth in the hash value of the corresponding each storage file in section, if there is the identical hash value with the hash value of file to be processed,
If it does, determining that file to be located is duplicate file, if it does not, determining that file to be located is non-duplicate file.
In a kind of implementation, after server extracts the size and hash value of file to be processed, server can be true
Numerical intervals (i.e. target value section) belonging to the size of fixed file to be processed.About numerical intervals, above-mentioned reality can be referred to
Apply being discussed in detail in example.
Then, server can be inquired in the hash value of the corresponding each storage file in target value section, sentence
It is disconnected with the presence or absence of the identical hash value with the hash value of file to be processed, if it does, server can be determined that stored and to
Handle the identical file of file, that is, file to be processed is duplicate file, if it does not, server can be determined that it is not stored with
The identical file of file to be processed, that is, file to be processed is non-duplicate file.
Based on above-mentioned processing, server need to only carry out in the hash value of the corresponding each storage file in target value section
Inquiry in turn, can save query time, improve weight without inquiring in the hash value for the All Files being locally stored
The efficiency of multiple file test method.
In addition, server can also store file to be processed when server determines that file to be processed is non-duplicate file,
And the corresponding relationship of the hash value of file to be processed and file to be processed is recorded, and in turn, when terminal uploads same file again,
Server can determine that the file that terminal uploads is duplicate file.
Corresponding with the embodiment of the method for Fig. 1, referring to Fig. 3, Fig. 3 is a kind of structure of terminal provided in an embodiment of the present invention
Figure, which may include: transceiver 301 and processor 302;
The transceiver 301 needs to be uploaded to the file to be processed of server for obtaining user;To the server
When sending the file to be processed, the size of the file to be processed is obtained;
The processor 302, for detecting target value section belonging to the size of the file to be processed, wherein no
Same numerical intervals respectively correspond different file Hash hash value calculations;According to the corresponding text in the target value section
Part hash value calculation calculates the hash value of the file to be processed;
The transceiver 301 is also used to send the transmission of the hash value comprising the file to be processed to the server
Information;The server is received for the response results for sending information, wherein the response results include described to be processed
File be duplicate file information or the file to be processed be non-duplicate file information.
Optionally, the processor 302 is specifically used for calculating according to the corresponding file hash value in the target value section
Mode, the data for including to the file to be processed are handled, and obtain hash value to be selected;Calculating includes the hash value to be selected
With the hash value of the data of the size of the file to be processed, using the hash value being calculated as the file to be processed
Hash value.
Optionally, the target value section is (0, A);
The processor 302, specifically for calculating the full dose hash value of the file to be processed, and by the full dose hash
Value is used as hash value to be selected.
Optionally, the target value section be [A, B), wherein B > A;
The processor 302 presets the data on head and default tail portion specifically for calculating comprising the file to be processed
Hash value, and using the hash value being calculated as the hash value to be selected.
Optionally, the target value section be [B ,+∞);
The processor 302, being specifically used for calculating includes during the file to be processed presets head, presets tail portion and is default
The data hash value in portion, and using the hash value being calculated as the hash value to be selected.
Optionally, described to send the size that information further includes the file to be processed.
Corresponding with the embodiment of the method for Fig. 2, referring to fig. 4, Fig. 4 is a kind of knot of server provided in an embodiment of the present invention
Composition, the server may include: transceiver 401 and processor 402;
The transceiver 401, the transmission information of the hash value comprising file to be processed for receiving terminal transmission,
In, the transmission information is the terminal when sending the file to be processed to the server, and Xiang Suoshu server is sent
's;
The processor 402, for determining whether the file to be located is duplicate file according to the transmission information;
The transceiver 401 is also used to send response results to the terminal, wherein the response results include described
File to be processed be duplicate file information or the file to be processed be non-duplicate file information.
Optionally, the processor 402, specifically in the hash value of the local each storage file of detection, if exist
The identical hash value with the hash value of the file to be processed;If in the hash value of local each storage file, existed and institute
The identical hash value of hash value of file to be processed is stated, determines that the file to be located is duplicate file;If local each storage
In the hash value of file, there is no hash values identical with the hash value of the file to be processed, determine the file to be processed
For non-duplicate file.
Optionally, described to send the size that information further includes the file to be processed;
The processor 402, belonging to determining the file to be processed according to the size of the file to be processed
Target value section;In the hash value for detecting the corresponding each storage file in the target value section, if exist and institute
State the identical hash value of hash value of file to be processed;If the hash of the corresponding each storage file in the target value section
In value, there is the identical hash value with the hash value of the file to be processed, determines that the file to be located is duplicate file;If
In the hash value of the corresponding each storage file in the target value section, there is no the hash value phases with the file to be processed
Same hash value determines that the file to be processed is non-duplicate file.
Referring to Fig. 5, Fig. 5 is a kind of structure chart of duplicate file detection system provided in an embodiment of the present invention, which can
To include terminal 501 and server 502;
The terminal 501 needs to be uploaded to the file to be processed of server 502 for obtaining user;To the service
When device 502 sends the file to be processed, the size of the file to be processed is obtained;Detect the size institute of the file to be processed
The target value section of category, wherein different numerical intervals respectively correspond different file hash value calculations;According to described
The corresponding file hash value calculation in target value section calculates the hash value of the file to be processed;To the server
502 send the transmission information of the hash value comprising the file to be processed;
The server 502, for receiving the transmission letter for the hash value comprising file to be processed that the terminal 501 is sent
Breath;According to the transmission information, determine whether the file to be located is duplicate file;Response results are sent to the terminal 501,
Wherein, the response results are the information of duplicate file comprising the file to be processed or the file to be processed is non-duplicate
The information of file.
The terminal 501 is also used to receive the server 502 for the response results for sending information.
The embodiment of the invention also provides a kind of computer readable storage medium, stored in the computer readable storage medium
There is instruction, when run on a computer, so that computer executes duplicate file detection method provided in an embodiment of the present invention.
Specifically, above-mentioned duplicate file detection method, comprising:
Obtain the file to be processed that user needs to be uploaded to server;
When sending the file to be processed to the server, the size of the file to be processed is obtained;
Detect target value section belonging to the size of the file to be processed, wherein different numerical intervals are right respectively
Answer different file Hash hash value calculations;
According to the corresponding file hash value calculation in the target value section, the hash of the file to be processed is calculated
Value;
The transmission information of the hash value comprising the file to be processed is sent to the server;
The server is received for the response results for sending information, wherein the response results include it is described to
Handle the information that file is duplicate file or the information that the file to be processed is non-duplicate file.
It should be noted that other implementations of above-mentioned duplicate file detection method and preceding method embodiment part phase
Together, which is not described herein again.
By running the instruction stored in computer readable storage medium provided in an embodiment of the present invention, sent out to server
When sending file to be processed, the hash value of file to be processed can also be sent to server, server withouts waiting for text to be processed
All transmission terminates part, so that it may obtain the hash value of file to be processed, in turn, server can determine text to be processed earlier
Whether part is duplicate file.
The embodiment of the invention also provides a kind of computer readable storage medium, stored in the computer readable storage medium
There is instruction, when run on a computer, so that computer executes duplicate file detection method provided in an embodiment of the present invention.
Specifically, above-mentioned duplicate file detection method, comprising:
Receive the transmission information for the hash value comprising file to be processed that terminal is sent, wherein the transmission information is institute
Terminal is stated when sending the file to be processed to the server, what Xiang Suoshu server was sent;
According to the transmission information, determine whether the file to be located is duplicate file;
Response results are sent to the terminal, wherein the response results include that the file to be processed is duplicate file
Information or the file to be processed be non-duplicate file information.
It should be noted that other implementations of above-mentioned duplicate file detection method and preceding method embodiment part phase
Together, which is not described herein again.
By running the instruction stored in computer readable storage medium provided in an embodiment of the present invention, without waiting for
All transmission terminates processing file, so that it may obtain the hash value of file to be processed, in turn, can determine text to be processed earlier
Whether part is duplicate file.
The embodiment of the invention also provides a kind of computer program products comprising instruction, when it runs on computers
When, so that computer executes duplicate file detection method provided in an embodiment of the present invention.
Specifically, above-mentioned duplicate file detection method, comprising:
Obtain the file to be processed that user needs to be uploaded to server;
When sending the file to be processed to the server, the size of the file to be processed is obtained;
Detect target value section belonging to the size of the file to be processed, wherein different numerical intervals are right respectively
Answer different file Hash hash value calculations;
According to the corresponding file hash value calculation in the target value section, the hash of the file to be processed is calculated
Value;
The transmission information of the hash value comprising the file to be processed is sent to the server;
The server is received for the response results for sending information, wherein the response results include it is described to
Handle the information that file is duplicate file or the information that the file to be processed is non-duplicate file.
It should be noted that other implementations of above-mentioned duplicate file detection method and preceding method embodiment part phase
Together, which is not described herein again.
By running computer program product provided in an embodiment of the present invention, when sending file to be processed to server,
The hash value of file to be processed can also be sent to server, server withouts waiting for file to be processed, and all transmission terminates,
It can be obtained by the hash value of file to be processed, in turn, server can determine whether file to be processed is to repeat text earlier
Part.
The embodiment of the invention also provides a kind of computer program products comprising instruction, when it runs on computers
When, so that computer executes duplicate file detection method provided in an embodiment of the present invention.
Specifically, above-mentioned duplicate file detection method, comprising:
Receive the transmission information for the hash value comprising file to be processed that terminal is sent, wherein the transmission information is institute
Terminal is stated when sending the file to be processed to the server, what Xiang Suoshu server was sent;
According to the transmission information, determine whether the file to be located is duplicate file;
Response results are sent to the terminal, wherein the response results include that the file to be processed is duplicate file
Information or the file to be processed be non-duplicate file information.
It should be noted that other implementations of above-mentioned duplicate file detection method and preceding method embodiment part phase
Together, which is not described herein again.
By running computer program product provided in an embodiment of the present invention, withouts waiting for file to be processed and all transmit
Terminate, so that it may obtain the hash value of file to be processed, in turn, can determine whether file to be processed is to repeat text earlier
Part.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for terminal,
For server, system, computer readable storage medium, computer program product embodiments, since it is substantially similar to method
Embodiment, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (18)
1. a kind of duplicate file detection method, which is characterized in that the described method includes:
Terminal obtains the file to be processed that user needs to be uploaded to server;
The terminal obtains the size of the file to be processed when sending the file to be processed to the server;
The terminal detects target value section belonging to the size of the file to be processed, wherein different numerical intervals point
Different file Hash hash value calculations is not corresponded to;
The terminal calculates the file to be processed according to the corresponding file hash value calculation in the target value section
Hash value;
The terminal sends the transmission information of the hash value comprising the file to be processed to the server;
The terminal receives the server for the response results for sending information, wherein the response results include institute
State the information that file to be processed is duplicate file or the information that the file to be processed is non-duplicate file.
2. the method according to claim 1, wherein the terminal is according to the corresponding text in the target value section
Part hash value calculation calculates the hash value of the file to be processed, comprising:
The terminal includes to the file to be processed according to the corresponding file hash value calculation in the target value section
Data handled, obtain hash value to be selected;
The hash value for calculating the data of the size comprising the hash value to be selected and the file to be processed, by what is be calculated
Hash value of the hash value as the file to be processed.
3. according to the method described in claim 2, it is characterized in that, the target value section is (0, A);The terminal according to
The corresponding file hash value calculation in the target value section, the data for including to the file to be processed are handled,
Obtain hash value to be selected, comprising:
The terminal calculates the full dose hash value of the file to be processed, and using the full dose hash value as hash value to be selected.
4. according to the method in claim 2 or 3, which is characterized in that the target value section be [A, B), wherein B > A;
The terminal is according to the corresponding file hash value calculation in the target value section, the number for including to the file to be processed
According to being handled, hash value to be selected is obtained, comprising:
The terminal calculates the hash value that the data of head and default tail portion are preset comprising the file to be processed, and will calculate
The hash value arrived is as the hash value to be selected.
5. according to the method described in claim 4, it is characterized in that, the target value section be [B ,+∞);The terminal is pressed
According to the corresponding file hash value calculation in the target value section, at the data for including to the file to be processed
Reason, obtains hash value to be selected, comprising:
The terminal calculates the data hash value that head, default tail portion and default middle part are preset comprising the file to be processed, and
Using the hash value being calculated as the hash value to be selected.
6. the method according to claim 1, wherein the transmission information further includes the big of the file to be processed
It is small.
7. a kind of duplicate file inspection method, which is characterized in that the described method includes:
The transmission information for the hash value comprising file to be processed that server receiving terminal is sent, wherein the transmission information is
The terminal to the server send the file to be processed when, Xiang Suoshu server send;
The server determines whether the file to be located is duplicate file according to the transmission information;
The server sends response results to the terminal, wherein the response results include that the file to be processed is attached most importance to
The information of multiple file or the file to be processed are the information of non-duplicate file.
8. the method according to the description of claim 7 is characterized in that described in the server according to the transmission information, determines
Whether file to be located is duplicate file, comprising:
In the hash value of the local each storage file of the server detection, if there is the hash value with the file to be processed
Identical hash value;
If there is the identical hash value with the hash value of the file to be processed in the hash value of local each storage file,
Determine that the file to be located is duplicate file;
If there is no the identical hash of hash value with the file to be processed in the hash value of local each storage file
Value determines that the file to be processed is non-duplicate file.
9. the method according to the description of claim 7 is characterized in that the transmission information further includes the big of the file to be processed
It is small;
The server determines whether the file to be located is duplicate file according to the transmission information, comprising:
The server according to the size of the file to be processed determine the file to be processed belonging to target value section;
The server detects in the hash value of the corresponding each storage file in the target value section, if exist with it is described
The identical hash value of the hash value of file to be processed;
If existed and the file to be processed in the hash value of the corresponding each storage file in the target value section
The identical hash value of hash value determines that the file to be located is duplicate file;
If in the hash value of the corresponding each storage file in the target value section, there is no with the file to be processed
The identical hash value of hash value determines that the file to be processed is non-duplicate file.
10. a kind of terminal, which is characterized in that the terminal includes: transceiver and processor;
The transceiver needs to be uploaded to the file to be processed of server for obtaining user;Institute is being sent to the server
When stating file to be processed, the size of the file to be processed is obtained;
The processor, for detecting target value section belonging to the size of the file to be processed, wherein different numerical value
Section respectively corresponds different file Hash hash value calculations;According to the corresponding file hash value in the target value section
Calculation calculates the hash value of the file to be processed;
The transceiver is also used to send the transmission information of the hash value comprising the file to be processed to the server;It connects
The server is received for the response results for sending information, wherein the response results include that the file to be processed is
The information of duplicate file or the file to be processed are the information of non-duplicate file.
11. terminal according to claim 10, which is characterized in that the processor is specifically used for according to the number of targets
It is worth the corresponding file hash value calculation in section, the data for including to the file to be processed are handled, obtained to be selected
Hash value;The hash value for calculating the data of the size comprising the hash value to be selected and the file to be processed, will be calculated
Hash value of the hash value as the file to be processed.
12. terminal according to claim 11, which is characterized in that the target value section is (0, A);
The processor, specifically for calculating the full dose hash value of the file to be processed, and using the full dose hash value as
Hash value to be selected.
13. terminal according to claim 11 or 12, which is characterized in that the target value section be [A, B), wherein B
>A;
The processor presets the hash of the data of head and default tail portion specifically for calculating comprising the file to be processed
Value, and using the hash value being calculated as the hash value to be selected.
14. terminal according to claim 13, which is characterized in that the target value section be [B ,+∞);
The processor presets the number on head, default tail portion and default middle part specifically for calculating comprising the file to be processed
According to hash value, and using the hash value being calculated as the hash value to be selected.
15. terminal according to claim 10, which is characterized in that the transmission information further includes the file to be processed
Size.
16. a kind of server, which is characterized in that the server includes: transceiver and processor;
The transceiver, the transmission information of the hash value comprising file to be processed for receiving terminal transmission, wherein the hair
Breath of delivering letters is the terminal when sending the file to be processed to the server, what Xiang Suoshu server was sent;
The processor, for determining whether the file to be located is duplicate file according to the transmission information;
The transceiver is also used to send response results to the terminal, wherein the response results include the text to be processed
Part be duplicate file information or the file to be processed be non-duplicate file information.
17. server according to claim 16, which is characterized in that the processor, specifically for each of detection local
In the hash value of storage file, if there is the identical hash value with the hash value of the file to be processed;If local is each
In the hash value of storage file, there is the identical hash value with the hash value of the file to be processed, determines the file to be located
For duplicate file;If there is no identical as the hash value of the file to be processed in the hash value of local each storage file
Hash value, determine the file to be processed be non-duplicate file.
18. server according to claim 16, which is characterized in that the transmission information further includes the file to be processed
Size;
The processor, specifically for according to the size of the file to be processed determine the file to be processed belonging to number of targets
It is worth section;In the hash value for detecting the corresponding each storage file in the target value section, if exist with it is described to be processed
The identical hash value of the hash value of file;If deposited in the hash value of the corresponding each storage file in the target value section
In hash value identical with the hash value of the file to be processed, determine that the file to be located is duplicate file;If the mesh
In the hash value for marking the corresponding each storage file of numerical intervals, there is no identical with the hash value of the file to be processed
Hash value determines that the file to be processed is non-duplicate file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910380465.7A CN110096483B (en) | 2019-05-08 | 2019-05-08 | Duplicate file detection method, terminal and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910380465.7A CN110096483B (en) | 2019-05-08 | 2019-05-08 | Duplicate file detection method, terminal and server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110096483A true CN110096483A (en) | 2019-08-06 |
CN110096483B CN110096483B (en) | 2021-04-30 |
Family
ID=67447375
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910380465.7A Active CN110096483B (en) | 2019-05-08 | 2019-05-08 | Duplicate file detection method, terminal and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110096483B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309975A (en) * | 2013-06-09 | 2013-09-18 | 华为技术有限公司 | Duplicated data deleting method and apparatus |
CN103714123A (en) * | 2013-12-06 | 2014-04-09 | 西安工程大学 | Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise |
CN103870514A (en) * | 2012-12-18 | 2014-06-18 | 华为技术有限公司 | Repeating data deleting method and device |
CN107360254A (en) * | 2017-08-22 | 2017-11-17 | 北京奇艺世纪科技有限公司 | A kind of document down loading method, device, server and terminal |
CN108520077A (en) * | 2018-04-20 | 2018-09-11 | 广东五科技股份有限公司 | A kind of method and apparatus avoiding repeated downloads |
CN108540566A (en) * | 2018-04-18 | 2018-09-14 | 暴风集团股份有限公司 | file uploading method, device, system and client and server |
CN109213738A (en) * | 2018-11-20 | 2019-01-15 | 武汉理工光科股份有限公司 | A kind of cloud storage file-level data de-duplication searching system and method |
CN109657480A (en) * | 2017-10-11 | 2019-04-19 | 中国移动通信有限公司研究院 | A kind of document handling method, equipment and computer readable storage medium |
-
2019
- 2019-05-08 CN CN201910380465.7A patent/CN110096483B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103870514A (en) * | 2012-12-18 | 2014-06-18 | 华为技术有限公司 | Repeating data deleting method and device |
CN103309975A (en) * | 2013-06-09 | 2013-09-18 | 华为技术有限公司 | Duplicated data deleting method and apparatus |
CN103714123A (en) * | 2013-12-06 | 2014-04-09 | 西安工程大学 | Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise |
CN107360254A (en) * | 2017-08-22 | 2017-11-17 | 北京奇艺世纪科技有限公司 | A kind of document down loading method, device, server and terminal |
CN109657480A (en) * | 2017-10-11 | 2019-04-19 | 中国移动通信有限公司研究院 | A kind of document handling method, equipment and computer readable storage medium |
CN108540566A (en) * | 2018-04-18 | 2018-09-14 | 暴风集团股份有限公司 | file uploading method, device, system and client and server |
CN108520077A (en) * | 2018-04-20 | 2018-09-11 | 广东五科技股份有限公司 | A kind of method and apparatus avoiding repeated downloads |
CN109213738A (en) * | 2018-11-20 | 2019-01-15 | 武汉理工光科股份有限公司 | A kind of cloud storage file-level data de-duplication searching system and method |
Also Published As
Publication number | Publication date |
---|---|
CN110096483B (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108156056B (en) | Network quality measuring method and device | |
US9444701B2 (en) | Identifying remote machine operating system | |
US8984055B2 (en) | Relay device, information processing system, and computer-readable recording medium | |
US9996852B2 (en) | System and method for measuring and improving the efficiency of social media campaigns | |
JP2010117757A (en) | Performance monitoring system and performance monitoring method | |
CN110247986A (en) | A kind of document transmission method, device and electronic equipment | |
CN107196848B (en) | Information push method and device | |
CN108206769B (en) | Method, apparatus, device and medium for filtering network quality alarms | |
CN108345601A (en) | Search result ordering method and device | |
US10057155B2 (en) | Method and apparatus for determining automatic scanning action | |
CN110120971A (en) | A kind of gray scale dissemination method, device and electronic equipment | |
CN109756401A (en) | A kind of test method, device, electronic equipment and storage medium | |
CN109688483A (en) | A kind of method, apparatus and electronic equipment obtaining video | |
CN104202418B (en) | Recommend the method and system of the content distributing network of business for content supplier | |
CN112118151A (en) | Network speed measuring method, device, system, electronic equipment and storage medium | |
JP2011039826A (en) | Apparatus and method for evaluating propagation state, and program | |
JP5957419B2 (en) | QoE estimation apparatus, QoE estimation method and program | |
CN111147330A (en) | Network quality evaluation method and device, storage medium and processor | |
CN107948015B (en) | A kind of Analysis on Quality of Service method, apparatus and network system | |
CN110096483A (en) | A kind of duplicate file detection method, terminal and server | |
CN103532931B (en) | Method and system for testing transmission performance of data stream, and server | |
JP4927180B2 (en) | User waiting time estimation apparatus, user waiting time estimation method, and program | |
CN103701821B (en) | File type identification method and device | |
CN106972986B (en) | The detection method and its system of IDC network of computer room quality | |
CN102932400A (en) | Method and device for identifying uniform resource locator primary links |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |