CN104410692A - Method and system for uploading duplicated files - Google Patents
Method and system for uploading duplicated files Download PDFInfo
- Publication number
- CN104410692A CN104410692A CN201410712783.6A CN201410712783A CN104410692A CN 104410692 A CN104410692 A CN 104410692A CN 201410712783 A CN201410712783 A CN 201410712783A CN 104410692 A CN104410692 A CN 104410692A
- Authority
- CN
- China
- Prior art keywords
- file
- uploaded
- check value
- service end
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method and a system for uploading duplicated files. The method comprises the steps: a client end calculates a calibration section calibration value of a file with the file size which is greater than or equal to a first threshold value, and a server end quickly matches the calibration section calibration value with the file size; if quick matching is successful, the client end calculates a calibration value of the whole file, and the server end precisely matches file calibration information; if precise matching is successful, the server end establishes a file to be uploaded to a map recorded in an existing grandfather, otherwise, stores the file calibration information and generates a grandfather record after the file is uploaded. If no matching item exists during quick matching, calculation of the calibration information of the whole file to be uploaded can be avoided; meanwhile, consistency of the file to be uploaded and a file matched by the server end is guaranteed; by pre-storage of the file calibration information during precise matching, repeated calculation of the file calibration information by the server end is avoided.
Description
Technical field
The present invention relates to technical field of data processing, especially relate to a kind of method and system uploaded for duplicate file.
Background technology
Universal along with Internet technology and mobile device, people create to be increased day by day with the amount of information shared.The network storage, as net dish, cloud dish, only once can need upload for user provides, can carry out the efficient information processing mode of data access on multiple terminal at any time.But in the mass data of the network storage, certainly exist a large amount of repeating data, major embodiment is the public resource that different user is uploaded separately.The existence of these repeating datas, the process that passes thereon on the one hand occupies a large amount of network bandwidth, causes unnecessary pressure on the other hand to network storage server.
In order to reduce the waste that repeating data causes, the technology generally adopted at present has the files passe and data de-duplication that verify based on repeating data.Wherein, the current files passe scheme based on repeating data verification, its feature and some limitation comprise:
(1) by file block, in units of data block, carry out the contrast of duplicate file, therefore can process repeating data block in similar documents.As Chinese patent CN102571952A discloses a kind of system and method for transfer files, treat upload file and carry out piecemeal contrast.But, when the network bandwidth is less, in units of data block, carry out the mode contrasted, need to produce a large amount of request and response between clients and servers, and then affect efficiency of transmission.Meanwhile, the check value of each data block need be preserved and contrast to service end, sets up the mapping relations between mass data block, very large to the pressure of server, often needs independently server to process, be unfavorable for realizing in middle-size and small-size server device.
(2) if not by file block, then existing scheme is before whether server side searches exists same file, needs the check value directly calculating file total data content to be uploaded.As Chinese patent CN103929453A discloses a kind of processing method, Apparatus and system of uploading data, wherein according to the index information of described data to be uploaded, determine whether described data to be uploaded store in storage server.When file is larger, computational process is obviously consuming time.Indivedual improvement project proposes partial content calculation check value in extracted file, to shorten computing time, but can not ensure the global consistency of file to be uploaded and service end institute locating file.
Summary of the invention
Object of the present invention be exactly in order to overcome above-mentioned prior art exist defect and a kind of method and system uploaded for duplicate file is provided, the situation that the network bandwidth is limited can be tackled, reduce the communication of client and server, reduce server internal processing pressure, and realize the quick precise alignment of client file to be uploaded and server existing file.
Object of the present invention can be achieved through the following technical solutions:
For the method that duplicate file is uploaded, realize the transmission of file from client to service end, comprise the following steps:
(1) client judges whether file to be uploaded is not less than first threshold, if so, then performs step (2), and if not, then client upload file, performs step (8);
(2) client extracts the verification section of file to be uploaded, and calculation check section check value;
(3) service end carries out Rapid matching according to verification section check value and file size, judges whether service end exists occurrence, if so, then performs step (4), and if not, then client upload file, performs step (8);
(4) client calculates the overall check value of whole file to be uploaded;
(5) service end carries out exact matching according to verification section check value, overall check value and file description information, judges whether service end exists occurrence, if, then perform step (6), if not, then client upload file, performs step (7);
(6) service end adds the map record of file to be uploaded to existing file record;
(7) service end record verification section check value, overall check value and file description information, form file record corresponding to file to be uploaded and preserve;
(8) service end receives file to be uploaded, and calculates its file verification information, forms file record corresponding to file to be uploaded and preserves.
In described step (2), the extraction mode of verification section comprises:
A, be that the data content of verification segment length is as verification section by extracting size from file header to be uploaded; Or
B, be parameter by the size of file to be uploaded, obtain by predefined processing mode the original position verifying section, extracting size is that the data content of verification segment length is as verification section.
In described step (2), the computational methods of verification section check value comprise MD5 algorithm.
In described step (4), when calculating the overall check value of whole file to be uploaded, the computational methods of employing comprise MD5 algorithm and/or SHA-1 hashing algorithm and/or CRC32 checking algorithm.
Described file record at least comprises recording mechanism, file verification information and file description information, described file verification information comprises file verification section check value and overall check value, and described file description information comprises file name, client modification time and file size.
Described step (7) replaces with:
Service end record verification section check value, overall check value and file description information, and send recording mechanism and it fails to match instruction to client, client detects formally uploads period from calculation document check information to file, whether file data changes, if, then directly upload file, perform step (8), if not, then the recording mechanism received is carried out files passe as uploading one of mark, the information and the information of having preserved that receive file are formed complete file record according to recording mechanism by service end.
In described step (8), comprise when service end calculates the file verification information receiving file:
Judge whether file size is not less than Second Threshold, if then calculation document check information, file verification information and the information receiving file forms file record, if not, then directly preserve the information formation file record of reception file.
A kind of system uploaded for duplicate file, realize the transmission of file from client to service end, comprise setting check value computing module, document management module and first communication module in the client and check value matching module, archive information administration module and the second communication module be arranged in service end, wherein
Described check value computing module comprises:
For when file size is not less than first threshold, the unit of calculation check section check value; With
When Rapid matching exists occurrence, calculate the unit of the overall check value of whole file to be uploaded;
Document management module comprises:
For judging the unit of file size and first threshold relation;
Be not less than the file to be uploaded of first threshold from file size and extract verification section and the unit passing to check value computing module; With
For by file transfers to be uploaded to the unit of first communication module;
First communication module comprises:
For uploading the unit of check value information and file description information;
When file size is less than first threshold, Rapid matching returns when there is not occurrence or exact matching returns when there is not occurrence, by the unit of files passe to be uploaded to service end; With
Receive the unit of service end response;
Check value matching module is used for realizing Rapid matching and exact matching, comprises coupling verification section check value, the overall check value of file to be uploaded and file description information, and matching result is passed to second communication module;
Archive information administration module comprises:
For preserving the unit of file record;
When the success of service end exact matching, add the unit that file to be uploaded maps to corresponding existing file record; With
When the file size that service end receives is not less than Second Threshold and there is not file verification information, calculate and preserve the unit of file verification information;
Second communication module comprises:
For the unit of the check value or file that receive client upload; With
For the result of Rapid matching, exact matching and files passe being returned the unit of client.
Described document management module also comprises:
The overall check value of monitoring from calculating whole file to be uploaded formally uploads period to file, the unit whether file data changes.
Described archive information administration module also comprises:
The file of exact matching failure exceedes certain hour and does not upload, then remove the unit of corresponding file verification value and the file description information of having preserved.
Compared with prior art, beneficial effect of the present invention is:
1, service end saves the check information of file, comprises the overall check value of verification section check value file and whole file, serves Rapid matching and exact matching simultaneously.Mating by first verification section check value being uploaded to service end, can judge whether service end may exist identical content fast; If do not mated, then directly upload, avoid to whole file carry out unnecessary check value calculate spent by time.
2, uploading period, if file data does not change through exact matching and when finally uploading, service end can preserve the overall check value calculated by client in advance, avoids the wasting of resources that double counting causes; If this file is not finally uploaded or upload procedure file changes, service end will remove more than the invalid record of certain hour, recalculate test value information.
To sum up, the present invention, by Rapid matching, avoids and to the time spent by whole file calculation check value, may accelerate the checking procedure of duplicate file; By exact matching, ensure that file to be uploaded and service end match the global consistency of file; By the overall check value that prestores, and overstepping one's bounds block check mode is carried out to file, the limited scene of the network bandwidth can be applied to better, and reduce server stress.
Accompanying drawing explanation
In order to more clearly set forth the details of the embodiment of the present invention, shown below is some flow charts, application scenarios figure and module interaction figure that embodiment is relevant.Apparently, below drawings illustrate the exemplary of embodiment and indefiniteness explanation.For those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can be obtained according to these accompanying drawings.
Fig. 1 is file uploading method first embodiment flow chart of the present invention;
Fig. 2 is file uploading method second embodiment flow chart of the present invention;
Fig. 3 is that service end of the present invention adds overall check value first embodiment flow chart;
Fig. 4 is that service end of the present invention adds overall check value second embodiment flow chart;
Fig. 5 is the application scenarios figure of file uploading system embodiment of the present invention;
Fig. 6 is each module and the intermodule interaction figure of file uploading system embodiment of the present invention.
Embodiment
Accompanying drawing in conjunction with the embodiments, will elaborate to object of the present invention, design, technical scheme and advantage below.Should be appreciated that described embodiment is in order to the present invention will be described, and limit content of the present invention never in any form.By embodiment, those of ordinary skill in the art can understand summary of the invention better.Under the prerequisite not paying creative work, its related embodiment all that those of ordinary skill in the art obtain, all should belong to protection scope of the present invention.
The current files passe mode based on duplicate file verification, needs the check value of the directly whole file of calculating, consuming time longer and calculating for non-duplicate file causes unnecessary waste.In indivedual improvement project extracted file, partial content verifies, and can not ensure the global consistency of file to be uploaded and service end institute matching files.
One of feature of the present invention, proposes based on Rapid matching and exact matching twice matching process, accelerates the checking procedure of duplicate file.
Fig. 1 gives the first embodiment flow process of file uploading method of the present invention, and it comprises:
Step 101: judge whether file size to be uploaded is not less than first threshold by client.If file size to be uploaded is not less than first threshold, then go to step 102; Otherwise, then 111 are gone to step.
First threshold is the predefined file size needing to carry out duplication check of client, first threshold can be MB rank, as 10M, can carry out estimating and adjust acquisition according to the network condition in practical application, also can be the threshold size being suitable for promoting transfer efficiency on duplicate file obtained by alternate manner.
Step 102: client extracts verification section from file to be uploaded, and calculation check section check value, uploads to service end by verification section check value and file size.
Extracting verification section, is to carry out Rapid matching.Verification section is unsuitable long, otherwise loses that client calculates fast, the meaning of service end Rapid matching.Verification section is also unsuitable too short, otherwise does not have representativeness to file, may there is a lot of identical verification section in a large amount of different file.The size of upload file is the accuracy rate in order to improve Rapid matching further.
For the determination of verification segment length, can be KB rank, as 200K, also can be other size determined according to client computing capability.The impact of verification segment length: verification Duan Yuechang, the result of file coupling is more accurate, but client is longer for computing time; Verification section is shorter, and the result of file coupling is more inaccurate, but the time that client carries out check value calculating is shorter.
For the determination of verification fragment position, can intercept from file header, can intercept from other ad-hoc location of file, also using file size as parameter, the original position of verification section can be obtained by the function process that client is identical with service end.
For the calculating of verification section check value, can adopt Message Digest Algorithm 5 MD5, can adopt SHA-1 hashing algorithm, also can be other account form of client and service end agreement.Because the verification segment length intercepted is shorter, even if comparatively complicated verification mode, the time expended also can meet Rapid matching.
It should be noted that: in the present invention, verification section is not defined as a continuous print data area in file.Repeatedly extract in file zones of different the content that total length equals to verify segment length, extracted content strings is linked up, and calculation check value, be optional.
Also there is the method that partial content in extracted file carries out verifying in some scheme existing, but just it can be used as and uniquely verify matching way.Apparently, extract the partial content limitation of carrying out verifying and be, how complicated the method no matter extracting content is, as long as the content extracted does not cover whole file, all can not ensure file to be uploaded and service end match the global consistency of file.
Step 103: service end judges whether the occurrence that there is verification section check value, file size.This step carries out Rapid matching, if there is no occurrence, then service end does not exist same file, goes to step 111; If there is occurrence, then may there is same file in service end, goes to step 104.
In the present embodiment, already present for service end file is called the original file of file, the corresponding file record of each original file.File record at least comprises recording mechanism and file fingerprint, described but be not limited to file verification information and file description information, described file verification information comprises file verification section check value and overall check value, and described file description information comprises file name, client modification time and file size.
It should be noted that: if Rapid matching exists occurrence, just illustrate that service end may exist same file, this is that the consistency that can not be used for carrying out whole file due to verification section and file size judges.Concrete reason, is described in a step 102.
Step 104: client calculates the overall check value of whole file to be uploaded, and overall check value, verification section check value and file description information are uploaded to service end.
The overall check value of calculation document, can adopt Message Digest Algorithm 5 MD5 and/or SHA-1 hashing algorithm, also can be other account form of client and service end agreement.In the present embodiment, adopt MD5 checksum CRC 32 to verify simultaneously.Apparently, other checking algorithm can be adopted to carry out independent utility or Combination application carrys out calculation document check value.
Upload verification section check value and the file description information of file in embodiment, wherein file description information includes but not limited to the title of file, client modification time and file size.
Step 105: whether service end exists the occurrence of file fingerprint information to be uploaded.This step carries out exact matching, if there is no occurrence, then service end does not exist same file, goes to step 107; If there is occurrence, then there is same file in service end, goes to step 106.
The file fingerprint mated, described in integrating step 103, includes but not limited to MD5 check value and the CRC32 check value of file, file verification section check value and file size in the present embodiment.
Step 106: service end adds the mapping of file to be uploaded to existing file record, uploads end.
In the present embodiment, if service end exists the occurrence of file to be uploaded, then add by the map record of file to be uploaded to existing file record.Mapping relations are many-to-one often, namely for duplicate file, after repeatedly uploading via same user or different user, all set up mapping relations with same file record.File fingerprint is not comprised in map record.When service end carries out Rapid matching and exact matching, only mate in file record.
In universal significance, user, to the operation of uploaded duplicate file, will be divided into the operation of map record and the operation two parts to the original file of file after mapping.Illustrate below this section, be in order to related content of the present invention describe integrality, not included in the claims in the present invention content, the present invention should do not caused to invent with other conflict mutually: user is to the reading of duplicate file yet, generally first obtain file record by map record, more therefrom obtain the actual storage locations of the original file of file; When the map record pointing to file record is not less than 2, deletion action only deletes map record.
Step 107: service end log file fingerprint, and return recording number and it fails to match information.
Now, service end, through exact matching, determines to there is not the file identical with file to be uploaded.Owing to whole file may be uploaded in client subsequent operation, therefore in embodiment, service end preserves the fingerprint of file in advance.If file does not change in upload procedure, after waiting files passe to be done, upload file information is combined with fingerprint by service end, can produce a complete file record.Another feature of the present invention, proposes the wasting of resources utilizing this processing mode to cause to avoid the overall check value of service end double counting; If this file is not finally uploaded or upload procedure file content changes, service end will remove more than the invalid record of certain hour.
Step 108: the check value that client detects from calculating whole file to be uploaded formally uploads period to file, and whether file data is constant.If formally upload period from the check value calculating file to be uploaded to file, file data is constant, then go to step 109, otherwise goes to step 111.
Step 109: client upload file, recording mechanism are to service end.
The recording mechanism that service end returns by client is as one of parameter during files passe.
Step 110: service end, according to recording mechanism, by the file fingerprint preserved with upload document information and form a complete file record, uploads end.
While service end receives file, if there is recording mechanism, the fingerprint Already in service end of this file is then described, therefore directly fingerprint and file out of Memory can be combined as a complete file record, do not need at the overall check value of service end double counting.
Step 111: client upload file, service end calculates the check information of this file, check information and upload file information is formed file record, uploads end.
The file that this step is uploaded, comprises the file that file size is less than first threshold, and Rapid matching is without the file of occurrence, and exact matching formally uploads the period file that changes of file data without occurrence and from calculation document check value to file.After service end receives file, first generate the file record not comprising check information, then adopt the overall check value of the workflow management in Fig. 3 or Fig. 4, and the check information write file record that will calculate.
Fig. 2 gives the second embodiment flow process of file uploading method of the present invention.Compared with the first embodiment, client adopts different process to file during files passe.In first embodiment, whether client need monitor upload procedure file data content and change, and processes accordingly according to step 108.Whether, in the second embodiment, client locks file in files passe process, therefore do not need monitoring file data content to change.For substantial elaboration and schematic illustration such as first threshold selection, the process of file verification section, check value account forms, the second embodiment is consistent with the first embodiment.The step of the second embodiment comprises:
Step 201: judge whether file size to be uploaded is not less than first threshold by client.If file size to be uploaded is not less than first threshold, then go to step 202; Otherwise, then 208 are gone to step.
Step 202: client calculates the verification section check value of file to be uploaded.In this step, verification section check value and file size are uploaded to service end by client simultaneously.
Step 203: whether service end Rapid matching is successful.During Rapid matching, judge whether the occurrence that there is verification section check value, file size, if there is no occurrence, then there is not same file in service end, goes to step 208; If there is occurrence, then may there is same file in service end, goes to step 204.
Step 204: client calculates the check information of whole file to be uploaded.File verification value, verification section check value and file description information are uploaded to service end by client simultaneously.
Step 205: whether service end exact matching is successful.The occurrence of server side searches file fingerprint to be uploaded, if there is no occurrence, then there is not same file in service end, goes to step 207; If there is occurrence, then there is same file in service end, goes to step 206.
Step 206: service end adds the mapping of file to be uploaded to existing file record, uploads end.
Step 207: the overall check value of service end record, pending file uploads the complete file record of rear formation, uploads end.This step can be divided into 3 subprocess:
(1) service end log file fingerprint, and return recording number and it fails to match information;
(2) client upload file, recording mechanism are to service end;
(3) service end is according to recording mechanism, by the file fingerprint preserved with upload document information and form a complete file record, uploads end.
Step 208: client upload file, service end calculates the check information of this file, and check information and upload file information are formed file record, uploads end.
The file that this step is uploaded, comprises file size and is less than the file of first threshold and the Rapid matching file without occurrence.After service end receives file, first generate the file record not comprising check information, then adopt the workflow management file verification information in Fig. 3 or Fig. 4, and the check information write file record that will calculate.
Fig. 3 gives the flow process that service end adds file verification information first embodiment.The meaning that service end adds file verification information is: may not there is check information in some file record of service end, as the file uploaded in Fig. 1 step 111 and Fig. 2 step 208, need calculation check section check value and overall check value, to obtain the Rapid matching that carries out file with service end and the consistent file record of exact matching demand.In embodiment illustrated in fig. 3, process the single or multiple file record that there is not check information, this file record entry can come from the message queue being pushed to processing module after service end receives file, and its step comprises:
Step 301: obtain the file record that there is not check information.
Step 302: whether file size is not less than Second Threshold.If file size is not less than Second Threshold, then go to step 303; Otherwise flow process terminates.
Second Threshold is the predefined file size needing calculation check value of service end, without positive connection between Second Threshold and first threshold, file verification segment length, but generally there is following relation: verification segment length≤Second Threshold≤first threshold.This is based on following consideration: the first threshold set by client may be change, as little in service end memory space and disposal ability is strong time, first threshold can be set as smaller value, Second Threshold is not less than first threshold, when guarantee service end is mated, find all possible file record; First threshold and Second Threshold generally can not be less than verification segment length, because the efficiency directly transmitted small documents is very high.
In the present embodiment, being set to by Second Threshold with to verify segment length consistent, is also namely KB rank, as 200K.Obviously, Second Threshold is larger, and the calculating pressure of service end is less, can consider determine Second Threshold size according to aspects such as the demand of practical application or server computational power.
Step 303: calculation document check information, and be saved in file record, flow process terminates.
This step adopts the account form consistent with client, the check information of calculation document, as the MD5 check value of whole file, CRC32 check value and file verification section check value.Wherein, the obtain manner of file verification section also must be consistent with client.
Fig. 4 gives the flow process that service end adds file verification information second embodiment.In this flow process, there is not the file record of check information and calculate in automatic cycle acquisition, its step comprises:
Step 401: obtain the file record that there is not check information.
Step 402: whether file size is not less than Second Threshold.If file size is not less than Second Threshold, then go to step 403; Otherwise go to 404.The determination of Second Threshold, consistent with embodiment illustrated in fig. 3.
Step 403: calculation document check information, and be saved in file record.
Step 404: whether there is untreated file record.If existed, then go to step 401; If there is no, then flow process terminates.
Fig. 5 gives the application scenarios of file uploading system embodiment of the present invention.As seen from the figure, file uploading method of the present invention is applicable to general client, server architecture.
Client in Fig. 5 can be one or more, includes but not limited to PC end, Web end and mobile device end; Server can be one or more, can separate or composition Networks for Storage Services between multiple service end.The application entity of client, can be existing, research and develop or the communication equipment that can complete client functionality in Fig. 6 of in the future research and development.The application entity of service end, can be existing, research and develop or the communication equipment that service end function in Fig. 6 can be provided of in the future research and development.
Fig. 6 gives each module and the intermodule reciprocal process of file uploading system embodiment of the present invention.Module wherein and each functions of modules as follows:
Client comprises document management module 601, check value computing module 602 and first communication module 603, and the function of each module of client is specially:
Document management module 601: safeguard pending listed files; Store first threshold, verification segment length and obtain manner; Judge the relation of file size to be uploaded and first threshold; Be not less than the file to be uploaded of first threshold from file size and extract verification section and pass to check value computing module; Monitor from calculate whole file to be uploaded check value to and upload file during, whether file data changes, or locks file in files passe process; By file transfers to be sent to communication module;
Check value computing module 602: realize the check value calculation method consistent with service end, as MD5 checksum CRC 32 verifies; When file size is not less than first threshold, calculation check section check value; When Rapid matching exists occurrence, calculate the check value of whole file to be uploaded;
First communication module 603: set up with service end and communicate, uploads check value information or file and reception server response; Upload file size is less than the file of first threshold; When Rapid matching return there is not occurrence time, by files passe to service end; When exact matching return there is not occurrence time, by files passe to service end.
Server comprises archive information administration module 604, check value matching module 605 and second communication module 606, the function of each module of server:
Archive information administration module 604: the archive information of maintenance service end existing file; Store Second Threshold, verification segment length and obtain manner: realize the check value calculation method consistent with client, as MD5 checksum CRC 32 verifies; When the file success to be uploaded of service end exact matching, add the mapping of file to be uploaded to existing file record; When the file size that service end receives is not less than Second Threshold and there is not overall check value, calculate and preserve overall check value to file record; Preserve overall check value in advance when service end carries out exact matching, if respective file is not finally uploaded or upload procedure file content changing, then this check information is invalid, removes more than the invalid record of certain hour.
Check value matching module 605: the overall check value of coupling verification section check value, file to be uploaded and file description information, and matching result is passed to communication module;
Second communication module 606: set up with client and communicate, receives check value or the file of client upload, the result of Rapid matching, exact matching and files passe is returned client.
Comprising alternately between modules:
Document management module 601 and check value computing module 602: when needing Rapid matching, the verification section of extraction is passed to check value computing module by document management module; When needing exact matching, file content is passed to check value computing module by document management module;
Obviously, because module 601 and module 602 are all in client, the transmission of above-mentioned file content, only need module 601 that the information such as address, length of file or file verification section is passed to module 602, mutual between other module, if module is in client or service end simultaneously, then the transmittance process of file content is similar;
Document management module 601 and first communication module 603: document management module will need upload file delivery of content to communication module;
Check value computing module 602 and first communication module 603: check value computing module will the section of verification check value, file verification value transmit is to communication module;
Archive information administration module 604 and check value matching module 605: check value matching module requires to obtain the file record in archive information administration module; File record is returned to check value matching module by archive information administration module;
Archive information administration module 604 and second communication module 606: communication module receives the file of no parity check information, if file size is not less than Second Threshold, then notify archive information administration module calculation check information and file record.
Check value matching module 605 and second communication module 606: communication module notice check value matching module carries out Rapid matching; Communication module notice check value matching module carries out exact matching; Check value matching module returns matching result to communication module.
First communication module 603 and second communication module 606: module 603 sends request to module 606, comprise check information matching request, files passe request; Module 606 returns response to module 603, comprises Rapid matching result, exact matching result, files passe result.
Claims (10)
1., for the method that duplicate file is uploaded, realize the transmission of file from client to service end, it is characterized in that, comprise the following steps:
(1) client judges whether file to be uploaded is not less than first threshold, if so, then performs step (2), and if not, then client upload file, performs step (8);
(2) client extracts the verification section of file to be uploaded, and calculation check section check value;
(3) service end carries out Rapid matching according to verification section check value and file size, judges whether service end exists occurrence, if so, then performs step (4), and if not, then client upload file, performs step (8);
(4) client calculates the overall check value of whole file to be uploaded;
(5) service end carries out exact matching according to verification section check value, overall check value and file description information, judges whether service end exists occurrence, if, then perform step (6), if not, then client upload file, performs step (7);
(6) service end adds the map record of file to be uploaded to existing file record;
(7) service end record verification section check value, overall check value and file description information, form file record corresponding to file to be uploaded and preserve;
(8) service end receives file to be uploaded, and calculates its file verification information, forms file record corresponding to file to be uploaded and preserves.
2. the method uploaded for duplicate file according to claim 1, is characterized in that, in described step (2), the extraction mode of verification section comprises:
A, be that the data content of verification segment length is as verification section by extracting size from file header to be uploaded; Or
B, be parameter by the size of file to be uploaded, obtain by predefined processing mode the original position verifying section, extracting size is that the data content of verification segment length is as verification section.
3. the method uploaded for duplicate file according to claim 1, is characterized in that, in described step (2), the computational methods of verification section check value comprise MD5 algorithm.
4. the method uploaded for duplicate file according to claim 1, it is characterized in that, in described step (4), when calculating the overall check value of whole file to be uploaded, the computational methods of employing comprise MD5 algorithm and/or SHA-1 hashing algorithm and/or CRC32 checking algorithm.
5. the method uploaded for duplicate file according to claim 1, it is characterized in that, described file record at least comprises recording mechanism, file verification information and file description information, described file verification information comprises file verification section check value and overall check value, and described file description information comprises file name, client modification time and file size.
6. the method uploaded for duplicate file according to claim 1, is characterized in that, described step (7) replaces with:
Service end record verification section check value, overall check value and file description information, and send recording mechanism and it fails to match instruction to client, client detects formally uploads period from calculation document check information to file, whether file data changes, if, then directly upload file, perform step (8), if not, then the recording mechanism received is carried out files passe as uploading one of mark, the information and the information of having preserved that receive file are formed complete file record according to recording mechanism by service end.
7. the method uploaded for duplicate file according to claim 1, is characterized in that, in described step (8), comprises when service end calculates the file verification information receiving file:
Judge whether file size is not less than Second Threshold, if then calculation document check information, file verification information and the information receiving file forms file record, if not, then directly preserve the information formation file record of reception file.
8. the system uploaded for duplicate file, realize the transmission of file from client to service end, it is characterized in that, comprise setting check value computing module, document management module and first communication module in the client and check value matching module, archive information administration module and the second communication module be arranged in service end, wherein
Described check value computing module comprises:
For when file size is not less than first threshold, the unit of calculation check section check value; With
When Rapid matching exists occurrence, calculate the unit of the overall check value of whole file to be uploaded;
Document management module comprises:
For judging the unit of file size and first threshold relation;
Be not less than first from file size explain extraction verification section the file to be uploaded of value and pass to the unit of check value computing module; With
For by file transfers to be uploaded to the unit of first communication module;
First communication module comprises:
For uploading the unit of check value information and file description information;
When file size is less than first threshold, Rapid matching returns when there is not occurrence or exact matching returns when there is not occurrence, by the unit of files passe to be uploaded to service end; With
Receive the unit of service end response;
Check value matching module is used for realizing Rapid matching and exact matching, comprises coupling verification section check value, the overall check value of file to be uploaded and file description information, and matching result is passed to second communication module;
Archive information administration module comprises:
For preserving the unit of file record;
When the success of service end exact matching, add the unit that file to be uploaded maps to corresponding existing file record; With
When the file size that service end receives is not less than Second Threshold and there is not file verification information, calculate and preserve the unit of file verification information;
Second communication module comprises:
For the unit of the check value or file that receive client upload; With
For the result of Rapid matching, exact matching and files passe being returned the unit of client.
9. the system uploaded for duplicate file according to claim 8, is characterized in that, described document management module also comprises:
The overall check value of monitoring from calculating whole file to be uploaded formally uploads period to file, the unit whether file data changes.
10. the system uploaded for duplicate file according to claim 8, is characterized in that, described archive information administration module also comprises:
The file of exact matching failure exceedes certain hour and does not upload, then remove the unit of corresponding file verification value and the file description information of having preserved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410712783.6A CN104410692B (en) | 2014-11-28 | 2014-11-28 | A kind of method and system uploaded for duplicate file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410712783.6A CN104410692B (en) | 2014-11-28 | 2014-11-28 | A kind of method and system uploaded for duplicate file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104410692A true CN104410692A (en) | 2015-03-11 |
CN104410692B CN104410692B (en) | 2019-03-22 |
Family
ID=52648287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410712783.6A Active CN104410692B (en) | 2014-11-28 | 2014-11-28 | A kind of method and system uploaded for duplicate file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104410692B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787041A (en) * | 2016-02-26 | 2016-07-20 | 中国银联股份有限公司 | Large file comparison method and comparison system based on data characteristic codes |
CN106101274A (en) * | 2016-08-10 | 2016-11-09 | 玉环看知信息科技有限公司 | A kind of document transmission method, Apparatus and system |
CN106649676A (en) * | 2016-12-15 | 2017-05-10 | 北京锐安科技有限公司 | Duplication eliminating method and device based on HDFS storage file |
CN106845278A (en) * | 2016-12-26 | 2017-06-13 | 武汉斗鱼网络科技有限公司 | A kind of file verification method and system |
CN108205632A (en) * | 2016-12-20 | 2018-06-26 | 北京小米移动软件有限公司 | System area method of calibration and device |
CN108229162A (en) * | 2016-12-15 | 2018-06-29 | 中标软件有限公司 | A kind of implementation method of cloud platform virtual machine completeness check |
CN109309651A (en) * | 2017-07-28 | 2019-02-05 | 阿里巴巴集团控股有限公司 | A kind of document transmission method, device, equipment and storage medium |
CN110457278A (en) * | 2018-05-07 | 2019-11-15 | 百度在线网络技术(北京)有限公司 | A kind of document copying method, device, equipment and storage medium |
CN110704439A (en) * | 2019-09-27 | 2020-01-17 | 北京智道合创科技有限公司 | Data storage method and device |
CN110995679A (en) * | 2019-11-22 | 2020-04-10 | 杭州迪普科技股份有限公司 | File data flow control method, device, equipment and storage medium |
CN111314314A (en) * | 2020-01-20 | 2020-06-19 | 苏州浪潮智能科技有限公司 | Method and system for verifying integrity of website download file |
CN112631514A (en) * | 2020-12-17 | 2021-04-09 | 龙存科技(北京)股份有限公司 | File duplicate removal method and system applied to cloud disk system |
CN114168537A (en) * | 2021-11-27 | 2022-03-11 | 深圳市连用科技有限公司 | Method for uploading file and terminal equipment |
CN114401147A (en) * | 2022-01-20 | 2022-04-26 | 山西晟视汇智科技有限公司 | New energy power station communication message comparison method and system based on abstract algorithm |
CN114422503A (en) * | 2022-01-24 | 2022-04-29 | 深圳市云语科技有限公司 | Method for intelligently selecting file transmission mode of multi-node file transmission system |
CN114615258A (en) * | 2022-03-28 | 2022-06-10 | 重庆长安汽车股份有限公司 | Method and device for uploading large files to file server in fragmented manner |
CN116527539A (en) * | 2023-05-15 | 2023-08-01 | 合芯科技(苏州)有限公司 | Data consistency verification method and device and computer equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103248711A (en) * | 2013-05-23 | 2013-08-14 | 华为技术有限公司 | File uploading method and server |
CN103581230A (en) * | 2012-07-26 | 2014-02-12 | 深圳市腾讯计算机系统有限公司 | File transmission system and method, receiving end and sending end |
CN103714123A (en) * | 2013-12-06 | 2014-04-09 | 西安工程大学 | Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise |
-
2014
- 2014-11-28 CN CN201410712783.6A patent/CN104410692B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103581230A (en) * | 2012-07-26 | 2014-02-12 | 深圳市腾讯计算机系统有限公司 | File transmission system and method, receiving end and sending end |
CN103248711A (en) * | 2013-05-23 | 2013-08-14 | 华为技术有限公司 | File uploading method and server |
CN103714123A (en) * | 2013-12-06 | 2014-04-09 | 西安工程大学 | Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787041B (en) * | 2016-02-26 | 2019-08-13 | 中国银联股份有限公司 | Big file comparison method and Compare System based on data characteristics code |
CN105787041A (en) * | 2016-02-26 | 2016-07-20 | 中国银联股份有限公司 | Large file comparison method and comparison system based on data characteristic codes |
CN106101274A (en) * | 2016-08-10 | 2016-11-09 | 玉环看知信息科技有限公司 | A kind of document transmission method, Apparatus and system |
CN106649676A (en) * | 2016-12-15 | 2017-05-10 | 北京锐安科技有限公司 | Duplication eliminating method and device based on HDFS storage file |
CN108229162B (en) * | 2016-12-15 | 2021-10-08 | 中标软件有限公司 | Method for realizing integrity check of cloud platform virtual machine |
CN108229162A (en) * | 2016-12-15 | 2018-06-29 | 中标软件有限公司 | A kind of implementation method of cloud platform virtual machine completeness check |
CN108205632A (en) * | 2016-12-20 | 2018-06-26 | 北京小米移动软件有限公司 | System area method of calibration and device |
CN106845278A (en) * | 2016-12-26 | 2017-06-13 | 武汉斗鱼网络科技有限公司 | A kind of file verification method and system |
CN109309651A (en) * | 2017-07-28 | 2019-02-05 | 阿里巴巴集团控股有限公司 | A kind of document transmission method, device, equipment and storage medium |
CN109309651B (en) * | 2017-07-28 | 2021-12-28 | 斑马智行网络(香港)有限公司 | File transmission method, device, equipment and storage medium |
CN110457278A (en) * | 2018-05-07 | 2019-11-15 | 百度在线网络技术(北京)有限公司 | A kind of document copying method, device, equipment and storage medium |
CN110704439A (en) * | 2019-09-27 | 2020-01-17 | 北京智道合创科技有限公司 | Data storage method and device |
CN110704439B (en) * | 2019-09-27 | 2022-07-29 | 北京智道合创科技有限公司 | Data storage method and device |
CN110995679A (en) * | 2019-11-22 | 2020-04-10 | 杭州迪普科技股份有限公司 | File data flow control method, device, equipment and storage medium |
CN110995679B (en) * | 2019-11-22 | 2022-03-01 | 杭州迪普科技股份有限公司 | File data flow control method, device, equipment and storage medium |
CN111314314A (en) * | 2020-01-20 | 2020-06-19 | 苏州浪潮智能科技有限公司 | Method and system for verifying integrity of website download file |
CN112631514A (en) * | 2020-12-17 | 2021-04-09 | 龙存科技(北京)股份有限公司 | File duplicate removal method and system applied to cloud disk system |
CN114168537A (en) * | 2021-11-27 | 2022-03-11 | 深圳市连用科技有限公司 | Method for uploading file and terminal equipment |
CN114401147A (en) * | 2022-01-20 | 2022-04-26 | 山西晟视汇智科技有限公司 | New energy power station communication message comparison method and system based on abstract algorithm |
CN114401147B (en) * | 2022-01-20 | 2024-02-20 | 山西晟视汇智科技有限公司 | New energy power station communication message comparison method and system based on abstract algorithm |
CN114422503A (en) * | 2022-01-24 | 2022-04-29 | 深圳市云语科技有限公司 | Method for intelligently selecting file transmission mode of multi-node file transmission system |
CN114422503B (en) * | 2022-01-24 | 2024-01-30 | 深圳市云语科技有限公司 | Method for intelligently selecting file transmission mode by multi-node file transmission system |
CN114615258A (en) * | 2022-03-28 | 2022-06-10 | 重庆长安汽车股份有限公司 | Method and device for uploading large files to file server in fragmented manner |
CN116527539A (en) * | 2023-05-15 | 2023-08-01 | 合芯科技(苏州)有限公司 | Data consistency verification method and device and computer equipment |
CN116527539B (en) * | 2023-05-15 | 2023-11-28 | 合芯科技(苏州)有限公司 | Data consistency verification method and device and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN104410692B (en) | 2019-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104410692A (en) | Method and system for uploading duplicated files | |
US20160057201A1 (en) | File Uploading Method, Client, and Application Server in Cloud Storage, and Cloud Storage System | |
US9514209B2 (en) | Data processing method and data processing device | |
CN102355426B (en) | Method for transmitting off-line file and system | |
CN110929880A (en) | Method and device for federated learning and computer readable storage medium | |
CN105025106B (en) | A kind of method of the breakpoint transmission based on piecemeal and metamessage | |
US20210227007A1 (en) | Data storage method, encoding device, and decoding device | |
WO2017215646A1 (en) | Data transmission method and apparatus | |
CN102325167A (en) | Verifying method for network file transmission | |
CN103023796B (en) | network data compression method and system | |
CN103916483A (en) | Self-adaptation data storage and reconstruction method for coding redundancy storage system | |
CN103795765A (en) | File uploading verification method and system | |
CN105656981A (en) | Data transmission method and system | |
CN103580945A (en) | Method and device for generating testing data for complex service system | |
CN103731499B (en) | Terminal and document transmission method | |
CN105302676A (en) | Method and apparatus for transmitting host and backup mechanism data of distributed file system | |
CN106790334A (en) | A kind of page data transmission method and system | |
CN114201421A (en) | Data stream processing method, storage control node and readable storage medium | |
WO2021068891A1 (en) | Method, system, electronic device, and storage medium for storing and collecting temperature data | |
CN104462562A (en) | Data migration system and method based on data warehouse automation | |
CN106203179B (en) | A kind of completeness check system and method for pair of file | |
CN104317716A (en) | Method for transmitting data among distributed nodes and distributed node equipment | |
EP3579526B1 (en) | Resource file feedback method and apparatus | |
CN104023070A (en) | File compression method based on cloud storage | |
CN110912904B (en) | Malicious device identification method and device, storage medium and computer device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 201112 Shanghai, Minhang District, United Airlines route 1188, building second layer A-1 unit 8 Applicant after: SHANGHAI EISOO INFORMATION TECHNOLOGY CO., LTD. Address before: 201112 Shanghai, Minhang District, United Airlines route 1188, building second layer A-1 unit 8 Applicant before: Shanghai Eisoo Software Co.,Ltd. |
|
COR | Change of bibliographic data | ||
GR01 | Patent grant | ||
GR01 | Patent grant |