CN104679830A - File processing method and device - Google Patents

File processing method and device Download PDF

Info

Publication number
CN104679830A
CN104679830A CN201510051466.9A CN201510051466A CN104679830A CN 104679830 A CN104679830 A CN 104679830A CN 201510051466 A CN201510051466 A CN 201510051466A CN 104679830 A CN104679830 A CN 104679830A
Authority
CN
China
Prior art keywords
file
fragmentation
burst
uploaded
file fragmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510051466.9A
Other languages
Chinese (zh)
Inventor
宋健
魏泽涛
薛伟
胡勇
陈翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LeTV Information Technology Beijing Co Ltd
Original Assignee
LeTV Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LeTV Information Technology Beijing Co Ltd filed Critical LeTV Information Technology Beijing Co Ltd
Priority to CN201510051466.9A priority Critical patent/CN104679830A/en
Publication of CN104679830A publication Critical patent/CN104679830A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a file processing method and device in order to solve problems about database performance due to too high occupancy of memory and data processing capacity is lowered. The file processing method includes: when file segments of a to-be-uploaded file are detected to be uploaded on a client side, determining content data of the file segments allowed to be uploaded according to segment description information stored in a memory database; receiving the content data of the file segments uploaded on the client side; when the content data of the whole file segments of the to-be-uploaded file are uploaded, generating metainformation of the files stored in the memory database; shifting and storing the metainformation to a disk database from the memory database. Therefore, file processing efficiency is improved, memory occupancy is reduced, impact on the database performance is avoided, and cost is lowered.

Description

A kind of document handling method and device
Technical field
The present invention relates to File Technology field, particularly relate to a kind of document handling method and device.
Background technology
File system is a kind of for providing the mechanism of data access to user.From system perspective, file system is organized the space of file-storage device and distributes, be responsible for file store and to stored in the file system protecting and retrieve.When receiving the file of client upload, this file is stored in corresponding storage space, when receiving the request of download file, corresponding file is searched from storage space, and issue the file found, and achieve the uploading of file, download, convenient for users to use.
After the content-data of a file has been uploaded, the metamessage of this file will be generated, this metamessage have recorded the relevant information of file, for the attribute of description document, and the fingerprint of such as file, title, creation-time, amendment and access time, file permission, map information etc.In current method, consider that the inquiry velocity of memory database is fast, therefore the usual metamessage by the file of above-mentioned generation is stored in memory database, to improve inquiry velocity.
But, in said method, a large amount of information is stored in memory database, along with the continuous expansion of business scale, internal memory occupation rate will be caused significantly to promote, and because internal memory occupation rate is too high, database performance also can be caused to go wrong, reduce data-handling capacity.
Summary of the invention
The invention provides a kind of document handling method and device, too high to solve internal memory occupation rate, cause database performance to go wrong, reduce the problem of data-handling capacity.
In order to solve the problem, the invention discloses a kind of document handling method, comprising:
When detecting that client starts the file fragmentation uploading file to be uploaded, determine the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database;
Receive the content-data of the described file fragmentation of described client upload;
When after the content-data of all files burst uploading described file to be uploaded, generate the metamessage of the described file be stored in described memory database;
By described metamessage from described memory database unloading to disk database.
Preferably, when detecting that client starts the file fragmentation uploading file to be uploaded, determine the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database before, also comprise:
Determine the burst descriptor of the file fragmentation that file to be uploaded is corresponding according to the files passe request of client transmission, and described burst descriptor is stored in memory database;
According to described files passe request, described burst descriptor is back to described client, uploads the file fragmentation of described file to be uploaded to make described client according to described burst descriptor.
Preferably, determine the burst descriptor of the file fragmentation that file to be uploaded is corresponding in the files passe request sent according to client before, also comprise:
Receive the files passe request that client sends, and to detect described files passe request be legitimate request.
Preferably, described file fragmentation comprises segmental identification and content-data, and described burst descriptor comprises segmental identification and fragmentation state, and wherein, described fragmentation state comprises not to be uploaded, uploading and uploading;
Determine the content-data allowing to upload described file fragmentation according to the burst descriptor of the file fragmentation be stored in memory database, comprising:
The fragmentation state that the segmental identification of described file fragmentation is corresponding is searched from the burst descriptor of described file fragmentation;
When described fragmentation state is not for uploading, determine the content-data allowing to upload described file fragmentation.
Preferably, before the fragmentation state that the segmental identification of searching described file fragmentation from the burst descriptor of described file fragmentation is corresponding, also comprise:
Determine that described file fragmentation is not out of date according to the burst descriptor of described file fragmentation.
Preferably, described burst descriptor also comprises burst expired time;
Determine that described file fragmentation is not out of date according to the burst descriptor of described file fragmentation, comprising:
Obtain current time, and the burst expired time that the segmental identification of searching described file fragmentation from the burst descriptor of described file fragmentation is corresponding;
Burst expired time corresponding with the segmental identification of described file fragmentation for current time is compared;
If current time is early than described expired time, then determine that described file fragmentation is not out of date.
Preferably, described file fragmentation comprises segmental identification, and described burst descriptor comprises segmental identification and fragmentation state;
Before the content-data of described file fragmentation receiving described client upload, also comprise:
Fragmentation state corresponding to the segmental identification of the described file fragmentation described burst descriptor comprised is updated to be uploaded;
After the content-data of described file fragmentation receiving described client upload, also comprise:
When the content-data of described file fragmentation has been uploaded, fragmentation state corresponding to the segmental identification of the described file fragmentation described burst descriptor comprised is updated to have been uploaded.
In order to solve the problem, the invention also discloses a kind of document handling apparatus, comprising:
Uploading determination module, during for detecting that client starts the file fragmentation uploading file to be uploaded, determining the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database;
Content receiver module, for receiving the content-data of the described file fragmentation of described client upload;
Information generating module, for when after the content-data of all files burst uploading described file to be uploaded, generates the metamessage of the described file be stored in described memory database;
Information unloading module, for by described metamessage from described memory database unloading to disk database.
Preferably, described device also comprises:
Information determination module, for when uploading determination module and detecting that client starts the file fragmentation uploading file to be uploaded, determine the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database before, determine the burst descriptor of the file fragmentation that file to be uploaded is corresponding according to the files passe request of client transmission, and described burst descriptor is stored in memory database;
Information returns module, for described burst descriptor being back to described client according to described files passe request, uploads the file fragmentation of described file to be uploaded to make described client according to described burst descriptor.
Preferably, described device also comprises:
Request receiving module, before the burst descriptor of the file fragmentation that file to be uploaded is corresponding is determined in files passe request for sending according to client in information determination module, receive the files passe request that client sends, and to detect described files passe request be legitimate request.
Compared with prior art, the present invention includes following advantage:
When detecting in the present invention that client starts the file fragmentation uploading file to be uploaded, determine according to the burst descriptor be stored in memory database the content-data allowing to upload this file fragmentation, receive the content-data of the file fragmentation of client upload; Then, when after the content-data of all files burst uploading file to be uploaded, generate the metamessage of the file be stored in memory database; Finally, by the metamessage of file from memory database unloading to disk database.Therefore, on the one hand, in the process of upload file burst, adopt memory database to preserve relevant information, thus improve the treatment effeciency of file; On the other hand, after the content-data of file has been uploaded, the metamessage of file is from memory database unloading to disk database the most at last, thus reduces internal memory occupation rate, avoids having an impact to database performance, reduces costs.
Accompanying drawing explanation
Fig. 1 is the flow chart of steps of a kind of document handling method of the embodiment of the present invention one;
Fig. 2 is the flow chart of steps of a kind of document handling method of the embodiment of the present invention two;
Fig. 3 is the structured flowchart of a kind of document handling apparatus of the embodiment of the present invention three;
Fig. 4 is the structured flowchart of a kind of document handling apparatus of the embodiment of the present invention four.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
Embodiment one
With reference to Fig. 1, show the flow chart of steps of a kind of document handling method of the embodiment of the present invention one.
The document handling method of the present embodiment can comprise the following steps:
Step 101, when detecting that client starts the file fragmentation uploading file to be uploaded, determines according to the burst descriptor be stored in memory database the content-data allowing upload file burst.
In the present embodiment, client is when upload file, can upload with the form of file fragmentation, file fragmentation can comprise the content-data of this file fragmentation and the burst descriptor of this file fragmentation, these burst descriptors have been stored in memory database in advance, therefore server end is when detecting that client starts the file fragmentation uploading file to be uploaded, can determine whether according to the burst descriptor stored in memory database the content-data allowing this this file fragmentation of client upload.In the present embodiment, determine according to the burst descriptor be stored in memory database the content-data allowing upload file burst.
Step 102, receives the content-data of the file fragmentation of client upload.
Determine the content-data allowing this this file fragmentation of client upload in a step 101, now can continue the content-data of this file fragmentation receiving client upload, after the content-data receiving file fragmentation, these content-datas can be stored in default distributed storage space.
Step 103, when after the content-data of all files burst uploading file to be uploaded, generates the metamessage of the file be stored in memory database.
When after the content-data of all files burst uploading file to be uploaded, the metamessage of this file can be generated, the metamessage of file have recorded the relevant information of file, for the attribute of description document, the fingerprint of such as file, title, creation-time, amendment and access time, file permission, map information etc.After the metamessage of spanned file, can be stored in memory database by this metamessage, the metamessage of file can provide foundation for subsequent query file.
Step 104, by metamessage from memory database unloading to disk database.
In the present embodiment, consider if a large amount of information is all stored in memory database, raise causing the occupation rate of memory database, thus affect the performance of database, increase cost, after the metamessage of therefore spanned file, also further by this metamessage from memory database unloading to disk database, namely metamessage is no longer stored in memory database, thus avoids these metamessages taking memory headroom.
In the present embodiment, simple introduction is carried out to each step above-mentioned, for the detailed process of each step, described in detail in embodiment two below.
In the embodiment of the present invention, on the one hand, in the process of upload file burst, adopt memory database to preserve relevant information, thus improve the treatment effeciency of file; On the other hand, after the content-data of file has been uploaded, the metamessage of file is from memory database unloading to disk database the most at last, thus reduces internal memory occupation rate, avoids having an impact to database performance, reduces costs.
Embodiment two
With reference to Fig. 2, show the flow chart of steps of a kind of document handling method of the embodiment of the present invention two.
The document handling method of the present embodiment can comprise the following steps:
Whether step 201, receive the files passe request that client sends, and to detect files passe request be legitimate request.
When client needs upload file, first files passe request is sent to server end, some relevant informations can be comprised in this file upload request, as sent URL corresponding to this file upload request (Uniform Resource Locator, URL(uniform resource locator)), the information such as title, size of file to be uploaded.In the present embodiment, whether this file upload request of the infomation detection that can comprise according to files passe request is legitimate request.
Below, whether can detect this file upload request for the URL corresponding according to this file upload request of transmission is that legitimate request is introduced.
Such as, the form of blacklist can be adopted to detect, pre-set URL blacklist, this blacklist comprises multiple dangerous URL, URL corresponding for this file upload request of above-mentioned transmission is mated with this blacklist, if blacklist comprises the dangerous URL matched with this URL, then determines that this file upload request is illegal request, otherwise, determine that this file upload request is legitimate request.
Again such as, the form of white list can be adopted to detect, pre-set URL white list, this white list comprises multiple safe URL, URL corresponding for this file upload request of above-mentioned transmission is mated with this white list, if white list comprises the safe URL matched with this URL, then determines that this file upload request is legitimate request, otherwise, determine that this file upload request is illegal request.
It should be noted that, matching wherein can for mating and semi-match completely, and those skilled in the art can carry out related setting according to actual conditions, and the present embodiment is not limited this.
Certainly, whether the URL that other modes can also be adopted in the present embodiment corresponding according to this file upload request of transmission detects this file upload request is legitimate request, such as detect in this URL and whether comprise default specific character, if comprise, then determine that this file upload request is illegal request, otherwise determine that this file upload request is legitimate request etc., the present embodiment is not limited this.
If detect that files passe request is legitimate request, then perform step 202; If detect that files passe request is illegal request, then can not receive the file of this client upload, and information can be returned to this client, not allow its upload file to notify that client is current.
Step 202, determines the burst descriptor of the file fragmentation that file to be uploaded is corresponding, and burst descriptor is stored in memory database according to files passe request.
In the present embodiment, this step 202 can comprise following sub-step a1 ~ sub-step a3:
Sub-step a1, searches the burst descriptor whether having stored file fragmentation corresponding to file to be uploaded in memory database according to files passe request; If so, then sub-step a2 is performed; If not, then sub-step a3 is performed;
The information such as title, fingerprint of file to be uploaded can also be comprised in this file upload request, the burst descriptor whether having stored file fragmentation corresponding to file to be uploaded in memory database can be searched according to these information.Such as, the burst descriptor stored in memory database can comprise the fingerprint of file belonging to burst, therefore the fingerprint whether comprising file to be uploaded can be searched from the fingerprint of file belonging to the burst that burst descriptor comprises, if comprise, then determine the burst descriptor having stored file fragmentation corresponding to file to be uploaded in memory database.
Sub-step a2, is defined as the burst descriptor of file fragmentation corresponding to this file to be uploaded by the burst descriptor found;
It should be noted that, in this kind of situation, do not need again burst descriptor to be stored in memory database.
Sub-step a3, treats upload file according to files passe request and carries out burst, obtains the burst descriptor of file fragmentation corresponding to file to be uploaded, and burst descriptor is stored in memory database.
In one preferred embodiment of the invention, the burst rule that can pre-set according to server end is treated upload file and is carried out burst, in this kind of situation, can also comprise the size of file to be uploaded in above-mentioned files passe request.Therefore, treating according to files passe request the process that upload file carries out burst can be: when the size of file to be uploaded is less than or equal to default normal size, using file to be uploaded as a file fragmentation; When the size of file to be uploaded is greater than default normal size, from the first character joint of the content-data of file to be uploaded, mark off the file fragmentation that multiple size equals above-mentioned normal size in order, till no longer there is size and equaling the file fragmentation of normal size.Such as, normal size is 16MB, and the size of this file to be uploaded is 40MB, then can mark off 3 file fragmentations, and the size of first file fragmentation and second file fragmentation is 16MB, and the size of the 3rd file fragmentation is 8MB; Again such as, normal size is 8MB, and the size of this file is 48MB, then can mark off 3 file fragmentations, and the size of first file fragmentation, second file fragmentation and the 3rd file fragmentation is 16MB.
In another preferred embodiment of the invention, the burst rule can specified according to client is treated upload file and is carried out burst, in this kind of situation, can also comprise the size of file to be uploaded, the quantity of file fragmentation in above-mentioned files passe request.Therefore, treating according to files passe request the process that upload file carries out burst can be: determine when being the file fragmentation of above-mentioned quantity by Divide File to be uploaded, whether the size of each file fragmentation is less than or equal to default normal size; If so, then by file fragmentation that Divide File to be uploaded is above-mentioned quantity; If not, then, from the first character of the content-data of file to be uploaded joint, mark off the file fragmentation that multiple size equals above-mentioned normal size in order, till no longer there is size and equaling the file fragmentation of normal size.
Through said process, be after one or more file fragmentation by Divide File to be uploaded, determine the burst descriptor of each file fragmentation, burst descriptor can comprise: segmental identification (as each file fragmentation serial number), burst expired time (in the light of actual conditions arranging for each file fragmentation), (fragmentation state can comprise and not upload fragmentation state, uploading and uploading, the fragmentation state of each file fragmentation is set after burst for not upload, such as fragmentation state can be set to segmental identification, represent and do not upload), burst position (the start-stop position of each file fragmentation in whole file), etc..In the present embodiment, burst descriptor can be stored by the form of Hash table, adopt a Hash table unification to safeguard burst descriptor.
Step 203, is back to client according to files passe request by burst descriptor, uploads the file fragmentation of file to be uploaded to make client according to burst descriptor.
In the present embodiment, memory database supports that (so-called atomicity operation refers to the operation that can not be interrupted by thread scheduling mechanism in atomicity operation, when multi-thread access resource, the every other thread resource that interior access is not identical at one time can be guaranteed), therefore client can pass through multithreading upload file burst, thus improves the upper transfer efficiency of file further.
Therefore, the number uploading thread, title etc. that can also comprise client in above-mentioned files passe request upload thread information, and burst descriptor is back to client according to above-mentioned thread information of uploading by server end.Detailed process can be: if the number of file fragmentation that residue corresponding to file to be uploaded is not uploaded is more than or equal to the number uploading thread, then according to the order remaining the segmental identification of file fragmentation do not uploaded, from first file fragmentation, what the burst descriptor of file fragmentation with the number same number uploading thread is back to client uploads in thread, wherein, the burst descriptor that thread returns a file fragmentation is uploaded for one; If the number of the file fragmentation that the residue that file to be uploaded is corresponding is not uploaded is less than the number uploading thread, then according to the order remaining the segmental identification of file fragmentation do not uploaded, the burst descriptor of the file fragmentation except last file fragmentation is back to and equal with the number of file fragmentation uploads in thread, wherein, upload the burst descriptor that thread returns a file fragmentation, the burst descriptor of last file fragmentation is back to residue and uploads in thread for one.
Such as, the number uploading thread of client is 2, title is respectively and uploads thread 1 and upload thread 2, the number of the file fragmentation that the residue that file to be uploaded is corresponding is not uploaded is 3, segmental identification is respectively burst 1, burst 2, burst 3, then the burst descriptor of burst 1 being back to uploads in thread 1, the burst descriptor of burst 2 is back to and uploads in thread 2.Again such as, the number uploading thread of client is 2, title is respectively and uploads thread 1 and upload thread 2, the number of the file fragmentation that the residue that file to be uploaded is corresponding is not uploaded is 1, segmental identification is burst 3, then the burst descriptor of burst 3 be back to and upload thread 1 and upload in thread 2.
Each of client is uploaded after each leisure of thread receives burst descriptor, the file fragmentation of file to be uploaded will be uploaded according to this burst descriptor, as extracted the file fragmentation of correspondence position from file to be uploaded according to burst position, and this file fragmentation is uploaded onto the server.This file fragmentation uploaded can comprise segmental identification and content-data, and content-data is the True Data of this file fragmentation.
Step 204, when detecting that client starts the file fragmentation uploading file to be uploaded, determines whether according to the burst descriptor be stored in memory database the content-data allowing upload file burst.
Client is after receiving the burst descriptor of file fragmentation, each uploads thread can upload file fragmentation corresponding to the burst descriptor that receives separately, server, when receiving the file fragmentation of client upload, can determine that client starts to upload the file fragmentation of file to be uploaded.When detecting that client starts the file fragmentation uploading file to be uploaded, determine whether according to the burst descriptor be stored in memory database the content-data allowing client upload file fragmentation further.When determining permission, perform step 205; When determine do not allow time, then can not receive the content-data of this file fragmentation of this client upload, and information can be returned to this client, not allow its content-data uploading this file fragmentation to notify that client is current.
In one preferred embodiment of the invention, this step 204 can comprise following sub-step b1:
Sub-step b1, the fragmentation state that the segmental identification of locating file burst is corresponding from the burst descriptor of file fragmentation; When fragmentation state is not for uploading, determine the content-data allowing upload file burst; When fragmentation state is for uploading or uploading, determine the content-data not allowing upload file burst.
In the present embodiment, aforesaid way defines and only allows one to upload thread for each file fragmentation to upload, do not allow multithreading or multi-client to upload same file burst.
Such as, the number uploading thread of client is 2, title is respectively and uploads thread 1 and upload thread 2, and the number of the file fragmentation that the residue that file to be uploaded is corresponding is not uploaded is 1, segmental identification is burst 3.Upload thread 1 and upload after in thread 2 the burst descriptor of burst 3 is back to, uploading thread 1 and upload thread 2 and all upload burst 3.If first detect that uploading thread 1 starts to upload burst 3, and obtain the fragmentation state of burst 3 for not upload according to the burst descriptor be stored in memory database, then determine to allow to upload the content-data that thread 1 uploads burst 3, and the fragmentation state of this burst 3 is updated to uploads; Then detect that uploading thread 2 starts to upload burst 3, and obtain the fragmentation state of burst 3 for upload according to the burst descriptor be stored in memory database, then determine not allow to upload the content-data that thread 2 uploads burst 3, information can also be returned, to notify not allow to upload the content-data that thread 2 uploads burst 3 to client.
In another preferred embodiment of the invention, this step 204 can comprise following sub-step c1 ~ sub-step c2:
Sub-step c1, whether not out of date according to the burst descriptor determination file fragmentation of file fragmentation; When not out of date, perform sub-step c2; When time out of date, determine the content-data not allowing upload file burst;
Sub-step c2, the fragmentation state that the segmental identification of locating file burst is corresponding from the burst descriptor of file fragmentation; When fragmentation state is not for uploading, determine the content-data allowing upload file burst; When fragmentation state is for uploading or uploading, determine the content-data not allowing upload file burst.
In another preferred embodiment of the present invention, this step 204 can comprise following sub-step d1 ~ sub-step d3:
Sub-step d1, the fragmentation state that the segmental identification of locating file burst is corresponding from the burst descriptor of file fragmentation;
Sub-step d2, when fragmentation state is not for uploading, whether not out of date according to the burst descriptor determination file fragmentation of file fragmentation; When not out of date, determine the content-data allowing upload file burst; When time out of date, determine the content-data not allowing upload file burst;
Sub-step d3, when fragmentation state is for uploading or uploading, determines the content-data not allowing upload file burst.
The above-mentioned process whether out of date according to burst descriptor determination file fragmentation can comprise: obtain current time, and the burst expired time that the segmental identification of locating file burst is corresponding from the burst descriptor of file fragmentation; Burst expired time corresponding with the segmental identification of file fragmentation for current time is compared; If current time is early than expired time, then determine that this file fragmentation is not out of date; Otherwise, determine that this file fragmentation is out of date.
Step 205, receives the content-data of the file fragmentation of client upload.
In the present embodiment, when determining the content-data allowing upload file burst, can receive the content-data of the file fragmentation of client upload, the content-data of the file fragmentation received can be stored in default distributed storage space.
In the present embodiment, after determining the content-data allowing upload file burst, namely before receiving the content-data of the file fragmentation of client upload, fragmentation state corresponding to the segmental identification of this file fragmentation that burst descriptor can also be comprised is updated to be uploaded (such as fragmentation state can be updated to current time stamp, represent and upload); When the content-data of this file fragmentation has been uploaded, fragmentation state corresponding to the segmental identification of this file fragmentation burst descriptor comprised is updated to be uploaded (such as fragmentation state can be updated to the negative value of segmental identification, represent and upload).
It should be noted that, when burst descriptor being back to client according to files passe request in step 203, the ID (Identity, mark) that this client is distributed can also be returned as simultaneously.This ID can be used for when breakpoint transmission determining whether same client upload, if then allow to upload; This ID also can when inquiry for inquiring about the information once uploaded, and the size, the state of burst, the fingerprint of file etc. which platform corresponding uploads machine, file is uploaded in such as this time, and the embodiment of the present invention is discussed no longer in detail to this.
Step 206, judges whether the content-data of all files burst of file to be uploaded has been uploaded.
After the content-data of the file fragmentation in step 205 has been uploaded, judge whether the content-data of all files burst of file to be uploaded has been uploaded further.If not, then return and perform step 203, continue the content-data uploading the file fragmentation that residue is not uploaded to make client; If so, then step 207 is performed.
In the present embodiment, the burst descriptor of file fragmentation that can be corresponding by this file to be uploaded stored in memory database judges whether the content-data of all files burst of file to be uploaded has been uploaded.Such as, judged by fragmentation state, when the fragmentation state of all files burst of file to be uploaded be upload time, determine that the content-data of all files burst of file to be uploaded has been uploaded.
Step 207, when after the content-data of all files burst uploading file to be uploaded, generates the metamessage of the file be stored in memory database.
After the content-data of file fragmentation receiving client upload, this content-data is stored in default distributed storage space, in storing process, the content-data of each file fragmentation is according to the sequential storage of its segmental identification, after the content-data of all files burst of file to be uploaded has been uploaded, the full content data of this file to be uploaded have been stored in distributed storage space all.Now can generate the metamessage of this file, this metamessage can comprise the information such as fingerprint, the memory location of file in distributed storage space, the size of file, the title of file of file.After the metamessage of spanned file, this metamessage is stored in memory database.
In one preferred embodiment of the invention, after the content-data of all files burst of file to be uploaded has been uploaded, can also verify this file.According to the fingerprint of the content-data calculation document of this file received, and the fingerprint that the files passe request received in the fingerprint of this calculating and step 201 comprises is compared; If identical, then determine that this files passe is correct, can continue to perform subsequent step; If different, then may there is mistake (as file content is tampered) in this file in upload procedure, now can return information to client, to notify that mistake appears in the file of client upload.
In the present embodiment, the fingerprint of this file can be calculated in the following ways: when the size of this file is less than or equal to predetermined threshold value (such as 512KB), content-data for file calculates unique HASH (Hash) value, and this unique HASH value namely can as this file fingerprint; When the size of this file is greater than predetermined threshold value, multiple data block is extracted from the content-data of this file, total size of the plurality of data block equals predetermined threshold value, one is become to splice data block according to original sequential concatenation the plurality of data block, calculate unique HASH value for this splicing data block and the combination of the size of this file, this unique HASH value namely can as this file fingerprint.
Wherein, MD5 algorithm (Message Digest Algorithm MD5, Message Digest Algorithm 5) etc. can be adopted to calculate unique HASH value of content-data.Treatment effeciency can be improved further by the mode calculating HASH value.
Wherein, from the content-data of this file, extract multiple data block can in the following way: the content-data of this file is on average divided into m data block, from each data block, extract the data block that size is n respectively (data block that continuous print size is n can be extracted from first character joint, also the data block that continuous print size is n can be extracted from other bytes, the extracting mode of each data block can be identical, also can be different), the product of m and n equals above-mentioned predetermined threshold value.Such as, predetermined threshold value is 512KB, and the size of file is 800KB, then the content-data of this file being on average divided into 8 sizes is the data block of 100KB, extracts the data block that size is 64KB from each data block respectively.Certainly, the mode of the multiple data block of this extraction is just for citing, other modes can also be adopted in the present embodiment to extract, such as, extract according to predetermined interval, predetermined interval wherein can calculate according to the size of the size of this file, above-mentioned predetermined threshold value and each data block, and the size of each data block can be identical, also can be different, the embodiment of the present invention is not limited this.
Step 208, by metamessage from memory database unloading to disk database.
In the present embodiment, in order to reduce the occupation rate to internal memory, also further by the metamessage of above-mentioned file from memory database unloading to disk database, namely no longer adopt the metamessage of memory database storage file.But because disk database does not support that atomicity operates, access efficiency is lower, therefore in order to support that atomicity operates, improving the upper transfer efficiency of file, in files passe process, still adopting internal storage data library storage burst descriptor.Also namely, the form adopting memory database and disk database two kinds to combine in the present embodiment processes, thus both ensure that the efficiency of files passe process, turn avoid too much taking internal memory.
Through said process, when subsequent query file, receive the file query requests that client sends, from disk database, first inquire about the metamessage of this file according to this file query requests; If find from disk database, then from distributed storage space, search the content-data of respective file according to the metamessage found; If do not find from disk database, from memory database, the metamessage (this kind of situation corresponds to metamessage at spanned file and after being stored to memory database, the situation before also this metamessage not being stored to disk database) of this file is inquired about again according to this file query requests; If find from memory database, then from distributed storage space, search the content-data of respective file according to the metamessage found; If do not find from memory database, then determine that client does not upload this file, therefore can return informations such as not finding.
In the embodiment of the present invention, utilize the operation that the advantage of internal memory kind database is carried out in files passe process, support that efficient atomicity operates, efficiency is higher; Utilize disk database to preserve metamessage, reduce the utilization rate of internal memory, be the solution using disk by the operational transition consuming internal memory, reduce the carrying cost of searching mass data, utilize atomicity to operate the accuracy ensureing retrieving information simultaneously.
For aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.
Embodiment three
With reference to Fig. 3, show the structured flowchart of a kind of document handling apparatus of the embodiment of the present invention three.
The document handling apparatus of the present embodiment can comprise with lower module:
Uploading determination module 301, during for detecting that client starts the file fragmentation uploading file to be uploaded, determining according to the burst descriptor be stored in memory database the content-data allowing upload file burst;
Content receiver module 302, for receiving the content-data of the file fragmentation of client upload;
Information generating module 303, for when after the content-data of all files burst uploading file to be uploaded, generates the metamessage of the file be stored in memory database;
Information unloading module 304, for by metamessage from memory database unloading to disk database.
When detecting in the embodiment of the present invention that client starts the file fragmentation uploading file to be uploaded, determine according to the burst descriptor be stored in memory database the content-data allowing to upload this file fragmentation, receive the content-data of the file fragmentation of client upload; Then, when after the content-data of all files burst uploading file to be uploaded, generate the metamessage of the file be stored in memory database; Finally, by metamessage from memory database unloading to disk database.Therefore, on the one hand, in the process of upload file burst, adopt memory database to preserve relevant information, thus improve the treatment effeciency of file; On the other hand, after the content-data of file has been uploaded, the metamessage of file is from memory database unloading to disk database the most at last, thus reduces internal memory occupation rate, avoids having an impact to database performance, reduces costs.
Embodiment four
With reference to Fig. 4, show the structured flowchart of a kind of document handling apparatus of the embodiment of the present invention four.
The document handling apparatus of the present embodiment can comprise with lower module:
Request receiving module 401, before the burst descriptor of the file fragmentation that file to be uploaded is corresponding is determined in files passe request for sending according to client in information determination module, receive the files passe request that client sends, and to detect files passe request be legitimate request; If legitimate request, then recalls information determination module;
Information determination module 402, for when uploading determination module and detecting that client starts the file fragmentation uploading file to be uploaded, determine the content-data of permission upload file burst according to the burst descriptor be stored in memory database before, determine the burst descriptor of the file fragmentation that file to be uploaded is corresponding according to the files passe request of client transmission, and burst descriptor is stored in memory database;
Information returns module 403, for burst descriptor being back to client according to files passe request, uploads the file fragmentation of file to be uploaded to make client according to burst descriptor;
Uploading determination module 404, during for detecting that client starts the file fragmentation uploading file to be uploaded, determining according to the burst descriptor be stored in memory database the content-data allowing upload file burst;
Content receiver module 405, for receiving the content-data of the file fragmentation of client upload;
State updating module 406, for receive the file fragmentation of client upload at content receiver module content-data before, fragmentation state corresponding to the segmental identification of file fragmentation burst descriptor comprised is updated to be uploaded; When the content-data of file fragmentation has been uploaded, fragmentation state corresponding to the segmental identification of file fragmentation burst descriptor comprised is updated to have been uploaded;
Information generating module 407, for when after the content-data of all files burst uploading file to be uploaded, generates the metamessage of the file be stored in memory database;
Information unloading module 408, for by metamessage from memory database unloading to disk database.
In one preferred embodiment of the invention, file fragmentation comprises segmental identification and content-data, and burst descriptor comprises segmental identification and fragmentation state, and wherein, fragmentation state comprises not to be uploaded, uploading and uploading.Upload determination module and can comprise following submodule:
Lookup of state submodule, the fragmentation state that the segmental identification for locating file burst from the burst descriptor of file fragmentation is corresponding;
Uploading and determine submodule, for when fragmentation state is not for uploading, determining the content-data allowing upload file burst; When fragmentation state is for uploading or uploading, determine the content-data not allowing upload file burst.
In one preferred embodiment of the invention, burst descriptor also comprises burst expired time.Upload determination module and can also comprise following submodule:
Expired detection sub-module, not out of date for the burst descriptor determination file fragmentation according to file fragmentation; When not out of date, call lookup of state submodule; When time out of date, determine the content-data not allowing upload file burst.
Wherein, expired detection sub-module, specifically for obtaining current time, and the burst expired time that the segmental identification of locating file burst is corresponding from the burst descriptor of file fragmentation; Burst expired time corresponding with the segmental identification of file fragmentation for current time is compared; If current time is early than expired time, then determine that file fragmentation is not out of date; Otherwise, determine that file fragmentation is out of date.
In the embodiment of the present invention, utilize the operation that the advantage of internal memory kind database is carried out in files passe process, support that efficient atomicity operates, efficiency is higher; Utilize disk database to preserve metamessage, reduce the utilization rate of internal memory, be the solution using disk by the operational transition consuming internal memory, reduce the carrying cost of searching mass data, utilize atomicity to operate the accuracy ensureing retrieving information simultaneously.
For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.
The present invention can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the present invention in a distributed computing environment, in these distributed computing environment, be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.
Finally, also it should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, commodity or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, commodity or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, commodity or the equipment comprising described key element and also there is other identical element.
Above to a kind of document handling method provided by the present invention and device, be described in detail, apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (10)

1. a document handling method, is characterized in that, comprising:
When detecting that client starts the file fragmentation uploading file to be uploaded, determine the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database;
Receive the content-data of the described file fragmentation of described client upload;
When after the content-data of all files burst uploading described file to be uploaded, generate the metamessage of the described file be stored in described memory database;
By described metamessage from described memory database unloading to disk database.
2. method according to claim 1, it is characterized in that, when detecting that client starts the file fragmentation uploading file to be uploaded, determine the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database before, also comprise:
Determine the burst descriptor of the file fragmentation that file to be uploaded is corresponding according to the files passe request of client transmission, and described burst descriptor is stored in memory database;
According to described files passe request, described burst descriptor is back to described client, uploads the file fragmentation of described file to be uploaded to make described client according to described burst descriptor.
3. method according to claim 2, is characterized in that, before determining the burst descriptor of the file fragmentation that file to be uploaded is corresponding, also comprises in the files passe request sent according to client:
Receive the files passe request that client sends, and to detect described files passe request be legitimate request.
4. method according to claim 1, is characterized in that, described file fragmentation comprises segmental identification and content-data, and described burst descriptor comprises segmental identification and fragmentation state, and wherein, described fragmentation state comprises not to be uploaded, uploading and uploading;
Determine the content-data allowing to upload described file fragmentation according to the burst descriptor of the file fragmentation be stored in memory database, comprising:
The fragmentation state that the segmental identification of described file fragmentation is corresponding is searched from the burst descriptor of described file fragmentation;
When described fragmentation state is not for uploading, determine the content-data allowing to upload described file fragmentation.
5. method according to claim 4, is characterized in that, before the fragmentation state that the segmental identification of searching described file fragmentation from the burst descriptor of described file fragmentation is corresponding, also comprises:
Determine that described file fragmentation is not out of date according to the burst descriptor of described file fragmentation.
6. method according to claim 5, is characterized in that, described burst descriptor also comprises burst expired time;
Determine that described file fragmentation is not out of date according to the burst descriptor of described file fragmentation, comprising:
Obtain current time, and the burst expired time that the segmental identification of searching described file fragmentation from the burst descriptor of described file fragmentation is corresponding;
Burst expired time corresponding with the segmental identification of described file fragmentation for current time is compared;
If current time is early than described expired time, then determine that described file fragmentation is not out of date.
7. method according to claim 1, is characterized in that, described file fragmentation comprises segmental identification, and described burst descriptor comprises segmental identification and fragmentation state;
Before the content-data of described file fragmentation receiving described client upload, also comprise:
Fragmentation state corresponding to the segmental identification of the described file fragmentation described burst descriptor comprised is updated to be uploaded;
After the content-data of described file fragmentation receiving described client upload, also comprise:
When the content-data of described file fragmentation has been uploaded, fragmentation state corresponding to the segmental identification of the described file fragmentation described burst descriptor comprised is updated to have been uploaded.
8. a document handling apparatus, is characterized in that, comprising:
Uploading determination module, during for detecting that client starts the file fragmentation uploading file to be uploaded, determining the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database;
Content receiver module, for receiving the content-data of the described file fragmentation of described client upload;
Information generating module, for when after the content-data of all files burst uploading described file to be uploaded, generates the metamessage of the described file be stored in described memory database;
Information unloading module, for by described metamessage from described memory database unloading to disk database.
9. device according to claim 8, is characterized in that, also comprises:
Information determination module, for when uploading determination module and detecting that client starts the file fragmentation uploading file to be uploaded, determine the content-data allowing to upload described file fragmentation according to the burst descriptor be stored in memory database before, determine the burst descriptor of the file fragmentation that file to be uploaded is corresponding according to the files passe request of client transmission, and described burst descriptor is stored in memory database;
Information returns module, for described burst descriptor being back to described client according to described files passe request, uploads the file fragmentation of described file to be uploaded to make described client according to described burst descriptor.
10. device according to claim 9, is characterized in that, also comprises:
Request receiving module, before the burst descriptor of the file fragmentation that file to be uploaded is corresponding is determined in files passe request for sending according to client in information determination module, receive the files passe request that client sends, and to detect described files passe request be legitimate request.
CN201510051466.9A 2015-01-30 2015-01-30 File processing method and device Pending CN104679830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510051466.9A CN104679830A (en) 2015-01-30 2015-01-30 File processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510051466.9A CN104679830A (en) 2015-01-30 2015-01-30 File processing method and device

Publications (1)

Publication Number Publication Date
CN104679830A true CN104679830A (en) 2015-06-03

Family

ID=53314872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510051466.9A Pending CN104679830A (en) 2015-01-30 2015-01-30 File processing method and device

Country Status (1)

Country Link
CN (1) CN104679830A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105141696A (en) * 2015-09-17 2015-12-09 成都华为技术有限公司 File uploading method and device
CN106599320A (en) * 2016-12-30 2017-04-26 郑州云海信息技术有限公司 File information abstract value calculation method and device
CN107291627A (en) * 2017-06-21 2017-10-24 联想(北京)有限公司 A kind of data processing method and electronic equipment
CN107707600A (en) * 2017-05-26 2018-02-16 贵州白山云科技有限公司 A kind of date storage method and device
CN108933805A (en) * 2017-05-26 2018-12-04 武汉斗鱼网络科技有限公司 A kind of document transmission method and system
CN109558752A (en) * 2018-11-06 2019-04-02 北京威努特技术有限公司 Method for quickly realizing file identification under host white list mechanism
CN109893854A (en) * 2019-01-14 2019-06-18 珠海金山网络游戏科技有限公司 A kind of server data management method and system
CN110995788A (en) * 2019-11-13 2020-04-10 广州辰河质检技术有限公司 Method for realizing breakpoint continuous uploading and file storage of HTTP (hyper text transport protocol) server
CN111198885A (en) * 2019-12-30 2020-05-26 北京奇艺世纪科技有限公司 Data processing method and device
US10938577B2 (en) 2017-05-22 2021-03-02 Advanced New Technologies Co., Ltd. Blockchain service acceptance and consensus method and devices
CN112445801A (en) * 2020-11-27 2021-03-05 杭州海康威视数字技术股份有限公司 Method and device for managing meta information of data table and storage medium
CN112968974A (en) * 2021-03-29 2021-06-15 深圳市科曼医疗设备有限公司 Perioperative period information management system and management method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546755A (en) * 2011-12-12 2012-07-04 华中科技大学 Data storage method of cloud storage system
CN103078936A (en) * 2012-12-31 2013-05-01 网宿科技股份有限公司 Metadata hierarchical storage method and system for Global file system (GFS)-based distributed file system
US8510848B1 (en) * 2009-02-02 2013-08-13 Motorola Mobility Llc Method and system for managing data in a communication network
CN103338242A (en) * 2013-06-20 2013-10-02 华中科技大学 Hybrid cloud storage system and method based on multi-level cache
US8600998B1 (en) * 2010-02-17 2013-12-03 Netapp, Inc. Method and system for managing metadata in a cluster based storage environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8510848B1 (en) * 2009-02-02 2013-08-13 Motorola Mobility Llc Method and system for managing data in a communication network
US8600998B1 (en) * 2010-02-17 2013-12-03 Netapp, Inc. Method and system for managing metadata in a cluster based storage environment
CN102546755A (en) * 2011-12-12 2012-07-04 华中科技大学 Data storage method of cloud storage system
CN103078936A (en) * 2012-12-31 2013-05-01 网宿科技股份有限公司 Metadata hierarchical storage method and system for Global file system (GFS)-based distributed file system
CN103338242A (en) * 2013-06-20 2013-10-02 华中科技大学 Hybrid cloud storage system and method based on multi-level cache

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105141696A (en) * 2015-09-17 2015-12-09 成都华为技术有限公司 File uploading method and device
CN106599320A (en) * 2016-12-30 2017-04-26 郑州云海信息技术有限公司 File information abstract value calculation method and device
US10938577B2 (en) 2017-05-22 2021-03-02 Advanced New Technologies Co., Ltd. Blockchain service acceptance and consensus method and devices
CN107707600A (en) * 2017-05-26 2018-02-16 贵州白山云科技有限公司 A kind of date storage method and device
CN107707600B (en) * 2017-05-26 2018-09-18 贵州白山云科技有限公司 A kind of date storage method and device
CN108933805A (en) * 2017-05-26 2018-12-04 武汉斗鱼网络科技有限公司 A kind of document transmission method and system
CN107291627A (en) * 2017-06-21 2017-10-24 联想(北京)有限公司 A kind of data processing method and electronic equipment
WO2018233216A1 (en) * 2017-06-21 2018-12-27 联想(北京)有限公司 Method and electronic device for processing data
CN109558752A (en) * 2018-11-06 2019-04-02 北京威努特技术有限公司 Method for quickly realizing file identification under host white list mechanism
CN109558752B (en) * 2018-11-06 2021-05-07 北京威努特技术有限公司 Method for quickly realizing file identification under host white list mechanism
CN109893854A (en) * 2019-01-14 2019-06-18 珠海金山网络游戏科技有限公司 A kind of server data management method and system
CN110995788A (en) * 2019-11-13 2020-04-10 广州辰河质检技术有限公司 Method for realizing breakpoint continuous uploading and file storage of HTTP (hyper text transport protocol) server
CN110995788B (en) * 2019-11-13 2022-02-22 广州辰河质检技术有限公司 Method for realizing breakpoint continuous uploading and file storage of HTTP (hyper text transport protocol) server
CN111198885A (en) * 2019-12-30 2020-05-26 北京奇艺世纪科技有限公司 Data processing method and device
CN112445801A (en) * 2020-11-27 2021-03-05 杭州海康威视数字技术股份有限公司 Method and device for managing meta information of data table and storage medium
CN112968974A (en) * 2021-03-29 2021-06-15 深圳市科曼医疗设备有限公司 Perioperative period information management system and management method thereof

Similar Documents

Publication Publication Date Title
CN104679830A (en) File processing method and device
CN102782643B (en) Use the indexed search of Bloom filter
CN102156751B (en) Method and device for extracting video fingerprint
CN103077199B (en) A kind of file resource Search and Orientation method and device
CN101404032B (en) Video retrieval method and system based on contents
TWI549005B (en) Multi-layer search-engine index
CN110597855B (en) Data query method, terminal device and computer readable storage medium
CN105808622A (en) File storage method and device
CN105468642A (en) Data storage method and apparatus
CN102890675B (en) Method and device for storing and finding data
CN103577418B (en) Magnanimity Document distribution formula retrieval re-scheduling system and method
CN106874348A (en) File is stored and the method for indexing means, device and reading file
CN109766318B (en) File reading method and device
CN110989937B (en) Data storage method, device and equipment and computer readable storage medium
CN111552692A (en) Plus-minus cuckoo filter
US10628487B2 (en) Method for hash collision detection based on the sorting unit of the bucket
CN101459489A (en) Deep packet detection device and method
CN104636368A (en) Data retrieval method and device and server
CN100357943C (en) A method for inspecting garbage files in cluster file system
CN112765155B (en) Block chain-based key value storage method and device, terminal equipment and medium
CN109739854A (en) A kind of date storage method and device
CN102970380A (en) Method for acquiring media data of cloud storage files and cloud storage server
CN102722557A (en) Self-adaption identification method for identical data blocks
CN111045988B (en) File searching method, device and computer program product
CN109213972B (en) Method, device, equipment and computer storage medium for determining document similarity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20180831