CN109062987A - A kind of document handling method and device - Google Patents

A kind of document handling method and device Download PDF

Info

Publication number
CN109062987A
CN109062987A CN201810714009.7A CN201810714009A CN109062987A CN 109062987 A CN109062987 A CN 109062987A CN 201810714009 A CN201810714009 A CN 201810714009A CN 109062987 A CN109062987 A CN 109062987A
Authority
CN
China
Prior art keywords
node data
file
database
content
mapping relations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810714009.7A
Other languages
Chinese (zh)
Inventor
冉世友
陈正
殷舒
刘胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Union Mobile Pay Co Ltd
Original Assignee
Union Mobile Pay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Union Mobile Pay Co Ltd filed Critical Union Mobile Pay Co Ltd
Priority to CN201810714009.7A priority Critical patent/CN109062987A/en
Publication of CN109062987A publication Critical patent/CN109062987A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The present embodiments relate to technical field of data processing more particularly to a kind of document handling methods and device to save resource to reduce the occupancy to space.The embodiment of the present invention includes: for the first node data in file destination, if it is determined that the content of the node data stored in the content and database of the first node data is all different, then in the database by first node data storage, and first position of the first node data in the file destination is determined;The first node data are any node data in the file destination;Form the mapping relations of the content of the first position and the first node data;The mapping relations are added to the index file of the database.

Description

A kind of document handling method and device
Technical field
The present invention relates to technical field of data processing more particularly to a kind of document handling methods and device.
Background technique
With the continuous development of information technology, transmitting-receiving, the storage of file have become the important link in information processing.One As, during file is stored and is compressed, it may appear that the case where by multiple files together storage or transmission.Send file it Before, original document can be compressed, obtain the compressed package smaller than original document, compressed package is transmitted.Receiving pressure After contracting packet, original document is obtained by being decompressed to compressed package, resource damage can be reduced in document transmission process in this way Consumption.
Encounter need to handle a large amount of similar documents when, such as electronic contract, usually by a series of files directly into Row storage or compression, can occupy a large amount of space in this way, cause the waste of resource.
Summary of the invention
The application provides a kind of document handling method and device, to reduce the occupancy to space, saves resource.
A kind of document handling method provided in an embodiment of the present invention, comprising:
For the first node data in file destination, however, it is determined that deposited in the content and database of the first node data The content of the node data of storage is all different, then in the database by first node data storage, and described in determination First position of the first node data in the file destination;The first node data are any in the file destination Node data;
Form the mapping relations of the content of the first position and the first node data;
The mapping relations are added to the index file of the database.
Optionally, further includes:
If it is determined that the content phase of the content of the second node data stored in the database and the first node data Together, then the mapping relations of the content of the first position and the second node data are formed;
The mapping relations are added to the index file of the database.
Optionally, the mapping relations of the index file further include the corresponding cryptographic Hash of content of node data;
It is described if it is determined that the node data stored in the content and database of the first node data content not phase Together, then the first node data are stored in the database, comprising:
The cryptographic Hash of the first node data is determined according to the content of the first node data;
It determines in the database with the presence or absence of cryptographic Hash identical with the cryptographic Hash of the first node data;
If it does not exist, then by first node data storage in the database, and by the first node data Cryptographic Hash be added in the index file;
The mapping relations for forming the first position and the first node data content, comprising:
Form the mapping relations between the first position and the cryptographic Hash of the first node data.
Optionally, the file destination is any file in multiple files to be processed, the multiple file to be processed File type is identical;
The node data stored in the database is the node data of any file in the multiple file to be processed.
It is optionally, described to store the first node data in the database, comprising:
It is stored in the database after the content of the first node data is compressed;
After the index file that the mapping relations are added to the database, further includes:
The index file is compressed and is stored in the database.
The embodiment of the present invention also provides a kind of document handling apparatus, comprising:
Storage unit, for for the first node data in file destination, however, it is determined that the first node data it is interior Hold and be all different with the content of the node data stored in database, then the first node data is stored in the database In, and determine first position of the first node data in the file destination;The first node data are the mesh Mark any node data in file;
Map unit is used to form the mapping relations of the content of the first position and the first node data;
Indexing units, for the mapping relations to be added to the index file of the database.
Optionally, the map unit, is also used to:
If it is determined that the content phase of the content of the second node data stored in the database and the first node data Together, then the mapping relations of the content of the first position and the second node data are formed.
Optionally, the mapping relations of the index file further include the corresponding cryptographic Hash of content of node data;
The storage unit, is also used to:
The cryptographic Hash of the first node data is determined according to the content of the first node data;
It determines in the database with the presence or absence of cryptographic Hash identical with the cryptographic Hash of the first node data;
If it does not exist, then by first node data storage in the database, and by the first node data Cryptographic Hash be added in the index file;
The map unit, is also used to:
Form the mapping relations between the first position and the cryptographic Hash of the first node data.
Optionally, the file destination is any file in multiple files to be processed, the multiple file to be processed File type is identical;
The node data stored in the database is the node data of any file in the multiple file to be processed.
Optionally, further include compression unit, be used for:
The content of the first node data is compressed;
The index file is compressed.
The embodiment of the present invention also provides a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that at least one described processor is able to carry out above-mentioned method.
The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, and the non-transient computer is readable to deposit Storage media stores computer instruction, and the computer instruction is for making the computer execute the above method.
In the embodiment of the present invention, using any node data in file destination as first node data, for first segment Point data compares all node datas stored in first node data and database, however, it is determined that first node data Content and the content of node data of storage be all different, then in the database by the storage of first node data, and determine the First position of one node data in file destination forms the mapping relations of the content of first position and first node data, And the mapping relations are added in the index file of database.In this way, only store the node data not having in database, It avoids storing duplicate file content, the memory space and transfer resource of data can be saved.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without any creative labor, it can also be obtained according to these attached drawings His attached drawing.
Fig. 1 is a kind of flow diagram of document handling method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram for the specific file process classification method that the embodiment of the present invention one provides;
Fig. 3 is the structural schematic diagram of pdf document provided by Embodiment 2 of the present invention;
Fig. 4 to Fig. 8 is respectively the tree figure of node data of the file 1 provided by Embodiment 2 of the present invention to file 5;
Fig. 9 is a kind of structural schematic diagram of the document sorting apparatus of file process provided in an embodiment of the present invention;
Figure 10 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, it is clear that the described embodiments are only some of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of document handling methods.As shown in Figure 1, at file provided in an embodiment of the present invention Reason method, comprising the following steps:
Step 101, for the first node data in file destination, however, it is determined that the content and number of the first node data It is all different according to the content of the node data stored in library, then in the database by first node data storage, and Determine first position of the first node data in the file destination;The first node data are the file destination In any node data.
Step 102, the mapping relations for forming the first position with the content of the first node data.
Step 103, the index file that the mapping relations are added to the database.
In the embodiment of the present invention, using any node data in file destination as first node data, for first segment Point data compares all node datas stored in first node data and database, however, it is determined that first node data Content and the content of node data of storage be all different, then in the database by the storage of first node data, and determine the First position of one node data in file destination forms the mapping relations of the content of first position and first node data, And the mapping relations are added in the index file of database.In this way, only store the node data not having in database, It avoids storing duplicate file content, the memory space and transfer resource of data can be saved.
In above-mentioned steps, the content of the node data stored in the content and database of first node data is all different. In addition, there is also the identical situations of the content of a certain node data stored in the content of first node data and database.This Inventive embodiments further include:
If it is determined that the content phase of the content of the second node data stored in the database and the first node data Together, then the mapping relations of the content of the first position and the second node data are formed;
The mapping relations are added to the index file of the database.
In the embodiment of the present invention, however, it is determined that the second node data stored in the content and database of first node data Content is identical, then without storing first node data again, it is only necessary to form the content of first position and second node data Mapping relations, and the mapping relations are added in the index file of database.Needing to obtain first in file destination in this way When node data, second can be found in the database according to the mapping relations of first position and the content of second node data Node data has found first node data since the content of first node data is identical as the content of second node data.
It can be seen that file destination is divided into multiple node datas in the embodiment of the present invention, by each node data with The node data stored in database compares.If the content of the not stored node data in database, by node data Storage is in the database;If the content of the existing node data in database, without repeating to store node data, only need The position of the content of node data and node data in file destination is formed into mapping relations, index text is added in mapping relations Part.The mapping relations of all node datas of a file destination form the index file of the file destination.In this way, when needing to obtain When taking the file destination, only all nodes of the file destination need to be found out from database according to the mapping relations in index file The content of data can be combined into file destination.
For the ease of storing and searching pairing, the embodiment of the present invention calculates cryptographic Hash according to the content of node data, then rope The mapping relations of quotation part further include the corresponding cryptographic Hash of content of node data.
Above-mentioned steps 101, if it is determined that the node data stored in the content and database of the first node data is interior Appearance is all different, then in the database by first node data storage, comprising:
The cryptographic Hash of the first node data is determined according to the content of the first node data;
It determines in the database with the presence or absence of cryptographic Hash identical with the cryptographic Hash of the first node data;
If it does not exist, then by first node data storage in the database, and by the first node data Cryptographic Hash be added in the index file;
Step 102 forms the first position and the first node data content, comprising:
Form the mapping relations between the first position and the cryptographic Hash of the first node data.
Hash (hash) is also hash, exactly the input of random length (be called and be preliminary mapping pre-image) by dissipating At the output of regular length, which is exactly hashed value for column algorithmic transformation.Hash function is just like next fundamental characteristics: if two A cryptographic Hash is different (according to Same Function), then being originally inputted for the two cryptographic Hash is also different.Also It is to say, if the content of two node datas is different, the cryptographic Hash of the two node datas is also different.It therefore, can be with Cryptographic Hash by comparing two node datas determines whether the content of two node datas is identical.In the embodiment of the present invention quite In using the cryptographic Hash of node data as the mark of the content of node data, if its content of the node data of different names is identical, It is still correspond to identical cryptographic Hash.The Kazakhstan of first node data is determined in the embodiment of the present invention according to the content of first node data Uncommon value.Then determining whether there is cryptographic Hash identical with the cryptographic Hash of first node data in database, and if it exists, then show There is node data identical with the content of first node data in database, then without storing first node Data duplication, Only the cryptographic Hash of first position of the first node data in file destination and first node data need to be formed mapping relations, deposited In indexed file.If it does not exist, show that the content of the content and first node data of database interior joint data is all different, Then in the database by the storage of first node data, and the cryptographic Hash of first position and first node data mapping is formed to close System, there are in index file.
Preferably, the embodiment of the present invention is suitable for handling multiple files, the file destination is in multiple files to be processed Any file, the file type of the multiple file to be processed is identical;The node data stored in the database is described The node data of any file in multiple files to be processed.Node data is carried out between a series of identical files of file type Comparison, more identical node datas can be contrasted, reduce the node data stored in database, avoid database The data of middle storing excess lead to more workload.
In order to further save database space, data compression can be carried out before node data stores or transmits. It is described to store the first node data in the database, comprising:
It is stored in the database after the content of the first node data is compressed;
After the index file that the mapping relations are added to the database, further includes:
The index file is compressed and is stored in the database.
For a clearer understanding of the present invention, above-mentioned process is described in detail with specific embodiment below.Implement The specific steps of example one are as shown in Figure 2, comprising:
Step 201, from multiple files optional one be used as file destination, determine multiple number of nodes in file destination According to.
Step 202, for any node data in file destination, calculate the cryptographic Hash of the node data.
Whether in the database step 203 judges the cryptographic Hash of the node data, if so, thening follow the steps 205, otherwise Execute step 204.
Step 204 will be stored in database after node data compression, and the key in database is the Hash of node data Value, value are compressed node data.
Position of the node data in file destination and cryptographic Hash are established and generate mapping relations by step 205, will map Relationship is added in index file.
Step 206 judges whether all node datas in the file destination calculated cryptographic Hash, if so, executing Step 207, no to then follow the steps 202.
Step 207 judges whether each file is used as file destination in multiple files, if so, step 208 is executed, it is no Then follow the steps 201.
Step 208 compresses index file.
In embodiment two, is illustrated based on network loan electronic contract, need to compress 5 PDF (Portable Document Format, portable document format) file.5 pdf documents are respectively designated as file 1, file 2, file 3, file 4 and text 5,5 files of part are as shown in figure 3, it is understood that name is only not represent sequencing convenient for citing.
Firstly, the head and the tail of each electronic contract will appear same subject of right's information, therefore there is head and the tail in 5 files Identical data, such as 1 head and the tail of file have identical data 1, and 2 head and the tail of file have identical data 3, and alternative document is similar.
Secondly, file 1, file 2, file 3 and file 4 are generated based on the same contract template.Wherein different numbers Node data content it is different, it is identical to number identical node data content.
Finally, file 5 is after contract template change, newly-generated contract, the content of contract template change is as new number According to before being inserted in data 4, as new data 8.
The corresponding node data organization chart of above-mentioned 5 pdf documents is as shown in Fig. 4 to Fig. 8.It should be noted that Fig. 4 extremely schemes The position of discrepant node data and the node data in organization chart is illustrated only in 8, wherein data 1 to data 8 use c1 It is indicated to c8, other not shown node datas do not have any difference in each file.In the embodiment of the present invention two, processing Process is as follows:
For ease of description, it is started to process from file 1, it can also be since other files.
The cryptographic Hash of Pages Root first in calculation document 1, judge its whether in the index file one of database, by In not existing, will be put into database after the data compression of Pages Root, wherein key value is the cryptographic Hash of Pages Root, Value value is compressed Pages Root data content.
The cryptographic Hash of position of the Pages Root in file 1 and Pages Root are generated into mapping relations, index is added In file.It, can since the cross reference table of PDF safeguards money position (offset address) of each node data in entire file With the position by the offset address of Pages Root directly used as Pages Root in file 1.
For the child node under child node and each Page under Pages Root in file 1, repeats and calculate each section The cryptographic Hash of point data, compression store and establish the operation of mapping relations.It should be pointed out that in the Page3 of file 1, Due to it includes data c1 and Page1 in include data c1 content it is identical, the cryptographic Hash of data c1 has been in Page3 Through being present in database, therefore, the cryptographic Hash of position of the data c1 in file 1 in Page3 and data c1 need to only be generated Mapping relations are added in index file.
File 1 is traversed to file 5, aforesaid operations is performed both by, generates index file, as shown in table 1.
Table 1
In addition, the cryptographic Hash of data and the corresponding relationship of compressed data are as shown in table 2 in database.
Table 2
Key value
Pages Root hash Compressed file 1Pages Root node data
Page hash Compressed file 1Page node data
c1hash Compressed file 1c1 node data
c2hash Compressed file 1c2 node data
c3hash Compressed file 2c3 node data
c4hash Compressed file 2c4 node data
c5hash Compressed file 3c5 node data
c6hash Compressed file 5c6 node data
c7hash Compressed file 4c7 node data
c8hash Compressed file 5c8 node data
For file 1 to file 5, it is assumed that (i.e. the size of data of data c1 to c8) is for each difference content node data 10k, the size of file 1 to file 4 are 100k, and the size of file 5 is 110k, then complete duplicate node data in 5 files Size 70k, total variances content node size 160k, duplicate difference content node size 80k.Assuming that the compressed software pair used Text compression can reach 30% compression ratio.Since index file size accounting is very small, ignore index file size, only The compression of initial data in file is considered, then the compression ratio of this programme is (70+160-80) * 30%/510=8.8%.
The embodiment of the invention also provides a kind of document handling apparatus, as shown in Figure 9, comprising:
Storage unit 901, for for the first node data in file destination, however, it is determined that the first node data The content of the node data stored in content and database is all different, then the first node data is stored in the data In library, and determine first position of the first node data in the file destination;The first node data are described Any node data in file destination;
Map unit 902 is used to form the mapping relations of the content of the first position and the first node data;
Indexing units 903, for the mapping relations to be added to the index file of the database.
Preferably, the map unit 902, is also used to:
If it is determined that the content phase of the content of the second node data stored in the database and the first node data Together, then the mapping relations of the content of the first position and the second node data are formed.
Preferably, the mapping relations of the index file further include the corresponding cryptographic Hash of content of node data;
The storage unit 901, is also used to:
The cryptographic Hash of the first node data is determined according to the content of the first node data;
It determines in the database with the presence or absence of cryptographic Hash identical with the cryptographic Hash of the first node data;
If it does not exist, then by first node data storage in the database, and by the first node data Cryptographic Hash be added in the index file;
The map unit 902, is also used to:
Form the mapping relations between the first position and the cryptographic Hash of the first node data.
Preferably, the file destination is any file in multiple files to be processed, the multiple file to be processed File type is identical;
The node data stored in the database is the node data of any file in the multiple file to be processed.
Preferably, further including compression unit 904, it is used for:
The content of the first node data is compressed;
The index file is compressed.
Based on identical principle, the present invention also provides a kind of electronic equipment, as shown in Figure 10, comprising:
Including processor 1001, memory 1002, transceiver 1003, bus interface 1004, wherein processor 1001, storage It is connected between device 1002 and transceiver 1003 by bus interface 1004;
The processor 1001 executes following method for reading the program in the memory 1002:
For the first node data in file destination, however, it is determined that deposited in the content and database of the first node data The content of the node data of storage is all different, then in the database by first node data storage, and described in determination First position of the first node data in the file destination;The first node data are any in the file destination Node data;
Form the mapping relations of the content of the first position and the first node data;
The mapping relations are added to the index file of the database.
Further, the processor 401 is specifically used for:
If it is determined that the content phase of the content of the second node data stored in the database and the first node data Together, then the mapping relations of the content of the first position and the second node data are formed;
The mapping relations are added to the index file of the database.
Further, the processor 401 is specifically used for:
The cryptographic Hash of the first node data is determined according to the content of the first node data;
It determines in the database with the presence or absence of cryptographic Hash identical with the cryptographic Hash of the first node data;
If it does not exist, then by first node data storage in the database, and by the first node data Cryptographic Hash be added in the index file;
Form the mapping relations between the first position and the cryptographic Hash of the first node data.
Further, the processor 401 is specifically used for:
It is stored in the database after the content of the first node data is compressed;
The index file is compressed and is stored in the database.
The embodiment of the present application provides a kind of computer program product, and the computer program product is non-temporary including being stored in Calculation procedure on state computer readable storage medium, the computer program include program instruction, when described program instructs quilt When computer executes, the method that makes the computer execute an any of the above-described text mark.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the invention is also intended to include including these modification and variations.

Claims (10)

1. a kind of document handling method characterized by comprising
For the first node data in file destination, however, it is determined that stored in the content and database of the first node data The content of node data is all different, then in the database by first node data storage, and determines described first First position of the node data in the file destination;The first node data are any node in the file destination Data;
Form the mapping relations of the content of the first position and the first node data;
The mapping relations are added to the index file of the database.
2. the method as described in claim 1, which is characterized in that further include:
If it is determined that the content of the second node data stored in the database is identical as the content of the first node data, then Form the mapping relations of the content of the first position and the second node data;
The mapping relations are added to the index file of the database.
3. the method as described in claim 1, which is characterized in that the mapping relations of the index file further include node data The corresponding cryptographic Hash of content;
It is described if it is determined that the content of the node data stored in the content and database of the first node data is all different, then In the database by first node data storage, comprising:
The cryptographic Hash of the first node data is determined according to the content of the first node data;
It determines in the database with the presence or absence of cryptographic Hash identical with the cryptographic Hash of the first node data;
If it does not exist, then by first node data storage in the database, and by the Kazakhstan of the first node data Uncommon value is added in the index file;
The mapping relations for forming the first position and the first node data content, comprising:
Form the mapping relations between the first position and the cryptographic Hash of the first node data.
4. method as described in any one of claims 1 to 3, which is characterized in that the file destination is multiple files to be processed In any file, the file type of the multiple file to be processed is identical;
The node data stored in the database is the node data of any file in the multiple file to be processed.
5. method as claimed in claim 4, which is characterized in that described that the first node data are stored in the database In, comprising:
It is stored in the database after the content of the first node data is compressed;
After the index file that the mapping relations are added to the database, further includes:
The index file is compressed and is stored in the database.
6. a kind of document handling apparatus characterized by comprising
Storage unit, for for the first node data in file destination, however, it is determined that the content of the first node data with The content of the node data stored in database is all different, then in the database by first node data storage, And determine first position of the first node data in the file destination;The first node data are the target text Any node data in part;
Map unit is used to form the mapping relations of the content of the first position and the first node data;
Indexing units, for the mapping relations to be added to the index file of the database.
7. device as claimed in claim 6, which is characterized in that the map unit is also used to:
If it is determined that the content of the second node data stored in the database is identical as the content of the first node data, then Form the mapping relations of the content of the first position and the second node data.
8. device as claimed in claim 6, which is characterized in that the mapping relations of the index file further include node data The corresponding cryptographic Hash of content;
The storage unit, is also used to:
The cryptographic Hash of the first node data is determined according to the content of the first node data;
It determines in the database with the presence or absence of cryptographic Hash identical with the cryptographic Hash of the first node data;
If it does not exist, then by first node data storage in the database, and by the Kazakhstan of the first node data Uncommon value is added in the index file;
The map unit, is also used to:
Form the mapping relations between the first position and the cryptographic Hash of the first node data.
9. such as the described in any item devices of claim 6 to 8, which is characterized in that the file destination is multiple files to be processed In any file, the file type of the multiple file to be processed is identical;
The node data stored in the database is the node data of any file in the multiple file to be processed.
10. device as claimed in claim 9, which is characterized in that further include compression unit, be used for:
The content of the first node data is compressed;
The index file is compressed.
CN201810714009.7A 2018-06-29 2018-06-29 A kind of document handling method and device Pending CN109062987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810714009.7A CN109062987A (en) 2018-06-29 2018-06-29 A kind of document handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810714009.7A CN109062987A (en) 2018-06-29 2018-06-29 A kind of document handling method and device

Publications (1)

Publication Number Publication Date
CN109062987A true CN109062987A (en) 2018-12-21

Family

ID=64818878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810714009.7A Pending CN109062987A (en) 2018-06-29 2018-06-29 A kind of document handling method and device

Country Status (1)

Country Link
CN (1) CN109062987A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629247A (en) * 2011-12-31 2012-08-08 成都市华为赛门铁克科技有限公司 Method, device and system for data processing
CN103455631A (en) * 2013-09-22 2013-12-18 广州中国科学院软件应用技术研究所 Method, device and system for processing data
CN105677904A (en) * 2016-02-04 2016-06-15 杭州数梦工场科技有限公司 Distributed file system based small file storage method and device
CN106407462A (en) * 2016-10-10 2017-02-15 北京恒华伟业科技股份有限公司 File processing method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629247A (en) * 2011-12-31 2012-08-08 成都市华为赛门铁克科技有限公司 Method, device and system for data processing
CN103455631A (en) * 2013-09-22 2013-12-18 广州中国科学院软件应用技术研究所 Method, device and system for processing data
CN105677904A (en) * 2016-02-04 2016-06-15 杭州数梦工场科技有限公司 Distributed file system based small file storage method and device
CN106407462A (en) * 2016-10-10 2017-02-15 北京恒华伟业科技股份有限公司 File processing method and system

Similar Documents

Publication Publication Date Title
CN101009516B (en) A method, system and device for data synchronization
CN103246730B (en) File memory method and equipment, document sending method and equipment
CN103324552B (en) Two benches list example duplicate removal data back up method
CN103384884B (en) A kind of file compression method, file decompression method, device and server
CN105868305A (en) A fuzzy matching-supporting cloud storage data dereplication method
CN110347651B (en) Cloud storage-based data synchronization method, device, equipment and storage medium
US7702641B2 (en) Method and system for comparing and updating file trees
CN104881466B (en) The processing of data fragmentation and the delet method of garbage files and device
CN105069111A (en) Similarity based data-block-grade data duplication removal method for cloud storage
CN103116615B (en) A kind of data index method and server based on version vector
CN110557124B (en) Data compression method and device
TW201423426A (en) System and method for diving document into data parts and uploading the data parts
CN109684284A (en) Sliding piecemeal data de-duplication method based on edge calculations
CN106844477A (en) To synchronous method after block catenary system, block lookup method and block chain
CN113051347B (en) Method, system, equipment and storage medium for synchronizing data between heterogeneous databases
CN108573014A (en) A kind of file synchronisation method, device, electronic equipment and readable storage medium storing program for executing
CN106547911A (en) A kind of access method and system of mass small documents
CN104079623A (en) Method and system for controlling multilevel cloud storage synchrony
CN104956340A (en) Scalable data deduplication
CN108090186A (en) A kind of electric power data De-weight method on big data platform
CN109947759A (en) A kind of data directory method for building up, indexed search method and device
CN109062987A (en) A kind of document handling method and device
CN109302449A (en) Method for writing data, method for reading data, device and server
KR102060198B1 (en) Generating sketches sensitive to high-overlap estimation
CN111414339A (en) File processing method, system, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221