CN107229420A - Date storage method, read method, delet method and data operation system - Google Patents
Date storage method, read method, delet method and data operation system Download PDFInfo
- Publication number
- CN107229420A CN107229420A CN201710392823.7A CN201710392823A CN107229420A CN 107229420 A CN107229420 A CN 107229420A CN 201710392823 A CN201710392823 A CN 201710392823A CN 107229420 A CN107229420 A CN 107229420A
- Authority
- CN
- China
- Prior art keywords
- data block
- target
- cryptographic hash
- data
- corresponding relation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000007906 compression Methods 0.000 claims abstract description 13
- 230000006835 compression Effects 0.000 claims abstract description 13
- 238000013500 data storage Methods 0.000 claims description 23
- 230000004048 modification Effects 0.000 claims description 9
- 238000012986 modification Methods 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/137—Hash-based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1737—Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a kind of date storage method, including determines the first data block to be stored, and the first data block the first mark;In the corresponding relation for judging cryptographic Hash and object identity, if including the first mark;If including being handled using the write operation method determined the first data block, obtaining target data block;If do not included, it is target data block to determine the first data block;Calculate the target cryptographic Hash of target data block;If in the corresponding relation of cryptographic Hash and object identity, finding target cryptographic Hash, then target data block is not stored, and find the corresponding original data block of target cryptographic Hash, the reference count value in original data block is added one;If not finding target cryptographic Hash, the corresponding relation of storage target cryptographic Hash and the first mark, and processing, the target data block after storage compression processing are compressed to target data block.The application saves the memory space of storage medium.
Description
Technical field
The application is related to field of computer technology, more particularly to a kind of date storage method, read method, delet method and
Data operation system.
Background technology
Ceph refers to a distributed unified storage system increased income, and Ceph design object is in cheap storage Jie
The storage system of a set of high-performance, high scalability and high availability is built in matter, file storage, block storage and object are externally provided
A set of unified storage system of storage.
But the storage medium based on Ceph is in data storage, because user may be more to identical data storage
Secondary, then will to store many parts of identical data on a storage medium accordingly, the storage that this undoubtedly wastes storage medium is empty
Between.
The content of the invention
In view of this, the application provides a kind of date storage method, read method, delet method and data operation system,
To save the memory space of storage medium.Technical scheme is as follows:
One side based on the application, the application provides a kind of date storage method, including:
It is determined that the first data block to be stored, and first data block the first mark;
Judge in the cryptographic Hash of storage and the corresponding relation of object identity, if including the described first mark;It is wherein described
Object identity in the corresponding relation of cryptographic Hash and object identity is the mark of data storage block;
If including the data block length and first mark according to first data block are corresponding existing
The second data block data block length, it is determined that to first data block perform write operation type, and to described first count
Handled according to block using the corresponding write operation method of the write operation type, obtain target data block;
If do not included, it is determined that first data block is target data block;
Calculate the target cryptographic Hash of the target data block;
If in the corresponding relation of the cryptographic Hash and object identity, finding the target cryptographic Hash, then not storing institute
Target data block is stated, and the corresponding original data block of the target cryptographic Hash is found according to the target cryptographic Hash, will be described
Reference count value in original data block adds one;
If in the corresponding relation of the cryptographic Hash and object identity, the target cryptographic Hash is not found, then described
In the corresponding relation of cryptographic Hash and object identity, increase the corresponding relation of the target cryptographic Hash and the described first mark, and it is right
The target data block is compressed processing, stores the target data block after compression processing, and by the target data block
Reference count value add one.
Preferably, the data block length and first mark according to first data block is corresponding existing
The second data block data block length, it is determined that to first data block perform write operation type include:
The data block length of first data block is equal to corresponding already present second data block of the described first mark
During data block length, it is determined that the write operation type performed to first data block is rewriting;
The data block length of first data block is less than corresponding already present second data block of the described first mark
During data block length, it is determined that the write operation type performed to first data block is write for modification.
Preferably, it is described to described first when it is determined that the write operation type performed to first data block is rewrites
Data block is handled using the corresponding write operation method of the write operation type, and obtaining target data block includes:
From the corresponding relation of the cryptographic Hash and object identity, first mark and second data block are deleted
The corresponding relation of cryptographic Hash, and the reference count value in second data block is subtracted one;
It is target data block to determine first data block.
Preferably, it is described to described the when it is determined that the write operation type performed to first data block is write for modification
One data block is handled using the corresponding write operation method of the write operation type, and obtaining target data block includes:
From the corresponding relation of the cryptographic Hash and object identity, first mark and second data block are deleted
The corresponding relation of cryptographic Hash, and the reference count value in second data block is subtracted one;
Second data block is decompressed, the second data content of second data block is obtained;
Merge the first data content of second data content and first data block, and the number that will be obtained after merging
It is defined as target data block according to block.
Preferably, methods described also includes:
When the reference count value in second data block is 0, second data block is deleted.
Another aspect based on the application, the application also provides a kind of method for reading data, including:
It is determined that the target identification for the target data block to be read;
From the cryptographic Hash of storage and the corresponding relation of object identity, target Hash corresponding with the target identification is obtained
Value;
The target data block is obtained based on the target cryptographic Hash;
The target data block is decompressed, the data of the target data block are read.
Preferably, after the target identification for determining the target data block to be read, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;It is wherein described to breathe out
Uncommon value and mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target
The step of identifying corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent in storage device not
There is the target identification.
Another further aspect based on the application, the application also provides a kind of data-erasure method, including:
It is determined that the target identification of the target data block to be deleted;
From the cryptographic Hash of storage and the corresponding relation of object identity, target Hash corresponding with the target identification is obtained
Value;
From the cryptographic Hash of storage and the corresponding relation of object identity, the target identification and the target cryptographic Hash are deleted
Corresponding record;
The target data block is obtained based on the target cryptographic Hash, the reference count value in the target data block is subtracted
One.
Preferably, methods described also includes:
When the reference count value in the target data block is 0, the target data block is deleted.
Preferably, after the target identification for determining the target data block to be deleted, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;It is wherein described to breathe out
Uncommon value and mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target
The step of identifying corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent in storage device not
There is the target identification.
Another further aspect based on the application, the application also provides a kind of data operation system, including:
Terminal, for sending file to distributed apparatus;
Distributed apparatus, obtains multiple data blocks, wherein the stem of each data block for performing pondization operation to file
Including metadata information, the metadata information includes the reference count information of data block, and sets cryptographic Hash and object mark
The corresponding relation of knowledge, using the cryptographic Hash and the corresponding relation of object identity, performs data storage operations, data read operation
And data deletion action.
The application is in data storage, if including target data block in the corresponding relation of cryptographic Hash and object identity
Target cryptographic Hash, then no longer store target data block, but by the reference count in the corresponding original data block of target cryptographic Hash
Value Jia one, and if not including the target cryptographic Hash of target data block in the corresponding relation of cryptographic Hash and object identity, then stores
The corresponding relation of the target cryptographic Hash and the first mark, and processing is compressed to target data block, after storage compression processing
Target data block.The application will not repeat to store in data storage procedure for many parts of identical data, it is ensured that storage is empty
Between only store a identical data, save the memory space of storage medium.And the application is in data storage, storage is
Data block after compression processing, this is compared to the means of the direct data storage of prior art, and the application reduced needed for data block
Memory space, save the memory space of storage medium.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
The embodiment of application, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
The accompanying drawing of offer obtains other accompanying drawings.
A kind of structural representation for data operation system that Fig. 1 provides for the embodiment of the present application;
Fig. 2 is the form schematic diagram of the data block of OSD ends storage in the application;
A kind of flow chart for date storage method that Fig. 3 provides for the embodiment of the present application;
A kind of flow chart for method for reading data that Fig. 4 provides for the embodiment of the present application;
A kind of flow chart for data-erasure method that Fig. 5 provides for the embodiment of the present application.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.It is based on
Embodiment in the application, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of the application protection.
Term is explained:
Distributed storage:By in the scattered storage of data to multiple data storage servers.
PG:Placement Groups, placement group.Virtual concept in distributed apparatus.
OSD:Object-based Storage Device, object storage device.
Cryptographic Hash:The data value obtained after hashing operation, referred to as cryptographic Hash are performed to object in the present embodiment.
For convenience skilled in the art realises that the application scenarios of the application there is provided a kind of data operation system.Ginseng
See Fig. 1, specifically include terminal 100 and distributed apparatus 200.
Terminal 100 is used to send file to distributed apparatus 200.
Distributed apparatus 200 is used to perform file the multiple data blocks (oid) of pondization operation acquisition, each of which data block tool
There is an object identity (oid_id).The stem of wherein each data block includes metadata information, and the metadata information is included
The reference count information of data block, and distributed apparatus 200 are provided with the corresponding relation of cryptographic Hash and object identity, utilize Kazakhstan
Uncommon value and the corresponding relation of object identity, perform data storage operations, data deletion action and data read operation.
In the application, data block can be after execution be deleted again, then is compressed processing, due to the length of the data block after compression
It is not consistent, therefore the length of the data block stored at OSD ends is not fixed.Wherein, data block is stored in the form at OSD ends
As shown in Fig. 2 the size of each data block is not regular length.Especially, the application is added in the stem of data block
Metadata information, the metadata information includes the information such as reference count, hash algorithm used and the compression algorithm of the data block, after
End stores the True Data of the data block after compression processing.
Reference count represents the storage number of times of identical data, also represents the number of times that data block is redundantly stored.
The corresponding relation of the cryptographic Hash that is stored with distributed apparatus 200 and object identity.Cryptographic Hash is that data block is performed
The data value obtained after hashing operation, for uniquely representing a data block.Object identity is expression one in distributed apparatus
The mark of data block.In distributed apparatus, after a data block is stored to OSD, in cryptographic Hash and pair of object identity
The corresponding relation of the middle object identity and cryptographic Hash for setting up the data block should be related to.
Specifically, distributed apparatus 200 can utilize fingerprint library storage cryptographic Hash and the corresponding relation of object identity.Fingerprint
Storehouse is, by distribution KV (Key-Value, a key assignments) database realizing, to store data block Hash numerical value and data block mark
Know the corresponding relation between oid.In distributed implementation, in order to avoid Single Point of Faliure causes the problem of finger print data is lost,
A distributed data base is set up based on Redis in each client and each memory node, for storing KV databases.
Redis can make data fingerprint keep data consistency in each client as a reliable distributed data base, synchronous
Efficiency comparison is high, and when ensure that single memory node finger print data breaks down, finger print data is recovered when environment is restarted.
The application builds a set of distribution for Ceph and deletes compression and storage method again, during being issued to original data stream
The object identity (oid_id) of data block is kidnapped and redirected, and is then performed follow-up delete again and is compressed storage operation.This
The input object operated in application is the deblocking (hereinafter referred to as data block) in Ceph, data delete again by using data block as pair
As perform delete again, data compression then to need store data block be compressed, and then be issued to rear end storage (hard disk) in deposit
Storage.
Date storage method, method for reading data and the data-erasure method that the application is provided are realized in Rados layers, right
Upper strata RBD, RGW and file system are transparent, no to need to change upper layer identification code.
Referring particularly to Fig. 1, a kind of flow chart of the date storage method provided it illustrates the application, including:
Step 101, it is determined that the first data block to be stored, and first data block the first mark.
Step 102, judge in the cryptographic Hash of storage and the corresponding relation of object identity, if including the described first mark.
If including performing step 103, if do not included, performing step 104.Wherein described cryptographic Hash is corresponding with object identity to close
Object identity in system is the mark of data storage block.
The corresponding relation of the cryptographic Hash that is stored with distributed apparatus 200 and object identity, the cryptographic Hash and object identity
Corresponding relation can be specially a form.It can be searched thereon with the presence or absence of the first mark based on the form.If it is present saying
Bright current bottom has been stored with the first data block identified.
Step 103, the data block length and first mark according to first data block are corresponding already present
The data block length of second data block, it is determined that the write operation type performed to first data block, and to first data
Block is handled using the corresponding write operation method of the write operation type, obtains target data block.
Write operation type in the application can include rewriteeing and modification is write.
Rewriting refers to that data block (i.e. the first data block) to be written exists within the storage system, but this is to be written
The data block length of data block is equal to the data block length of already present data block in storage system, by the data to be written
Block monoblock overrides already present data block in storage system.
Modification, which is write, refers to that data block (i.e. the first data block) to be written exists within the storage system, and this is to be written
The data block length of data block is the part in the data block length of already present data block in storage system.
Thus, when the data block length of the first data block is equal to the number of corresponding already present second data block of the first mark
During according to block length, it is determined that the write operation type performed to first data block is rewriting;When the data block length of the first data block
When degree is less than the data block length of corresponding already present second data block of the first mark, it is determined that being performed to first data block
Write operation type write for modification.
Specifically, when it is determined that the write operation type performed to first data block is rewrites, the application is from cryptographic Hash
Corresponding relation with the corresponding relation of object identity, deleting first mark and the cryptographic Hash of second data block, and
Reference count value in second data block is subtracted one.And determine that first data block re-writed is target data block.
When it is determined that the write operation type performed to first data block is write for modification, the application is from cryptographic Hash and object
In the corresponding relation of mark, the corresponding relation of first mark and the cryptographic Hash of second data block is deleted, and will be described
Reference count value in second data block subtracts one.The second data block is obtained from bottom simultaneously, second data block is decompressed, obtains
To the second data content of second data block.And then merge the first of second data content and first data block
Data content, and the data block obtained after merging is defined as target data block.
The merging treatment process being related in the application can according to the second data content it is different from the first data content without
Together.If for example, the first data content is that the partial data in the second data content is replaced, the first data content is replaced
Change the partial data content of appropriate section in the second data content.If the first data content is to the second data content increased portion
Divide content, then increase by the first data content in the second data content.It is existing for the processing method that data merge in the application
There is mature technology, applicant will not be repeated here.
Especially in this application, when the reference count value in the second data block is 0, the application directly deletes described the
Two data blocks.
Step 104, it is target data block to determine first data block.
If not including the described first mark in the corresponding relation of cryptographic Hash and object identity, first data are directly determined
Block is target data block.
Step 105, the target cryptographic Hash of the target data block is calculated.
The application calculates the target cryptographic Hash of the target data block using preset hash algorithm, for the target data
The hash algorithm information that block is used can be stored in the stem metadata information of target data block.
Step 106, if in the corresponding relation of the cryptographic Hash and object identity, finding the target cryptographic Hash, then
The target data block is not stored, and finds the corresponding initial data of the target cryptographic Hash according to the target cryptographic Hash
Block, adds one by the reference count value in the original data block.
Preferably, the application can also store the corresponding relation of the target cryptographic Hash and the described first mark.
Step 107, if in the corresponding relation of the cryptographic Hash and object identity, the target cryptographic Hash is not found,
Then in the corresponding relation of the cryptographic Hash and object identity, increase the corresponding pass of the target cryptographic Hash and the described first mark
System, and be compressed processing to the target data block, storage compression handle after target data block, and by the number of targets
Add one according to the reference count value in block.
The application, can be by the target cryptographic Hash when storing the corresponding relation of the target cryptographic Hash and the described first mark
Corresponding relation synchronized update with the first mark is into KV databases.
In order to avoid storage is repeated, the application is after calculating obtains the target cryptographic Hash of the target data block, to target
Before data block is stored, first in the corresponding relation of cryptographic Hash and object identity, search whether that including the target breathes out
Uncommon value.If including, then it represents that target data block is stored into OSD;If not including, then it represents that target data block do not store to
In OSD.
In the application, if in the corresponding relation of cryptographic Hash and object identity, finding the target cryptographic Hash, then it represents that
Target data block stored mistake, therefore in order to avoid repeating to store, no longer stores the target data block, and breathe out according to target
Uncommon value finds the corresponding original data block of the target cryptographic Hash, adds one by the reference count value in the original data block,
Stored once with representing the target data block.
As the application preferably, if in the corresponding relation of cryptographic Hash and object identity, breathed out in the absence of the target
The corresponding relation of uncommon value and the described first mark, then increase the target Hash in the corresponding relation of cryptographic Hash and object identity
The corresponding relation of value and the described first mark, that is, realize the storage to the target cryptographic Hash and the corresponding relation of the first mark.
If in the corresponding relation of cryptographic Hash and object identity, the target cryptographic Hash is not found, then it represents that number of targets
It is not stored according to block in OSD.Therefore, the corresponding relation of the target cryptographic Hash and the described first mark is stored, i.e., in cryptographic Hash
Corresponding relation with increasing the target cryptographic Hash and the described first mark in the corresponding relation of object identity, and to the target
Data block is compressed processing, and then determines the storage region (target OSD) of target data block based on target cryptographic Hash, will compress
Target data block after processing is stored to the storage region of determination.Meanwhile, the application is by the reference meter in the target data block
Numerical value adds one.For the data block stored first, the concrete numerical value after its reference count value adds one is 1.
It should be noted that being compressed processing to the target data block for what is be related in the application step 107
Step, can also be performed after step 105, that is, after the target cryptographic Hash for calculating the target data block, just to target data block
Processing is compressed, the application is not construed as limiting to this execution sequence.
The above embodiments of the present application delete compression Realization of Storing again there is provided data, and system is deleted again using online, fixed
The method that long, block level, source are deleted again, in writing data blocks, is deleted after being finished again, then is compressed processing to data block,
And then it is sent to the storage of OSD ends.
Therefore the date storage method that application the application is provided, in data storage, if cryptographic Hash and object identity
Include the target cryptographic Hash of target data block in corresponding relation, then no longer store target data block, but by target cryptographic Hash
Reference count value in corresponding original data block adds one, and if not including mesh in the corresponding relation of cryptographic Hash and object identity
The target cryptographic Hash of data block is marked, then stores the corresponding relation of the target cryptographic Hash and the first mark, and to target data block
It is compressed processing, the target data block after storage compression processing.The application is in data storage procedure for many parts of identicals
Data will not repeat to store, it is ensured that memory space only stores a identical data, saves the memory space of storage medium.And
The application in data storage, storage be compression processing after data block, this is compared to the direct data storage of prior art
Means, reduce the memory space needed for data block, save the memory space of storage medium.
Based on the above embodiments of the present application, on the basis of the date storage method shown in Fig. 3, present invention also provides number
According to read method, as shown in figure 4, including:
Step 201, it is determined that the target identification for the target data block to be read.
Step 202, in the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification.Such as
Fruit includes, and performs step 203, if do not included, and performs step 206.The corresponding relation of wherein described cryptographic Hash and object identity
In object identity for data storage block mark.
Step 203, from the cryptographic Hash of storage and the corresponding relation of object identity, obtain corresponding with the target identification
Target cryptographic Hash.
Step 204, the target data block is obtained based on the target cryptographic Hash.
In the application, calculated by the target cryptographic Hash and obtain PG marks, identifying calculating by PG obtains OSD marks,
Target data block is obtained in OSD using OSD marks.Concrete methods of realizing prior art for the application step 204 is non-
Often ripe, applicant will not be repeated here.
Step 205, the target data block is decompressed, the data of the target data block are read.
Step 206, feedback error prompt message, wherein, the miscue information is used to represent not deposit in storage device
In the target identification.
If not including the target identification in the corresponding relation of the cryptographic Hash and object identity, then it represents that do not stored institute
The corresponding target data block of target identification is stated, therefore to terminal feedback error prompt message, to point out ownership goal mark wrong.
Based on the above embodiments of the present application, on the basis of the date storage method shown in Fig. 3, present invention also provides number
According to delet method, as shown in figure 5, including:
Step 301, it is determined that the target identification of the target data block to be deleted.
Step 302, in the corresponding relation for judging cryptographic Hash and object identity, if including the target identification.If bag
Include, perform step 303, if do not included, perform step 306.In the corresponding relation of wherein described cryptographic Hash and object identity
Object identity is the mark of data storage block.
Step 303, from the corresponding relation of cryptographic Hash and object identity, obtain target corresponding with the target identification and breathe out
Uncommon value.
Step 304, from the cryptographic Hash of storage and the corresponding relation of object identity, the target identification and the mesh are deleted
Mark the corresponding record of cryptographic Hash.
Step 305, the target data block is obtained based on the target cryptographic Hash, by the reference in the target data block
Count value subtracts one.
Especially, when the reference count value in the target data block is 0, illustrate all use delete target per family
The corresponding destination object of cryptographic Hash, it is possible thereby to the target data block be deleted, to discharge the memory space of storage medium.
Step 306, feedback error prompt message.Wherein, the miscue information is used to represent not deposit in storage device
In the target identification.
If not including the target identification in the corresponding relation of the cryptographic Hash and object identity, then it represents that do not stored institute
The corresponding target data block of target identification is stated, therefore to terminal feedback error prompt message, to point out ownership goal mark wrong.
If the function described in the present embodiment method is realized using in the form of SFU software functional unit and is used as independent product pin
Sell or in use, can be stored in a computing device read/write memory medium.Understood based on such, the embodiment of the present application
The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, and this is soft
Part product is stored in a storage medium, including some instructions to cause a computing device (can be personal computer,
Server, mobile computing device or network equipment etc.) perform all or part of step of the application each embodiment methods described
Suddenly.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), deposit at random
Access to memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
The embodiment of each in this specification is described by the way of progressive, what each embodiment was stressed be with it is other
Between the difference of embodiment, each embodiment same or similar part mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the application.
A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein
General Principle can in other embodiments be realized in the case where not departing from spirit herein or scope.Therefore, the application
The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one
The most wide scope caused.
Claims (11)
1. a kind of date storage method, it is characterised in that including:
It is determined that the first data block to be stored, and first data block the first mark;
Judge in the cryptographic Hash of storage and the corresponding relation of object identity, if including the described first mark;Wherein described Hash
Value and mark of the object identity in the corresponding relation of object identity for data storage block;
If including data block length and first mark corresponding already present the according to first data block
The data block length of two data blocks, it is determined that the write operation type performed to first data block, and to first data block
Handled using the corresponding write operation method of the write operation type, obtain target data block;
If do not included, it is determined that first data block is target data block;
Calculate the target cryptographic Hash of the target data block;
If in the corresponding relation of the cryptographic Hash and object identity, finding the target cryptographic Hash, then not storing the mesh
Data block is marked, and the corresponding original data block of the target cryptographic Hash is found according to the target cryptographic Hash, will be described original
Reference count value in data block adds one;
If in the corresponding relation of the cryptographic Hash and object identity, the target cryptographic Hash is not found, then in the Hash
Value and the corresponding relation in the corresponding relation of object identity, increasing the target cryptographic Hash and the described first mark, and to described
Target data block is compressed processing, the target data block after storage compression processing, and by drawing in the target data block
Plus one with count value.
2. according to the method described in claim 1, characterized in that, the data block length according to first data block with
And the data block length of corresponding already present second data block of first mark, it is determined that first data block is performed
Write operation type includes:
The data block length of first data block is equal to the data of corresponding already present second data block of the described first mark
During block length, it is determined that the write operation type performed to first data block is rewriting;
The data block length of first data block is less than the data of corresponding already present second data block of the described first mark
During block length, it is determined that the write operation type performed to first data block is write for modification.
3. method according to claim 2, characterized in that, when determine to first data block perform write operation class
It is described that first data block is handled using the corresponding write operation method of the write operation type when type is rewrites, obtain
Include to target data block:
From the corresponding relation of the cryptographic Hash and object identity, the Hash of first mark and second data block is deleted
The corresponding relation of value, and the reference count value in second data block is subtracted one;
It is target data block to determine first data block.
4. method according to claim 2, characterized in that, when determine to first data block perform write operation class
It is described that first data block is handled using the corresponding write operation method of the write operation type when type is write for modification,
Obtaining target data block includes:
From the corresponding relation of the cryptographic Hash and object identity, the Hash of first mark and second data block is deleted
The corresponding relation of value, and the reference count value in second data block is subtracted one;
Second data block is decompressed, the second data content of second data block is obtained;
Merge the first data content of second data content and first data block, and the data block that will be obtained after merging
It is defined as target data block.
5. method according to claim 3 or 4, characterized in that, methods described also includes:
When the reference count value in second data block is 0, second data block is deleted.
6. a kind of method for reading data, it is characterised in that including:
It is determined that the target identification for the target data block to be read;
From the cryptographic Hash of storage and the corresponding relation of object identity, target cryptographic Hash corresponding with the target identification is obtained;
The target data block is obtained based on the target cryptographic Hash;
The target data block is decompressed, the data of the target data block are read.
7. method according to claim 6, characterized in that, it is described determine the target data block to be read target identification
Afterwards, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;Wherein described cryptographic Hash
With mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target identification
The step of corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent to be not present in storage device
The target identification.
8. a kind of data-erasure method, it is characterised in that including:
It is determined that the target identification of the target data block to be deleted;
From the cryptographic Hash of storage and the corresponding relation of object identity, target cryptographic Hash corresponding with the target identification is obtained;
From the cryptographic Hash of storage and the corresponding relation of object identity, the target identification and pair of the target cryptographic Hash are deleted
It should record;
The target data block is obtained based on the target cryptographic Hash, the reference count value in the target data block is subtracted one.
9. method according to claim 8, it is characterised in that methods described also includes:
When the reference count value in the target data block is 0, the target data block is deleted.
10. method according to claim 8 or claim 9, it is characterised in that the target of the target data block to be deleted of the determination
After mark, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;Wherein described cryptographic Hash
With mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target identification
The step of corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent to be not present in storage device
The target identification.
11. a kind of data operation system, it is characterised in that including:
Terminal, for sending file to distributed apparatus;
Distributed apparatus, obtains multiple data blocks, wherein the stem of each data block includes for performing pondization operation to file
Metadata information, the metadata information includes the reference count information of data block, and sets cryptographic Hash and object identity
Corresponding relation, using the cryptographic Hash and the corresponding relation of object identity, perform data storage operations, data read operation and
Data deletion action.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710392823.7A CN107229420B (en) | 2017-05-27 | 2017-05-27 | Data storage method, reading method, deleting method and data operating system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710392823.7A CN107229420B (en) | 2017-05-27 | 2017-05-27 | Data storage method, reading method, deleting method and data operating system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107229420A true CN107229420A (en) | 2017-10-03 |
CN107229420B CN107229420B (en) | 2020-05-26 |
Family
ID=59933455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710392823.7A Active CN107229420B (en) | 2017-05-27 | 2017-05-27 | Data storage method, reading method, deleting method and data operating system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107229420B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009025A (en) * | 2017-12-13 | 2018-05-08 | 北京小米移动软件有限公司 | Date storage method and device |
CN108197159A (en) * | 2017-12-11 | 2018-06-22 | 厦门集微科技有限公司 | Digital independent, wiring method and device based on distributed file system |
CN109086172A (en) * | 2018-09-21 | 2018-12-25 | 郑州云海信息技术有限公司 | A kind of method and relevant apparatus of data processing |
CN110399340A (en) * | 2019-06-28 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of document handling method and device |
CN111090620A (en) * | 2019-12-06 | 2020-05-01 | 浪潮电子信息产业股份有限公司 | File storage method, device, equipment and readable storage medium |
CN111258502A (en) * | 2020-01-13 | 2020-06-09 | 深信服科技股份有限公司 | Data deleting method, device, equipment and computer readable storage medium |
WO2020140622A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Distributed storage system, storage node device and data duplicate deletion method |
CN112099725A (en) * | 2019-06-17 | 2020-12-18 | 华为技术有限公司 | Data processing method and device and computer readable storage medium |
CN113467721A (en) * | 2021-07-22 | 2021-10-01 | 杭州海康威视数字技术股份有限公司 | Data deleting system, method and device |
CN113721836A (en) * | 2021-06-15 | 2021-11-30 | 荣耀终端有限公司 | Data deduplication method and device |
CN113885785A (en) * | 2021-06-15 | 2022-01-04 | 荣耀终端有限公司 | Data deduplication method and device |
CN114442961A (en) * | 2022-02-07 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Data processing method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908073A (en) * | 2010-08-13 | 2010-12-08 | 清华大学 | Method for deleting duplicated data in file system in real time |
CN101917396A (en) * | 2010-06-25 | 2010-12-15 | 清华大学 | Real-time repetition removal and transmission method for data in network file system |
CN102629247A (en) * | 2011-12-31 | 2012-08-08 | 成都市华为赛门铁克科技有限公司 | Method, device and system for data processing |
CN103154950A (en) * | 2012-05-04 | 2013-06-12 | 华为技术有限公司 | Repeated data deleting method and device |
CN106406759A (en) * | 2016-09-13 | 2017-02-15 | 郑州云海信息技术有限公司 | Data storage method and device |
-
2017
- 2017-05-27 CN CN201710392823.7A patent/CN107229420B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101917396A (en) * | 2010-06-25 | 2010-12-15 | 清华大学 | Real-time repetition removal and transmission method for data in network file system |
CN101908073A (en) * | 2010-08-13 | 2010-12-08 | 清华大学 | Method for deleting duplicated data in file system in real time |
CN102629247A (en) * | 2011-12-31 | 2012-08-08 | 成都市华为赛门铁克科技有限公司 | Method, device and system for data processing |
CN103154950A (en) * | 2012-05-04 | 2013-06-12 | 华为技术有限公司 | Repeated data deleting method and device |
CN106406759A (en) * | 2016-09-13 | 2017-02-15 | 郑州云海信息技术有限公司 | Data storage method and device |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197159A (en) * | 2017-12-11 | 2018-06-22 | 厦门集微科技有限公司 | Digital independent, wiring method and device based on distributed file system |
CN108197159B (en) * | 2017-12-11 | 2020-07-10 | 厦门集微科技有限公司 | Data reading and writing method and device based on distributed file system |
CN108009025A (en) * | 2017-12-13 | 2018-05-08 | 北京小米移动软件有限公司 | Date storage method and device |
CN109086172A (en) * | 2018-09-21 | 2018-12-25 | 郑州云海信息技术有限公司 | A kind of method and relevant apparatus of data processing |
WO2020140622A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Distributed storage system, storage node device and data duplicate deletion method |
CN112099725A (en) * | 2019-06-17 | 2020-12-18 | 华为技术有限公司 | Data processing method and device and computer readable storage medium |
US11797204B2 (en) | 2019-06-17 | 2023-10-24 | Huawei Technologies Co., Ltd. | Data compression processing method and apparatus, and computer-readable storage medium |
CN110399340A (en) * | 2019-06-28 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of document handling method and device |
CN111090620A (en) * | 2019-12-06 | 2020-05-01 | 浪潮电子信息产业股份有限公司 | File storage method, device, equipment and readable storage medium |
WO2021109587A1 (en) * | 2019-12-06 | 2021-06-10 | 浪潮电子信息产业股份有限公司 | File storage method and apparatus, and device and readable storage medium |
CN111090620B (en) * | 2019-12-06 | 2022-04-22 | 浪潮电子信息产业股份有限公司 | File storage method, device, equipment and readable storage medium |
CN111258502A (en) * | 2020-01-13 | 2020-06-09 | 深信服科技股份有限公司 | Data deleting method, device, equipment and computer readable storage medium |
CN113721836A (en) * | 2021-06-15 | 2021-11-30 | 荣耀终端有限公司 | Data deduplication method and device |
CN113885785A (en) * | 2021-06-15 | 2022-01-04 | 荣耀终端有限公司 | Data deduplication method and device |
CN113885785B (en) * | 2021-06-15 | 2022-07-26 | 荣耀终端有限公司 | Data deduplication method and device |
CN113467721A (en) * | 2021-07-22 | 2021-10-01 | 杭州海康威视数字技术股份有限公司 | Data deleting system, method and device |
CN114442961A (en) * | 2022-02-07 | 2022-05-06 | 苏州浪潮智能科技有限公司 | Data processing method and device, computer equipment and storage medium |
CN114442961B (en) * | 2022-02-07 | 2023-08-08 | 苏州浪潮智能科技有限公司 | Data processing method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107229420B (en) | 2020-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107229420A (en) | Date storage method, read method, delet method and data operation system | |
US10776396B2 (en) | Computer implemented method for dynamic sharding | |
CN102246137B (en) | Delta compression after the deletion of identity copy | |
US7117294B1 (en) | Method and system for archiving and compacting data in a data storage array | |
CN102693302B (en) | Quick file comparison method, system and client side | |
US8719237B2 (en) | Method and apparatus for deleting duplicate data | |
CN107220005A (en) | A kind of data manipulation method and system | |
CN102968498A (en) | Method and device for processing data | |
CN103959256A (en) | Fingerprint-based data deduplication | |
US20180004786A1 (en) | Incremental bloom filter rebuild for b+ trees under multi-version concurrency control | |
US10862672B2 (en) | Witness blocks in blockchain applications | |
CN105009067A (en) | Managing operations on stored data units | |
CN108090125B (en) | Non-query type repeated data deleting method and device | |
US11620065B2 (en) | Variable length deduplication of stored data | |
CN112612576B (en) | Virtual machine backup method and device, electronic equipment and storage medium | |
CN104036187A (en) | Method and system for determining computer virus types | |
CN105493080B (en) | The method and apparatus of data de-duplication based on context-aware | |
CN104246718A (en) | Segment combining for deduplication | |
CN113767378A (en) | File system metadata deduplication | |
US10579586B1 (en) | Distributed hash object archive system | |
KR20140050999A (en) | Device and method of data compression and computer-readable recording medium thereof | |
CN109947776B (en) | Data compression and decompression method and device | |
US10949088B1 (en) | Method or an apparatus for having perfect deduplication, adapted for saving space in a deduplication file system | |
CN115543918A (en) | File snapshot method, system, electronic equipment and storage medium | |
CN104866535A (en) | Compression method and device of number segment records |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200428 Address after: 215100 No. 1 Guanpu Road, Guoxiang Street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd. Address before: 450018 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601 Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |