CN107229420A - Date storage method, read method, delet method and data operation system - Google Patents

Date storage method, read method, delet method and data operation system Download PDF

Info

Publication number
CN107229420A
CN107229420A CN201710392823.7A CN201710392823A CN107229420A CN 107229420 A CN107229420 A CN 107229420A CN 201710392823 A CN201710392823 A CN 201710392823A CN 107229420 A CN107229420 A CN 107229420A
Authority
CN
China
Prior art keywords
data block
target
cryptographic hash
data
corresponding relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710392823.7A
Other languages
Chinese (zh)
Other versions
CN107229420B (en
Inventor
王利朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710392823.7A priority Critical patent/CN107229420B/en
Publication of CN107229420A publication Critical patent/CN107229420A/en
Application granted granted Critical
Publication of CN107229420B publication Critical patent/CN107229420B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a kind of date storage method, including determines the first data block to be stored, and the first data block the first mark;In the corresponding relation for judging cryptographic Hash and object identity, if including the first mark;If including being handled using the write operation method determined the first data block, obtaining target data block;If do not included, it is target data block to determine the first data block;Calculate the target cryptographic Hash of target data block;If in the corresponding relation of cryptographic Hash and object identity, finding target cryptographic Hash, then target data block is not stored, and find the corresponding original data block of target cryptographic Hash, the reference count value in original data block is added one;If not finding target cryptographic Hash, the corresponding relation of storage target cryptographic Hash and the first mark, and processing, the target data block after storage compression processing are compressed to target data block.The application saves the memory space of storage medium.

Description

Date storage method, read method, delet method and data operation system
Technical field
The application is related to field of computer technology, more particularly to a kind of date storage method, read method, delet method and Data operation system.
Background technology
Ceph refers to a distributed unified storage system increased income, and Ceph design object is in cheap storage Jie The storage system of a set of high-performance, high scalability and high availability is built in matter, file storage, block storage and object are externally provided A set of unified storage system of storage.
But the storage medium based on Ceph is in data storage, because user may be more to identical data storage Secondary, then will to store many parts of identical data on a storage medium accordingly, the storage that this undoubtedly wastes storage medium is empty Between.
The content of the invention
In view of this, the application provides a kind of date storage method, read method, delet method and data operation system, To save the memory space of storage medium.Technical scheme is as follows:
One side based on the application, the application provides a kind of date storage method, including:
It is determined that the first data block to be stored, and first data block the first mark;
Judge in the cryptographic Hash of storage and the corresponding relation of object identity, if including the described first mark;It is wherein described Object identity in the corresponding relation of cryptographic Hash and object identity is the mark of data storage block;
If including the data block length and first mark according to first data block are corresponding existing The second data block data block length, it is determined that to first data block perform write operation type, and to described first count Handled according to block using the corresponding write operation method of the write operation type, obtain target data block;
If do not included, it is determined that first data block is target data block;
Calculate the target cryptographic Hash of the target data block;
If in the corresponding relation of the cryptographic Hash and object identity, finding the target cryptographic Hash, then not storing institute Target data block is stated, and the corresponding original data block of the target cryptographic Hash is found according to the target cryptographic Hash, will be described Reference count value in original data block adds one;
If in the corresponding relation of the cryptographic Hash and object identity, the target cryptographic Hash is not found, then described In the corresponding relation of cryptographic Hash and object identity, increase the corresponding relation of the target cryptographic Hash and the described first mark, and it is right The target data block is compressed processing, stores the target data block after compression processing, and by the target data block Reference count value add one.
Preferably, the data block length and first mark according to first data block is corresponding existing The second data block data block length, it is determined that to first data block perform write operation type include:
The data block length of first data block is equal to corresponding already present second data block of the described first mark During data block length, it is determined that the write operation type performed to first data block is rewriting;
The data block length of first data block is less than corresponding already present second data block of the described first mark During data block length, it is determined that the write operation type performed to first data block is write for modification.
Preferably, it is described to described first when it is determined that the write operation type performed to first data block is rewrites Data block is handled using the corresponding write operation method of the write operation type, and obtaining target data block includes:
From the corresponding relation of the cryptographic Hash and object identity, first mark and second data block are deleted The corresponding relation of cryptographic Hash, and the reference count value in second data block is subtracted one;
It is target data block to determine first data block.
Preferably, it is described to described the when it is determined that the write operation type performed to first data block is write for modification One data block is handled using the corresponding write operation method of the write operation type, and obtaining target data block includes:
From the corresponding relation of the cryptographic Hash and object identity, first mark and second data block are deleted The corresponding relation of cryptographic Hash, and the reference count value in second data block is subtracted one;
Second data block is decompressed, the second data content of second data block is obtained;
Merge the first data content of second data content and first data block, and the number that will be obtained after merging It is defined as target data block according to block.
Preferably, methods described also includes:
When the reference count value in second data block is 0, second data block is deleted.
Another aspect based on the application, the application also provides a kind of method for reading data, including:
It is determined that the target identification for the target data block to be read;
From the cryptographic Hash of storage and the corresponding relation of object identity, target Hash corresponding with the target identification is obtained Value;
The target data block is obtained based on the target cryptographic Hash;
The target data block is decompressed, the data of the target data block are read.
Preferably, after the target identification for determining the target data block to be read, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;It is wherein described to breathe out Uncommon value and mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target The step of identifying corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent in storage device not There is the target identification.
Another further aspect based on the application, the application also provides a kind of data-erasure method, including:
It is determined that the target identification of the target data block to be deleted;
From the cryptographic Hash of storage and the corresponding relation of object identity, target Hash corresponding with the target identification is obtained Value;
From the cryptographic Hash of storage and the corresponding relation of object identity, the target identification and the target cryptographic Hash are deleted Corresponding record;
The target data block is obtained based on the target cryptographic Hash, the reference count value in the target data block is subtracted One.
Preferably, methods described also includes:
When the reference count value in the target data block is 0, the target data block is deleted.
Preferably, after the target identification for determining the target data block to be deleted, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;It is wherein described to breathe out Uncommon value and mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target The step of identifying corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent in storage device not There is the target identification.
Another further aspect based on the application, the application also provides a kind of data operation system, including:
Terminal, for sending file to distributed apparatus;
Distributed apparatus, obtains multiple data blocks, wherein the stem of each data block for performing pondization operation to file Including metadata information, the metadata information includes the reference count information of data block, and sets cryptographic Hash and object mark The corresponding relation of knowledge, using the cryptographic Hash and the corresponding relation of object identity, performs data storage operations, data read operation And data deletion action.
The application is in data storage, if including target data block in the corresponding relation of cryptographic Hash and object identity Target cryptographic Hash, then no longer store target data block, but by the reference count in the corresponding original data block of target cryptographic Hash Value Jia one, and if not including the target cryptographic Hash of target data block in the corresponding relation of cryptographic Hash and object identity, then stores The corresponding relation of the target cryptographic Hash and the first mark, and processing is compressed to target data block, after storage compression processing Target data block.The application will not repeat to store in data storage procedure for many parts of identical data, it is ensured that storage is empty Between only store a identical data, save the memory space of storage medium.And the application is in data storage, storage is Data block after compression processing, this is compared to the means of the direct data storage of prior art, and the application reduced needed for data block Memory space, save the memory space of storage medium.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this The embodiment of application, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.
A kind of structural representation for data operation system that Fig. 1 provides for the embodiment of the present application;
Fig. 2 is the form schematic diagram of the data block of OSD ends storage in the application;
A kind of flow chart for date storage method that Fig. 3 provides for the embodiment of the present application;
A kind of flow chart for method for reading data that Fig. 4 provides for the embodiment of the present application;
A kind of flow chart for data-erasure method that Fig. 5 provides for the embodiment of the present application.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Site preparation is described, it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.It is based on Embodiment in the application, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of the application protection.
Term is explained:
Distributed storage:By in the scattered storage of data to multiple data storage servers.
PG:Placement Groups, placement group.Virtual concept in distributed apparatus.
OSD:Object-based Storage Device, object storage device.
Cryptographic Hash:The data value obtained after hashing operation, referred to as cryptographic Hash are performed to object in the present embodiment.
For convenience skilled in the art realises that the application scenarios of the application there is provided a kind of data operation system.Ginseng See Fig. 1, specifically include terminal 100 and distributed apparatus 200.
Terminal 100 is used to send file to distributed apparatus 200.
Distributed apparatus 200 is used to perform file the multiple data blocks (oid) of pondization operation acquisition, each of which data block tool There is an object identity (oid_id).The stem of wherein each data block includes metadata information, and the metadata information is included The reference count information of data block, and distributed apparatus 200 are provided with the corresponding relation of cryptographic Hash and object identity, utilize Kazakhstan Uncommon value and the corresponding relation of object identity, perform data storage operations, data deletion action and data read operation.
In the application, data block can be after execution be deleted again, then is compressed processing, due to the length of the data block after compression It is not consistent, therefore the length of the data block stored at OSD ends is not fixed.Wherein, data block is stored in the form at OSD ends As shown in Fig. 2 the size of each data block is not regular length.Especially, the application is added in the stem of data block Metadata information, the metadata information includes the information such as reference count, hash algorithm used and the compression algorithm of the data block, after End stores the True Data of the data block after compression processing.
Reference count represents the storage number of times of identical data, also represents the number of times that data block is redundantly stored.
The corresponding relation of the cryptographic Hash that is stored with distributed apparatus 200 and object identity.Cryptographic Hash is that data block is performed The data value obtained after hashing operation, for uniquely representing a data block.Object identity is expression one in distributed apparatus The mark of data block.In distributed apparatus, after a data block is stored to OSD, in cryptographic Hash and pair of object identity The corresponding relation of the middle object identity and cryptographic Hash for setting up the data block should be related to.
Specifically, distributed apparatus 200 can utilize fingerprint library storage cryptographic Hash and the corresponding relation of object identity.Fingerprint Storehouse is, by distribution KV (Key-Value, a key assignments) database realizing, to store data block Hash numerical value and data block mark Know the corresponding relation between oid.In distributed implementation, in order to avoid Single Point of Faliure causes the problem of finger print data is lost, A distributed data base is set up based on Redis in each client and each memory node, for storing KV databases. Redis can make data fingerprint keep data consistency in each client as a reliable distributed data base, synchronous Efficiency comparison is high, and when ensure that single memory node finger print data breaks down, finger print data is recovered when environment is restarted.
The application builds a set of distribution for Ceph and deletes compression and storage method again, during being issued to original data stream The object identity (oid_id) of data block is kidnapped and redirected, and is then performed follow-up delete again and is compressed storage operation.This The input object operated in application is the deblocking (hereinafter referred to as data block) in Ceph, data delete again by using data block as pair As perform delete again, data compression then to need store data block be compressed, and then be issued to rear end storage (hard disk) in deposit Storage.
Date storage method, method for reading data and the data-erasure method that the application is provided are realized in Rados layers, right Upper strata RBD, RGW and file system are transparent, no to need to change upper layer identification code.
Referring particularly to Fig. 1, a kind of flow chart of the date storage method provided it illustrates the application, including:
Step 101, it is determined that the first data block to be stored, and first data block the first mark.
Step 102, judge in the cryptographic Hash of storage and the corresponding relation of object identity, if including the described first mark. If including performing step 103, if do not included, performing step 104.Wherein described cryptographic Hash is corresponding with object identity to close Object identity in system is the mark of data storage block.
The corresponding relation of the cryptographic Hash that is stored with distributed apparatus 200 and object identity, the cryptographic Hash and object identity Corresponding relation can be specially a form.It can be searched thereon with the presence or absence of the first mark based on the form.If it is present saying Bright current bottom has been stored with the first data block identified.
Step 103, the data block length and first mark according to first data block are corresponding already present The data block length of second data block, it is determined that the write operation type performed to first data block, and to first data Block is handled using the corresponding write operation method of the write operation type, obtains target data block.
Write operation type in the application can include rewriteeing and modification is write.
Rewriting refers to that data block (i.e. the first data block) to be written exists within the storage system, but this is to be written The data block length of data block is equal to the data block length of already present data block in storage system, by the data to be written Block monoblock overrides already present data block in storage system.
Modification, which is write, refers to that data block (i.e. the first data block) to be written exists within the storage system, and this is to be written The data block length of data block is the part in the data block length of already present data block in storage system.
Thus, when the data block length of the first data block is equal to the number of corresponding already present second data block of the first mark During according to block length, it is determined that the write operation type performed to first data block is rewriting;When the data block length of the first data block When degree is less than the data block length of corresponding already present second data block of the first mark, it is determined that being performed to first data block Write operation type write for modification.
Specifically, when it is determined that the write operation type performed to first data block is rewrites, the application is from cryptographic Hash Corresponding relation with the corresponding relation of object identity, deleting first mark and the cryptographic Hash of second data block, and Reference count value in second data block is subtracted one.And determine that first data block re-writed is target data block.
When it is determined that the write operation type performed to first data block is write for modification, the application is from cryptographic Hash and object In the corresponding relation of mark, the corresponding relation of first mark and the cryptographic Hash of second data block is deleted, and will be described Reference count value in second data block subtracts one.The second data block is obtained from bottom simultaneously, second data block is decompressed, obtains To the second data content of second data block.And then merge the first of second data content and first data block Data content, and the data block obtained after merging is defined as target data block.
The merging treatment process being related in the application can according to the second data content it is different from the first data content without Together.If for example, the first data content is that the partial data in the second data content is replaced, the first data content is replaced Change the partial data content of appropriate section in the second data content.If the first data content is to the second data content increased portion Divide content, then increase by the first data content in the second data content.It is existing for the processing method that data merge in the application There is mature technology, applicant will not be repeated here.
Especially in this application, when the reference count value in the second data block is 0, the application directly deletes described the Two data blocks.
Step 104, it is target data block to determine first data block.
If not including the described first mark in the corresponding relation of cryptographic Hash and object identity, first data are directly determined Block is target data block.
Step 105, the target cryptographic Hash of the target data block is calculated.
The application calculates the target cryptographic Hash of the target data block using preset hash algorithm, for the target data The hash algorithm information that block is used can be stored in the stem metadata information of target data block.
Step 106, if in the corresponding relation of the cryptographic Hash and object identity, finding the target cryptographic Hash, then The target data block is not stored, and finds the corresponding initial data of the target cryptographic Hash according to the target cryptographic Hash Block, adds one by the reference count value in the original data block.
Preferably, the application can also store the corresponding relation of the target cryptographic Hash and the described first mark.
Step 107, if in the corresponding relation of the cryptographic Hash and object identity, the target cryptographic Hash is not found, Then in the corresponding relation of the cryptographic Hash and object identity, increase the corresponding pass of the target cryptographic Hash and the described first mark System, and be compressed processing to the target data block, storage compression handle after target data block, and by the number of targets Add one according to the reference count value in block.
The application, can be by the target cryptographic Hash when storing the corresponding relation of the target cryptographic Hash and the described first mark Corresponding relation synchronized update with the first mark is into KV databases.
In order to avoid storage is repeated, the application is after calculating obtains the target cryptographic Hash of the target data block, to target Before data block is stored, first in the corresponding relation of cryptographic Hash and object identity, search whether that including the target breathes out Uncommon value.If including, then it represents that target data block is stored into OSD;If not including, then it represents that target data block do not store to In OSD.
In the application, if in the corresponding relation of cryptographic Hash and object identity, finding the target cryptographic Hash, then it represents that Target data block stored mistake, therefore in order to avoid repeating to store, no longer stores the target data block, and breathe out according to target Uncommon value finds the corresponding original data block of the target cryptographic Hash, adds one by the reference count value in the original data block, Stored once with representing the target data block.
As the application preferably, if in the corresponding relation of cryptographic Hash and object identity, breathed out in the absence of the target The corresponding relation of uncommon value and the described first mark, then increase the target Hash in the corresponding relation of cryptographic Hash and object identity The corresponding relation of value and the described first mark, that is, realize the storage to the target cryptographic Hash and the corresponding relation of the first mark.
If in the corresponding relation of cryptographic Hash and object identity, the target cryptographic Hash is not found, then it represents that number of targets It is not stored according to block in OSD.Therefore, the corresponding relation of the target cryptographic Hash and the described first mark is stored, i.e., in cryptographic Hash Corresponding relation with increasing the target cryptographic Hash and the described first mark in the corresponding relation of object identity, and to the target Data block is compressed processing, and then determines the storage region (target OSD) of target data block based on target cryptographic Hash, will compress Target data block after processing is stored to the storage region of determination.Meanwhile, the application is by the reference meter in the target data block Numerical value adds one.For the data block stored first, the concrete numerical value after its reference count value adds one is 1.
It should be noted that being compressed processing to the target data block for what is be related in the application step 107 Step, can also be performed after step 105, that is, after the target cryptographic Hash for calculating the target data block, just to target data block Processing is compressed, the application is not construed as limiting to this execution sequence.
The above embodiments of the present application delete compression Realization of Storing again there is provided data, and system is deleted again using online, fixed The method that long, block level, source are deleted again, in writing data blocks, is deleted after being finished again, then is compressed processing to data block, And then it is sent to the storage of OSD ends.
Therefore the date storage method that application the application is provided, in data storage, if cryptographic Hash and object identity Include the target cryptographic Hash of target data block in corresponding relation, then no longer store target data block, but by target cryptographic Hash Reference count value in corresponding original data block adds one, and if not including mesh in the corresponding relation of cryptographic Hash and object identity The target cryptographic Hash of data block is marked, then stores the corresponding relation of the target cryptographic Hash and the first mark, and to target data block It is compressed processing, the target data block after storage compression processing.The application is in data storage procedure for many parts of identicals Data will not repeat to store, it is ensured that memory space only stores a identical data, saves the memory space of storage medium.And The application in data storage, storage be compression processing after data block, this is compared to the direct data storage of prior art Means, reduce the memory space needed for data block, save the memory space of storage medium.
Based on the above embodiments of the present application, on the basis of the date storage method shown in Fig. 3, present invention also provides number According to read method, as shown in figure 4, including:
Step 201, it is determined that the target identification for the target data block to be read.
Step 202, in the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification.Such as Fruit includes, and performs step 203, if do not included, and performs step 206.The corresponding relation of wherein described cryptographic Hash and object identity In object identity for data storage block mark.
Step 203, from the cryptographic Hash of storage and the corresponding relation of object identity, obtain corresponding with the target identification Target cryptographic Hash.
Step 204, the target data block is obtained based on the target cryptographic Hash.
In the application, calculated by the target cryptographic Hash and obtain PG marks, identifying calculating by PG obtains OSD marks, Target data block is obtained in OSD using OSD marks.Concrete methods of realizing prior art for the application step 204 is non- Often ripe, applicant will not be repeated here.
Step 205, the target data block is decompressed, the data of the target data block are read.
Step 206, feedback error prompt message, wherein, the miscue information is used to represent not deposit in storage device In the target identification.
If not including the target identification in the corresponding relation of the cryptographic Hash and object identity, then it represents that do not stored institute The corresponding target data block of target identification is stated, therefore to terminal feedback error prompt message, to point out ownership goal mark wrong.
Based on the above embodiments of the present application, on the basis of the date storage method shown in Fig. 3, present invention also provides number According to delet method, as shown in figure 5, including:
Step 301, it is determined that the target identification of the target data block to be deleted.
Step 302, in the corresponding relation for judging cryptographic Hash and object identity, if including the target identification.If bag Include, perform step 303, if do not included, perform step 306.In the corresponding relation of wherein described cryptographic Hash and object identity Object identity is the mark of data storage block.
Step 303, from the corresponding relation of cryptographic Hash and object identity, obtain target corresponding with the target identification and breathe out Uncommon value.
Step 304, from the cryptographic Hash of storage and the corresponding relation of object identity, the target identification and the mesh are deleted Mark the corresponding record of cryptographic Hash.
Step 305, the target data block is obtained based on the target cryptographic Hash, by the reference in the target data block Count value subtracts one.
Especially, when the reference count value in the target data block is 0, illustrate all use delete target per family The corresponding destination object of cryptographic Hash, it is possible thereby to the target data block be deleted, to discharge the memory space of storage medium.
Step 306, feedback error prompt message.Wherein, the miscue information is used to represent not deposit in storage device In the target identification.
If not including the target identification in the corresponding relation of the cryptographic Hash and object identity, then it represents that do not stored institute The corresponding target data block of target identification is stated, therefore to terminal feedback error prompt message, to point out ownership goal mark wrong.
If the function described in the present embodiment method is realized using in the form of SFU software functional unit and is used as independent product pin Sell or in use, can be stored in a computing device read/write memory medium.Understood based on such, the embodiment of the present application The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, and this is soft Part product is stored in a storage medium, including some instructions to cause a computing device (can be personal computer, Server, mobile computing device or network equipment etc.) perform all or part of step of the application each embodiment methods described Suddenly.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), deposit at random Access to memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
The embodiment of each in this specification is described by the way of progressive, what each embodiment was stressed be with it is other Between the difference of embodiment, each embodiment same or similar part mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the application. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can in other embodiments be realized in the case where not departing from spirit herein or scope.Therefore, the application The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (11)

1. a kind of date storage method, it is characterised in that including:
It is determined that the first data block to be stored, and first data block the first mark;
Judge in the cryptographic Hash of storage and the corresponding relation of object identity, if including the described first mark;Wherein described Hash Value and mark of the object identity in the corresponding relation of object identity for data storage block;
If including data block length and first mark corresponding already present the according to first data block The data block length of two data blocks, it is determined that the write operation type performed to first data block, and to first data block Handled using the corresponding write operation method of the write operation type, obtain target data block;
If do not included, it is determined that first data block is target data block;
Calculate the target cryptographic Hash of the target data block;
If in the corresponding relation of the cryptographic Hash and object identity, finding the target cryptographic Hash, then not storing the mesh Data block is marked, and the corresponding original data block of the target cryptographic Hash is found according to the target cryptographic Hash, will be described original Reference count value in data block adds one;
If in the corresponding relation of the cryptographic Hash and object identity, the target cryptographic Hash is not found, then in the Hash Value and the corresponding relation in the corresponding relation of object identity, increasing the target cryptographic Hash and the described first mark, and to described Target data block is compressed processing, the target data block after storage compression processing, and by drawing in the target data block Plus one with count value.
2. according to the method described in claim 1, characterized in that, the data block length according to first data block with And the data block length of corresponding already present second data block of first mark, it is determined that first data block is performed Write operation type includes:
The data block length of first data block is equal to the data of corresponding already present second data block of the described first mark During block length, it is determined that the write operation type performed to first data block is rewriting;
The data block length of first data block is less than the data of corresponding already present second data block of the described first mark During block length, it is determined that the write operation type performed to first data block is write for modification.
3. method according to claim 2, characterized in that, when determine to first data block perform write operation class It is described that first data block is handled using the corresponding write operation method of the write operation type when type is rewrites, obtain Include to target data block:
From the corresponding relation of the cryptographic Hash and object identity, the Hash of first mark and second data block is deleted The corresponding relation of value, and the reference count value in second data block is subtracted one;
It is target data block to determine first data block.
4. method according to claim 2, characterized in that, when determine to first data block perform write operation class It is described that first data block is handled using the corresponding write operation method of the write operation type when type is write for modification, Obtaining target data block includes:
From the corresponding relation of the cryptographic Hash and object identity, the Hash of first mark and second data block is deleted The corresponding relation of value, and the reference count value in second data block is subtracted one;
Second data block is decompressed, the second data content of second data block is obtained;
Merge the first data content of second data content and first data block, and the data block that will be obtained after merging It is defined as target data block.
5. method according to claim 3 or 4, characterized in that, methods described also includes:
When the reference count value in second data block is 0, second data block is deleted.
6. a kind of method for reading data, it is characterised in that including:
It is determined that the target identification for the target data block to be read;
From the cryptographic Hash of storage and the corresponding relation of object identity, target cryptographic Hash corresponding with the target identification is obtained;
The target data block is obtained based on the target cryptographic Hash;
The target data block is decompressed, the data of the target data block are read.
7. method according to claim 6, characterized in that, it is described determine the target data block to be read target identification Afterwards, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;Wherein described cryptographic Hash With mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target identification The step of corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent to be not present in storage device The target identification.
8. a kind of data-erasure method, it is characterised in that including:
It is determined that the target identification of the target data block to be deleted;
From the cryptographic Hash of storage and the corresponding relation of object identity, target cryptographic Hash corresponding with the target identification is obtained;
From the cryptographic Hash of storage and the corresponding relation of object identity, the target identification and pair of the target cryptographic Hash are deleted It should record;
The target data block is obtained based on the target cryptographic Hash, the reference count value in the target data block is subtracted one.
9. method according to claim 8, it is characterised in that methods described also includes:
When the reference count value in the target data block is 0, the target data block is deleted.
10. method according to claim 8 or claim 9, it is characterised in that the target of the target data block to be deleted of the determination After mark, methods described also includes:
In the corresponding relation for judging the cryptographic Hash and object identity, if including the target identification;Wherein described cryptographic Hash With mark of the object identity in the corresponding relation of object identity for data storage block;
If including performing described from the cryptographic Hash of storage and the corresponding relation of object identity, acquisition and the target identification The step of corresponding target cryptographic Hash;
If do not included, feedback error prompt message, wherein, the miscue information is used to represent to be not present in storage device The target identification.
11. a kind of data operation system, it is characterised in that including:
Terminal, for sending file to distributed apparatus;
Distributed apparatus, obtains multiple data blocks, wherein the stem of each data block includes for performing pondization operation to file Metadata information, the metadata information includes the reference count information of data block, and sets cryptographic Hash and object identity Corresponding relation, using the cryptographic Hash and the corresponding relation of object identity, perform data storage operations, data read operation and Data deletion action.
CN201710392823.7A 2017-05-27 2017-05-27 Data storage method, reading method, deleting method and data operating system Active CN107229420B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710392823.7A CN107229420B (en) 2017-05-27 2017-05-27 Data storage method, reading method, deleting method and data operating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710392823.7A CN107229420B (en) 2017-05-27 2017-05-27 Data storage method, reading method, deleting method and data operating system

Publications (2)

Publication Number Publication Date
CN107229420A true CN107229420A (en) 2017-10-03
CN107229420B CN107229420B (en) 2020-05-26

Family

ID=59933455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710392823.7A Active CN107229420B (en) 2017-05-27 2017-05-27 Data storage method, reading method, deleting method and data operating system

Country Status (1)

Country Link
CN (1) CN107229420B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009025A (en) * 2017-12-13 2018-05-08 北京小米移动软件有限公司 Date storage method and device
CN108197159A (en) * 2017-12-11 2018-06-22 厦门集微科技有限公司 Digital independent, wiring method and device based on distributed file system
CN109086172A (en) * 2018-09-21 2018-12-25 郑州云海信息技术有限公司 A kind of method and relevant apparatus of data processing
CN110399340A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 A kind of document handling method and device
CN111090620A (en) * 2019-12-06 2020-05-01 浪潮电子信息产业股份有限公司 File storage method, device, equipment and readable storage medium
CN111258502A (en) * 2020-01-13 2020-06-09 深信服科技股份有限公司 Data deleting method, device, equipment and computer readable storage medium
WO2020140622A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Distributed storage system, storage node device and data duplicate deletion method
CN112099725A (en) * 2019-06-17 2020-12-18 华为技术有限公司 Data processing method and device and computer readable storage medium
CN113467721A (en) * 2021-07-22 2021-10-01 杭州海康威视数字技术股份有限公司 Data deleting system, method and device
CN113721836A (en) * 2021-06-15 2021-11-30 荣耀终端有限公司 Data deduplication method and device
CN113885785A (en) * 2021-06-15 2022-01-04 荣耀终端有限公司 Data deduplication method and device
CN114442961A (en) * 2022-02-07 2022-05-06 苏州浪潮智能科技有限公司 Data processing method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908073A (en) * 2010-08-13 2010-12-08 清华大学 Method for deleting duplicated data in file system in real time
CN101917396A (en) * 2010-06-25 2010-12-15 清华大学 Real-time repetition removal and transmission method for data in network file system
CN102629247A (en) * 2011-12-31 2012-08-08 成都市华为赛门铁克科技有限公司 Method, device and system for data processing
CN103154950A (en) * 2012-05-04 2013-06-12 华为技术有限公司 Repeated data deleting method and device
CN106406759A (en) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 Data storage method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101917396A (en) * 2010-06-25 2010-12-15 清华大学 Real-time repetition removal and transmission method for data in network file system
CN101908073A (en) * 2010-08-13 2010-12-08 清华大学 Method for deleting duplicated data in file system in real time
CN102629247A (en) * 2011-12-31 2012-08-08 成都市华为赛门铁克科技有限公司 Method, device and system for data processing
CN103154950A (en) * 2012-05-04 2013-06-12 华为技术有限公司 Repeated data deleting method and device
CN106406759A (en) * 2016-09-13 2017-02-15 郑州云海信息技术有限公司 Data storage method and device

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197159A (en) * 2017-12-11 2018-06-22 厦门集微科技有限公司 Digital independent, wiring method and device based on distributed file system
CN108197159B (en) * 2017-12-11 2020-07-10 厦门集微科技有限公司 Data reading and writing method and device based on distributed file system
CN108009025A (en) * 2017-12-13 2018-05-08 北京小米移动软件有限公司 Date storage method and device
CN109086172A (en) * 2018-09-21 2018-12-25 郑州云海信息技术有限公司 A kind of method and relevant apparatus of data processing
WO2020140622A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Distributed storage system, storage node device and data duplicate deletion method
CN112099725A (en) * 2019-06-17 2020-12-18 华为技术有限公司 Data processing method and device and computer readable storage medium
US11797204B2 (en) 2019-06-17 2023-10-24 Huawei Technologies Co., Ltd. Data compression processing method and apparatus, and computer-readable storage medium
CN110399340A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 A kind of document handling method and device
CN111090620A (en) * 2019-12-06 2020-05-01 浪潮电子信息产业股份有限公司 File storage method, device, equipment and readable storage medium
WO2021109587A1 (en) * 2019-12-06 2021-06-10 浪潮电子信息产业股份有限公司 File storage method and apparatus, and device and readable storage medium
CN111090620B (en) * 2019-12-06 2022-04-22 浪潮电子信息产业股份有限公司 File storage method, device, equipment and readable storage medium
CN111258502A (en) * 2020-01-13 2020-06-09 深信服科技股份有限公司 Data deleting method, device, equipment and computer readable storage medium
CN113721836A (en) * 2021-06-15 2021-11-30 荣耀终端有限公司 Data deduplication method and device
CN113885785A (en) * 2021-06-15 2022-01-04 荣耀终端有限公司 Data deduplication method and device
CN113885785B (en) * 2021-06-15 2022-07-26 荣耀终端有限公司 Data deduplication method and device
CN113467721A (en) * 2021-07-22 2021-10-01 杭州海康威视数字技术股份有限公司 Data deleting system, method and device
CN114442961A (en) * 2022-02-07 2022-05-06 苏州浪潮智能科技有限公司 Data processing method and device, computer equipment and storage medium
CN114442961B (en) * 2022-02-07 2023-08-08 苏州浪潮智能科技有限公司 Data processing method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN107229420B (en) 2020-05-26

Similar Documents

Publication Publication Date Title
CN107229420A (en) Date storage method, read method, delet method and data operation system
US10776396B2 (en) Computer implemented method for dynamic sharding
CN102246137B (en) Delta compression after the deletion of identity copy
US7117294B1 (en) Method and system for archiving and compacting data in a data storage array
CN102693302B (en) Quick file comparison method, system and client side
US8719237B2 (en) Method and apparatus for deleting duplicate data
CN107220005A (en) A kind of data manipulation method and system
CN102968498A (en) Method and device for processing data
CN103959256A (en) Fingerprint-based data deduplication
US20180004786A1 (en) Incremental bloom filter rebuild for b+ trees under multi-version concurrency control
US10862672B2 (en) Witness blocks in blockchain applications
CN105009067A (en) Managing operations on stored data units
CN108090125B (en) Non-query type repeated data deleting method and device
US11620065B2 (en) Variable length deduplication of stored data
CN112612576B (en) Virtual machine backup method and device, electronic equipment and storage medium
CN104036187A (en) Method and system for determining computer virus types
CN105493080B (en) The method and apparatus of data de-duplication based on context-aware
CN104246718A (en) Segment combining for deduplication
CN113767378A (en) File system metadata deduplication
US10579586B1 (en) Distributed hash object archive system
KR20140050999A (en) Device and method of data compression and computer-readable recording medium thereof
CN109947776B (en) Data compression and decompression method and device
US10949088B1 (en) Method or an apparatus for having perfect deduplication, adapted for saving space in a deduplication file system
CN115543918A (en) File snapshot method, system, electronic equipment and storage medium
CN104866535A (en) Compression method and device of number segment records

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200428

Address after: 215100 No. 1 Guanpu Road, Guoxiang Street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 450018 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant