CN112817962B - Data storage method and device based on object storage and computer equipment - Google Patents

Data storage method and device based on object storage and computer equipment Download PDF

Info

Publication number
CN112817962B
CN112817962B CN202110280863.9A CN202110280863A CN112817962B CN 112817962 B CN112817962 B CN 112817962B CN 202110280863 A CN202110280863 A CN 202110280863A CN 112817962 B CN112817962 B CN 112817962B
Authority
CN
China
Prior art keywords
data
file
fingerprint
stored
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110280863.9A
Other languages
Chinese (zh)
Other versions
CN112817962A (en
Inventor
娄永杰
马立珂
王贤达
王子骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Dingjia Computer Technology Co ltd
Original Assignee
Guangzhou Dingjia Computer Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Dingjia Computer Technology Co ltd filed Critical Guangzhou Dingjia Computer Technology Co ltd
Priority to CN202110280863.9A priority Critical patent/CN112817962B/en
Publication of CN112817962A publication Critical patent/CN112817962A/en
Application granted granted Critical
Publication of CN112817962B publication Critical patent/CN112817962B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Abstract

The application relates to a data storage method and device based on object storage, computer equipment and a storage medium. Dividing data to be stored into at least one data block by a preset blocking strategy through acquiring a data file and an index file, sending a data fingerprint of each data block to a duplication elimination server, inquiring whether the data fingerprint exists by the duplication elimination server and returning an inquiry result of a client, if not, storing the data block corresponding to the data fingerprint into the data file by the client, and sending a storage position to the duplication elimination server; if so, writing the data fingerprint into the index file, and sending at least one of the data file and the index file to the object storage for storage. Compared with the traditional method for deleting and storing data by using local storage equipment when the data is stored, the method and the device for storing the data determine the type of the storage file of the data to be stored by using the data fingerprint, store the data by using object storage, and improve the storage performance of the data storage due to the characteristics of high reliability and elastic expansion of the object storage.

Description

Data storage method and device based on object storage and computer equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data storage method and apparatus based on object storage, a computer device, and a storage medium.
Background
A database is a data set for storing data, various important data are generally stored in the database, and the data stored in the database generally have repeated data, and when the data is stored, if the repeated data is too much, the utilization efficiency of a storage space is reduced, so that the repeated data is generally required to be deleted. At present, when repeated data exists in data, data deletion is usually performed directly in a local storage device, for example, a local hard disk, however, this method is easily limited by the size of the hard disk space, resulting in low performance of data storage.
Therefore, the current data storage method has the defect of low storage performance.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data storage method, apparatus, computer device and storage medium based on object storage, which can improve data storage performance.
A data storage method based on object storage is applied to a client, and the method comprises the following steps:
sending a data file request to a deduplication server in response to a data storage request for data to be stored; the data file request is used for indicating the deduplication server to generate a corresponding file to be stored according to the data to be stored; the data comprises at least one data block; the file to be stored comprises at least one file type;
dividing the data to be stored into at least one data block according to a preset blocking strategy;
for each data block, acquiring a data fingerprint corresponding to the data block, and sending the data fingerprint to the deduplication server; the duplication eliminating server is used for inquiring whether the data fingerprint exists or not and returning an inquiry result;
storing the data block and/or the data fingerprint to a file to be stored of a corresponding file type according to the query result;
and sending the file to be stored to an object storage for storage.
In one embodiment, the file types of the file to be stored include: the method comprises the steps of storing a data file to be stored and an index file to be stored; the data file to be stored is used for storing the data to be stored; the index file to be stored is used for storing data fingerprints corresponding to the data to be stored;
the storing the data block and/or the data fingerprint to a file to be stored of a corresponding file type according to the query result includes:
if the query result is negative, storing the data block corresponding to the data fingerprint into the data file, writing the data fingerprint into the index file, and sending the storage position of the data block in the data file to the duplication elimination server, otherwise, writing the data fingerprint into the index file.
In one embodiment, after storing the data block corresponding to the data fingerprint in the data file, the method further includes:
if the residual storage space of the data file is smaller than the data size of the data block in the data to be stored, sending the data file to the object storage for storage;
sending the storage location and a new data file request to the deduplication server; and the duplication eliminating server is used for storing the storage position into metadata corresponding to the data file and sending a new data file to be stored to the client.
In one embodiment, the method further comprises the following steps:
responding to a data reading instruction, and acquiring an index file corresponding to the data reading instruction from the object storage;
acquiring at least one data fingerprint according to the index file, and sending the at least one data fingerprint to the duplication removing server; the duplication removing server is used for inquiring the position of a data block corresponding to the data fingerprint according to the at least one data fingerprint and returning;
and acquiring a corresponding data block from the object storage according to the position returned by the duplication removing server.
In one embodiment, the method further comprises the following steps:
sending a data change instruction to the deduplication server, and acquiring an index file to be changed from the object storage; the duplication eliminating server is used for generating a data file to be changed according to the data change instruction and returning the data file to be changed; the data change instruction comprises a change position;
acquiring the data fingerprint to be modified corresponding to the modification position in the index file to be modified, and sending the data fingerprint to be modified to the deduplication server; the duplication eliminating server is used for inquiring the position of the corresponding data block to be changed according to the data fingerprint to be changed and returning;
acquiring a corresponding data block to be modified from the object storage according to the position of the data block to be modified, and modifying the data block to be modified according to the modification position to obtain a modified data block;
generating a corresponding modified data fingerprint according to the modified data block;
inquiring the index file to be modified according to the modified data fingerprint, if the modified data fingerprint is not referred by any data block, storing the modified data fingerprint to the index file to be modified to obtain a modified index file, and sending the modified data fingerprint to the duplication elimination server; the duplication eliminating server is used for inquiring whether the changed data fingerprint exists or not and returning an inquiry result to the client;
if the query result is negative, writing the modified data block into the data file to be modified to obtain a modified data file;
and storing the changed index file and the changed data file to the object storage.
A data storage method based on object storage is applied to a deduplication server, and comprises the following steps:
responding to a data file request sent by a client, and generating a corresponding data file to be stored and an index file to be stored;
acquiring a data fingerprint which is sent by the client and generated aiming at the at least one data block, and inquiring whether the data fingerprint exists in a fingerprint database;
if yes, returning a query result to the client, and updating the reference number of the data fingerprint in the fingerprint database according to the data fingerprint;
if not, returning a query result to the client, acquiring a storage position of a data block corresponding to the data fingerprint in the data file sent by the client, and storing the storage position and the data fingerprint in the fingerprint database.
In one embodiment, the method further comprises the following steps:
acquiring a deleting instruction sent by the client; the deleting instruction comprises storage file information of data to be deleted;
acquiring an index file corresponding to the data to be deleted from an object storage, and acquiring a data fingerprint to be deleted corresponding to a data block to be deleted in the data to be deleted from the index file;
deleting the reference relation of the data fingerprint to be deleted to the data block to be deleted, and deleting the index file corresponding to the data to be deleted;
further comprising:
if any data block to be deleted does not have a reference relation with the data fingerprint to be deleted, acquiring a target data file corresponding to the data block to be deleted, and subtracting one from the count of effective data blocks in the target data file;
further comprising:
and if the count of the effective data blocks in the target data file is zero, deleting the information of the target data file in the metadata corresponding to the target data file.
In one embodiment, the method further comprises the following steps:
acquiring a data file to be recovered, wherein the ratio of redundant data in the data file is greater than a preset value; the data file to be recycled comprises at least one data block; the redundant data proportion represents the proportion of data blocks which do not have reference relation with any data fingerprint in the data file to all data blocks in the data file;
acquiring a data fingerprint corresponding to each data block, and judging the reference number corresponding to the data fingerprint according to the data fingerprint; the reference number represents the number of data blocks in association with the data fingerprint;
if the reference number is not zero, determining that the data block is an effective data block, writing the effective data block into a new data file, updating a data fingerprint corresponding to the effective data block according to the storage position of the effective data block in the new data file, and obtaining an updated data file according to at least one effective data block;
if the reference number is zero, determining that the data block is a redundant data block, deleting a data fingerprint corresponding to the redundant data block from an index file corresponding to the redundant data block, and obtaining a recovered data file according to at least one redundant data block;
and storing the updated data file into the object storage, and deleting the recovered data file.
An object storage based data storage device applied to a client, the device comprising:
the sending module is used for responding to a data storage request aiming at the data to be stored and sending a data file request to the deduplication server; the data file request is used for indicating the deduplication server to generate a corresponding data file to be stored and an index file to be stored according to the data to be stored; the data comprises at least one data block; the data file to be stored is used for storing the data to be stored; the index file to be stored is used for storing data fingerprints corresponding to the data to be stored;
the dividing module is used for dividing the data to be stored into at least one data block according to a preset partitioning strategy;
the acquisition module is used for acquiring a data fingerprint corresponding to each data block and sending the data fingerprint to the duplication elimination server; the duplication eliminating server is used for inquiring whether the data fingerprint exists or not and returning an inquiry result;
the judging module is used for storing the data blocks corresponding to the data fingerprints into the data file if the query result is negative, and sending the storage positions of the data blocks in the data file to the duplication removing server, otherwise, writing the data fingerprints into the index file;
and the storage module is used for sending the data file and/or the index file to an object storage for storage.
An object storage based data storage device applied to a deduplication server, the device comprising:
the generating module is used for responding to a data file request sent by a client and generating a corresponding data file to be stored and an index file to be stored;
the acquisition module is used for acquiring a data fingerprint which is sent by the client and generated aiming at the at least one data block, and inquiring whether the data fingerprint exists in a fingerprint database;
the first determining module is used for returning a query result to the client if the data fingerprint exists, and updating the reference number of the data fingerprint in a fingerprint database according to the data fingerprint;
and if not, returning a query result to the client, acquiring a storage position of the data block corresponding to the data fingerprint in the data file sent by the client, and storing the storage position and the data fingerprint in the fingerprint database.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the data storage method, the data storage device, the computer equipment and the storage medium based on the object storage, when a data storage request is received, the data file to be stored and the index file to be stored, which are generated by the deduplication server aiming at the data to be stored, are acquired, the data to be stored are divided into at least one data block according to a preset blocking strategy, the data fingerprint corresponding to each data block is sent to the deduplication server, the deduplication server can inquire whether the data fingerprint exists or not and return an inquiry result to the client, if the inquiry result is negative, the client can store the data block corresponding to the data fingerprint into the data file, and the storage position is sent to the deduplication server; and if so, writing the data fingerprint into the index file, and sending at least one of the data file and the index file to the object storage for storage. Compared with the traditional mode of deleting and storing data by using local storage equipment when storing data, the scheme determines the storage file type of the data to be stored by using the data fingerprint and stores the data by using object storage.
Drawings
FIG. 1 is a diagram of an application environment of a data storage method based on object storage in one embodiment;
FIG. 2 is a block diagram of a deduplication server in one embodiment;
FIG. 3 is a diagram illustrating an exemplary structure of an object store;
FIG. 4 is a schematic flow chart diagram illustrating a method for storing data based on object storage according to an embodiment;
FIG. 5 is a flowchart illustrating the step of reading data according to one embodiment;
FIG. 6 is a schematic flow chart diagram illustrating the step of modifying data in one embodiment;
FIG. 7 is a schematic flow chart diagram illustrating a method for storing data based on object storage in one embodiment;
FIG. 8 is a flow diagram that illustrates the step of deleting data in one embodiment;
FIG. 9 is a flowchart illustrating the step of recovering redundant data in one embodiment;
FIG. 10 is a flowchart illustrating a data storage method based on object storage according to another embodiment;
FIG. 11 is a flow chart illustrating a data storage method based on object storage in yet another embodiment;
FIG. 12 is a flowchart illustrating a data storage method based on object storage in accordance with still another embodiment;
FIG. 13 is a block diagram of an object-based storage data store in one embodiment;
FIG. 14 is a block diagram of an object-based storage data store in accordance with another embodiment;
FIG. 15 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The data storage method based on object storage provided by the application can be applied to the application environment shown in fig. 1. Among them, the client 102, the deduplication server 104, and the object store 106 may communicate with each other through a network, for example, a cloud service, for example, through various communication protocols including, but not limited to, TCP, HTTP, and the like, and the client 102 and the deduplication server 104 perform operations on objects on the object store 106 using a standard API interface provided by the object store 106. The client 102 may obtain a data storage request for data to be stored, and upon receiving the request, a data file request is sent to deduplication server 104, and deduplication server 104 may, upon receiving the data file request, generating files to be stored corresponding to the data to be stored, including data files to be stored and index files to be stored, and sends the generated data file and index file to the client 102, the client 102 may divide the data to be stored into at least one data block, and for each data block, obtain the corresponding data fingerprint, send the data fingerprint to the deduplication server 104 for querying, and according to the query result of the duplicate removal server 104, the data block and the data fingerprint are stored to the file to be stored of the corresponding file type, or the data fingerprint is stored to the corresponding file to be stored, and the client 102 may also send the file to be stored to the object storage 106 for storage. The client 102 is matched with the deduplication server 104 to perform deduplication processing on data to be stored, the data with the deduplication data deleted can be stored in the object storage 106, the client 102 is an initiator of data access, the deduplication server 104 is responsible for fingerprint management and data object management, fingerprint query and matching services are provided for the client 102, and the object storage 106 is responsible for storing actual data and index data.
In one embodiment, the structure of the deduplication server 104 may be as shown in fig. 2, and fig. 2 is a schematic structural diagram of the deduplication server in one embodiment. The deduplication server 104 may include structures such as indexing services, data object management, reclamation services, and fingerprint/metadata storage. In one embodiment, the structure of object store 106 may be as shown in FIG. 3, where FIG. 3 is a schematic diagram of the structure of the object store in one embodiment. A storage structure including a data object data, which may be an object formed from a data file, and an index object index, which may be an object formed from an index file. The client 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, and tablet computers, and the deduplication server 104 and the object storage 106 may be implemented by independent servers or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 4, a data storage method based on object storage is provided, which is described by taking the method as an example applied to the client in fig. 1, and includes the following steps:
step S202, in response to a data storage request for data to be stored, sending a data file request to the deduplication server 104; the data file request is used for indicating the deduplication server 104 to generate a corresponding file to be stored according to the data to be stored; the data comprises at least one data block; the file to be stored comprises at least one file type.
The data storage request may be a request triggered when data needs to be stored, and the request may be triggered by a user or a system according to a certain rule. The client 102 may obtain data to be stored when querying the data storage request, and send a data file request to the deduplication server 104. The deduplication server 104 may receive the data file request, and the data file request may include information related to the data to be stored, such as a name of the data, and the deduplication server 104 generates a file to be stored corresponding to the data to be stored. Specifically, when the client 102 needs to store data or files, the client 102 may apply for a data object name from the deduplication server 104, where the data object name may be a writable object name, and the deduplication server 104 may ensure that one data object name is only assigned to one client 102. The file to be stored includes at least one file type, for example, may include a data file to be stored, and may also include an index file to be stored, and the data to be stored may be in a data block format, and the data to be stored includes at least one data block. The data file to be stored is used for storing data to be stored; the index file to be stored is used for storing data fingerprints corresponding to the data to be stored; the data fingerprint may refer to a fingerprint of the data block, and may be used to characterize information such as a storage location and a reference number of the data block. After receiving the file to be stored, the client 102 may locally open the file to be stored, such as opening the data file and the index file, or opening the index file, so as to cache the data blocks and the index data that need to be uploaded to the object store 106.
As shown in fig. 2, the deduplication server 104 may include a plurality of modules, such as an index service module, a data object management module, a recycling service module, a fingerprint/metadata storage module, and the like. The index service module can provide fingerprint query, matching and submission services for the client 102; the data object management module may be configured to generate and allocate an available data file for writing the data to be stored to the client 102; the recovery service module can be used for recovering redundant data blocks and data files; the fingerprint/metadata storage module may be configured to store data fingerprints and other metadata, where a data fingerprint includes a fingerprint of a data chunk, location information of the data chunk storage, the location including a data object in object store 106 where the data chunk is stored and an offset of the data chunk in the data object, reference data of the data chunk, and the like; other metadata includes necessary information about data object management, such as the number of data in a data block file, the number of valid data blocks, and the like.
Step S204, dividing the data to be stored into at least one data block according to a preset block dividing strategy.
The data to be stored may be data that needs to be stored in the object storage 106, the data to be stored may include at least one data block, and the client 102 needs to extract the data block from the data to be stored, so as to store the data block. For example, the client 102 may divide the data to be stored into at least one data block in the raw data center according to a preset block dividing policy. Such that client 102 may store data based on at least one data block.
Step S206, aiming at each data block, acquiring a data fingerprint corresponding to the data block, and sending the data fingerprint to the duplication elimination server 104; the deduplication server 104 is used for querying whether a data fingerprint exists and returning a query result.
The data fingerprint may be a fingerprint corresponding to data to be stored, for example, a fingerprint corresponding to each data block in the data to be stored, and the client 102 may acquire the data fingerprint corresponding to the data block extracted from the data to be stored, for example, calculate the data fingerprint corresponding to the data block through a fingerprint algorithm. The client 102 may send the data fingerprint to the deduplication server 104, and the deduplication server 104 may query, after receiving the data fingerprint sent by the client 102, whether the data fingerprint corresponding to the data chunk exists in the deduplication server 104, for example, the deduplication server 104 may query, through an index service module therein, whether the data fingerprint sent by the client 102 exists in a fingerprint/metadata storage module, and return a query result, for example, whether the data fingerprint exists or does not exist, to the client 102. In addition, after the client 102 sends the data fingerprint of the data block to the deduplication server 104, the deduplication server 104 may also temporarily store the data fingerprint.
And S208, storing the data block and/or the data fingerprint to a file to be stored of a corresponding file type according to the query result.
The query result may be a query result of whether the data fingerprint corresponding to the data block exists by the deduplication server 104. The client 102 may store the data block and the data fingerprint to a corresponding file to be stored according to the query result. For example, in one embodiment, storing the data blocks and/or the data fingerprints to the file to be stored of the corresponding file type according to the query result includes: if the query result is negative, storing the data block corresponding to the data fingerprint into the data file, writing the data fingerprint into the index file, and sending the storage position of the data block in the data file to the duplication elimination server 104, otherwise, writing the data fingerprint into the index file. In this embodiment, when the query result is no, that is, when the data fingerprint does not exist, the client 102 may store the data block corresponding to the data to be stored in the data file, write the data fingerprint in the index file, and may also send the storage location of the data block in the data file to the deduplication server 104; if the query result of the client 102 is yes, that is, the fingerprint already exists, the client 102 may write the data fingerprint into the index file, and continue to acquire and query the next data block until the data block is completely acquired. Specifically, if the client 102 receives the query result that the fingerprint does not exist in the server, the client 102 may write the data block into the data file, and record the location of the data block in the data file. If the query result is the existence, the client 102 may write the information of the data fingerprint into the index file, and continue to obtain the next data block in the data to be stored.
The capacity of the data file may be limited, and when the size of the data file reaches a preset size, for example, the remaining space is not enough to store the next data block, the client 102 may perform the retrieving of the data file and the storing of the data file. For example, in one embodiment, after storing the data block corresponding to the data fingerprint in the data file, the method further includes: if the remaining storage space of the data file is smaller than the data size of the data block in the data to be stored, the data file is sent to the object storage 106 for storage; sending a storage location and a new data file request to deduplication server 104; the deduplication server 104 is configured to store the storage location in the metadata corresponding to the data file, and send a new data file to be stored to the client. In this embodiment, the client 102 may store the data block into the data file when the data fingerprint does not exist, and when the data file reaches a preset size, for example, the remaining capacity of the data file is not enough to store a next data block, the client 102 may upload the used data file to a specific location in the object storage 106, generate a data object, and send related information of the data file to the deduplication server 104, the deduplication server 104 may record information of the data object into metadata, that is, the metadata corresponds to the data file, and the deduplication server 104 may temporarily store a fingerprint list included in the data object, where the fingerprint list includes a plurality of data fingerprints. In addition, the deduplication server 104 may generate a new data file to be stored according to the data file request sent by the client 102, and send the new data file to the client 102, so that the client 102 may write data blocks in the new data file.
Step S210, sending the file to be stored to the object storage 106 for storage.
The file to be stored may be a file in which the client 102 stores data blocks and data fingerprints in the data to be stored, and the file to be stored may include a data file and an index file. Client 102 may send the file to be stored to object store 106 for storage. For example, after all the data to be stored are processed, the client 102 may close and upload the last data file to the object storage 106 for storage, and send the relevant information of the data object corresponding to the data file to the deduplication server 104, and the client 102 may close the index file and upload the index file to a specific location in the object storage 106, thereby implementing storage of the data file and the index file in the object storage 106. After receiving the related information of the last data file, the deduplication server 104 may record the information to the metadata corresponding to the data block, and the deduplication server 104 may update the reference number of the fingerprint in the fingerprint/metadata storage module with the existing fingerprint; and insert a new fingerprint record in the fingerprint/metadata storage module for the non-existing fingerprint.
In addition, as shown in fig. 3, the object store 106 may be configured to store the data after deduplication and the index data of the data, and the specific storage structure of the object store 106 may include storage structures of the data object data and the index object index. When the client 102 writes data, the generated data object can be stored under the data/prefix, and the object names are uniformly distributed by the deduplication server 104; the index data corresponding to the data to be stored is stored as an index object under the index/prefix, and the name of the index object may use the name of the data to be stored, or may be automatically generated according to a certain rule or distributed by the deduplication server 104. It should be noted that fig. 3 is only one implementation structure in which the actual data and the index data are stored in the object store 106, and other storage structures may be adopted as needed in actual use.
In the data storage method based on object storage, when a data storage request is received, a data file to be stored and an index file to be stored, which are generated by a deduplication server for the data to be stored, are acquired, the data to be stored is divided into at least one data block according to a preset blocking strategy, a data fingerprint corresponding to each data block is sent to the deduplication server, the deduplication server can inquire whether the data fingerprint exists or not and return an inquiry result to a client, and if the inquiry result is negative, the client can store the data block corresponding to the data fingerprint into the data file and send a storage position to the deduplication server; and if so, writing the data fingerprint into the index file, and sending at least one of the data file and the index file to the object storage for storage. Compared with the traditional mode of deleting and storing data by using local storage equipment when storing data, the scheme determines the storage file type of the data to be stored by using the data fingerprint and stores the data by using object storage.
In one embodiment, the method further comprises: responding to the data reading instruction, and acquiring an index file corresponding to the data reading instruction from the object storage; acquiring at least one data fingerprint according to the index file, and sending the at least one data fingerprint to a duplicate removal server; the duplication removing server is used for inquiring the position of the data block corresponding to the data fingerprint according to the at least one data fingerprint and returning; and acquiring the corresponding data block from the object storage according to the position returned by the duplicate removal server.
In this embodiment, the client 102 may store data to be stored, for example, in the object store 106, and may read the stored data. As shown in FIG. 5, FIG. 5 is a flow chart illustrating the step of reading data according to one embodiment. The data reading instruction may be an instruction triggered by a user and used to read data from the object storage 106, the client 102 may receive the data reading instruction, where the data reading instruction may include related information of the data to be read, such as index information of the data to be read, and the like, the client 102 may obtain an index file corresponding to the data reading instruction from the object storage 106, the client 102 may parse the index file to obtain at least one data fingerprint stored therein, the client 102 may send the at least one data fingerprint obtained from the index file to the deduplication server 104, the deduplication server 104 may query a location of a data block corresponding to the data fingerprint and return the queried location according to the at least one fingerprint sent by the client 102, where the client 102 may read the data fingerprints one by one and send the data fingerprint to the deduplication server 104 for querying, or the complete reading and the sending to the deduplication server 104 for query. After obtaining the location of the data to be read according to the data fingerprint, the client 102 may read the data block at the corresponding location from the object storage 106 according to the location, thereby implementing the reading of the data. Specifically, when the client 102 needs to read a file, the client may download an index file of the file to be read from the object storage 106 according to the data reading instruction, and parse index data therein to obtain a data block fingerprint information list, and the client 102 may query the deduplication server 104 for data blocks used by the file to be read, using data fingerprints of the data blocks, and obtain an actual storage location of the data blocks in the object storage 106, where the storage location includes names of the data objects and storage locations of the data blocks in the data objects. So that client 102 can read the data block from object store 106 according to the storage location.
In addition, when the client 102 needs to read data from a random position, the client 102 may skip the data block before the offset from the index file according to the read position offset, and start reading data from the data block starting with the valid data.
With the present embodiment, the client 102 can store data into the object store 106 by using the data file and the index file, and can read data from the object store 106 by using the index file, thereby improving the performance of data storage and reading.
In one embodiment, further comprising: sending a data change instruction to the deduplication server 104, and acquiring an index file to be changed from the object storage 106; the deduplication server 104 is used for generating a data file to be changed according to the data change instruction and returning the data file; the data change instruction comprises a change position; acquiring a data fingerprint to be modified corresponding to the modification position in the index file to be modified, and sending the data fingerprint to be modified to the deduplication server 104; the duplication elimination server 104 is used for inquiring and returning the position of the corresponding data block to be changed according to the data fingerprint to be changed; acquiring a corresponding data block to be modified from the object storage 106 according to the position of the data block to be modified, and modifying the data block to be modified according to the modification position to obtain a modified data block; generating a corresponding modified data fingerprint according to the modified data block; inquiring the index file to be modified according to the modified data fingerprint, if the modified data fingerprint is not referred by any data block, storing the modified data fingerprint to the index file to be modified to obtain the modified index file, and sending the modified data fingerprint to the duplication elimination server 104; the deduplication server 104 is configured to query whether the changed data fingerprint already exists, and return a query result to the client; if the query result is negative, writing the modified data block into the data file to be modified to obtain the modified data file; the modified index file and the modified data file are stored to object store 106.
In this embodiment, the client 102 may write data into a data file and write a data fingerprint into an index file, and store the data in the object store 106, thereby implementing data storage. The client 102 may also make changes to data that is already stored. As shown in FIG. 6, FIG. 6 is a flow diagram illustrating the steps of modifying data in one embodiment. Client 102 may receive a data change instruction, where the change instruction may include a change location of data to be changed, and client 102 may send the data change instruction to deduplication server 104 while obtaining an index file to be changed from object store 106. The deduplication server 104 may generate a data file to be modified according to the data modification instruction and return the data file to the client 102. Specifically, when the client 102 starts rewriting the file, it may apply for the data file from the deduplication server 104, obtain the data file returned by the deduplication server 104, and open the data file locally to be written. Wherein the data file may be a file for storing the changed data.
The client 102 may further obtain an index file to be modified, and obtain a data fingerprint to be modified from the index file, for example, the client 102 may download an index object from the object storage 106, and parse data in the index file from the index file, and obtain a data fingerprint to be modified corresponding to a modification position therein, specifically, the client 102 may obtain fingerprint information of a data block to be modified from the obtained index file according to a current position to be modified, and query the deduplication server 104 for related information of the data block using the data block fingerprint. Such as location information of the data chunk in object store 106, deduplication server 104 may query the location of the data chunk to be modified based on the data fingerprint sent by client 102 and send the location of the data chunk to client 102.
After obtaining the location of the data block to be modified through the deduplication server 104, the client 102 may obtain the corresponding data block to be modified from the object store 106 by using the location of the data block, and modify the data block to be modified according to the location to be modified, for example, rewrite data from the location to be modified, thereby obtaining the modified data block.
The client 102 may also store the changed data. The client 102 may generate a corresponding modified data fingerprint from the modified data chunk. For example, the client 102 may calculate the fingerprint of the rewritten data block by using the above fingerprint algorithm when the end of the pending data block is rewritten or the rewriting is completed. The index file may record data blocks used in the data file, a fingerprint list of the data blocks, and relative positions of the data blocks in the original file, and the fingerprint database in the deduplication server 104 records fingerprints, actual storage locations, and reference information of the data blocks; when a file is deleted, the client 102 may operate the number of references of each data block in the fingerprint database minus 1 one by one according to the fingerprint list corresponding to the data block recorded in the index file, when the number of references of a certain data block is 0, it indicates that the data block has no file to use, and may delete the data block from the data file to release the physical storage space, and when all data blocks referenced by the file are processed as above, the index data corresponding to the file may be deleted. After the client 102 obtains the data block fingerprint, it may search from the index file whether the data fingerprint has been referred by any data block, and if the data fingerprint has not been referred by any data block, the client 102 may store the changed data fingerprint to the index file to be changed, that is, store the new data fingerprint to the new index file, thereby obtaining the changed index file; the client 102 may also send the changed data fingerprint to the deduplication server 104, and the deduplication server 104 may temporarily store the data fingerprint, query whether the fingerprint already exists by using the data fingerprint, and return a query result to the client 102. The client 102 may receive the query result from the deduplication server 104, and when the query result indicates that the fingerprint does not exist, the client 102 may write the modified data block into the data file to be modified, that is, store the rewritten data block into a new data file, thereby obtaining the modified data file, and the client 102 may also store the modified data file and the modified index file into the object storage 106, thereby implementing rewriting and modifying of data. In addition, because the size of the data file is limited, when the size of the data file reaches a preset size, the client 102 may upload the modified data file to the object storage 106 and send the related information of the data file to the deduplication server 104, and if there is data that needs to be rewritten or modified, the deduplication server 104 sends a new data file to be modified to the client 102, so that the client 102 may store the rewritten data block in the new data file to be modified.
In addition, the client 102 may also update the data fingerprint information corresponding to the changed data block to a corresponding position in the changed index file. When the change is completed, the client 102 sends the changed index file to the object store 106 for storage. When the above-described changes are complete, the client 102 may submit the no-longer-referenced data fingerprint to the deduplication server 104, and the deduplication server 104 may receive information about the no-longer-referenced data fingerprint and release the no-longer-referenced data fingerprint.
Through the embodiment, the client 102 can obtain the data to be modified from the object storage 106, store the modified actual data by using the data file, store the modified fingerprint information by using the index file, and store the data file and the index file in the object storage 106 for storage.
In one embodiment, as shown in fig. 7, fig. 7 is a flowchart illustrating a data storage method based on object storage in one embodiment. The data storage method based on object storage is provided, and is described by taking the method applied to the deduplication server in fig. 1 as an example, and comprises the following steps:
step S302, in response to the client 102 sending a data file request, generating a corresponding data file to be stored and an index file to be stored.
Step S304, acquiring a data fingerprint generated for at least one data block sent by the client 102, and querying the fingerprint database for whether the data fingerprint exists.
Step S306, if yes, returning the query result to the client 102, and updating the reference number of the data fingerprint in the fingerprint database according to the data fingerprint.
Step S308, if not, returning a query result to the client 102, acquiring the storage position of the data block corresponding to the data fingerprint in the data file sent by the client 102, and storing the storage position and the data fingerprint in the fingerprint database.
For the limitation of the data storage method based on object storage in this embodiment, reference may be made to the above-mentioned limitation of the data storage method based on object storage, which is taken as an example by the client 102, and details are not repeated here.
In the data storage method based on object storage, when a data storage request is received, a data file to be stored and an index file to be stored, which are generated by a deduplication server for the data to be stored, are acquired, the data to be stored is divided into at least one data block according to a preset blocking strategy, a data fingerprint corresponding to each data block is sent to the deduplication server, the deduplication server can inquire whether the data fingerprint exists or not and return an inquiry result to a client, and if the inquiry result is negative, the client can store the data block corresponding to the data fingerprint into the data file and send a storage position to the deduplication server; and if so, writing the data fingerprint into the index file, and sending at least one of the data file and the index file to the object storage for storage. Compared with the traditional mode of deleting and storing data by using local storage equipment when storing data, the scheme determines the storage file type of the data to be stored by using the data fingerprint and stores the data by using object storage.
In one embodiment, the method further comprises: acquiring a deletion instruction sent by the client 102; the deleting instruction comprises storage file information of the data to be deleted; acquiring an index file corresponding to the data to be deleted from the object storage 106, and acquiring a data fingerprint to be deleted corresponding to a data block to be deleted in the data to be deleted from the index file; deleting the reference relation of the data fingerprint to be deleted to the data block to be deleted, and deleting the index file corresponding to the data to be deleted.
In this embodiment, as shown in fig. 8, fig. 8 is a schematic flow chart of a step of deleting data in an embodiment. The client 102 may delete data that has already been stored, such as by the deduplication server 104. When the client 102 needs to delete data, a deletion instruction including information of a stored file of the data to be deleted may be sent to the deduplication server 104, and the deduplication server 104 may be requested to delete the file. After receiving the deletion instruction, the deduplication server 104 may obtain an index file corresponding to the data to be deleted from the object storage 106, and parse the index file, so as to obtain a data fingerprint corresponding to a data block to be deleted in the data to be deleted, for example, a data block fingerprint information list referred by the data file to be deleted. The deduplication server 104 may delete the reference relationship between the data fingerprint corresponding to the data chunk to be deleted and the data chunk. For example, the deduplication server 104 may subtract 1 from the reference number of the data block to be deleted in the data fingerprint list, so as to update the fingerprint database, and since one fingerprint may have multiple reference numbers, each reference number represents that there is one data block corresponding to the fingerprint, the deduplication server 104 may implement deletion of the data block to be deleted by reducing the reference number. After the deduplication server 104 finishes processing all the data blocks that need to be deleted, the index object corresponding to the data to be deleted may be deleted in the object store 106.
In addition, in one embodiment, if any data block to be deleted does not have a reference relationship with the data fingerprint to be deleted, a target data file corresponding to the data block to be deleted is obtained, and the count of valid data blocks in the target data file is reduced by one. In this embodiment, one data fingerprint may refer to multiple data blocks, and when the deduplication server 104 deletes data, if it is detected that the reference number of the data block referred by the data fingerprint is not 0, the deduplication server 104 may continue to process the next data block; when the deduplication server 104 detects that the reference number is 0, it indicates that the data block that is not referenced by the data fingerprint, that is, the valid data block in the data file, is decreased, and the deduplication server 104 may acquire the target data file corresponding to the data block to be deleted, and decrease the count of the valid data block of the target data file by 1.
In addition, in one embodiment, the method further comprises: and if the count of the effective data blocks in the target data file is zero, deleting the information of the target data file in the metadata corresponding to the target data file. In this embodiment, if the deduplication server 104 detects that the count of the valid data blocks is 0, it indicates that there is no valid data block in the current target data file, and at this time, the deduplication server 104 may delete the data object corresponding to the target data file in the object storage 106, and delete the information record of the target data file from the data file information of the metadata.
Through the above embodiment, the deduplication server 104 may obtain information corresponding to data that needs to be deleted by using the object store 106, and delete the data through the index file and the data file, thereby improving the efficiency of deleting the data.
In one embodiment, further comprising: acquiring a data file to be recovered, wherein the ratio of redundant data in the data file is greater than a preset value; the data file to be recovered comprises at least one data block; the redundant data ratio represents the ratio of data blocks which do not have reference relation with any data fingerprint in the data file to all data blocks in the data file; acquiring a data fingerprint corresponding to each data block, and judging the reference number corresponding to the data fingerprint according to the data fingerprint; the reference number represents the number of data blocks in association with the data fingerprint; if the reference number is not zero, determining that the data block is an effective data block, writing the effective data block into a new data file, updating a data fingerprint corresponding to the effective data block according to the storage position of the effective data block in the new data file, and obtaining an updated data file according to at least one effective data block; if the reference number is zero, determining that the data block is a redundant data block, deleting a data fingerprint corresponding to the redundant data block from an index file corresponding to the redundant data block, and obtaining a recovered data file according to at least one redundant data block; the updated data file is stored in the object store 106, and the recovered data file is deleted.
In this embodiment, as shown in fig. 9, fig. 9 is a schematic flow chart of a step of recovering redundant data in an embodiment. Since the objects in the object store 106 cannot be overwritten once they are created, there is no way to release a portion of the data blocks from the data file as with a block device, and the data file can only be deleted to reclaim space until all the data blocks in the data file are no longer referenced. Therefore, a large number of redundant data blocks cannot be timely recovered, and in order to timely recover the storage space occupied by the redundant data blocks and improve the storage utilization efficiency, the duplicate removal server 104 can recover the redundant data blocks. The deduplication server 104 may obtain a data file of which the ratio of redundant data is greater than a preset value in the data file, as a data file to be recycled. For example, the deduplication server 104 may obtain a list of data files from the metadata information, where the redundant data ratio exceeds a preset value. The data file to be recovered may include at least one data block, and the redundant data percentage may represent a percentage of data blocks, which do not have a reference relationship with any data fingerprint, in the data file to all data blocks in the data file.
The deduplication server 104 may obtain, for each data block in the data file to be recycled, a data fingerprint corresponding to the data block, and determine a reference number corresponding to the data fingerprint. For example, the deduplication server 104 may obtain the data block information list in the data file, query the fingerprint database by using the data fingerprint of the data block, and determine whether the data block is a redundant data block, that is, query whether the reference number of the data block of the data fingerprint is 0. Wherein the reference number may characterize the number of data chunks associated with the data fingerprint.
If the reference number is not 0, the deduplication server 104 may determine that the data block is an effective data block, the deduplication server 104 may write the effective data block into a new data file to form a new data object, and the deduplication server 104 may update the data fingerprint corresponding to the data block in the fingerprint database according to the storage location of the effective data block stored in the new data file. The new data file may include at least one valid data block, the deduplication server 104 may obtain an updated data file according to the at least one valid data block, and the deduplication server 104 may upload the updated data file to the object storage 106 for storage.
If the reference number is 0, the deduplication server 104 may determine that the data chunk is a redundant data chunk, and the deduplication server 104 may delete the data fingerprint corresponding to the redundant data chunk in the index file corresponding to the redundant data chunk, for example, delete the relevant record of the data chunk from the fingerprint database. The number of the redundant data blocks may be at least one, and the deduplication server 104 may obtain a recycled data file according to the at least one redundant data block, that is, the recycled data file includes at least one redundant data block. The deduplication server 104 may delete the reclaimed data file.
By the embodiment, the deduplication server 104 may delete the redundant data blocks in the object store 106, so as to reduce the redundancy of the storage space, thereby achieving the effect of improving the data storage performance.
In one embodiment, as shown in fig. 10, fig. 10 is a schematic flowchart of a data storage method based on object storage in another embodiment. The embodiment includes the following processes:
when the client 102 needs to store data, it may send a notification signal to the deduplication server 104, so that the deduplication server 104 generates a data object name; the deduplication server 104 allocates a data object name to the client 102, and the client 102 locally opens a data file to be stored and an index file to be stored, which are generated and sent by the deduplication server 104. The client 102 acquires data blocks to be stored from original data according to a certain strategy, a fingerprint algorithm is applied to calculate fingerprints of the data blocks for each data block, the client 102 sends the data fingerprints of the data blocks to the duplication elimination server 104, the duplication elimination server 104 temporarily stores the data fingerprints, inquires whether the data fingerprints exist or not, and returns the inquiry result to the client 102. After the client 102 receives the query result, if the query result is the existing result, the client 102 writes the data fingerprint information corresponding to the data block into the index file, and continues to acquire the next data block; if the data block does not exist, the client 102 may write the data block into the data file and record the location of the data block in the data file.
When the size of the data file reaches a preset value, the client 102 may upload the data file to a designated location of the object store 106, and generate a data object; submitting the data file information to a deduplication server 104, wherein the deduplication server 104 can record the data object information to metadata and temporarily store a fingerprint list contained in the data object; the deduplication server 104 allocates a new data object name to the client 102, and the client 102 opens a new data file to be written locally; and writing the data fingerprint information of the data block into the index file, and continuously acquiring the next data block. After all data is processed, client 102 closes and uploads the last data file to object store 106, and submits the data object information to the fingerprint database in deduplication server 104. Meanwhile, the client 102 closes the index file and uploads the index file to the designated location of the object store 106; the deduplication server 104 records the last data file information to the metadata, and performs the following operations with respect to the temporarily stored fingerprint list: for an existing fingerprint, updating the reference number of the data fingerprint in the fingerprint database of the deduplication server 104; for a non-existing fingerprint, a new fingerprint record is inserted into the fingerprint database.
With the present embodiment, the client 102 and the deduplication server 104 determine the storage file type of the data to be stored by using the data fingerprint, and store the data by using the object storage, so as to achieve the effect of improving the storage performance of the data storage due to the characteristics of high reliability and elastic expansion of the object storage.
In one embodiment, as shown in fig. 11, fig. 11 is a flowchart illustrating a data storage method based on object storage in another embodiment. The data storage method based on object storage in this embodiment may be applied to the scenario where the deduplication server 104 has certain storage, bandwidth, and processing capability, and includes the following processes:
when the client 102 needs to store data, it may send a notification signal to the deduplication server 104, so that the deduplication server 104 generates a data object name; the deduplication server 104 allocates a data object name to the client 102, and opens a corresponding index file to be stored in the deduplication server 104 for caching index data of the file; after receiving the data object name assigned by the deduplication server, the client 102 may locally open a data file for caching a data block to be stored in the object storage 106; the client 102 may obtain a data block to be stored from the data to be stored according to a certain blocking strategy, calculate a data fingerprint of the data block by using a set fingerprint algorithm, and submit the data fingerprint to the deduplication server 104; the deduplication server 104 writes the data fingerprints submitted by the client 102 into the index file in sequence, adds the temporary fingerprint list, queries whether the fingerprints exist in the fingerprint database, and returns the query result to the client 102.
After receiving the query result, the client 102 may perform different steps according to the query result; for example, if the query result is that the submitted data fingerprint already exists in the system, the client 102 may continue to obtain the next data block to be stored; if the data block does not exist, the client 102 may write the data block to be stored into the data file, and record the storage location of the data block in the data file. When the size of the data file reaches a preset size, the client 102 may upload the data file to a set location in the object store 106 and submit the data file information to the fingerprint database in the deduplication server 104.
The fingerprint database can store data file information into metadata and update the storage position of the fingerprint to a temporary fingerprint list. Meanwhile, the deduplication server 104 may reassign a new data object name to the client 102 and generate a new data file, and the client 102 may open the data file locally for writing. After all data blocks have been processed according to the above process, the client 102 may close the last data file, upload the last data file to the object store 106, and submit the data file information to the deduplication server 104. Deduplication server 104 may record the relevant information of the last data file into the metadata, update the last fingerprint list, close the index file, upload the data file and the index file into object store 106, and update or insert the temporarily stored fingerprint list into the fingerprint database.
With the embodiment, the client 102 and the deduplication server 104 determine the storage file type of the data to be stored by using the data fingerprint, and store the data by using the object storage, so as to achieve the effect of improving the storage performance of the data storage due to the characteristics of high reliability and elastic expansion of the object storage.
In one embodiment, as shown in fig. 12, fig. 12 is a schematic flowchart of a data storage method based on object storage in yet another embodiment. The data storage method based on object storage in this embodiment may be applied to a scenario when the deduplication server 104 has sufficient storage, bandwidth, and processing performance. The method comprises the following steps:
when the client 102 needs to store data, it may send a notification signal to the deduplication server 104, so that the deduplication server 104 generates a data object name; the deduplication server 104 allocates a data object name for the client 102, and opens a corresponding data file to be stored and an index file to be stored in the deduplication server 104; the client 102 acquires data blocks from data to be stored according to a certain blocking strategy, calculates data fingerprints of the data blocks according to a set fingerprint algorithm, and submits the data fingerprints to the duplication elimination server 104; the deduplication server 104 may query the fingerprint database for whether the data fingerprint exists, and when the data fingerprint already exists, may write information of the data fingerprint into the index file, and when the data fingerprint does not exist, may request the client 102 to upload the data block; the client 102 may determine whether the data block needs to be uploaded according to the query result of the data fingerprint, and if the data block does not need to be uploaded, the client 102 may continue to process the data fingerprint of the next data block; if the data block needs to be uploaded, the client 102 may send the data block to the deduplication server 104. The deduplication server 104 may obtain a data fingerprint corresponding to the received data block, compare the data fingerprint with the data fingerprint submitted by the client 102, and write the data block into the data file and the fingerprint into the index file if the data fingerprints are consistent. When the size of the data file reaches a preset value, the deduplication server 104 may upload the data file to the object storage 106, record relevant information of the data file in the metadata, and allocate and open a new data file to be stored to perform storage processing of the data block. When all the data blocks are processed, the client 102 may send a processing completion instruction to the deduplication server 104, and after receiving the instruction, the deduplication server 104 may close the last data file and upload the data file to the object storage 106; the index file is closed and uploaded to object store 106 and the fingerprint list is updated or inserted into the fingerprint database.
With the embodiment, the client 102 and the deduplication server 104 determine the storage file type of the data to be stored by using the data fingerprint, and store the data by using the object storage, so as to achieve the effect of improving the storage performance of the data storage due to the characteristics of high reliability and elastic expansion of the object storage.
It should be understood that although the various steps in the flowcharts of fig. 4-12 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 4-12 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 13, there is provided an object storage based data storage apparatus, including: a sending module 500, a dividing module 502, an obtaining module 504, a judging module 506 and a storing module 508, wherein:
a sending module 500, configured to send a data file request to the deduplication server 104 in response to a data storage request for data to be stored; the data file request is used for indicating the deduplication server 104 to generate a corresponding file to be stored according to the data to be stored; the data comprises at least one data block; the file to be stored comprises at least one file type.
The dividing module 502 is configured to divide data to be stored into at least one data block according to a preset block dividing policy.
An obtaining module 504, configured to obtain, for each data block, a data fingerprint corresponding to the data block, and send the data fingerprint to the deduplication server 104; the deduplication server 104 is used for querying whether a data fingerprint exists and returning a query result.
And the judging module 506 is configured to store the data block and/or the data fingerprint to a file to be stored of a corresponding file type according to the query result.
The storage module 508 is configured to send the file to be stored to the object storage 106 for storage.
In an embodiment, the determining module 506 is specifically configured to, if the query result is negative, store the data block corresponding to the data fingerprint in the data file, write the data fingerprint in the index file, and send the storage location of the data block in the data file to the deduplication server, otherwise, write the data fingerprint in the index file.
In one embodiment, the above apparatus further comprises: the newly-added module is used for sending the data file to an object storage for storage if the residual storage space of the data file is smaller than the data size of the data block in the data to be stored; sending a storage location and a new data file request to a deduplication server; the deduplication server is used for storing the storage position into the metadata corresponding to the data file and sending a new data file to be stored to the client.
In one embodiment, the above apparatus further comprises: the reading module is used for responding to the data reading instruction and acquiring the index file corresponding to the data reading instruction from the object storage; acquiring at least one data fingerprint according to the index file, and sending the at least one data fingerprint to a duplicate removal server; the duplication removing server is used for inquiring the position of the data block corresponding to the data fingerprint according to the at least one data fingerprint and returning; and acquiring the corresponding data block from the object storage according to the position returned by the duplicate removal server.
In one embodiment, the above apparatus further comprises: a rewriting module, configured to send a data change instruction to the deduplication server 104, and obtain an index file to be changed from the object storage 106; the deduplication server 104 is used for generating a data file to be changed according to the data change instruction and returning the data file; the data change instruction comprises a change position; acquiring a data fingerprint to be modified corresponding to the modification position in the index file to be modified, and sending the data fingerprint to be modified to the deduplication server 104; the duplication elimination server 104 is used for inquiring and returning the position of the corresponding data block to be changed according to the data fingerprint to be changed; acquiring a corresponding data block to be modified from the object storage 106 according to the position of the data block to be modified, and modifying the data block to be modified according to the modification position to obtain a modified data block; generating a corresponding modified data fingerprint according to the modified data block; inquiring the index file to be modified according to the modified data fingerprint, if the modified data fingerprint is not referred by any data block, storing the modified data fingerprint to the index file to be modified to obtain the modified index file, and sending the modified data fingerprint to the duplication elimination server 104; the deduplication server 104 is configured to query whether the changed data fingerprint already exists, and return a query result to the client; if the query result is negative, writing the modified data block into the data file to be modified to obtain the modified data file; the modified index file and the modified data file are stored to object store 106.
In one embodiment, as shown in fig. 14, there is provided an object storage based data storage apparatus, including: a generating module 600, an obtaining module 602, a first determining module 604, and a second determining module 606, wherein:
the generating module 600 is configured to generate a corresponding data file to be stored and an index file to be stored in response to a data file request sent by the client 102.
An obtaining module 602, configured to obtain a data fingerprint generated for at least one data chunk sent by the client 102, and query the fingerprint database for whether the data fingerprint exists.
The first determining module 604 is configured to, if yes, return the query result to the client 102, and update the reference number of the data fingerprint in the fingerprint database according to the data fingerprint.
The second determining module 606 is configured to, if not, return a query result to the client 102, obtain a storage location, in the data file, of the data block corresponding to the data fingerprint sent by the client 102, and store the storage location and the data fingerprint in the fingerprint database.
In one embodiment, the above apparatus further comprises: a deletion module, configured to obtain a deletion instruction sent by the client 102; the deleting instruction comprises storage file information of the data to be deleted; acquiring an index file corresponding to the data to be deleted from the object storage 106, and acquiring a data fingerprint to be deleted corresponding to a data block to be deleted in the data to be deleted from the index file; deleting the reference relation of the data fingerprint to be deleted to the data block to be deleted, and deleting the index file corresponding to the data to be deleted.
In one embodiment, the above apparatus further comprises: and the first detection module is used for acquiring a target data file corresponding to the data block to be deleted and subtracting one from the count of the effective data block in the target data file if the data block to be deleted and the data fingerprint to be deleted do not have a reference relationship.
In one embodiment, the above apparatus further comprises: and the second detection module is used for deleting the information of the target data file in the metadata corresponding to the target data file if the count of the effective data blocks in the target data file is zero.
In one embodiment, the above apparatus further comprises: the recovery module is used for acquiring a data file to be recovered, wherein the redundant data proportion of the data file is greater than a preset value; the data file to be recovered comprises at least one data block; the redundant data ratio represents the ratio of data blocks which do not have reference relation with any data fingerprint in the data file to all data blocks in the data file; acquiring a data fingerprint corresponding to each data block, and judging the reference number corresponding to the data fingerprint according to the data fingerprint; the reference number represents the number of data blocks in association with the data fingerprint; if the reference number is not zero, determining that the data block is an effective data block, writing the effective data block into a new data file, updating a data fingerprint corresponding to the effective data block according to the storage position of the effective data block in the new data file, and obtaining an updated data file according to at least one effective data block; if the reference number is zero, determining that the data block is a redundant data block, deleting a data fingerprint corresponding to the redundant data block from an index file corresponding to the redundant data block, and obtaining a recovered data file according to at least one redundant data block; the updated data file is stored in the object store 106, and the recovered data file is deleted.
For specific limitations of the data storage device based on object storage, reference may be made to the above limitations of the data storage method based on object storage, which are not described herein again. The respective modules in the object storage based data storage device may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a client, and its internal structure diagram may be as shown in fig. 15. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external client, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a data storage method based on object storage. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 15 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the data storage method based on object storage when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the object storage-based data storage method described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A data storage method based on object storage is applied to a client, and the method comprises the following steps:
sending a data file request to a deduplication server in response to a data storage request for data to be stored; the data file request is used for indicating the deduplication server to generate a corresponding file to be stored according to the data to be stored; the file to be stored comprises a data object name corresponding to the client; the data to be stored comprises at least one data block; the file to be stored comprises at least one file type; the file types of the files to be stored comprise: the method comprises the steps of storing a data file to be stored and an index file to be stored; the data file to be stored is used for storing the data to be stored; the index file to be stored is used for storing data fingerprints corresponding to the data to be stored;
dividing the data to be stored into at least one data block according to a preset blocking strategy;
for each data block, acquiring a data fingerprint corresponding to the data block, and sending the data fingerprint to the deduplication server; the duplication eliminating server is used for inquiring whether the data fingerprint exists or not and returning an inquiry result;
according to the query result, storing the data block and/or the data fingerprint to a file to be stored of a corresponding file type, including: if the query result is negative, storing the data block corresponding to the data fingerprint into the data file, writing the data fingerprint into the index file, and sending the storage position of the data block in the data file to the duplication elimination server, otherwise, writing the data fingerprint into the index file;
further comprising: if the residual storage space of the data file is smaller than the data size of the data block in the data to be stored, sending the data file to the object storage for storage;
sending the storage location and a new data file request to the deduplication server; the duplication removing server is used for storing the storage position into metadata corresponding to the data file and sending a new data file to be stored to the client;
and sending the file to be stored to an object storage for storage.
2. The method of claim 1, further comprising:
responding to a data reading instruction, and acquiring an index file corresponding to the data reading instruction from the object storage;
acquiring at least one data fingerprint according to the index file, and sending the at least one data fingerprint to the duplication removing server; the duplication removing server is used for inquiring the position of a data block corresponding to the data fingerprint according to the at least one data fingerprint and returning;
and acquiring a corresponding data block from the object storage according to the position returned by the duplication removing server.
3. The method of claim 1, further comprising:
sending a data change instruction to the deduplication server, and acquiring an index file to be changed from the object storage; the duplication eliminating server is used for generating a data file to be changed according to the data change instruction and returning the data file to be changed; the data change instruction comprises a change position;
acquiring the data fingerprint to be modified corresponding to the modification position in the index file to be modified, and sending the data fingerprint to be modified to the deduplication server; the duplication eliminating server is used for inquiring the position of the corresponding data block to be changed according to the data fingerprint to be changed and returning;
acquiring a corresponding data block to be modified from the object storage according to the position of the data block to be modified, and modifying the data block to be modified according to the modification position to obtain a modified data block;
generating a corresponding modified data fingerprint according to the modified data block;
inquiring the index file to be modified according to the modified data fingerprint, if the modified data fingerprint is not referred by any data block, storing the modified data fingerprint to the index file to be modified to obtain a modified index file, and sending the modified data fingerprint to the duplication elimination server; the duplication eliminating server is used for inquiring whether the changed data fingerprint exists or not and returning an inquiry result to the client;
if the query result is negative, writing the modified data block into the data file to be modified to obtain a modified data file;
and storing the changed index file and the changed data file to the object storage.
4. A data storage method based on object storage is applied to a deduplication server, and comprises the following steps:
responding to a data file request sent by a client, acquiring a data object name corresponding to the client and generating a corresponding data file to be stored and an index file to be stored; the data file to be stored is used for storing the data to be stored; the index file to be stored is used for storing data fingerprints corresponding to the data to be stored;
acquiring a data fingerprint which is sent by the client and generated aiming at the at least one data block, and inquiring whether the data fingerprint exists in a fingerprint database;
if yes, returning a query result to the client, and updating the reference number of the data fingerprint in the fingerprint database according to the data fingerprint;
if not, returning a query result to the client, acquiring a storage position, in the data file, of a data block corresponding to the data fingerprint sent by the client, and storing the storage position and the data fingerprint in the fingerprint database; the client is used for storing the data blocks corresponding to the data fingerprints into the data file according to the query result, writing the data fingerprints into the index file, and sending the storage positions of the data blocks in the data file to the duplication elimination server, otherwise, writing the data fingerprints into the index file, and if the residual storage space of the data file is smaller than the data size of the data blocks in the data to be stored, sending the data file to the object storage for storage;
further comprising:
and receiving the storage position and a new data file request sent by a client, storing the storage position into metadata corresponding to the data file, and sending a new data file to be stored to the client.
5. The method of claim 4, further comprising:
acquiring a deleting instruction sent by the client; the deleting instruction comprises storage file information of data to be deleted;
acquiring an index file corresponding to the data to be deleted from an object storage, and acquiring a data fingerprint to be deleted corresponding to a data block to be deleted in the data to be deleted from the index file;
deleting the reference relation of the data fingerprint to be deleted to the data block to be deleted, and deleting the index file corresponding to the data to be deleted;
further comprising:
if any data block to be deleted does not have a reference relation with the data fingerprint to be deleted, acquiring a target data file corresponding to the data block to be deleted, and subtracting one from the count of effective data blocks in the target data file;
further comprising:
and if the count of the effective data blocks in the target data file is zero, deleting the information of the target data file in the metadata corresponding to the target data file.
6. The method of claim 5, further comprising:
acquiring a data file to be recovered, wherein the ratio of redundant data in the data file is greater than a preset value; the data file to be recycled comprises at least one data block; the redundant data proportion represents the proportion of data blocks which do not have reference relation with any data fingerprint in the data file to all data blocks in the data file;
acquiring a data fingerprint corresponding to each data block, and judging the reference number corresponding to the data fingerprint according to the data fingerprint; the reference number represents the number of data blocks in association with the data fingerprint;
if the reference number is not zero, determining that the data block is an effective data block, writing the effective data block into a new data file, updating a data fingerprint corresponding to the effective data block according to the storage position of the effective data block in the new data file, and obtaining an updated data file according to at least one effective data block;
if the reference number is zero, determining that the data block is a redundant data block, deleting a data fingerprint corresponding to the redundant data block from an index file corresponding to the redundant data block, and obtaining a recovered data file according to at least one redundant data block;
and storing the updated data file into the object storage, and deleting the recovered data file.
7. An object storage based data storage device, applied to a client, the device comprising:
the sending module is used for responding to a data storage request aiming at the data to be stored and sending a data file request to the deduplication server; the data file request is used for indicating the deduplication server to generate a corresponding data file to be stored and an index file to be stored according to the data to be stored; the file to be stored comprises a data object name corresponding to the client; the data to be stored comprises at least one data block; the file types of the files to be stored comprise: the method comprises the steps of storing a data file to be stored and an index file to be stored; the data file to be stored is used for storing the data to be stored; the index file to be stored is used for storing data fingerprints corresponding to the data to be stored;
the dividing module is used for dividing the data to be stored into at least one data block according to a preset partitioning strategy;
the acquisition module is used for acquiring a data fingerprint corresponding to each data block and sending the data fingerprint to the duplication elimination server; the duplication eliminating server is used for inquiring whether the data fingerprint exists or not and returning an inquiry result;
the judging module is used for storing the data blocks and/or the data fingerprints to files to be stored of corresponding file types according to the query result, specifically, if the query result is negative, the data blocks corresponding to the data fingerprints are stored to the data files, the storage positions of the data blocks in the data files are sent to the duplication elimination server, and otherwise, the data fingerprints are written into the index file;
the newly added module is used for sending the data file to the object storage for storage if the residual storage space of the data file is smaller than the data size of the data block in the data to be stored; sending the storage location and a new data file request to the deduplication server; the duplication removing server is used for storing the storage position into metadata corresponding to the data file and sending a new data file to be stored to the client;
and the storage module is used for sending the data file and/or the index file to an object storage for storage.
8. An object storage based data storage device, applied to a deduplication server, the device comprising:
the generating module is used for responding to a data file request sent by a client, acquiring a data object name corresponding to the client and generating a corresponding data file to be stored and an index file to be stored; the data file to be stored is used for storing the data to be stored; the index file to be stored is used for storing data fingerprints corresponding to the data to be stored;
the acquisition module is used for acquiring a data fingerprint which is sent by the client and generated aiming at least one data block, and inquiring whether the data fingerprint exists in a fingerprint database;
the first determining module is used for returning a query result to the client if the data fingerprint exists, and updating the reference number of the data fingerprint in a fingerprint database according to the data fingerprint;
a second determining module, configured to, if not, return a query result to the client, obtain a storage location, in the data file, of a data block corresponding to the data fingerprint sent by the client, and store the storage location and the data fingerprint in the fingerprint database; the client is used for storing the data blocks corresponding to the data fingerprints into the data file according to the query result, writing the data fingerprints into the index file, and sending the storage positions of the data blocks in the data file to the duplication elimination server, otherwise, writing the data fingerprints into the index file, and if the residual storage space of the data file is smaller than the data size of the data blocks in the data to be stored, sending the data file to the object storage for storage;
and is also used for: and receiving the storage position and a new data file request sent by a client, storing the storage position into metadata corresponding to the data file, and sending a new data file to be stored to the client.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202110280863.9A 2021-03-16 2021-03-16 Data storage method and device based on object storage and computer equipment Active CN112817962B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110280863.9A CN112817962B (en) 2021-03-16 2021-03-16 Data storage method and device based on object storage and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110280863.9A CN112817962B (en) 2021-03-16 2021-03-16 Data storage method and device based on object storage and computer equipment

Publications (2)

Publication Number Publication Date
CN112817962A CN112817962A (en) 2021-05-18
CN112817962B true CN112817962B (en) 2022-02-18

Family

ID=75863296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110280863.9A Active CN112817962B (en) 2021-03-16 2021-03-16 Data storage method and device based on object storage and computer equipment

Country Status (1)

Country Link
CN (1) CN112817962B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129664A (en) * 2022-09-01 2022-09-30 湖南兴天电子科技股份有限公司 Data recording device, data file management method and apparatus

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442931A (en) * 2021-12-23 2022-05-06 天翼云科技有限公司 Data deduplication method and system, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106843773A (en) * 2017-02-16 2017-06-13 天津书生云科技有限公司 Storage method and distributed storage system
CN109445703A (en) * 2018-10-26 2019-03-08 黄淮学院 A kind of Delta compression storage assembly based on block grade data deduplication

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180891B1 (en) * 2008-11-26 2012-05-15 Free Stream Media Corp. Discovery, access control, and communication with networked services from within a security sandbox
US9639274B2 (en) * 2015-04-14 2017-05-02 Commvault Systems, Inc. Efficient deduplication database validation
CN111522502B (en) * 2019-02-01 2022-04-29 阿里巴巴集团控股有限公司 Data deduplication method and device, electronic equipment and computer-readable storage medium
CN111966649B (en) * 2020-10-21 2021-01-01 中国人民解放军国防科技大学 Lightweight online file storage method and device capable of efficiently removing weight

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106843773A (en) * 2017-02-16 2017-06-13 天津书生云科技有限公司 Storage method and distributed storage system
CN109445703A (en) * 2018-10-26 2019-03-08 黄淮学院 A kind of Delta compression storage assembly based on block grade data deduplication

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129664A (en) * 2022-09-01 2022-09-30 湖南兴天电子科技股份有限公司 Data recording device, data file management method and apparatus

Also Published As

Publication number Publication date
CN112817962A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
US10635359B2 (en) Managing cache compression in data storage systems
US11531482B2 (en) Data deduplication method and apparatus
US9965394B2 (en) Selective compression in data storage systems
US9317218B1 (en) Memory efficient sanitization of a deduplicated storage system using a perfect hash function
US8751763B1 (en) Low-overhead deduplication within a block-based data storage
US9430164B1 (en) Memory efficient sanitization of a deduplicated storage system
CN105190573A (en) Reduced redundancy in stored data
CN112817962B (en) Data storage method and device based on object storage and computer equipment
CN111125033B (en) Space recycling method and system based on full flash memory array
CN108415986B (en) Data processing method, device, system, medium and computing equipment
CN111198856B (en) File management method, device, computer equipment and storage medium
US11409766B2 (en) Container reclamation using probabilistic data structures
CN103197987A (en) Data backup method, data recovery method and cloud storage system
CN113094372A (en) Data access method, data access control device and data access system
US11093453B1 (en) System and method for asynchronous cleaning of data objects on cloud partition in a file system with deduplication
WO2020215580A1 (en) Distributed global data deduplication method and device
CN111158606B (en) Storage method, storage device, computer equipment and storage medium
CN114936010B (en) Data processing method, device, equipment and medium
US10423533B1 (en) Filtered data cache eviction
CN110389706B (en) Fingerprint recovery method and storage system
CN117149724B (en) Method and system for deleting repeated data of cloud environment system
US20230342293A1 (en) Method and system for in-memory metadata reduction in cloud storage system
US10740015B2 (en) Optimized management of file system metadata within solid state storage devices (SSDs)
CN116756137A (en) Method, system and equipment for deleting large-scale data object storage
CN117743324A (en) Data storage method, device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Data storage method, device and computer equipment based on object storage

Effective date of registration: 20221021

Granted publication date: 20220218

Pledgee: Industrial Bank Co.,Ltd. Guangzhou Development Zone sub branch

Pledgor: Guangzhou Dingjia Computer Technology Co.,Ltd.

Registration number: Y2022980018838

PE01 Entry into force of the registration of the contract for pledge of patent right