CN115309336A

CN115309336A - Data writing method, cache information updating method and related device

Info

Publication number: CN115309336A
Application number: CN202210939151.8A
Authority: CN
Inventors: 陈涛
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2022-11-08

Abstract

The invention provides a data writing method, a cache information updating method and a related device, and relates to the field of distributed storage. The data writing method comprises the following steps: dividing data to be stored into a plurality of data fragments according to a preset erasure ratio, and determining a storage block corresponding to each data fragment, wherein the storage block is used for storing the data fragments; determining a target storage node where each storage block is located at the current moment according to a plurality of pieces of cache information of a local cache, wherein the plurality of pieces of cache information are updated based on the recovery progress of the recovered data blocks reported by the plurality of storage nodes, the recovered data blocks reported by each storage node all meet a preset reporting condition, and the preset reporting condition is sent to the plurality of storage nodes by a client at regular time; and sending each data fragment to the target storage node where the storage block corresponding to each data fragment is located to write data, so as to avoid the influence of the data fragment writing process on the data block recovery process and reduce the writing time delay.

Description

Data writing method, cache information updating method and related device

Technical Field

The invention relates to the field of distributed storage, in particular to a data writing method, a cache information updating method and a related device.

Background

In the distributed storage system, the storage information is positioned through the metadata server, a plurality of data servers serve as storage nodes, and the client provides an access interface for the outside to perform data reading/writing operation. For data writing operation, a client divides data to be stored into N + M data fragments according to an erasure correction ratio, then writes each data fragment into data blocks of different storage nodes, and when the number of successfully written data fragments is greater than N, confirms that the whole data to be stored is successfully written, and for the data fragments which are failed to be written, the storage nodes recover through erasure correction calculation.

Due to the cooperative work of multiple devices, a great challenge is brought to the consistency of data, for example, when a storage node recovers a data block, a client writes a data fragment into the data block. At present, the main processing mode for the situation that data writing and recovery are triggered simultaneously is that a client suspends the writing operation of current data fragmentation, and executes the writing operation of data fragmentation after the data block recovery is completed, so that the writing time delay is greatly increased.

Disclosure of Invention

In order to overcome the defects of the prior art, embodiments of the present invention provide a data writing method, a cache information updating method, and a related apparatus, which can enable a data block recovery process not to affect a data fragmentation writing process, and reduce a writing time delay.

The technical scheme of the embodiment of the invention can be realized as follows:

in a first aspect, an embodiment of the present invention provides a data writing method, which is applied to a client in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes, the client is in communication connection with the plurality of storage nodes, and each storage node has at least one data block created thereon, and the method includes:

dividing data to be stored into a plurality of data fragments according to a preset erasure ratio, and determining a storage block corresponding to each data fragment, wherein the storage block is used for storing the data fragments;

determining a target storage node where each storage block is located at the current moment according to a plurality of pieces of cache information of local cache;

the cache information is updated based on the recovery progress of the recovered data blocks reported by the storage nodes, the recovered data blocks reported by each storage node meet a preset reporting condition, and the preset reporting condition is sent to the storage nodes by the client at regular time;

and sending each data fragment to a target storage node where a storage block corresponding to each data fragment is located, so as to write data.

Optionally, the pieces of cache information include pieces of first information cached in a local cache region, each piece of the first information records an identifier and a state of one data block and a storage node where the data block is located, and the distributed storage system further includes a management node, where the management node is in communication connection with the client;

the step of determining the target storage node where each storage block is located at the current moment according to the plurality of pieces of cache information of the local cache comprises:

for each storage block, if target first information in which the identifier of the storage block is recorded exists in the plurality of pieces of first information and the state of the data block recorded in the target first information is a recovery state or a normal state, taking a storage node where the data block recorded in the target first information is located as a target storage node where the storage block is located at the current moment;

if the target first information recorded with the identification of the storage block exists in the first information and the state of the data block recorded in the target first information is an abnormal state, sending a query request to the management node to acquire a target storage node where the storage block is located at the current moment;

optionally, the pieces of cache information further include a plurality of pieces of second information cached in a local cache linked list, each piece of the second information includes an identifier of the data block and a storage node where the data block is located, the plurality of pieces of second information are updated based on the recovery progress of the recovered data block reported by the plurality of storage nodes, and the plurality of pieces of first information are updated based on the update condition of the plurality of pieces of second information;

the step of determining the target storage node where each storage block is located at the current time according to the plurality of pieces of cache information of the local cache further includes:

if the target first information recorded with the identification of the storage block does not exist in the first information, judging whether target second information recorded with the identification of the storage block exists in the second information;

if the target second information exists, taking the storage node where the data block recorded by the target second information is located as the target storage node where the storage block is located at the current moment;

and if the target second information does not exist, sending a query request to the management node to acquire a target storage node where the storage block is located at the current moment.

Optionally, the method further comprises:

receiving a response message returned by each target storage node, wherein the response message represents whether the target storage node successfully writes the data fragments into the storage blocks corresponding to the data fragments;

for each response message representing successful write, if the state of the storage block written in the data fragment on the target storage node returning to the response message is a recovery state, not including the response message in an erasure count;

if the state of the storage block written in the data fragment on the target storage node returning the response message is a normal state, incorporating the response message into erasure count;

after each response message representing successful writing is judged whether to incorporate erasure correction calculation, judging whether the data to be stored is successfully stored according to the erasure correction count value;

if the erasure count value meets a preset condition, judging that the data to be stored is successfully stored;

and if the erasure count value does not meet the preset condition, returning to the step of determining the target storage node where each storage block is located at the current moment according to the plurality of pieces of cache information of the local cache until the data to be stored is successfully stored.

In a second aspect, an embodiment of the present invention provides a cache information updating method, which is applied to a client in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes, each storage node has at least one data block created thereon, and the client is in communication connection with the plurality of storage nodes, where the method includes:

sending preset reporting conditions to the plurality of storage nodes at regular time, so that the plurality of storage nodes report the recovery progress of the recovered data blocks meeting the preset reporting conditions to the client;

updating a plurality of pieces of cache information cached locally according to the recovery progress of the recovered data block received each time;

the pieces of cache information are used to determine, when the client executes the data writing method according to any one of the foregoing embodiments, a target storage node where a storage block corresponding to each data slice of the data to be stored is located at the current time, where the storage blocks are data blocks used for storing the data slices.

Optionally, the pieces of cache information include a plurality of pieces of first information cached in the local cache region and a plurality of pieces of second information cached in the local cache linked list, and each piece of the second information includes an identifier of one data block and a storage node where the data block is located;

the step of updating the plurality of pieces of cache information cached locally according to the recovery progress of the recovered data block received each time includes:

for the recovery progress of the recovered data block received each time, if the recovery progress is in recovery, adding and recording the identifier of the recovered data block and second information of the storage node in which the identifier is located in the local cache linked list;

if the recovery progress is recovery success or recovery failure, deleting the second information recorded with the identifier of the recovered data block in the local cache linked list;

and updating the plurality of pieces of first information according to the updating conditions of the plurality of pieces of second information.

Optionally, each piece of first information records an identifier and a state of one data block and a storage node where the data block is located;

the step of updating the plurality of pieces of first information according to the updating condition of the plurality of pieces of second information comprises:

for second information newly added in the local cache chain table, if a reference first signal that the identifier of the recorded data block is the same as the identifier of the data block recorded by the second information exists in the local cache region, updating the storage node where the data block recorded by the reference first information is located to the storage node where the data block recorded by the second information is located, and setting the state of the data block recorded by the reference first information to be a recovery state;

for the newly deleted second information in the local cache linked list, if the recovery progress of the data block recorded by the second information is successful and the reference first information exists in the local cache region, setting the state of the data block recorded by the reference first information as a normal state;

and if the recovery progress of the data block recorded by the second information is recovery failure and the reference first information exists in the local cache region, setting the state of the data block recorded by the reference first information as an abnormal state.

In a third aspect, an embodiment of the present invention provides a data writing apparatus, which is applied to a client in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes, the client is communicatively connected to the plurality of storage nodes, and each storage node has at least one data block created thereon, where the apparatus includes:

the data storage device comprises a splitting module, a storage module and a processing module, wherein the splitting module is used for splitting data to be stored into a plurality of data fragments according to a preset erasure correction ratio and determining a storage block corresponding to each data fragment, and the storage blocks are used for storing the data fragments;

the determining module is used for determining a target storage node where each storage block is located at the current moment according to a plurality of pieces of cache information of local cache;

and the first sending module is used for sending each data fragment to a target storage node where the storage block corresponding to each data fragment is located so as to write data.

In a fourth aspect, an embodiment of the present invention provides a cache information updating apparatus, which is applied to a client in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes, each storage node has at least one data block created thereon, and the client is in communication connection with the plurality of storage nodes, where the apparatus includes:

a second sending module, configured to send a preset reporting condition to the multiple storage nodes at regular time, so that the multiple storage nodes report recovery progress of a recovered data block that meets the preset reporting condition to the client;

the updating module is used for updating a plurality of pieces of cache information of the local cache according to the recovery progress of the recovered data block received each time;

In a fifth aspect, an embodiment of the present invention provides a client, including a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, implements the data writing method according to the first aspect and/or the cache information updating method according to the second aspect.

In a sixth aspect, an embodiment of the present invention provides a distributed storage system, which includes a management node, a plurality of storage nodes, and the client according to the fifth aspect.

In a seventh aspect, the present invention provides a computer-readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the data writing method according to the first aspect and/or the cache information updating method according to the second aspect.

Compared with the prior art, according to the data writing method, the cache information updating method and the related device provided by the embodiment of the invention, firstly, the data to be stored is split into the plurality of data fragments according to the preset erasure correction ratio, and the storage block corresponding to each data fragment is determined, wherein the storage block is a data block for storing the data fragment; then, determining a target storage node where each storage block is located at the current moment according to a plurality of pieces of cache information of local cache, wherein the plurality of pieces of cache information are updated based on the recovery progress of the recovered data blocks reported by the plurality of storage nodes, the recovered data blocks reported by each storage node all meet preset reporting conditions, and the preset reporting conditions are sent to the plurality of storage nodes by a client at regular time; and finally, sending each data fragment to a target storage node where the storage block corresponding to each data fragment is located, so as to write data. In the embodiment of the invention, the client determines the storage node of each data block for storing the data fragment at the current moment through the plurality of pieces of cache information regularly updated by the local cache, and then sends the data fragment to the storage node of the data block at the current moment to write data, so that the data fragment writing process is prevented from being influenced by the data block recovery process, and the writing time delay is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a conventional data distributed storage principle provided by an embodiment of the present invention;

fig. 3 is a first flowchart illustrating a data writing method according to an embodiment of the present invention;

fig. 4 is a second flowchart illustrating a data writing method according to an embodiment of the present invention;

fig. 5 is a third flowchart illustrating a data writing method according to an embodiment of the present invention;

fig. 6 is a first flowchart illustrating a cache information updating method according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of a second method for updating cache information according to an embodiment of the present invention;

fig. 8 is a specific example of a process of writing data and updating cache information according to an embodiment of the present invention;

FIG. 9 is a block diagram of functional units of a data writing apparatus according to an embodiment of the present invention;

fig. 10 is a functional unit block diagram of a cache information updating apparatus according to an embodiment of the present invention;

fig. 11 is a block diagram illustrating a structure of a client according to an embodiment of the present invention.

Icon: 100-a data writing device; 101-splitting module; 102-a determination module; 103-a first sending module; 104-a judgment module; 200-cache information updating means; 201-a second sending module; 202-an update module; 300-a client; 310-a memory; 320-processor.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Referring to fig. 1, the distributed storage system includes a client, a plurality of storage nodes, and a management node. The client is in communication connection with the management node, the client is in communication connection with the plurality of storage nodes, the management node is in communication connection with the plurality of storage nodes, and each storage node is provided with at least one data block for storing the data fragments.

As shown in fig. 2, the client may interact with an upper layer application or an external device, receive data to be stored of the upper layer application or the external device, split the received data to be stored into a plurality of data fragments according to a preset erasure correction ratio, and distribute each data fragment to data blocks on different storage nodes for storage by sending a write data request to the storage nodes. The client side can also send the relevant information of the data block which fails to store the data fragment to the management node for recording.

The client may be a server, a Personal Computer (PC), a notebook Computer, or the like, or may be one or more program modules on one device, or a virtual machine or a container running on one device, or may be a cluster formed by a plurality of devices, for example, may be a general term for a plurality of program modules distributed on a plurality of devices.

The storage node may receive a write data request to store a data slice from a client to a corresponding data block. The storage node can also receive a data block recovery request issued by the management node, and recover the data block failed in storing the data fragment. The storage node may be a server, a PC, a notebook computer, or the like. The storage nodes may be physical storage nodes or logical storage nodes obtained by dividing the physical storage nodes.

The management node may receive the related information of the data blocks with failed storage sent by the client, and recover each data block with failed storage according to the related information. The management node may be a server, a PC, a notebook computer, or the like, the management node may also be one or more program modules on one device, or a virtual machine or a container running on one device, and the management node may also be a cluster formed by multiple devices, for example, may be a collective term for multiple program modules distributed on multiple devices.

For the situation that data fragment writing and data block recovery are triggered simultaneously, the existing processing methods mainly include two methods: one is that the client suspends the write operation of the current data fragment and executes the write operation of the data fragment after the data block is recovered, which can greatly increase the write time delay; the other is that the client directly waits for the next data recovery process to recover the data fragment without performing write operation on the current data fragment, and this way needs an additional recovery operation, which increases the system load.

In order to avoid the influence of the data block recovery process on the writing process of the data fragments, an embodiment of the present invention provides a data writing method, an execution subject of which is a client in fig. 1, which will be described in detail below.

Referring to fig. 3, the data writing method according to the embodiment of the present invention includes steps S101 to S103.

S101, dividing data to be stored into a plurality of data fragments according to a preset erasure ratio, and determining a storage block corresponding to each data fragment.

The storage block is a data block for storing the data fragment.

A distributed storage system usually uses an Erasure Code (EC) technology to implement data storage, that is, to divide data to be stored into n data partitions, and then perform Erasure coding on the n data partitions to obtain m redundant partitions for data recovery processing.

It can be understood that the ratio of the number n of the data fragments to the number m of the redundant fragments is an erasure correction ratio, and the plurality of data fragments obtained by splitting the data to be stored according to the preset erasure correction ratio n: m include n data fragments and m redundant fragments.

S102, determining a target storage node where each storage block is located at the current moment according to the plurality of pieces of cache information of the local cache.

The plurality of pieces of cache information are updated based on the recovery progress of the recovered data blocks reported by the plurality of storage nodes, the recovered data blocks reported by each storage node all meet preset reporting conditions, and the preset reporting conditions are sent to the plurality of storage nodes by the client at regular time.

When a data block is recovered, a storage node where the data block 1 is located may be changed, for example, the data block 1 is initially created on the storage node 1, and the storage node where the data block 1 recorded on the management node is located is the storage node 1, if a data fragment stored in the data block 1 fails, the management node selects the storage node 4 to recover the data block 1, that is, the storage node 4 creates a new data block 1, and obtains the stored data fragment from other storage nodes to recover the data block 1, and after the storage node 4 recovers the data block 1 successfully, sends a recovery condition of the data block 1 to the management node, at this time, the management node modifies the storage node where the data block 1 is located to be the storage node 4, and sends a deletion instruction to the storage node 1 to delete the original data block 1.

If the client needs to write the data fragments into the data block 1 in the process of recovering the data block 1 by the storage node 4, at this time, the storage node where the data block 1 acquired by the client from the management node is located is the storage node 1, and is not the storage node 4.

The client caches a plurality of pieces of cache information locally, each piece of cache information records a storage node where a data block is located, and the plurality of pieces of cache information are updated at regular time according to the recovery progress of the recovered data block reported by all the storage nodes, that is, at any time, the storage node where the data block recorded by each piece of cache information is located is the storage node where the data block is located at the current time.

It is understood that the target storage node refers to a storage node where a data fragment data block needs to be stored at the current time, for example, if the data block 1 is initially created on the storage node 1, and the current time is recovered on the storage node 4, then the storage node where the data block 1 is recorded in the cache information of the local cache is the storage node 4, and the data block 2 is initially created on the storage node 2, and until the current time, the management node does not designate another storage node to recover the data block 2, then the storage node where the data block 1 is recorded in the cache information of the local cache is the storage node 2. If the storage block corresponding to the data fragment 1 is the data block 1, and the storage block corresponding to the data fragment 2 is the data block 2, according to the cache information of the local cache, the target storage node where the storage block corresponding to the data fragment 1 is located is the storage node 4, and the target storage node where the storage block corresponding to the data fragment 2 is located is the storage node 2.

S103, sending each data fragment to a target storage node where a storage block corresponding to each data fragment is located, so as to write data.

After determining the target storage node where the storage block corresponding to each data fragment is located, sending a data writing request containing the corresponding data fragment to each target storage node, so that each target storage node writes the data fragment in the data writing request into the storage block corresponding to the data fragment.

Step S102 will be described in detail below.

Referring to fig. 4, step S102 includes sub-steps S102-1 to S102-2 based on fig. 3.

In the embodiment of the invention, the processes of the steps S102-1 to S102-2 are executed once for each storage block to determine the target storage node of each storage block at the current moment.

S102-1, if the target first information recorded with the identification of the storage block exists in the first information and the state of the data block recorded in the target first information is a recovery state or a normal state, the storage node where the data block recorded in the target first information is located is used as the target storage node where the storage block is located at the current moment.

The cache information comprises a plurality of pieces of first information cached in the local cache region, and each piece of first information records the identification and the state of a data block and the storage node where the data block is located.

Optionally, each piece of first information in the local cache area records the identification and the status of the data block that has undergone the data fragmentation write operation and the storage node where the data block is located, because the space of the local cache area is limited.

For any storage block, if the target first information recorded with the identifier of the storage block exists in the local cache region, it means that the storage block has performed data fragmentation writing operation before the current time, and if the current state of the storage block is determined to be a normal state or a recovery state according to the target first information, it means that data fragmentation can be continuously written into the data block, that is, the storage node where the data block recorded in the target first information is located is the target storage node where the storage block is located.

S102-2, if the target first information recorded with the identification of the storage block exists in the first information and the state of the data block recorded in the target first information is an abnormal state, sending a query request to the management node to acquire the target storage node where the storage block is located at the current time.

For any storage block, although the target first information recorded with the identifier of the storage block exists in the local cache region, that is, the storage block has performed data fragment writing operation before the current time, it is determined that the current state of the storage block is an abnormal state according to the target first information, which means that a storage node where the data block recorded in the current non-target first information is located may not be a target storage node where the storage block is located. At this time, the target storage node where the storage block is located at the current time needs to be acquired from the management node.

For this reason, referring to fig. 4 again, each storage block needs to execute the flow of steps S102-3 to S102-5 once, because there may be a case where the target first information recorded with the identifier of the storage block does not exist in the plurality of pieces of first information.

S102-3, if the target first information recorded with the identification of the storage block does not exist in the plurality of pieces of first information, judging whether the target second information recorded with the identification of the storage block exists in the plurality of pieces of second information.

The plurality of pieces of cache information further comprise a plurality of pieces of second information cached in the local cache linked list, each piece of second information comprises an identifier of one data block and a storage node where the data block is located, the plurality of pieces of second information are updated based on the recovery progress of the recovered data block reported by the plurality of storage nodes, and the plurality of pieces of first information are updated based on the updating condition of the plurality of pieces of second information.

In the embodiment of the present invention, the local cache linked list is used to record the identification of the data blocks being restored and the storage node where the data blocks are located, including the data blocks that have been subjected to the data fragmentation write operation and the data blocks that have not been subjected to the data fragmentation write operation.

It can be understood that, for the recovery progress of the recovered data block reported by the multiple storage nodes, if the recovered data block has been subjected to the data fragmentation write operation, after the relevant second information in the local cache chain table is updated according to the recovery progress, the relevant first information in the local cache region is also updated. If the data block to be restored has not been subjected to the data fragment writing operation, only the related second information in the local cache linked list needs to be updated according to the restoration progress.

For any memory block, if the target first information recorded with the identifier of the memory block does not exist in the local cache region, it means that the memory block has not performed a data fragment write operation before the current time. At this time, it is necessary to determine whether the storage block is being restored by determining whether the target second information in which the identifier of the storage block is recorded exists in the plurality of pieces of second information.

And S102-4, if the target second information exists, taking the storage node where the data block of the target second information record is located as the target storage node where the storage block is located at the current moment.

When the target second information recorded with the identification of the storage block exists, the storage block is being restored, and the storage node where the data block recorded in the target second information is located is the target storage node where the storage block is located.

And S102-5, if the target second information does not exist, sending a query request to the management node to acquire the target storage node where the storage block is located at the current moment.

Wherein, when there is no target second information in which the identification of the storage block is recorded, it means that the storage block is not in a state of being restored. At this time, the storage node where the storage block is recorded in the management node is correct, and the target storage node where the storage block is located is obtained by sending an inquiry request for the storage block to the management node.

The flow of steps S102-1 to S102-5 is executed uniformly for each memory block, and the storage node where each memory block is currently located is determined.

According to the erasure correction strategy n + m, when the number of successfully written data fragments is greater than n, that is, the erasure correction count value is greater than n, the data to be stored can be regarded as successfully stored, and since the states of the storage blocks corresponding to part of the data fragments are not restored and the restoration result is unknown, the judgment of whether the data to be stored is successfully written in the whole can be influenced, and the consistency of the data can be further influenced.

Therefore, referring to fig. 5 in addition to fig. 3, after step S103, the data writing method further includes steps S104 to S108.

And S104, receiving a response message returned by each target storage node.

And the response message represents whether the target storage node successfully writes the data fragment into the storage block corresponding to the data fragment.

It will be appreciated that each response message characterizing a write failure does not incorporate an erasure count. For each response message indicating successful writing, the following processes of steps S105 to S106 are executed to determine whether the response message indicating successful writing includes erasure count.

And S105, if the state of the storage block written with the data fragment on the target storage node returning the response message is a recovery state, not including the erasure count in the response message.

For each response message representing successful write, whether the response message is included in the erasure count is determined according to the state of the storage block of the write data fragment on the target storage node returning the response message. And if the state of the storage block successfully written into the data fragment is a normal state, incorporating the corresponding response message into the erasure count.

And S106, if the state of the storage block written with the data fragment on the target storage node returning the response message is a normal state, incorporating the response message into erasure count.

For the case that the storage block successfully written into the data fragment is in the recovery state, since the recovery result is unknown, if the subsequent storage block fails to be recovered, the data fragment successfully written into the storage block is deleted together with the storage block, and in order to ensure the consistency of the data, the corresponding response message cannot be included in the erasure count.

And S107, after each response message representing successful writing is judged whether to incorporate erasure correction calculation, judging whether the data to be stored is successfully stored according to the erasure correction counting value.

After the above operation is performed on each response message representing successful write, a final erasure count value is obtained, which can be used to determine whether the data to be stored is successfully stored.

And S108, if the erasure count value meets a preset condition, determining that the data to be stored is successfully stored.

The preset condition is that whether the erasure correction count value is larger than n (the preset erasure correction ratio is n: m) or not, and if the erasure correction count value meets the preset condition, it is determined that the data to be stored is successfully stored.

If the erasure count value does not meet the preset condition, returning to the step of determining the target storage node where each storage block is located at the current time according to the plurality of pieces of cache information of the local cache (i.e., step S102) until the data to be stored is successfully stored.

When the client executes the data writing method, the plurality of pieces of cache information are used for determining the target storage node where the storage block corresponding to each data fragment of the data to be stored is located at the current moment, and need to be updated continuously. In view of this, an embodiment of the present invention further provides a cache information updating method, an execution main body of which is the client in fig. 1, which will be described in detail below.

Referring to fig. 6, the cache information updating method includes steps S201 to S202.

S201, sending a preset reporting condition to a plurality of storage nodes at regular time, so that the plurality of storage nodes report the recovery progress of the recovered data block meeting the preset reporting condition to the client.

The preset reporting condition means that the state change time of the data block is greater than the incremental ID, the incremental ID is the time when the client receives the recovery progress of the recovered data block last time, the client issues instructions to all the storage nodes at regular time, and the issued instructions contain the incremental ID.

As a possible implementation, the client may send messages to all storage nodes every second.

It is understood that when the state change time of a data block is greater than the delta ID, it means that the data block may be a newly created recovered data block, may be a data block that has just been successfully recovered, and likewise may be a data block that has just failed to be recovered. And when the state change time of the data block is less than the increment ID, the data block is reported, and the state of the data block is not changed again until the current moment, so that repeated reporting is not needed. If the incremental ID is 0, it means that the recovery progress of all the recovered data blocks needs to be reported.

S202, updating a plurality of pieces of cache information cached locally according to the recovery progress of the recovered data block received each time.

Step S202 will be described in detail below.

The plurality of pieces of cache information of the local cache comprise a plurality of pieces of first information cached in the local cache region and a plurality of pieces of second information cached in the local cache linked list, the plurality of pieces of second information are updated based on the recovery progress of the recovered data block reported by the plurality of storage nodes, and the plurality of pieces of first information are updated based on the update condition of the plurality of pieces of second information.

In contrast, as shown in fig. 7, the flow of steps S202-1 to S202-2 is executed once for each restoration progress of the received restored data block to update the plurality of pieces of second information, and then step S202-3 is executed to update the plurality of pieces of first information.

S202-1, if the recovery progress is in recovery, adding and recording the identification of the recovered data block and second information of the storage node in the local cache linked list.

Each piece of second information in the local cache linked list is used for recording the identification of a data block being restored and the storage node where the data block is located.

It can be understood that, when the recovery progress of the recovered data block received by the client is in recovery, it means that the recovery process of the recovered data block has not ended yet, and therefore, the identifier of the data block and the second information of the storage node where the data block is located need to be added and recorded in the local cache linked list.

S202-2, if the recovery progress is recovery success or recovery failure, deleting the second information recorded with the identification of the recovered data block in the local cache linked list.

The recovery progress of the recovered data block received by the client is recovery success or recovery failure, which means that the recovery process of the recovered data block is finished, and the identifier of the data block and the second information of the storage node where the identifier is recorded in the local cache linked list need to be deleted.

S202-3, updating the plurality of pieces of first information according to the updating condition of the plurality of pieces of second information.

Alternatively, the implementation process of step S202-3 may be as follows:

for second information newly added in a local cache linked list, if a reference first signal with the same identifier of the recorded data block as the identifier of the data block recorded by the second information exists in a local cache region, updating a storage node where the data block recorded by the reference first information is located as a storage node where the data block recorded by the second information is located, and setting the state of the data block recorded by the reference first information as a recovery state;

aiming at the second information which is newly deleted in the local cache linked list, if the recovery progress of the data block recorded by the second information is successful and the reference first information exists in the local cache region, setting the state of the data block recorded by the reference first information as a normal state;

It should be noted that, the storage node may be powered off or restarted due to various situations, and all data blocks on the restarted storage node may be abnormal, and the data fragment cannot be directly written into the data blocks. And when each storage node reports the recovery progress of the recovered data block, reporting the process ID of the node, and judging whether the storage node is powered off or restarted according to whether the process ID changes.

For any storage node, if the process ID reported at the current time is the same as the process ID reported at the previous time, the relevant operation of step S202 is executed directly according to the reported recovery progress of the recovered data block, so as to update the plurality of pieces of second information in the local cache linked list and the plurality of pieces of first information in the local cache area.

If the process ID reported at the current moment is different from the process ID reported at the previous time, the power failure or the restart of the process ID can be judged, all second information related to the storage node in the local cache chain table is deleted, and the state of the data block in all first information related to the storage node in the local cache area is set to be an abnormal state.

And for the recovery progress of the currently reported recovered data block, the relevant operation of step S202 is also executed, so as to update the plurality of pieces of second information in the local cache chain table and the plurality of pieces of first information in the local cache region.

To explain the technical solutions provided by the above method embodiments in more detail, the embodiments of the present invention are further described by using the following specific examples.

As shown in fig. 8, a 4-node environment with an erasure ratio of n: m =2:1 is created, and lun is mapped by ISCSI.

Data block BLK-1 is created on storage node DN-1, data block BLK-2 is created on storage node DN-2, and data block BLK-3 is created on storage node DN-3.

When data is written for the first time, because the data fragments are not written in the BLK-1, the BLK-2 and the BLK-3, the first information recording the identifications, the states and the storage nodes where the BLK-1, the BLK-2 and the BLK-3 are not in the local cache region, and the BLK-1, the BLK-2 and the BLK-3 are not in recovery, the identifications and the second information recording the storage nodes where the BLK-1, the BLK-2 and the BLK-3 are not in the local cache linked list.

The client acquires through the management node: the method comprises the steps that BLK-1 is on DN-1, BLK-2 is on DN-2, and BLK-3 is on DN-3, and three pieces of first information are generated in a local cache area and respectively record the identification and the state of the BLK-1, the BLK-2 and the BLK-3 and the storage node where the storage node is located, wherein the states of the BLK-1, the BLK-2 and the BLK-3 are all normal states.

The client splits the data to be stored into data fragments data-1 and data-2, generates redundant fragment data-3, sends the data-1 to DN-1 for writing BLK-1, sends the data-2 to DN-2 for writing BLK-2, and sends the data-3 to DN-3 for writing BLK-3.

The client acquires that the data-1 is failed to write into the BLK-1 according to the response message returned by the DN-1, acquires that the data-2 is successfully written into the BLK-2 according to the response message returned by the DN-2, and acquires that the data-3 is successfully written into the BLK-3 according to the response message returned by the DN-3.

The client judges that the data to be stored is successfully written based on the fact that the erasure count value obtained by response messages which are returned by DN-2 and DN-3 and represent successful writing is 2> = n, reports the failure condition of writing BLK-1 into data-1 to the management node, and the management node selects the storage node DN-4 to recover the BLK-1.

DN-4 reports the recovery progress of BLK-1 to the client, the client adds a record of the identification of BLK-1 and the second information of the storage node in the local cache linked list, and then modifies the storage node in the local cache area, where BLK-1 is located, from DN-1 to DN-4 in the first information related to BLK-1, the state of BLK-1 is modified from normal state to recovery state.

And when the data is written for the second time, the client acquires that the BLK-1 is on DN-4, the BLK-2 is on DN-2 and the BLK-3 is on DN-3 from the local cache area.

The client splits the data to be stored into data fragments data-1 and data-2, generates redundant fragment data-3, sends the data-1 to DN-4 for writing BLK-1, sends the data-2 to DN-2 for writing BLK-2, and sends the data-3 to DN-3 for writing BLK-3.

The client acquires that the data-1 is successfully written into the BLK-1 according to the response message returned by the DN-4, acquires that the data-2 is unsuccessfully written into the BLK-2 according to the response message returned by the DN-2, and acquires that the data-3 is successfully written into the BLK-3 according to the response message returned by the DN-3.

And the client side obtains an erasure count value of 1<n based on the response message which is returned by DN-4 and DN-3 and represents successful writing, judges that the data to be stored is failed to be written, and then writes the data again.

And the BLK-1 fails to recover on the DN-4, the DN-4 reports the recovery progress of the BLK-1 to the client, the client deletes the second information related to the BLK-1 in the local cache linked list, and then modifies the recorded BLK-1 state in the first information related to the BLK-1 in the local cache area from the recovery state to the abnormal state.

And the management node selects DN-4 again to recover BLK-1, the DN-4 reports the recovery progress of the BLK-1 to the client, the client adds a second message recorded with the identifier of the BLK-1 and the storage node where the BLK-1 is located in the local cache linked list, and then the recorded state of the BLK-1 in the first message related to the BLK-1 in the local cache area is changed from the abnormal state to the recovery state.

And the BLK-1 is successfully recovered on the DN-4, the DN-4 reports the recovery progress of the BLK-1 to the client, the client deletes the second information related to the BLK-1 in the local cache linked list, and then the recorded BLK-1 state in the first information related to the BLK-1 in the local cache area is modified from the recovery state to the normal state.

And when the data is written for the third time, the client acquires that the BLK-1 is in the DN-4, the BLK-2 is in the DN-2 and the BLK-3 is in the DN-3 from the local cache area, and the states are normal states.

The client splits the data to be stored into data fragments data-1 and data-2, generates redundant fragment data-3, sends the data-1 to DN-4 to write BLK-1, sends the data-2 to DN-2 to write BLK-2, and sends the data-3 to DN-3 to write BLK-3.

The client acquires that the data-1 is successfully written into the BLK-1 according to the response message returned by the DN-4, acquires that the data-2 is successfully written into the BLK-2 according to the response message returned by the DN-2, and acquires that the data-3 is successfully written into the BLK-3 according to the response message returned by the DN-3.

And the client judges that the data to be stored is successfully written based on the erasure count value of 3>n obtained by the response message which is returned by DN-1, DN-2 and DN-3 and represents the successful writing.

Compared with the prior art, the embodiment of the invention has the following effects:

(1) Determining a storage node where each data block for storing the data fragments is located at the current moment through a plurality of pieces of cache information which is updated at regular time and cached locally, and sending the data fragments to the storage node where the data blocks are located at the current moment to write data, so that the data fragment writing process is prevented from being influenced by the data block recovery process, and the writing time delay is reduced;

(2) The data block recovery process and the data fragment writing process can be carried out simultaneously without triggering an additional recovery flow;

(3) The updating mechanism of the plurality of pieces of cache information of the local cache enables the data block recovery process or the data fragment writing process into the storage node, and the correctness and the consistency of the data are not influenced by the conditions of network abnormity, power failure or restarting of the storage node and the like;

in order to execute the corresponding steps in the above method embodiments and various possible embodiments, an implementation manner of the data writing apparatus 100 and an implementation manner of the cache information updating apparatus 200 are respectively given below.

Referring to fig. 9, the data writing apparatus 100 applied to the client in fig. 1 may include a splitting module 101, a determining module 102, a first sending module 103, and a determining module 104.

The splitting module 101 is configured to split data to be stored into a plurality of data fragments according to a preset erasure ratio, and determine a storage block corresponding to each data fragment, where the storage block is a data block for storing the data fragments.

The determining module 102 is configured to determine, according to multiple pieces of cache information of the local cache, a target storage node where each storage block is located at a current moment; the plurality of pieces of cache information are updated based on the recovery progress of the recovered data blocks reported by the plurality of storage nodes, the recovered data blocks reported by each storage node all meet preset reporting conditions, and the preset reporting conditions are sent to the plurality of storage nodes by the client at regular time.

The first sending module 103 is configured to send each data fragment to a target storage node where a storage block corresponding to each data fragment is located, so as to write data.

Optionally, the plurality of pieces of cache information include a plurality of pieces of first information cached in the local cache region, each piece of first information records an identifier and a state of one data block and a storage node where the data block is located, the distributed storage system further includes a management node, and the management node is in communication connection with the client; the determining module 102 is specifically configured to, for each storage block, if target first information in which an identifier of the storage block is recorded exists in the first information and a state of a data block recorded in the target first information is a recovery state or a normal state, use a storage node where the data block recorded in the target first information is located as a target storage node where the storage block is located at the current time; if the target first information recorded with the identification of the storage block exists in the first information and the state of the data block recorded in the target first information is an abnormal state, sending a query request to the management node to acquire the target storage node where the storage block is located at the current moment.

Optionally, the plurality of pieces of cache information further include a plurality of pieces of second information cached in the local cache linked list, each piece of second information includes an identifier of one data block and a storage node where the data block is located, the plurality of pieces of second information are updated based on the recovery progress of the recovered data block reported by the plurality of storage nodes, and the plurality of pieces of first information are updated based on the update condition of the plurality of pieces of second information; the determining module 102 is further specifically configured to determine whether a target second information recorded with the identifier of the storage block exists in the plurality of pieces of second information if the target first information recorded with the identifier of the storage block does not exist in the plurality of pieces of first information; if the target second information exists, taking the storage node where the data block recorded by the target second information is located as the target storage node where the storage block is located at the current moment; and if the target second information does not exist, sending a query request to the management node to acquire the target storage node where the storage block is located at the current moment.

The judging module 104 is configured to receive a response message returned by each target storage node, where the response message indicates whether the target storage node successfully writes the data fragment into the storage block corresponding to the data fragment; for each response message representing successful write, if the state of the storage block written with the data fragment on the target storage node returning the response message is a recovery state, not including the response message in erasure count; if the state of the storage block written with the data fragment on the target storage node returning the response message is a normal state, the response message is included in erasure count; after each response message representing successful writing is judged whether to be included in erasure correction calculation, judging whether the data to be stored is successfully stored according to an erasure correction counting value; if the erasure count value meets the preset condition, judging that the data to be stored is successfully stored; and if the erasure count value does not meet the preset condition, returning to the step of determining the target storage node where each storage block is located at the current moment according to the plurality of pieces of cache information of the local cache until the data to be stored is successfully stored.

Referring to fig. 10, the cache information updating apparatus 200 is applied to the client in fig. 1, and may include a second sending module 201 and an updating module 202.

A second sending module 201, configured to send a preset reporting condition to the multiple storage nodes at regular time, so that the multiple storage nodes report the recovery progress of the recovered data block meeting the preset reporting condition to the client.

The updating module 202 is configured to update the plurality of pieces of cache information cached locally according to the recovery progress of the recovered data block received each time. The cache information is used for determining a target storage node where a storage block corresponding to each data fragment of the data to be stored at the current moment is located when the client executes the data writing method, and the storage blocks are data blocks for storing the data fragments.

Optionally, the plurality of pieces of cache information include a plurality of pieces of first information cached in the local cache region and a plurality of pieces of second information cached in the local cache linked list, each piece of second information includes an identifier of one data block and a storage node where the data block is located, the update module 202 is specifically configured to, for a recovery progress of the recovered data block received each time, add, in the local cache linked list, the identifier of the recovered data block and the second information of the storage node where the recovered data block is located if the recovery progress is in recovery; if the recovery progress is recovery success or recovery failure, deleting the second information recorded with the identifier of the recovered data block in the local cache linked list; and updating the plurality of pieces of first information according to the updating condition of the plurality of pieces of second information.

Optionally, the updating module 202 is configured to, when the plurality of pieces of first information are updated according to the update condition of the plurality of pieces of second information, specifically, for the second information newly added in the local cache chain table, if a reference first signal exists in the local cache region, where the identifier of the recorded data block is the same as the identifier of the data block recorded by the second information, update the storage node where the data block recorded by referring to the first information is located as the storage node where the data block recorded by the second information is located, and set the state of the data block recorded by referring to the first information as the recovery state; aiming at the second information which is newly deleted in the local cache linked list, if the recovery progress of the data block recorded by the second information is successful and the reference first information exists in the local cache region, setting the state of the data block recorded by the reference first information as a normal state; and if the recovery progress of the data block recorded by the second information is recovery failure and the reference first information exists in the local cache region, setting the state of the data block recorded by the reference first information as an abnormal state.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the data writing apparatus 100 and the cache information updating apparatus 200 described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Further, an embodiment of the present invention also provides a client 300, which may be the client in fig. 1. Referring to fig. 11, fig. 11 is a schematic block diagram of a structure of a client 300 according to an embodiment of the present invention, where the client 300 may include a memory 310 and a processor 320.

The processor 320 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application-Specific Integrated Circuit (ASIC), or one or more Integrated circuits for controlling the data writing method provided by the above method embodiments and/or the program execution of the cache information updating method.

The MEMory 310 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an Electrically Erasable programmable Read-Only MEMory (EEPROM), a compact disc Read-Only MEMory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 310 may be self-contained and coupled to the processor 320 via a communication bus. The memory 310 may also be integrated with the processor 320. Memory 310 is used to store, among other things, machine-executable instructions for performing aspects of the present application. Processor 320 is operative to execute machine executable instructions stored in memory 310 to implement the method embodiments described above.

Embodiments of the present invention further provide a computer-readable storage medium containing a computer program, where the computer program can be used to execute the data writing method provided in the foregoing method embodiments and/or related operations in the cache information updating method.

To sum up, according to the data writing method, the cache information updating method, and the related apparatus provided in the embodiments of the present invention, first, data to be stored is divided into a plurality of data fragments according to a preset erasure correction ratio, and a storage block corresponding to each data fragment is determined, where the storage block is a data block for storing the data fragments; then, determining a target storage node where each storage block is located at the current moment according to a plurality of pieces of cache information of local cache, wherein the plurality of pieces of cache information are updated based on the recovery progress of the recovered data blocks reported by the plurality of storage nodes, the recovered data blocks reported by each storage node all meet preset reporting conditions, and the preset reporting conditions are sent to the plurality of storage nodes by a client at regular time; and finally, sending each data fragment to a target storage node where the storage block corresponding to each data fragment is located, so as to write data. In the embodiment of the invention, the client determines the storage node of each data block for storing the data fragment at the current moment through the plurality of pieces of cache information which are updated at regular time and cached locally, and then sends the data fragment to the storage node of the data block at the current moment to write data, so that the data fragment writing process is prevented from being influenced by the data block recovery process, and the writing time delay is reduced.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A data writing method, applied to a client in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes, the client is communicatively connected to the plurality of storage nodes, and each storage node has at least one data block created thereon, and the method includes:

2. The method according to claim 1, wherein the pieces of cache information include pieces of first information cached in a local cache area, each piece of the first information records an identification and a status of one of the data blocks and a storage node where the data block is located, and the distributed storage system further includes a management node, and the management node is in communication connection with the client;

if the target first information recorded with the identification of the storage block exists in the first information and the state of the data block recorded in the target first information is an abnormal state, sending a query request to the management node to acquire a target storage node where the storage block is located at the current moment.

3. The method according to claim 2, wherein the pieces of cache information further include pieces of second information cached in a local cache linked list, each piece of the second information includes an identifier of the data block and a storage node where the data block is located, the pieces of second information are updated based on the recovery progress of the recovered data block reported by the plurality of storage nodes, and the pieces of first information are updated based on the update condition of the pieces of second information;

4. The method of claim 1, wherein the method further comprises:

for each response message representing successful write, if the state of the storage block written in the data fragment on the target storage node returning the response message is a recovery state, not including the response message in erasure count;

if the state of the storage block written in the data fragment on the target storage node returning the response message is a normal state, incorporating the response message into erasure correction count;

5. A cache information updating method is applied to a client in a distributed storage system, the distributed storage system further comprises a plurality of storage nodes, at least one data block is created on each storage node, and the client is in communication connection with the plurality of storage nodes, and the method comprises the following steps:

the pieces of cache information are used for determining, when the client executes the data writing method according to any one of claims 1 to 4, a target storage node where a storage block corresponding to each data slice of the data to be stored is located at the current time, where the storage blocks are data blocks used for storing the data slices.

6. The method of claim 5, wherein the pieces of cache information include a plurality of pieces of first information cached in a local cache region and a plurality of pieces of second information cached in a local cache chain table, each piece of the second information includes an identifier of a data block and a storage node where the data block is located;

7. The method of claim 6, wherein each piece of the first information records an identification and a status of the data block and a storage node where the data block is located;

the step of updating the plurality of pieces of first information according to the updating condition of the plurality of pieces of second information comprises the following steps:

8. A data writing apparatus, applied to a client in a distributed storage system, the distributed storage system further including a plurality of storage nodes, the client being communicatively connected to the plurality of storage nodes, each storage node having at least one data block created thereon, the apparatus comprising:

9. A cache information updating apparatus, applied to a client in a distributed storage system, where the distributed storage system further includes a plurality of storage nodes, each of the storage nodes has at least one data block created thereon, and the client is communicatively connected to the plurality of storage nodes, and the apparatus includes:

10. A client, characterized by comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements a data writing method according to any one of claims 1 to 4 and/or a cache information updating method according to any one of claims 5 to 7.

11. A distributed storage system comprising a management node, a plurality of storage nodes, and the client of claim 10.

12. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the data writing method according to any one of claims 1 to 4, and/or the cache information updating method according to any one of claims 5 to 7.