WO2023241350A1

WO2023241350A1 - Data processing method and device, data access end, and storage medium

Info

Publication number: WO2023241350A1
Application number: PCT/CN2023/097128
Authority: WO
Inventors: 赵旭东; 易曌平
Original assignee: 重庆紫光华山智安科技有限公司
Priority date: 2022-06-17
Filing date: 2023-05-30
Publication date: 2023-12-21
Also published as: CN114968668A

Abstract

A data processing method and device, a data access end, and a storage medium provided in embodiments of the present application, relating to the field of distributed storage. The method comprises: first receiving a data write request (S101), the data write request comprising data to be stored; then processing said data to obtain a plurality of data blocks (S102), a serial number being distributed to each data block; and next, for each data block, determining a target node from a plurality of storage nodes according to the serial number of each data block, and sending the data block and a data version number of said data to the target node, such that the target node stores the data block and the data version number into a first space and a second space of the target node, respectively (S103), wherein the serial number of each data block and the serial number of the target node satisfy a preset mapping relationship, and the first space is located in front of the second space, such that the situation that the data version numbers in the storage nodes are consistent, but the storage of the data block fails is avoided, the data access end can recover in time the data block which fails to be stored, and the data consistency is ensured.

Description

Data processing method, device, data access terminal and storage medium

Cross-references to related applications

This application requests the priority of the Chinese patent application with application number 202210692395.0 and titled "Data processing method, device, data access terminal and storage medium" submitted to the State Intellectual Property Office of China on June 17, 2022, and its entire contents incorporated herein by reference.

Technical field

This application relates to the field of distributed storage, specifically, to a data processing method, device, data access terminal and storage medium.

Background technique

In the era of big data, with the explosive growth of massive data, distributed storage is increasingly used. Distributed storage means that the data access end divides the data to be stored into multiple data strips, encodes the multiple data strips through the erasure coding algorithm, obtains redundant verification data, and then separates each data strip into and verification data are stored on multiple storage nodes. When there is data that failed to be stored, the data that failed to be stored can be restored through the data that has been successfully stored.

In existing distributed storage technology, bitmaps are generally used to record the storage locations of data strips, and read and write operations of data strips are performed by finding storage locations. Bitmaps are usually recorded by metadata and combined with data strips. are stored together in each storage node. When any storage node is powered off abnormally, metadata may be successfully stored at that node, but data stripe storage may fail. At this time, because the data version numbers in the metadata between this node and other nodes are consistent. , the data access end will not restore the data strips that failed to be stored at the node, and the consistency of the data cannot be guaranteed.

Contents of the invention

In order to overcome the deficiencies of the existing technology, embodiments of the present application provide a data processing method, device, data access terminal and storage medium, which can avoid the occurrence of consistent data version numbers in each storage node and storage failure in data strips situation, so that the data access end can promptly recover the failed data strips to ensure data consistency.

The embodiment of this application can be implemented as follows:

In a first aspect, this application provides a data processing method, which is applied to the data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes, and each storage node is provided with a number. Each storage node is communicatively connected with the data access terminal, and the method includes:

Receive a data write request, the data write request includes data to be stored;

Process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;

For each data block, determine the target node from the multiple storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node stores the data block and the data version number respectively into the first space and the second space of the target node, where , the serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space.

In an optional implementation, before processing the data to be stored to obtain multiple data blocks, the method further includes:

If there is one data write request, use the current timestamp as the data version number of the data to be stored;

If there are multiple data write requests, multiple auto-increment operations are performed on the current timestamp, and the result of each auto-increment operation is used as one according to the reception time of each data write request. Describe the data version number of the data to be stored in the data write request.

In an optional implementation, the step of processing the data to be stored to obtain multiple data blocks includes:

Divide the data to be stored into multiple data strips according to a preset length;

Each preset number of the data strips is formed into an original data block to obtain multiple first data blocks;

Erasure coding is performed on the plurality of original data blocks to obtain a plurality of verification data blocks. The plurality of data blocks include the plurality of original data blocks and the plurality of verification data blocks.

In optional implementations, the method further includes:

Receive a data read request, the data read request includes the writing node sequence of the data to be read;

According to the writing node sequence, read the target data version number from the second space of each storage node;

If all the target data version numbers are consistent, read the target data block from the first space of each storage node according to the writing node sequence;

The data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.

In optional implementations, the method further includes:

If there are inconsistent target data version numbers, multiple storage nodes are divided into normal nodes and abnormal nodes according to each target data version number, where the target data version numbers corresponding to all normal nodes are equal. Consistent, the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes;

According to the writing node sequence, read the target data block from the first space of each normal node;

According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node;

In an optional implementation, the step of generating the data to be read according to the preset mapping relationship and all the target data blocks includes:

Allocate a sequence number to each target data block according to the preset mapping relationship and the number of each storage node;

Sort all the target data blocks according to the sequence number of each target data block to obtain the data to be read.

In optional implementations, the method further includes:

Determine the target area in the first space of each abnormal node according to the writing node order and the preset size;

For each abnormal node, use the target data block corresponding to the abnormal node to cover the content of the target area of the abnormal node.

In a second aspect, this application provides a data processing device applied to a data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes, and each storage node is provided with a number. Each storage node is communicatively connected with the data access terminal, and the method includes:

A receiving module, configured to receive a data write request, where the data write request includes data to be stored;

A processing module, used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;

A sending module, configured to determine, for each data block, a target node from the plurality of storage nodes according to the sequence number of the data block, and combine the data block with the data version number of the data to be stored. Sent to the target node, so that the target node stores the data block and the data version number in the first space and the second space of the target node respectively, where the sequence number of the data block is the same as the data version number. The number of the target node satisfies a preset mapping relationship, and the first space is located before the second space.

In a third aspect, the present application provides a data access terminal, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements any one of the preceding embodiments. Data processing methods.

In a fourth aspect, the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the data processing method as described in any one of the preceding embodiments is implemented.

Compared with the existing technology, embodiments of the present application provide a data processing method, device, data access terminal and storage medium. First, a data write request is received, and the data write request includes data to be stored; then, the data to be stored is processed. After processing, multiple data blocks are obtained, each data block is assigned a sequence number; then, for each data block, the target node is determined from multiple storage nodes according to the sequence number of the data block, and the data block is combined with the data to be stored The data version number is sent to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block and the number of the target node satisfy the preset mapping relationship , the first space is before the second space. Since the embodiment of the present application sends each data block to the target node together with the data version number of the data to be stored, the target node stores the data block and data version number in its first space and second space respectively, and the target node The first space is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly recover the failed data blocks to ensure data consistency. .

Description of the drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application and therefore do not It should be regarded as a limitation of the scope. For those of ordinary skill in the art, other relevant drawings can be obtained based on these drawings without exerting creative efforts.

Figure 1 is a schematic structural diagram of a distributed storage system provided by an embodiment of the present application;

Figure 2 is a schematic diagram of the distributed storage process provided by the embodiment of the present application;

Figure 3 is a schematic flow chart of the data processing method provided by the embodiment of the present application;

Figure 4 is a schematic diagram of the data write request response process provided by the embodiment of the present application;

Figure 5 is a schematic flow chart of the implementation of step S102 provided by the embodiment of the present application;

Figure 6 is another schematic flow chart of the data processing method provided by the embodiment of the present application;

Figure 7 is a schematic diagram of the data read request response process provided by the embodiment of the present application;

Figure 8 is another schematic diagram of the data read request response process provided by the embodiment of the present application;

Figure 9 is an example of the data write request response process provided by the embodiment of the present application;

Figure 10 is an example of the data read request response process provided by the embodiment of the present application;

Figure 11 is another example of the data read request response process provided by the embodiment of the present application;

Figure 12 is a schematic structural block diagram of a data access terminal provided by an embodiment of the present application;

Figure 13 is a functional unit block diagram of the data processing device provided by the embodiment of the present application.

Icon: 300-data access terminal; 310-memory; 320-processor; 400-data processing device; 401-receiving module; 402-processing module; 403-sending module.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments These are part of the embodiments of this application, but not all of them. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

Accordingly, the following detailed description of the embodiments of the application provided in the appended drawings is not intended to limit the scope of the claimed application, but rather to represent selected embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

It should be noted that similar reference numerals and letters represent similar items in the following figures, therefore, once an item is defined in one figure, it does not need further definition and explanation in subsequent figures.

In addition, if the terms "first", "second", etc. appear, they are only used to distinguish the description and cannot be understood as indicating or implying relative importance.

It should be noted that, as long as there is no conflict, the features in the embodiments of the present application can be combined with each other.

In the era of big data, with the explosive growth of massive data, distributed storage is increasingly used. Distributed storage requires the storage system to have high availability and high performance. However, when traditional file systems are applied to distributed storage, most of them have problems such as poor read and write performance and low reliability. At the same time, when the number of nodes is large, it also poses a great challenge to ensuring data consistency.

The current industry response methods include allocating location information to data for data reading and writing, but this method will cause data fragmentation, and random reading and writing of data will lead to poor storage system performance; or the metadata of the data will be changed. Store in multiple storage nodes and copy copies, while verifying and repairing metadata to ensure data consistency. However, the disadvantage of this method is that the copy copy will occupy excess space, and the checksum repair is also time-consuming and consumes the system. Performance; or when reading and writing data, determine whether the data version numbers of each copy of the data are consistent. If they are inconsistent, select the most complete copy for replacement. However, this method cannot accurately determine the data version of each copy when the node is offline or data abnormality occurs. Whether the numbers are consistent will lead to data inconsistency; erasure coding technology and multi-copy technology can be optimized to improve distributed data storage performance. However, when the amount of data is large, algorithm optimization cannot save greater performance.

In other words, the existing distributed storage technology has not yet solved the problem of how to ensure data consistency while improving the read and write performance of the distributed storage system.

In view of this, embodiments of the present application provide a data processing method, which will be introduced in detail below.

Please refer to Figure 1. Figure 1 is a schematic diagram of the results of a distributed storage system provided by an embodiment of the present application. The distributed storage system includes a data access terminal and multiple storage nodes. The data access terminal communicates with each storage node.

The data access end can interact with upper-layer applications or external hosts and receive data write requests sent by upper-layer applications or external hosts. As shown in Figure 2, the data access end divides the data into n original data blocks and m verification data blocks, while generating a data version number for each original data block or verification data block, and finally writing each original data block or verification data block and the corresponding data version number to each storage node at the same time . When a read data request is made, the data access end reads the data version number of the data block from the storage node for comparison. If the data version number of the data block is consistent, the data block is read in response to the data read request. If If the data version numbers of the read data blocks are inconsistent, the data access end will restore the original data blocks and the verification data blocks through erasure calculation and then read them. The data access end can be a server, a personal computer (hereinafter referred to as PC), a laptop, etc. The data access end can also be one or more program modules on a device, or a virtual machine or virtual machine running on a device. A container or client can also be a cluster composed of multiple devices, for example, it can be a collective name for multiple program modules distributed on multiple devices.

The storage node can store original data blocks and/or verification data blocks from the data access terminal. The storage node can be a server, PC, laptop, etc. The storage node can be a physical storage node or a logical storage node obtained by dividing the physical storage node.

Please refer to Figure 3. Figure 3 shows a flow of the data processing method provided by the embodiment of the present application. The data processing method includes steps S101 to S103, and the execution subject is the data access terminal in Figure 1.

S101, receive a data write request.

The data write request includes data to be stored, and the data write request may be sent to the data access terminal by an upper-layer application or an external host.

S102, process the data to be stored and obtain multiple data blocks.

Among them, after receiving the data write request, the data access end divides the data to be stored with length L into n original data blocks, and then generates m verification data blocks through the erasure ratio, and n original data Each block is assigned a sequence number, and the value range of the sequence number is [1, n]. The m check data blocks are also assigned a sequence number, and the value range of the sequence number is [1, m] (see Figure 4).

S103. For each data block, determine the target node from multiple storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node combines the data block and the data version number. The data version numbers are stored in the first space and the second space of the target node respectively.

The serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space. As shown in Figure 4, storage nodes are divided into data nodes and check nodes, which are used to store original data blocks and check data blocks respectively. The total number of data nodes is n, and the total number of check nodes is m, and Each data node is set with a number, and the range of the number value is [1, n]. Each check node is also set with a number, and the range of the number value is [1, m].

The preset mapping relationship includes the correspondence between the sequence numbers of n original data blocks and the numbers of n data nodes, and the correspondence between the sequence numbers of m check data blocks and the numbers of m check nodes. For each original data block or check data block, the target node is determined from n data nodes or m check nodes according to its sequence number and preset mapping relationship, and then the original data block or check data block is summed The data version number of the data to be stored generated based on the current timestamp is sent to the target node together. For example, the target node of the original data block 1 is data node 1, and the target node of the verification data block m is the verification node m (see Figure 4 ).

For each data node or check node, its disk space is divided into a first space and a second space, and the first space is located before the second space. When writing the data block and data version number, the data block must be placed on the disk before the data version number, thereby avoiding the situation where the data version number is successfully stored but the data block storage fails, so that the data access end can promptly respond to the storage failure. Data blocks are restored to ensure data consistency.

The beneficial effect of the above method provided by the embodiment of the present application is that by sending each data block together with the data version number of the data to be stored to the target node, the target node stores the data block and the data version number respectively in its first space and the second space, and the first space in the target node is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly respond to the storage failure. Data blocks are restored to ensure data consistency.

Since in the existing method, the data version number is generated by each storage node, when a storage node is abnormally powered off, in addition to If the data version number is stored successfully, but the data block storage fails, or the data version number is stored incorrectly, but the data block is stored successfully, this will cause the data access end to process the successfully stored data blocks in each storage node. Recovery consumes system performance. In this regard, before executing step S102, the embodiment of the present application also provides an implementation method of generating a data version number by the data access terminal, which will be introduced in detail below.

In this embodiment of the present application, when the data access terminal generates the data version number of the data to be stored, the following two situations may occur.

Case 1: If there is one data write request, use the current timestamp as the data version number of the data to be stored.

Among them, when the data access terminal receives one data write request, the system timestamp of the current distributed storage system can be directly used as the data version number of the data to be stored in the data write request.

Case 2: If there are multiple data write requests, perform multiple auto-increment operations on the current timestamp, and use the result of each auto-increment operation as one data write request according to the reception time of each data write request. The data version number of the data to be stored.

Among them, when the data access end receives multiple data write requests, the system timestamp of the current distributed storage system is used as the initial value to perform multiple self-increment operations. The number of self-increment operations is the total number of data write requests. number, and according to the reception time of each data write request, the result of each auto-increment operation is used as the data version number of the data to be stored in a data write request, thereby obtaining the data to be stored in each data write request. The data version number of the data.

The above-mentioned data version number generation method also provided by the embodiment of the present application can realize unified management of data version numbers, and ensure the consistency of data through the data version number.

Step S102 is introduced in detail below.

Please refer to Figure 5. Figure 5 shows a flow of implementation of step S102 provided by the embodiment of the present application. Step S102 includes sub-steps S102-1 to S102-3.

S102-1: Divide the data to be stored into multiple data strips according to the preset length.

The length of each data strip is a preset length. As shown in Figure 4, the preset length is x, and the data to be stored with length L is divided into L/x data strips.

S102-2: Combine each preset number of data strips into an original data block to obtain multiple original data blocks.

The preset number is determined by the preset length, the number of data nodes, and the length of the data to be stored. As shown in Figure 4, the length of the data to be stored is L, the preset length is x, and the number of data nodes is n, so the preset number is L/nx. Understandably, according to the cutting order of the data strips, each L/nx data strips form an original data block, and a total of n original data blocks are obtained.

S102-3: Perform erasure coding on multiple original data blocks to obtain multiple verification data blocks.

Among them, as shown in Figure 4, erasure operation is performed on n original data blocks to obtain m verification data blocks. Understandably, each verification data block includes L/nx data strips, and each data The length of the strip is x.

As shown in Figure 4, each storage node is allocated a fixed and continuous first space and a second space for the data block and the data version number of the data to be stored. The method of reading and writing data in a data block is changed to random reading and writing. Change to sequential reading and writing, which improves the reading and writing performance of the distributed storage system.

After introducing the process of the data access terminal processing data write requests, the following will introduce in detail the process of the data access terminal processing data read requests.

Please refer to FIG. 6 , which shows another flow of the data processing method provided by the embodiment of the present application. The data processing method includes steps S201 to S207.

S201, receive a data read request.

Among them, the data read request includes the order of writing nodes of the data to be read. The data to be read includes the original data blocks stored in multiple data nodes by the data access end through processing the data write request. The order of writing nodes refers to the original data. The position order of the blocks in the first space of the storage node.

As shown in Figure 4, the data access terminal processed data write request 1, data write request 2,..., data write request k in order of reception time, among which data was written in the first space of data node 1 in sequence. Original data block 1 corresponding to write request 1, original data block 1 corresponding to data write request 2,..., original data block 1 corresponding to data write request k. Similarly, data is written in the second space of data node 1 in sequence. The data version number of the data to be stored in write request 1, the data version number of the data to be stored in data write request 2,..., the data version number of the data to be stored in data write request k, understandably, the first of data node n The original data block n corresponding to data write request 1, the original data block n corresponding to data write request 2,..., the original data block n corresponding to data write request k are sequentially written in the space, and the second space of data node n is sequentially written. The data version number of the data to be stored in data write request 1, the data version number of the data to be stored in data write request 2,..., the data version number of the data to be stored in data write request k are written. If the writing node order of the data to be read is 2, the data to be read is composed of the second original data block in data node 1 to data node n.

S201: Read the target data version number from the second space of each storage node according to the order of writing nodes.

Wherein, multiple data version numbers are stored in the second space of each storage node, and the size of the space occupied by each data version number is the same. For each storage node, the target area in the second space is determined according to the order of writing nodes and the space occupied by the data version number, and the content read from the target area is used as the target data version number.

For example, if the order of writing nodes is 4, and the space size occupied by each data version number is 8B, then the target area in the second space of each storage node is calculated to be the 24th B to the 32nd B, and the target area will be calculated from each storage node. The content read in the target area in the second space of the node is used as the target data version number corresponding to the storage node.

S203, if all target data version numbers are consistent, read the target data block from the first space of each storage node according to the order of writing nodes.

Among them, when the target data version numbers read from the second space of each storage node are consistent, it means that the data blocks used to form the data to be read at each storage node are successfully stored. Similarly, the first empty of each storage node Multiple data blocks are stored in the space, and each data block occupies the same size of space. For each storage node, the target area in the first space is determined according to the writing node order and the space occupied by the data block. The content read from the target area is used as the target data block, and then all target data blocks are combined to obtain the target data block. Read data.

For example, if the writing node order is 4 and the space size occupied by each data block is 128K, then the target area within the first space of each storage node is calculated to be the 384Kth to 512Kth, and the target area will be calculated from each storage node. The content read in the target area in the first space is used as the target data block corresponding to the storage node.

S204, if there are inconsistent target data version numbers, divide the multiple storage nodes into normal nodes and abnormal nodes according to each target data version number.

Among them, when there are inconsistent target data version numbers in the target data version numbers read from the second space of each storage node, it means that the data blocks used to form the data to be read at some storage nodes are stored Failure. At this time, multiple storage nodes are divided into two categories: normal nodes and abnormal nodes based on each target data version number. Since the data blocks used to form the data to be read at the normal nodes are successfully stored, and the data blocks used to form the data to be read at the abnormal nodes are failed to be stored, it is understandable that the target data version numbers corresponding to all normal nodes are consistent, the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes.

S205: Read the target data block from the first space of each normal node according to the order of writing nodes.

Wherein, the same as step 203 above, multiple data blocks are stored in the first space of each normal point, and the size of the space occupied by each data block is the same. For each normal node, the target area in the first space is determined according to the writing node order and the space occupied by the data block, and the content read from the target area is used as the target data block.

S206: According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node.

Among them, by performing erasure calculation on the target data blocks corresponding to all normal nodes, the target data blocks corresponding to each abnormal node are restored.

S207: Generate data to be read according to the preset mapping relationship and all target data blocks to respond to the data read request.

Among them, as shown in Figure 7, when the target data version numbers read from the second space of each storage node are consistent, the target data blocks read from the first space of all data nodes can be used to generate to-be-read Get data. As shown in Figure 8, when there is an inconsistent target data version number in the target data version number read from the second space of each storage node, all data nodes (which may be normal nodes or abnormal ones) can be used node) to generate the data to be read.

The implementation process of step S207 is as follows:

First, assign a sequence number to each target data block based on the preset mapping relationship and the number of each storage node;

Then, sort all the target data blocks according to the sequence number of each target data block to obtain the data to be read.

Understandably, since the data blocks used to compose the data to be read stored at the abnormal node failed to be stored, in order to facilitate subsequent data reading, the target data block corresponding to the abnormal node obtained by the recovery processing needs to be rewritten. enter, The detailed implementation process is as follows:

First, determine the target area in the first space of each abnormal node according to the writing node order and preset size;

Then, for each abnormal node, the target data block corresponding to the abnormal node is used to cover the content of the target area of the abnormal node.

In order to introduce the aforementioned data processing method more clearly, the embodiment of this application assumes that the number of storage nodes in the distributed storage system is 3 (2 data nodes, 1 check node), and the erasure ratio is 2:1 for illustration. .

As shown in Figure 9, the data access end receives the data write request sent by the upper-layer application or external host, and splits the data to be stored with a total length of 256K in the data write request into 2 original data blocks and 1 according to the erasure ratio. A check data block, each original data block or check data block includes 32 data strips, and the length of each data strip is 4K. According to the current timestamp of the distributed storage system, the data version number of the data to be stored is 164961834.

According to the preset mapping relationship, the original data block with serial number 1 and the data version number are written into the first space and the second space of the data node numbered 1 respectively, and the original data block with serial number 2 and the data version number are written into In the first space and second space of the data node numbered 2, write the verification data block and data version number numbered 1 into the first space and second space of the verification node numbered 1 in response to the data write ask.

Each data node consists of multiple 64MB first spaces and 128KB second spaces. Multiple data write requests continuously store multiple original data blocks and corresponding data version numbers on the disk of the data node. Similarly, each check node is also composed of multiple first spaces of 64MB size and second space of 128KB size. Multiple data write requests continuously store multiple check blocks and corresponding data version numbers in the data node. on disk.

The data access end receives a data read request sent by an upper-layer application or an external host. According to the order of writing nodes of the data to be read in the data read request, the target is read in the second space of 2 data nodes and 1 check node. Data version number and compare.

As shown in Figure 10, if the version numbers of the target data read from the second space of 2 data nodes and 1 check node are consistent and the erasure ratio is met, then the order of writing nodes to be read is from 2 Read the target data block from the first space of each data node, and then combine all the target data blocks according to the preset mapping relationship and the number of the data node to obtain the data to be read in response to the data read request.

As shown in Figure 11, if the target data version number read from the second space of data node 1 is inconsistent with the target data version number read from the second space of data node 1, it will be inconsistent with the target data version number read from the second space of check node 1. The target data version numbers read in the two spaces are consistent. When the erasure ratio is met, the data access end first reads the data from the first space of data node 1 and check node 1 according to the order of the writing nodes of the data to be read. Read the target data block and recover the target data block corresponding to data node 2 through erasure calculation. After completing the recovery process, the data access end combines the target data blocks corresponding to data node 1 and data node 2 into data to be read according to the preset mapping relationship and the number of the data node to respond to the data read request.

Furthermore, this embodiment of the present application also provides a schematic structural block diagram of the data access terminal 300. Please refer to FIG. 12. The data access terminal 300 may include a memory 310 and a processor 320.

Among them, the processor 320 can be a general central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), or one or more for controlling the implementation of the above method. Examples provide data processing methods for program execution on integrated circuits.

The memory 310 can be ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or it can be an electrically erasable programmable read-only memory (Electrically Erasable Programmabler) -Only MEMory, EEPROM), Compactdisc Read-Only MEMory, CD-ROM or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage device, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, without limitation. The memory 310 may exist independently and be connected to the processor 320 through a communication bus. Memory 310 may also be integrated with processor 320. Among them, the memory 310 is used to store machine-executable instructions for executing the solution of the present application. The processor 320 is configured to execute machine-executable instructions stored in the memory 310 to implement the above method embodiments.

Embodiments of the present application also provide a computer-readable storage medium containing a computer program. When executed, the computer program can be used to perform relevant operations in the data processing method provided by the above-mentioned method embodiments.

Please refer to FIG. 13 , which is a functional unit block diagram of the data processing device 400 provided by an embodiment of the present application. The data processing device 400 is applied to the data access terminal 300 and may include a receiving module 401, a processing module 402, and a sending module 403. Among them, the receiving module 401, the processing module 402, and the sending module 403 can all be stored in the memory or computer-readable storage medium in the form of software. It should be noted that the basic principles and technical effects of the data processing device 400 provided in the embodiments of the present application are the same as those of the above-mentioned embodiments. For the sake of brief description, they are not mentioned in the embodiments of the present application.

The receiving module 401 is used to receive a data write request, where the data write request includes data to be stored.

The processing module 402 is used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number.

The sending module 403 is used to determine the target node from multiple storage nodes according to the sequence number of the data block for each data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node The data block and the data version number are respectively stored in the first space and the second space of the target node, where the serial number of the data block and the number of the target node satisfy a preset mapping relationship, and the first space is located before the second space.

In one implementation, the processing module 402 is also configured to use the current timestamp as the data version number of the data to be stored if there is one data write request; if there are multiple data write requests, perform multiple times on the current timestamp. Auto-increment operation, and according to the reception time of each data write request, the result of each auto-increment operation is used as the data version number of the data to be stored in a data write request.

In one implementation, the processing module 402 is specifically configured to divide the data to be stored into multiple data strips according to a preset length; each preset number of data strips is composed into an original data block to obtain multiple checksums. Data block; perform erasure coding on multiple original data blocks to obtain multiple verification data blocks. The multiple data blocks include multiple original data blocks and multiple verification data blocks.

In one implementation, the receiving module 401 is also configured to receive a data read request, which includes the writing node sequence of the data to be read; the processing module 402 is also configured to obtain the data from each storage node according to the writing node sequence. Read the target data version number from the second space; if all target data version numbers are consistent, read the target data block from the first space of each storage node according to the order of writing nodes; according to the preset mapping relationship and all The target data block generates data to be read in response to the data read request.

In one implementation, the processing module 402 is also configured to divide multiple storage nodes into normal nodes and abnormal nodes according to each target data version number if there are inconsistent target data version numbers, wherein all normal nodes correspond to The target data version numbers are all consistent, and the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes; according to the order of writing nodes, read the target data block from the first space of each normal node ; According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node; according to the preset mapping relationship and all target data blocks, generate data to be read in response to the data read request.

In one implementation, the processing module 402 is specifically configured to assign a sequence number to each target data block according to the preset mapping relationship and the number of each storage node; and to sort all target data blocks according to the sequence number of each target data block. , get the data to be read.

In one implementation, the processing module 402 is also used to determine the target area in the first space of each abnormal node according to the writing node order and the preset size; for each abnormal node, use the target data corresponding to the abnormal node block, covering the contents of the target area of the abnormal node.

Embodiments of the present application provide a data processing method, device, data access terminal and storage medium. First, a data write request is received, and the data write request includes data to be stored; then, the data to be stored is processed to obtain multiple data blocks. , each data block is assigned a serial number; then, for each data block, the target node is determined from multiple storage nodes according to the serial number of the data block, and the data block and the data version number of the data to be stored are sent to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block and the number of the target node satisfy the preset mapping relationship, and the first space is located in the second space. before space. Since the embodiment of the present application sends each data block to the target node together with the data version number of the data to be stored, the target node stores the data block and data version number in its first space and second space respectively, and the target node The first space is located before the second space, thereby avoiding the situation where the data version numbers in each storage node are consistent and the data block storage fails, so that the data access end can promptly recover the failed data blocks to ensure data consistency. .

The above are only specific implementation modes of the present application, but the protection scope of the present application is not limited thereto. Anyone familiar with the art Changes or substitutions that can be easily imagined by those skilled in the art within the technical scope disclosed in this application should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A data processing method, characterized in that it is applied to the data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes. Each of the storage nodes is provided with a number, and each storage node is provided with a number. The storage nodes are all communicatively connected to the data access terminal, and the method includes:

Receive a data write request, the data write request includes data to be stored;

Process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;

For each data block, determine a target node from the plurality of storage nodes according to the sequence number of the data block, and send the data block and the data version number of the data to be stored to the target node, so that the target node stores the data block and the data version number into the first space and the second space of the target node respectively, where the serial number of the data block is the same as the number of the target node. Satisfying the preset mapping relationship, the first space is located before the second space.
The method of claim 1, wherein before processing the data to be stored to obtain a plurality of data blocks, the method further includes:

If there is one data write request, use the current timestamp as the data version number of the data to be stored;

If there are multiple data write requests, multiple auto-increment operations are performed on the current timestamp, and the result of each auto-increment operation is used as one according to the reception time of each data write request. Describe the data version number of the data to be stored in the data write request.
The method of claim 1, wherein the step of processing the data to be stored to obtain a plurality of data blocks includes:

Divide the data to be stored into multiple data strips according to a preset length;

Each preset number of the data strips is formed into an original data block to obtain multiple original data blocks;

Erasure coding is performed on the plurality of original data blocks to obtain a plurality of verification data blocks. The plurality of data blocks include the plurality of original data blocks and the plurality of verification data blocks.
The method of claim 1, further comprising:

Receive a data read request, the data read request includes the writing node sequence of the data to be read;

According to the writing node sequence, read the target data version number from the second space of each storage node;

If all the target data version numbers are consistent, read the target data block from the first space of each storage node according to the writing node sequence;

The data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
The method of claim 4, further comprising:

If there are inconsistent target data version numbers, multiple storage nodes are divided into normal nodes and abnormal nodes according to each target data version number, where the target data version numbers corresponding to all normal nodes are equal. Consistent, the target data version number corresponding to each abnormal node is inconsistent with the target data version number corresponding to all normal nodes;

According to the writing node sequence, read the target data block from the first space of each normal node;

According to the target data block corresponding to each normal node, restore the target data block corresponding to each abnormal node;

The data to be read is generated according to the preset mapping relationship and all the target data blocks in response to the data read request.
The method of claim 4 or 5, wherein the step of generating the data to be read according to the preset mapping relationship and all the target data blocks includes:

Allocate a sequence number to each target data block according to the preset mapping relationship and the number of each storage node;

Sort all the target data blocks according to the sequence number of each target data block to obtain the data to be read.
The method of claim 5, further comprising:

Determine the target area in the first space of each abnormal node according to the writing node order and the preset size;

For each abnormal node, use the target data block corresponding to the abnormal node to cover the content of the target area of the abnormal node.
A data processing device, characterized in that it is applied to a data access terminal in a distributed storage system. The distributed storage system also includes a plurality of storage nodes. Each storage node is provided with a number, and each storage node is provided with a number. The storage nodes are all communicatively connected to the data access terminal, and the method includes:

A receiving module, configured to receive a data write request, where the data write request includes data to be stored;

A processing module, used to process the data to be stored to obtain multiple data blocks, each of which is assigned a sequence number;

A sending module, configured to determine, for each data block, a target node from the plurality of storage nodes according to the sequence number of the data block, and combine the data block with the data version number of the data to be stored. Sent to the target node, so that the target node stores the data block and the data version number in the first space and the second space of the target node respectively, where the sequence number of the data block is the same as the data version number. The number of the target node satisfies a preset mapping relationship, and the first space is located before the second space.
A data access terminal, characterized in that it includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the data according to any one of claims 1-7 is realized. Approach.
A computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the data processing method according to any one of claims 1-7 is implemented.