CN116149576A

CN116149576A - Method and system for reconstructing disk redundant array oriented to server non-perception calculation

Info

Publication number: CN116149576A
Application number: CN202310426602.2A
Authority: CN
Inventors: 金鑫; 刘譞哲; 舒俊宜; 马郓; 黄罡
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-05-23
Anticipated expiration: 2043-04-20
Also published as: CN116149576B

Abstract

The embodiment of the application discloses a server-oriented non-perception computing disk redundant array reconstruction method and a system, wherein the data reconstruction operation is completed through a small amount of communication interaction among related storage servers through a tree-shaped communication topological structure, and communication overhead of a client can be shared at a storage server side, so that the limitation of network card bandwidth of the client to the data reconstruction performance of the disk redundant array is reduced, nodes for data reduction are reasonably selected according to available bandwidth, network congestion can be avoided, and the data reconstruction performance of the disk redundant array is further optimized.

Description

Method and system for reconstructing disk redundant array oriented to server non-perception calculation

Technical Field

The application relates to the technical field of cloud computing, in particular to a server-oriented non-perception computing disk redundant array reconstruction method and a server-oriented non-perception computing disk redundant array reconstruction system.

Background

When one or more disks fail, the redundant array of disks requires the data to be reconstructed first when the data is read. Aiming at the problem of data reconstruction performance optimization of a redundant array of magnetic disk, some research works at present mainly aim at a single-machine redundant array of magnetic disk and mainly aim at the characteristic of a memory, and a system designed and realized by the research works presumes that a controller and a magnetic disk are communicated through a bus, but the problem which needs to be solved by a distributed redundant array of magnetic disk running on a separated storage is not considered.

Compared with the method that a disk redundant array is formed by adopting local storage (namely, each local disk of a storage server), the method has the advantages that the data reconstruction performance of the disk redundant array formed by adopting separated storage is extremely limited due to the reasons of network card bandwidth and the like, and the speed of the server for non-perception calculation of read data under extreme conditions is severely restricted.

Disclosure of Invention

The embodiment of the application aims to provide a server-oriented non-perception computing disk redundant array reconstruction method and system, which can optimize the data reconstruction performance of the disk redundant array.

In order to solve the above technical problems, in a first aspect, an embodiment of the present application provides a method for reconstructing a redundant array of inexpensive disks for server-oriented non-aware computing, where the method includes:

the method comprises the steps that a client receives a target reading request and determines target data required to be read by the target reading request;

the client determines each disk corresponding to the target data from a target disk redundant array, wherein the target disk redundant array is composed of disks of a plurality of storage servers, and the client communicates with each storage server and each two storage servers through point-to-point network communication connection;

The client judges whether a fault disk exists in each disk or not;

under the condition that a fault disk exists in each disk, the client determines each first storage server and each second storage server from the storage servers of the rest disks except the fault disk in the target disk redundant array according to the available bandwidths of the plurality of storage servers;

the client generates a tree-shaped communication topological structure taking any second storage server as a root node according to the first storage servers and the second storage servers;

the client side respectively sends first requests to the first storage servers and the second storage servers according to the tree-shaped communication topological structure;

the first storage servers receiving the first request read data indicated by the first request from the disk of each first storage server, and send the data to the second storage server indicated by the first request;

the second storage servers except the root node in the second storage servers receive the first request, reduce the received data and send the reduced data to the second storage server indicated by the first request;

And the second storage server which receives the first request and serves as the root node rebuilds a target data block corresponding to the fault disk in the target data according to the first request and the received data.

In a second aspect, an embodiment of the present application further provides a server-oriented non-aware computing redundant array of disk rebuilding system, where the system includes a client and a plurality of storage servers, and the client and each storage server, and each two storage servers communicate through a peer-to-peer network communication connection, where:

the client is used for receiving a target reading request and determining target data required to be read by the target reading request;

the client is further configured to determine each disk corresponding to the target data from a target redundant array of disks, where the target redundant array of disks is formed by disks of a plurality of storage servers;

the client is further used for judging whether a fault disk exists in each disk;

the client is further configured to determine, according to the available bandwidths of the plurality of storage servers, each first storage server and each second storage server from storage servers where remaining disks except the failed disk are located in the target redundant array of disks, when the failed disk exists in the respective disks;

The client is further configured to generate a tree-like communication topology structure with any one of the second storage servers as a root node according to the first storage servers and the second storage servers;

the client is further configured to send a first request to each first storage server and each second storage server according to the tree communication topology structure;

the first storage servers receiving the first request are used for reading data indicated by the first request from the disk of the first storage servers and sending the data to the second storage server indicated by the first request;

the second storage servers except the root node in the second storage servers receive the first request and are used for reducing received data and sending the reduced data to the second storage server indicated by the first request;

and the second storage server which receives the first request and serves as the root node is used for reconstructing a target data block corresponding to the fault disk in the target data according to the first request and the received data.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the server-oriented non-aware computing redundant array reconstruction method according to the first aspect.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium having stored thereon a computer program/instruction which, when executed by a processor, implements a method for reconstructing a redundant array of disks for server-oriented unaware computing according to the first aspect.

In a fifth aspect, embodiments of the present application further provide a computer program product, including a computer program/instruction, which when executed by a processor implements the server-oriented non-aware computing disk redundancy array reconstruction method according to the first aspect.

According to the technical scheme, the data reconstruction operation is completed through a small amount of communication interaction among the related storage servers through the tree communication topological structure, and the communication overhead of the client can be shared at the storage server, so that the limitation of the network card bandwidth of the client on the data reconstruction performance of the redundant array of the disk is reduced, nodes for data reduction are reasonably selected according to the available bandwidth, network congestion can be avoided, and the data reconstruction performance of the redundant array of the disk is further optimized.

Drawings

For a clearer description of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an implementation of a method for reconstructing a RAID (redundant array of independent disks) for server-oriented non-aware computing according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a server-oriented split-type redundant array of inexpensive disk data reconstruction system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a data reconstruction process of a split redundant array of inexpensive disks for server-oriented non-aware computing according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a redundant array of inexpensive disks reconstruction system for server-oriented non-aware computing according to an embodiment of the present application;

fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments herein without making any inventive effort are intended to fall within the scope of the present application.

The terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

The redundant array of inexpensive disks is a key technology for building a reliable and high-performance storage system on a server and a personal computer, and main stream operating systems such as Windows, linux all support forming a disk array from a plurality of disks as required for users to use. When a single disk can not meet the requirements of users, the users can select disk arrays of different types to build disk redundant arrays with different sizes, bandwidths and redundancies as bottom layer block devices of storage systems such as databases, file systems and the like, and better support is provided for read-write requests of upper-layer applications.

Meanwhile, application of server non-perception calculation becomes an important development trend in data centers, particularly cloud data centers. The conventional server needs to include a central processing unit, a graphics processor, a hard disk memory, a solid state memory, and the like, which are required for computation. With the increasing of the network bandwidth and the decreasing of the delay of the data center, the calculation and the storage are decoupled through the network, so that different types of resources can be independently expanded into one resource for use. The server unaware computing can flexibly schedule and use computing resources, and the separated storage architecture provides flexible bottom storage for the server unaware computing.

Combining a redundant array of disks with separate storage can bring additional gain to server unaware computing. When all disks are located on the same storage server, the redundant array of disks is no longer available anyway when the storage server itself fails, although the redundant array of disks may guarantee that the redundant array of disks is still available when one or both disks fail. The use of multiple separate storage components located on different storage servers can effectively overcome this problem.

However, when one or more disks fail, the redundant array of disks requires reconstruction of the data as it is read. Compared with the method that the local storage is adopted to form the redundant array of the magnetic disk, the method has extremely limited data reconstruction performance by adopting the separate storage to form the redundant array of the magnetic disk, and severely restricts the speed of the server for non-perception calculation of read data under extreme conditions. The redundant array of inexpensive disks controller running on the cpu typically communicates via a bus on the motherboard when accessing the local disk, with the bus communication having a very high bandwidth. Whereas access to separate storage must be through a network card and a data center network. The bandwidth consumed by data reconstruction is proportional to the number of disks compared to normal reads where the consumed bandwidth is constant. For example, when 1 redundant array of disks including 9 data disks is used for data reconstruction, the bandwidth is consumed by 9 times compared with normal reading, and a larger burden is caused to the network card, so that network congestion and delay rise are caused.

Aiming at the problem of data reconstruction performance optimization of a redundant array of magnetic disk, some existing research works are mainly directed to a single-machine redundant array of magnetic disk, and are mainly optimized aiming at the characteristics of a memory, but the problem to be solved by a distributed redundant array of magnetic disk running on a separated storage is not considered. The systems designed and implemented by these research works assume that the bandwidth does not become a bottleneck restricting data transfer by the communication between the controller and the disk via the bus. In practical application scenarios, the bandwidth of the network card is still very limited, so it is critical to improve performance to reduce the bandwidth occupied by the redundant array of magnetic disk as much as possible (especially the bandwidth occupied by the network card to the client).

Aiming at the problems in the related art, the application provides a network tree topology structure of a server-oriented non-perception computing separated disk redundant array, the communication overhead of a client is shared at a storage server, and a bandwidth-aware data reconstruction technology is provided, and nodes for data reduction are reasonably selected to avoid network congestion.

The following describes in detail the method for reconstructing a disk redundant array oriented to server non-aware computing provided in the embodiment of the present application through some embodiments and application scenarios thereof with reference to the accompanying drawings.

In a first aspect, referring to fig. 1, a flowchart of an implementation of a method for reconstructing a server-oriented non-aware computing redundant array of a disk according to an embodiment of the present application is shown, where the method may include the following steps:

step S101: the client receives a target reading request and determines target data required to be read by the target reading request.

In particular implementations, a client receives a target read request from a user, the target read request describing a storage location of data to be read in a target redundant array of disks, and the client determines the target data to be read according to the storage location.

It may be understood that the target data may be all data in the storage location, or may be part of the data in the storage location, where, for all data read into the storage location, the client may correspondingly perform one or more rounds of target data reading operations with respect to the target read request.

Step S102: and the client determines each disk corresponding to the target data from the target disk redundant array.

The target disk redundant array is composed of disks of a plurality of storage servers, and the client and each storage server and each two storage servers are communicated through point-to-point network communication connection.

In specific implementation, the client side analyzes the target reading request to determine each disk where target data to be read are located. For example, the client may obtain a data interval in which the target data is stored in the target redundant array of disks by analyzing the target read request, and then the client may determine each member disk related to the data interval as each disk corresponding to the target data.

Step S103: and the client judges whether the fault disk exists in each disk.

It is understood that the failed disk described in this application refers to: at least a disk that normally does not provide a related data reading function for the target data.

Step S104: and under the condition that the fault disk exists in each disk, the client determines each first storage server and each second storage server from the storage servers of the rest disks except the fault disk in the target disk redundant array according to the available bandwidths of the storage servers.

It will be appreciated that, when the client receives the target read request and the execution of the target read request involves a failed disk, the client triggers data reconstruction to recover the data (i.e., the target data block) that needs to be read in the failed disk. If the execution of the target read request does not involve a failed disk, the client will perform the data read operation normally.

In the implementation, the client selects a first storage server and a second storage server from the storage servers where the normal disk is located according to the available bandwidth of the storage servers, so as to execute the related operation of data reconstruction.

For example, the client may control the storage server with the extremely small available bandwidth not to participate in the data reconstruction related operation (i.e., not to serve as the first storage server and the second storage server), and may further select the storage server with the relatively large available bandwidth as the second storage server that needs to receive the data of the other storage servers, and/or select the storage server with the relatively small available bandwidth as the first storage server that only sends the data of the client.

Step S105: and the client generates a tree-shaped communication topological structure taking any second storage server as a root node according to the first storage servers and the second storage servers.

In the implementation, the client may randomly generate, according to the peer-to-peer network communication connection between the client and the storage server and between the storage server and the storage server, a tree communication topology structure using any one of the second storage servers as a root node by using the determined first storage server and the determined second storage server as nodes and through a preset probability, where the tree communication topology structure describes a data forwarding path between each node and the root node.

Step S106: and the client side respectively sends first requests to the first storage servers and the second storage servers according to the tree communication topological structure.

In the implementation, the client controls the first storage server and the second storage server to transmit data according to the data forwarding path described by the tree-shaped communication topological structure through the first request so as to collect all data related to data reconstruction to the root node.

Step S107: and the first storage servers receiving the first request read the data indicated by the first request from the disk of each first storage server, and send the data to the second storage server indicated by the first request.

In implementation, the client may determine, according to the tree communication topology, a forwarding node (i.e., the second storage server) required for the data of the current first storage server. The client may then generate a first request carrying a data interval of the current first storage server required read data, and information of the data required forwarding node.

The second storage server then reduces the received data under the direction of the first request and sends the reduced data to the next second storage server or root node in the corresponding data forwarding path.

Step S108: and the second storage servers except the root node receive the first request, reduce the received data and send the reduced data to the second storage server indicated by the first request.

In implementation, the client may determine, according to the tree communication topology, which nodes (e.g., the first storage server, other second storage servers, etc.) the current second storage server needs to reduce and forward data, and the next node to which the data needs to be forwarded. The client may then generate a first request carrying a data interval for the current second storage server to read data, information for the current second storage server to reduce the data, and information for the next node to forward the data.

The second storage server then reduces the received data under the instruction of the first request, that is, performs a reduction operation on the data read from the own disk and the data of all the nodes to be received, or performs a reduction operation on only the data of all the nodes to be received, so as to obtain reduced data, and then sends the reduced data to the next node (the second storage server or the root node) in the corresponding data forwarding path. The reduction operation refers to simplifying the data on the premise of keeping the original appearance of the data as much as possible.

Step S109: and the second storage server which receives the first request and serves as the root node rebuilds a target data block corresponding to the fault disk in the target data according to the first request and the received data.

It can be understood that all data related to data reconstruction are summarized to the root node through the tree communication topological structure so as to enable the root node to reconstruct the data, so that communication overhead generated by the client side when the client side reads data required by data reconstruction from each related storage server respectively can be avoided, and therefore partial communication overhead of the client side is shared at the storage server side, and network card bandwidth occupation of the client side in the data reconstruction process can be effectively reduced. And when a plurality of second storage servers are selected, each second storage server reduces the data of other storage servers and then forwards the reduced data, so that the network card burden of the root node can be further reduced.

The above technical scheme is further described below with reference to fig. 2. As shown in fig. 2, the present application provides a server-oriented, non-aware computing split-disk redundancy array data reconstruction system, which mainly includes a client and a storage server associated with a target disk redundancy array. The modules and functions thereof included in the client and storage server are described below.

Universal module

In this embodiment, the client and storage server comprise at least three general purpose modules: the system comprises a data reconstruction protocol expansion module, a network communication module and a data reconstruction module.

For the data reconstruction protocol expansion module, the application expands the data reconstruction operation oF a nonvolatile memory (NVMe-oF, NVMe over Fabrics) protocol based on the architecture, and a plurality oF instruction codes and fields applicable to the target disk redundant array are newly added for sending instructions to the storage servers by the client and sending instructions between the storage servers. In terms of instructions, a first instruction (AlsoRead instruction) is added for notifying the storage server of performing both rebuilding and normal reading (i.e., instructing the storage server to perform data rebuilding-related operations and data reading-related operations), and a second instruction (NoRead instruction) is added for notifying the storage server of performing only rebuilding and not normal reading (i.e., instructing the storage server to perform data rebuilding-related operations only). In terms of fields, an interval length (fwd-length) field and an interval offset (fwd-offset) field are added to describe a data interval to be forwarded, a next-dest field is added to represent a next node (such as a storage server or a target device for sending data such as a client) to be forwarded, and a wait-num field is added to represent a data quantity to be reduced.

Through the data reconstruction protocol expansion module, the client and the storage server can finish the generation and analysis oF the data reconstruction related request according to the NVMe-oF protocol and the newly added instruction and field, thereby achieving the purpose that the client and the storage server can communicate and operate according to the protocol. The application of the next-dest field and the wait-num field will be described in detail below in the description of the tree topology generating module, and the application of the data interval related field and the first instruction and the second instruction will be described in detail below in the description of the multipath data transmitting module, which will not be described in detail herein.

And the network communication module is used for establishing point-to-point network communication connection between the client and the storage server through the network card and between the storage server and the storage server, and sending messages and transmitting data through the connection.

Optionally, the network communication module establishes a point-to-point network communication connection between the client and each storage server, and between each two storage servers, through a network card supporting remote direct memory access (RDMA, remote Direct Memory Access). In this embodiment, the present application adopts a remote direct memory access network technology, which can reduce the consumption of a central processor in the data transmission process, and can increase the transmission rate.

Illustratively, the system, at startup, first starts the storage server side. During the storage server side startup process, the network communication module establishes a reliable connection between every two storage servers. When a client is started, the network communication module establishes a reliable connection between the client and each storage server. When the redundant array of inexpensive disks operates, the network communication module is mainly responsible for sending messages and transmitting data. When a message is assembled according to the agreed protocol, the network communication module sends the message to the target server in a bilateral communication mode. When the client transmits data to the storage server, the storage server actively adopts unilateral communication to pull the data. When the storage server transmits data to the client, the storage server actively adopts unilateral communication to transmit the data to the client. Accordingly, resource usage by clients may be minimized.

And the data reconstruction module is used for accelerating the data reconstruction process of recovering the complete data from the partial data through the X86 instruction set of the central processing unit. Specifically, the data reconstruction module directly invokes a special instruction of the central processing unit configured by the computer from the user state software library, so that the characteristics of an x86 instruction set can be fully utilized, and related operation can be greatly improved in throughput and delay, thereby accelerating operations such as exclusive-OR operation, galois field multiplication operation and the like required by data reconstruction by utilizing the characteristics of the central processing unit.

(II) client

In this embodiment, the client also contains at least three unique modules: the system comprises a user reading request response module, a data reconstruction strategy planning module and a tree topology generating module.

Unique module 1: and the user reading request response module is used for receiving a target reading request from a user and judging whether the data reconstruction is needed to be entered according to whether the disk fails or not.

In this embodiment, after receiving a read request from a user, the user read request response module may first divide the data segments requested to be read according to an organization manner (such as data striping, data mirroring, etc.) of the redundant array of magnetic disks, obtain each data block to be read, and determine whether a magnetic disk is in a failure state, and if the magnetic disk is in a failure state, the user read request response module triggers the data reconstruction policy planning module to execute a data reconstruction operation.

As a possible implementation manner, in the case that the target read request corresponds to a plurality of stripes in the target redundant array of magnetic disks, the user read request response module may determine a data segment that needs to be read by the target read request (i.e., all data that needs to be read by the request), and then divide the data segment into data sub-segments, where one data sub-segment corresponds to one stripe in the target redundant array of magnetic disks, and the same offset of different magnetic disks in the redundant array of magnetic disks logically divides the stripes. The user reading request response module then determines any data sub-segment as target data, and executes the data writing related operation and the data reconstruction related operation provided by the application, so that the striped data writing and data reconstruction process is realized.

It will be appreciated that the user read request response module provides a unique interface to the target redundant array of disks, i.e., reads data from the redundant array of disks at the granularity of data blocks. When receiving a read request, the user read request response module determines whether the target redundant array of disks is in a failure state, if so, the user read request response module submits a request (a target read request or a read request for each target data) to the data reconstruction policy planning module, and begins to poll whether the requests are processed. After all back-end requests are processed, the user request response module informs the user that the target read request is successfully processed in an asynchronous notification mode.

Unique module 2: and the data reconstruction strategy planning module is used for determining a data reconstruction strategy according to the request reading interval and the states of the magnetic disks. Specifically, the data reconstruction policy planning module selects an appropriate data reconstruction policy according to the number of failed nodes and the condition of the nodes requesting coverage, where the reconstruction policy at least includes: determining which storage servers participate in data reconstruction according to the number of fault nodes, and selecting a node (namely a second storage server) for data reduction according to the available bandwidth of each storage server; and determining which storage servers need to participate in normal data reading according to the node condition of the request coverage.

In particular implementations, the data reconstruction policy planning module is triggered only if a read request is received and there is a failed disk. The data reconstruction policy planning module first determines whether a currently received read request (target read request or read request for certain target data) involves a failed disk. For example, if the disk on which any data block that needs to be read by the read request is a failed disk, it may be determined that the read request relates to the failed disk.

If the read request does not involve a failed disk, normal read logic is executed, i.e., the storage server to which the read request pertains performs a normal data read. If the read request involves a failed disk, then the data reconstruction policy planning module then determines the storage servers involved in the data reconstruction based on the number of failed disks involved (i.e., the number of failed nodes) indicating that the storage server involved in the read request needs to perform a data reconstruction to complete the data read.

For example, partial data with the least data quantity or the least involved storage servers can be selected to participate in data reconstruction, each storage server where the partial data is located is determined to be the storage server participating in data reconstruction, and the storage server with the highest available bandwidth is selected as a data reduction server (i.e. a second storage server) according to the available bandwidth of each storage server, so that all other storage servers (i.e. the first storage server) are controlled to send all the results read from the disk of the storage server to the data reduction server, and the data reduction server can complete data reconstruction. It can be understood that the data reconstruction policy can be implemented by converting the tree topology generation module into an actual tree communication topology structure; the multiple storage servers associated with the target redundant array of inexpensive disks may send their own available bandwidth to the client in a set period.

As a possible implementation manner, the data reconstruction policy planning module may determine each third storage server with an available bandwidth greater than a set value from the storage servers where the remaining disks except the failed disk are located in the target disk redundancy array, then determine each second storage server from each third storage server, for example, select one or more third storage servers where data required for reconstructing current data are located as the second storage servers, and use the storage server where other data required for reconstructing current data are located as the first storage server, and then establish a tree-like communication topology structure through the tree-like topology generating module based on the selected first storage server and the second storage server, thereby realizing a bandwidth-aware data reconstruction policy planning algorithm, balancing network loads of the storage servers, and avoiding network congestion.

Unique module 3: the tree topology generation module is used for generating a tree communication topology structure according to the data reconstruction strategy determined by the data reconstruction strategy planning module, packaging the tree communication topology structure in a structured mode, and calling the data reconstruction protocol expansion module to generate a corresponding request.

In the implementation, the tree topology generation module can randomly generate a tree communication topology structure meeting the requirements according to the data reconstruction strategy determined by the data reconstruction strategy planning module through the pre-planned probability, for example, the tree communication topology structure is generated based on a random tree generation algorithm of breadth-first search by setting the random range of the child nodes through the pre-planned probability. It will be appreciated that the root node and child nodes of the tree communication topology are the second storage servers and the leaf nodes are the first storage servers. For example, in the case where the data reconstruction policy planning module selects only one second storage server, the tree communication topology is composed of one root node and leaf nodes directly connected to the root node.

After the tree topology generation module generates the tree communication topology, for any leaf node or child node, the next node (child node or root node) whose read result or reduce result needs to be sent can be determined. It will be appreciated that the next node to be sent for the root node's reduction result (i.e., reconstructed data) is the client. The tree topology generation module then calls a data reconstruction protocol expansion module, encapsulates the information of the next node by using a next-dest field, and sends a first request carrying the encapsulated information of the next node to the corresponding node so as to control the node to send a reduction result or a reading result of a disk of the node to the corresponding next node.

For any child node or root node, the tree topology generation module can determine the number of nodes required for data reception, and further determine the data quantity required for reduction of the child node or root node. The tree topology generation module then invokes the data reconstruction protocol expansion module, encapsulates the data quantity of the required reduction by using the wait-num field, and sends a first request carrying encapsulation information of the data quantity of the required reduction to a corresponding node so as to control the node to generate a reduction result of the data quantity conforming to the required reduction.

(III) storage server side

In this embodiment, the storage server further includes at least two unique modules: the system comprises a disk reading module and a multipath data transmission module.

Unique module 1: and the disk reading module is used for directly accessing the disk drive, sending a reading request to the disk and polling the state of the disk reading request, and simultaneously carrying out necessary data caching.

In an implementation, the disk reading module may reserve a section of memory as a buffer for storing the read data when executing the read request. After the read request is executed, the disk read module recovers the memory and prepares for the next execution of the read request.

Unique module 2: and the multipath data transmission module is used for transmitting the normally read data and the reconstructed data in different paths, so that network overhead is reduced.

In this embodiment, it is considered that a user's read request typically involves both a failed disk and a normal disk. That is, the same storage server may need to read two portions of data, one for reconstruction of the failed disk data and one for normal reading. The two parts are often overlapped, so that when the storage server reads corresponding data from the disk, the storage server can perform reading once as a request, so that the client can also only send a request (i.e. the first request) to control the storage server to perform corresponding disk reading operation, but when the storage server returns data, the multi-path data transmission module needs to directly transmit the part of data to the client to achieve normal reading and needs to transmit the part of data to the next node to participate in data reconstruction because the normal reading data can be directly used without reduction.

The present application introduces a first instruction and a second instruction to trigger the multi-path data transmission module to transmit the data (i.e. the read result or the reduced result) of the storage server in different paths.

In the implementation, the client sends a first request carrying a first instruction (i.e. AlsoRead instruction) to a first storage server and a second storage server where each disk corresponding to the target data is located according to the tree communication topological structure.

The first storage server or the second storage server (e.g., server 2 shown in fig. 3) that receives the first instruction transmits the data (e.g., data D2 shown in fig. 3) indicated by the first request read from its disk to the next node (e.g., reduction node shown in fig. 3) through the multi-path data transmission module, and simultaneously transmits the read result to the client through the multi-path data transmission module. The server 2 performs the data reconstruction-related operation and the data reading-related operation at the same time.

It should be noted that, when the data used for reconstructing the failed disk data and the data used for normal reading are the same, the client may generate a first request carrying the first instruction and the information of the data interval of the same data through the data reconstruction protocol expansion module, where the information of the data interval may be encapsulated by using the fwd-length field and the fwd-offset field; under the condition that the data used for reconstructing the fault disk data and the data used for normally reading are different, the client can generate a first request carrying the first instruction, the data interval oF the data used for reconstructing the fault disk data and the information oF the data interval oF the data used for normally reading through the data reconstruction protocol expansion module, can also generate a first request carrying the second instruction (namely the NoRead instruction) and the information oF the data interval oF the data used for reconstructing the fault disk data through the data reconstruction protocol expansion module, and generate a request oF the normal reading data through the NVMe-oF protocol, and send the two requests to a corresponding storage server through the network communication module.

Meanwhile, the client side also sends first requests carrying second instructions to the rest of the first storage servers and the second storage servers (such as the server 1 and the reduction node shown in fig. 3). The server 1 is a first storage server, and after receiving the second instruction, the server sends data (such as data D1 shown in fig. 3) indicated by the first request, which is read from a disk of the server by the server, to a reduction node only through a multi-path data transmission module; the reduction node is a second storage server serving as a root node, after receiving the second instruction, performs a reduction operation (such as an exclusive or operation) on the received data D1 and D2 and the data P indicated by the first request read from the own disk, so as to obtain a reconstructed target data block D3 (that is, the data required to be read in the failed disk of the server 3 shown in fig. 3), and returns the target data block D3 to the client only through the multipath data transmission module, where the server 1 and the reduction node only perform a data reconstruction related operation.

After storing all the data (i.e. data D2 and data D3) to be read into its own read buffer, the client (or the client controller) can return the result of successful execution of the read request to the user.

Based on the above embodiment, the present application proposes a tree-like redundant array of disk data reconstruction network topology, which allocates network communication overhead of a client, and proposes a bandwidth-aware data reconstruction policy planning algorithm, which balances network load of a storage server, thereby optimizing data reconstruction performance of a separate redundant array of disk.

In a second aspect, an embodiment of the present application provides a server-oriented non-aware computing redundant array of independent disks rebuilding system, where the system includes a client and a plurality of storage servers (servers 1 to n shown in fig. 4), and the client and each storage server, and each two storage servers communicate through a peer-to-peer network communication connection, where:

Optionally, the information carried by the first request includes at least one of:

a data interval;

the amount of data to be reduced;

target equipment for transmitting data;

a first instruction for instructing the storage server to perform a data reconstruction-related operation and a data reading-related operation, or a second instruction for instructing the storage server to perform a data reconstruction-related operation.

Optionally, the data interval is described by an interval length field and an interval offset field carried by the first request.

Optionally, the client is further configured to send, according to the tree communication topology structure, a first request carrying the first instruction to a first storage server and a second storage server where each disk corresponding to the target data is located, and send, respectively, a first request carrying the second instruction to the remaining storage servers in each first storage server and each second storage server;

the second storage server serving as the root node is further configured to send the reconstructed target data block to the client;

the first storage server or the second storage server receiving the first instruction is further configured to perform the following steps:

and sending the data indicated by the first request read from the disk of the client to the client.

Optionally, in the case that the target read request corresponds to a plurality of stripes in the target redundant array of disks, the client is further configured to receive the target read request, and determine a data segment that needs to be read by the target read request; dividing the data segment into data sub-segments, wherein one data sub-segment corresponds to one stripe in the target redundant array of inexpensive disks; and determining any data subfragment as the target data.

Optionally, the client is further configured to determine, from storage servers where remaining disks except the failed disk in the target redundant array of inexpensive disks are located, each third storage server having an available bandwidth greater than a set value;

the client is further configured to determine, from the third storage servers, the second storage servers.

Optionally, the plurality of storage servers are further configured to send the available bandwidth of the storage servers to the client according to a set period.

Optionally, the client and each storage server, and each two storage servers are connected by a network card supporting remote direct memory access to establish a point-to-point network communication connection.

Optionally, the second storage server reconstructs the target data block through an X86 instruction set of a central processor configured itself.

The embodiment of the application also provides an electronic device, and referring to fig. 5, fig. 5 is a schematic diagram of the electronic device according to the embodiment of the application. As shown in fig. 5, the electronic device 100 includes: the memory 110 and the processor 120 are connected through bus communication, and a computer program is stored in the memory 110 and can run on the processor 120, so that the steps in the server-oriented non-aware computing disk redundant array reconstruction method disclosed by the embodiment of the application are realized.

The embodiment of the application also provides a computer readable storage medium, on which a computer program/instruction is stored, which when executed by a processor, implements the server-oriented non-aware computing disk redundant array reconstruction method disclosed in the embodiment of the application.

The embodiment of the application also provides a computer program product, which comprises a computer program/instruction, wherein the computer program/instruction realizes the server-oriented unaware computing disk redundant array reconstruction method disclosed by the embodiment of the application when being executed by a processor.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, systems, devices, storage media, and program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The above description is made in detail on a method and a system for reconstructing a disk redundant array of server-oriented non-aware computing provided in the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the description of the above examples is only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A server-oriented non-perception computing disk redundant array reconstruction method is characterized by comprising the following steps:

The client judges whether a fault disk exists in each disk or not;

2. The method of claim 1, wherein the information carried by the first request comprises at least one of:

a data interval;

the amount of data to be reduced;

target equipment for transmitting data;

3. The method of claim 2, wherein the client sending the first request to each of the first storage servers and each of the second storage servers according to the tree communication topology, respectively, comprises:

the client sends a first request carrying the first instruction to a first storage server and a second storage server where each disk corresponding to the target data is located according to the tree communication topological structure, and sends first requests carrying the second instruction to the rest of storage servers in each first storage server and each second storage server respectively;

The method further comprises the steps of:

the second storage server serving as the root node sends the reconstructed target data block to the client;

the first storage server or the second storage server receiving the first instruction further performs the following steps:

4. The method of claim 2, wherein the data interval is described by an interval length field and an interval offset field carried by the first request.

5. The method of claim 1, wherein, in the case where the target read request corresponds to a plurality of stripes in the target redundant array of disks, the client receives a target read request, and determining target data to be read by the target read request comprises:

the client receives the target reading request and determines a data fragment required to be read by the target reading request;

the client divides the data segment into data sub-segments, and one data sub-segment corresponds to one stripe in the target redundant array of inexpensive disks;

The client determines any one of the data subfragments as the target data.

6. The method of claim 1, wherein each second storage server is determined by:

the client determines each third storage server with the available bandwidth larger than a set value from the storage servers where the rest disks except the fault disk are located in the target disk redundant array;

the client determines the second storage servers from the third storage servers.

7. The method according to any one of claims 1-6, further comprising:

and the plurality of storage servers send the available bandwidth of the storage servers to the client according to a set period.

8. The method of any of claims 1-6, wherein a point-to-point network communication connection is established between the client and each of the storage servers, and between each of the two storage servers, through a network card supporting remote direct memory access.

9. The method of any of claims 1-6, wherein the second storage server rebuilds the target data block via an X86 instruction set of a self-configured central processor.

10. A server-oriented, non-aware computing redundant array of inexpensive disks reconstruction system, comprising a client and a plurality of storage servers, wherein the client and each storage server, communicate via a point-to-point network communication connection, wherein: