CN116048878A - Business service recovery method, device and computer equipment - Google Patents

Business service recovery method, device and computer equipment Download PDF

Info

Publication number
CN116048878A
CN116048878A CN202211733663.5A CN202211733663A CN116048878A CN 116048878 A CN116048878 A CN 116048878A CN 202211733663 A CN202211733663 A CN 202211733663A CN 116048878 A CN116048878 A CN 116048878A
Authority
CN
China
Prior art keywords
node
target
logical
written
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211733663.5A
Other languages
Chinese (zh)
Inventor
刘立黎
韩勇
吴瑞强
陈超
韩晓薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongke Shuguang Storage Technology Co ltd
Original Assignee
Tianjin Zhongke Shuguang Storage Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongke Shuguang Storage Technology Co ltd filed Critical Tianjin Zhongke Shuguang Storage Technology Co ltd
Priority to CN202211733663.5A priority Critical patent/CN116048878A/en
Publication of CN116048878A publication Critical patent/CN116048878A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques

Abstract

The application relates to a business service restoration method, a business service restoration device, computer equipment, a storage medium and a computer program product. The method comprises the following steps: and under the condition that the physical node receives a service recovery instruction which is sent by the management node and aims at a target logical node in the fault physical node, the physical node creates a new logical node with the same service attribute as the target logical node. And taking the logical volumes as units, reading target data to be written of each logical volume contained in the target logical node, and sending the target data to be written to a storage pool. And under the condition that the data to be written in any target logical volume corresponding to the target logical node is determined to be written in, receiving a data writing request aiming at the target logical volume through the new logical node, and completing business processing service aiming at the target logical volume. By adopting the method, the business service recovery efficiency can be improved.

Description

Business service recovery method, device and computer equipment
Technical Field
The present invention relates to the field of distributed storage technologies, and in particular, to a business service recovery method, apparatus, and computer device.
Background
In some large distributed storage systems, there are many physical nodes, on which at least one logical node is deployed, and many services are accepted on each logical node. When the physical node where the logical node is located fails, it must be ensured that all data on the logical node is recovered to provide service, and at this time, the service can be recovered, so that service interruption time is caused.
Specifically, after the logical node deployed by the failed physical node migrates to other normal physical nodes, after the data that is not dropped (i.e., all the data to be written in the cache) during the failure needs to be sent to the storage pool, the logical node can continue to receive the write request.
The service recovery method has slower efficiency, so that service interruption time is longer, and service recovery efficiency is slow.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a service restoration method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve service restoration efficiency.
In a first aspect, the present application provides a business service recovery method. The method is applied to physical nodes in a distributed system, the distributed system further comprises management nodes and other physical nodes, the physical nodes are provided with at least one logic node, and the logic nodes correspond to at least one logic volume; the method comprises the following steps:
under the condition that a service recovery instruction which is sent by the management node and aims at a target logic node in the fault physical node is received, creating a new logic node with the same service attribute as the target logic node;
Reading target data to be written of each logic volume contained in the target logic node by taking the logic volume as a unit, and sending the target data to be written to a storage pool;
and under the condition that the fact that the target to-be-written data of any target logical volume corresponding to the target logical node is written is determined to be completed, the new logical node receives a data writing request aiming at the target logical volume, and business processing service aiming at the target logical volume is completed.
The logical volumes are divided on the basis of the logical nodes, and the data recovery is carried out by the logical volumes, and the service interruption time is reduced because the service is firstly completed by the logical volumes with the data recovery instead of waiting for the service recovery after the data recovery of all the logical volumes.
In one embodiment, the creating a new logical node with the same service attribute as the target logical node in the failed physical node under the condition of receiving the service recovery instruction sent by the management node, includes:
under the condition that a service recovery instruction which is sent by the management node and aims at a target logic node in the fault physical node is received, determining resource configuration information and connection configuration information of the target logic node;
And constructing a new logic node, and carrying out configuration processing on the new logic node according to the resource configuration information and the connection configuration information.
After receiving the service restoration instruction, the physical node creates a logic node with the same attribute as the target logic node to accept the service of the target logic node.
In one embodiment, the reading, by using the logical volumes as units, target data to be written of each logical volume included in the target logical node, and sending the target data to be written to a storage pool includes:
traversing all logic volumes contained in the target logic volumes, sequentially reading target data to be written corresponding to the cached logic volumes according to a preset sequence, and sending the target data to be written to a storage pool.
And reading the data to be written according to the cache sequence, and preventing data errors caused by the error of the writing sequence of the data to be written.
In one embodiment, the method further comprises:
after receiving a data writing request through a logic node, determining data to be written carried by the data writing request and a logic volume identifier of a logic volume to be written;
caching the data to be written into a cache space corresponding to the logical volume identifier through a logical node, and synchronizing the data to be written and the logical volume identifier to other physical nodes;
And reading data to be written cached in a cache space corresponding to the logical volume identifier through a logical node, sending the data to be written to a storage pool, and synchronizing the sent data to be written to other physical nodes.
And each time the physical node stores one data, synchronizing the data to other physical nodes, so that the other physical nodes can recover the service accepted by the physical nodes according to the synchronized data after the physical node fails.
In a second aspect, the present application also provides a distributed system, the system comprising a management node and a plurality of physical nodes, the physical nodes having at least one logical node deployed, the logical nodes corresponding to at least one logical volume, wherein:
the management node is used for determining a target physical node backed up with the data cached by the failed physical node under the condition that any physical node fails;
the target physical node is used for creating a new logical node with the same service attribute as the target logical node under the condition that a service recovery instruction of the target logical node deployed for the fault physical node, which is sent by the management node, is received; reading data to be written of each logic volume synchronized with the target logic node by taking the logic volume as a unit, and sending the data to be written to a storage pool; under the condition that the data to be written of any target logical volume corresponding to the target logical node is confirmed to be written, receiving a data writing request aiming at the target logical volume through the new logical node, and completing business processing service aiming at the target logical volume;
The fault physical node is used for determining data to be written carried by the data writing request and a logic volume identifier of a logic volume to be written after the data writing request is received through the logic node; caching the data to be written into a cache space corresponding to the logical volume identifier through a logical node, and synchronizing the data to be written and the logical volume identifier to the target physical node; and reading data to be written cached in a cache space corresponding to the logical volume identifier through a logical node, sending the data to be written to a storage pool, and synchronizing the sent data to be written to the target physical node.
The target physical node is a node for service recovery, the fault physical node is a node for service recovery, each time the fault physical node stores one data before the fault occurs, the data target physical node is subjected to service recovery according to the data synchronized by the fault physical node after the fault physical node has a fault.
In one example, the management node is further configured to:
acquiring the bandwidth and the read-write quantity configured by a user for each logical volume;
Determining the read-write pressure of each physical node by the management node according to the corresponding relation between the physical node and the logical node and the corresponding relation between the logical node and the logical volume;
under the condition that the read-write pressure of each physical node is not uniform, an overload physical node with the read-write pressure larger than the upper pressure limit and an idle physical node with the read-write pressure smaller than the lower pressure limit are determined;
and distributing at least one logical volume corresponding to the overload physical node to the logical node deployed by the overload physical node.
And the management node performs load balancing on each logical volume according to the pressure of the logical volume corresponding to each physical node, so that the logical volume resources of the distributed system are effectively utilized.
The third aspect of the application also provides a business service recovery device. The apparatus is applied to a physical node in a distributed system, the distributed system further including a management node and other physical nodes, the physical node being deployed with at least one logical node, the logical node corresponding to at least one logical volume, the apparatus comprising:
the receiving module is used for creating a new logic node with the same service attribute as the target logic node under the condition of receiving a service recovery instruction which is sent by the management node and is aimed at the target logic node in the fault physical node;
The sending module is used for reading target data to be written of each logic volume contained in the target logic node by taking the logic volume as a unit, and sending the target data to be written to a storage pool;
and the recovery module is used for receiving a data writing request aiming at the target logical volume through the new logical node under the condition that the fact that the target to-be-written data of any target logical volume corresponding to the target logical node is written is determined to be completed, and completing business processing service aiming at the target logical volume.
In one embodiment, the receiving module includes:
an information determining unit, configured to determine resource configuration information and connection configuration information of a target logical node in the failed physical node, where the service restoration instruction sent by the management node for the target logical node is received;
the construction unit is used for constructing a new logic node and carrying out configuration processing on the new logic node according to the resource configuration information and the connection configuration information.
In one embodiment, the sending module is specifically configured to:
traversing all logic volumes contained in the target logic volumes, sequentially reading target data to be written corresponding to the cached logic volumes according to a preset sequence, and sending the target data to be written to a storage pool.
In one embodiment, the apparatus further includes:
the logic volume determining module is used for determining data to be written carried by the data writing request and a logic volume identifier of a logic volume to be written after the data writing request is received through the logic node;
the caching module is used for caching the data to be written into a caching space corresponding to the logical volume identifier through the logical node, and synchronizing the data to be written and the logical volume identifier to other physical nodes;
and the writing module is used for reading the data to be written cached in the cache space corresponding to the logical volume identifier through the logical node, sending the data to be written to a storage pool, and synchronizing the sent data to be written to the other physical nodes.
In a fourth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
under the condition that a service recovery instruction which is sent by the management node and aims at a target logic node in the fault physical node is received, creating a new logic node with the same service attribute as the target logic node;
Reading target data to be written of each logic volume contained in the target logic node by taking the logic volume as a unit, and sending the target data to be written to a storage pool;
and under the condition that the fact that the target to-be-written data of any target logical volume corresponding to the target logical node is written is determined to be completed, the new logical node receives a data writing request aiming at the target logical volume, and business processing service aiming at the target logical volume is completed.
In a fifth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
under the condition that a service recovery instruction which is sent by the management node and aims at a target logic node in the fault physical node is received, creating a new logic node with the same service attribute as the target logic node;
reading target data to be written of each logic volume contained in the target logic node by taking the logic volume as a unit, and sending the target data to be written to a storage pool;
and under the condition that the fact that the target to-be-written data of any target logical volume corresponding to the target logical node is written is determined to be completed, the new logical node receives a data writing request aiming at the target logical volume, and business processing service aiming at the target logical volume is completed.
In a sixth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
under the condition that a service recovery instruction which is sent by the management node and aims at a target logic node in the fault physical node is received, creating a new logic node with the same service attribute as the target logic node;
reading target data to be written of each logic volume contained in the target logic node by taking the logic volume as a unit, and sending the target data to be written to a storage pool;
and under the condition that the fact that the target to-be-written data of any target logical volume corresponding to the target logical node is written is determined to be completed, the new logical node receives a data writing request aiming at the target logical volume, and business processing service aiming at the target logical volume is completed.
According to the business service restoration method, the business service restoration device, the computer equipment, the storage medium and the computer program product, the logical volumes are divided on the basis of the logical nodes, and the data restoration is carried out by the logical volumes.
Drawings
FIG. 1 is an application environment diagram of a business service restoration method in one embodiment;
FIG. 2 is a flow diagram of a business service restoration method in one embodiment;
fig. 3A, 3B, 3C, and 3D are schematic flow diagrams of a business service recovery method in another embodiment;
FIG. 4 is a block diagram of a business service device in one embodiment;
fig. 5 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In the field of distributed storage, a distributed storage system is generally a front end of the distributed storage, and is configured to receive and process a write request, and then process the write request, and then send the write request to a storage pool at a back end to complete a disc-dropping operation (writing data to a disc).
In the related art, a distributed storage system generally includes a plurality of physical nodes, where each physical Node includes at least one Logical Node (LNODE) that is a unit of storage that is virtualized from the physical Node and is carried on the physical Node. When a physical node sends a fault, for example, the physical node is inaccessible due to the abnormality of network, power failure and the like, the service accepted by the fault physical node needs to be recovered, generally, the service is recovered by taking the physical node as granularity, another target physical node is selected in the distributed storage system to accept all the services of the fault physical node, and when all the data of the fault physical node is completed by the target physical node, all the services accepted by the fault physical node can be recovered.
In the related art, the service recovery is performed with the granularity of the logical nodes, where one or more physical nodes need to be selected in the distributed storage system to accept the service accepted by each logical node on the failed physical node, for example, three logical nodes are operated on the failed physical node X, the logical node A, B, C, the service accepted by the physical node Y in the distributed storage system to accept the logical node A, B, C, or the service accepted by the physical node Y to accept the logical node a, and the service accepted by the physical node Z to accept the logical node B, C.
According to the service recovery method, after the logical nodes deployed by the failed physical node migrate to other normal physical nodes, after the data which is not dropped in the failure (namely, all data to be written in a cache) are required to be sent to the storage pool, the logical nodes can continue to receive the writing request to complete the recovery of the service accepted by the failed physical node, and the recovery efficiency is not very high whether the service recovery is carried out by taking the physical node as granularity or the service recovery is carried out by taking the logical node as granularity.
Based on the above, the application provides a business service recovery method, which is applied to physical nodes in a distributed system, wherein the distributed system further comprises management nodes and other physical nodes, each physical node is provided with at least one logic node, and each logic node corresponds to at least one logic volume. And under the condition that the physical node receives a service recovery instruction which is sent by the management node and aims at a target logical node in the fault physical node, the physical node creates a new logical node with the same service attribute as the target logical node. And taking the logical volumes as units, reading target data to be written of each logical volume contained in the target logical node, and sending the target data to be written to a storage pool. And under the condition that the data to be written in any target logical volume corresponding to the target logical node is determined to be written in, receiving a data writing request aiming at the target logical volume through the new logical node, and completing business processing service aiming at the target logical volume.
According to the method, the logical volumes are divided on the basis of the logical nodes, and the data is restored by the logical volumes, and the service is restored after the data of all the logical volumes are restored, so that the service interruption time is reduced.
It should be noted that, the physical node applied in the service recovery method of the present application is a block device, and stores information in blocks with a fixed size, and each block has its own address, so when the data to be written is cached, the data is written in units of blocks and read in units of blocks, so when the service is recovered, the physical node or the logical node cannot distinguish the corresponding relationship between the cached data and the service, and after the data of the synchronous failed physical node is dropped, the data to be written can be continuously received and cached.
The distributed system may be a distributed storage system, a distributed file system, a distributed block storage system, or the like, as long as the distributed system includes a plurality of physical nodes, each physical node including at least one logical node.
The business service recovery method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein, each physical node communicates with the management node through the network, and each physical node can also communicate through the network. Each physical node is deployed with at least one logical node, and each logical node corresponds to at least one logical volume.
In one embodiment, as shown in fig. 2, there is provided a business service restoration method, which is applied to a physical node in a distributed system as shown in fig. 1, and includes the following steps:
step 201, under the condition that a service recovery instruction sent by the management node for a target logical node in the failed physical node is received, creating a new logical node with the same service attribute as the target logical node.
The service attribute refers to an attribute corresponding to a service for processing, for example, a storage for which the corresponding service is transaction data or a storage for which the corresponding service is user operation data. The processing logic of the data writing request corresponding to different service attributes is different, the data carried by the data writing request is different, or the position of the data to be written in the data request is different, and the like.
Specifically, the management node periodically communicates with each physical node, if a response of the physical node is not received after a preset duration is reached when the management node communicates with a certain physical node, the physical node is determined to be a failed physical node, a target physical node synchronizing data stored by the failed physical node is determined, each logical node on the failed physical node and a physical node receiving each logical node are determined according to the load condition of each target physical node, and then a service recovery instruction for the target logical node in the failed physical node is sent to the target physical node. For example, three logical nodes are running on the failed physical node X, the logical node A, B, C, and the management node determines that the physical node Y, Z synchronizes all data of the failed physical node, and can accept the service accepted by the logical node A, B, C by the physical node Y, where the management node sends a service restoration instruction for the logical node A, B, C in the failed physical node X to the physical node Y; the management node may send a service restoration instruction for the logical node a in the failed physical node X to the physical node Y, and send a service restoration instruction for the logical node B, C in the failed physical node X to the physical node Z.
Under the condition that a service recovery instruction sent by the management node aiming at a target logic node in the fault physical node is received, the physical node determines the service attribute of the target logic node, and then creates a new logic node with the same service attribute as the target logic node according to the creation requirement of the service attribute.
And 203, reading target data to be written of each logical volume contained in the target logical node by taking the logical volume as a unit, and sending the target data to be written to a storage pool.
The target data to be written in the logical volume refers to the data to be written in the disk storage space corresponding to the logical volume, wherein the data to be written in the target data to be written in the logical volume finally falls to the disk storage space corresponding to the logical volume.
Specifically, when the logical node caches the data to be written, the logical volume is used as a unit to cache, so that other physical access nodes also cache the data to be written of the logical node in a unit of a logical volume when synchronizing the data to be written of the logical node, and therefore, the physical node reads the target data to be written of each logical volume contained in the target logical node in a unit of a logical volume and sends the target data to be written to the storage pool. For example, the target logical node includes 3 logical volumes, namely logical volume A, B, C, and reads the target data to be written stored in logical volume a first, then sequentially sends the target data to be written stored in logical volume a to the storage pool, reads the target data to be written stored in logical volume B and sends the target data to be written to the storage pool, and finally reads the target data to be written stored in logical volume C and sends the target data to be written to the storage pool
Step 205, under the condition that it is determined that the writing of the target data to be written of any target logical volume corresponding to the target logical node is completed, receiving a data writing request for the target logical volume by the new logical node, and completing the business processing service for the target logical volume.
Specifically, when determining that the target data to be written of any target logical volume corresponding to the target logical node completes writing, for example, determining that the buffer space corresponding to a certain logical volume is empty, the physical node determines that the target data to be written corresponding to the logical volume has completed writing, and if obtaining a response to a thread corresponding to the certain logical volume, the physical node determines that the target data to be written corresponding to the logical volume has completed writing. At this time, the data that the failed physical node fails to perform the disk-drop processing on the target logical volume corresponding to the target logical node is described, so as to complete the disk-drop processing, and the physical node may receive a data write-in request for the target logical volume through the new logical node, so as to complete the business processing service for the target logical volume.
In this embodiment, the logical volumes are divided on the basis of the logical nodes, and the data is restored by the logical volumes, and the service interruption time is reduced because the service is first completed by the logical volumes whose data is restored, instead of waiting for restoration of the service after the data of all the logical volumes is restored.
In one embodiment, step 201 specifically includes:
step 201A, determining resource configuration information and connection configuration information of a target logical node in the failed physical node under the condition that a service recovery instruction sent by the management node is received for the target logical node.
Wherein the resources and connection configurations required by the different logical nodes are different,
specifically, after determining a target physical node for accepting a target logical node in the failed physical nodes, the management node queries the pre-stored resource configuration information and connection configuration information of the target logical node according to the logical node identification of the target logical node. And the management node sends the resource configuration information and the connection configuration information of the target logical node to the target physical node at the same time of sending the service recovery instruction aiming at the target logical node in the fault physical node to the target physical node.
Step 201B, a new logical node is constructed, and the new logical node is configured according to the resource configuration information and the connection configuration information.
Specifically, the physical node receives a service recovery instruction sent by the management node and aiming at a target logic node in the fault physical node, and after resource configuration information and connection configuration information of the target logic node are determined, a new logic node is constructed, then resources such as memory, CPU (Central processing Unit) resources, storage space, bandwidth and the like are divided for the new logic node according to the resource configuration information, resource configuration of the new logic node is completed, connection with a client, the management node and the like is configured according to the connection configuration information, and configuration processing of the new logic node is completed.
In this embodiment, after receiving the service restoration instruction, the physical node creates a logical node with the same attribute as the target logical node to accept the service of the target logical node.
In one embodiment, step 203 specifically includes:
traversing all logic volumes contained in the target logic volumes, sequentially reading target data to be written corresponding to the cached logic volumes according to a preset sequence, and sending the target data to be written to a storage pool.
Specifically, the physical node determines each logical volume identifier corresponding to the target logical node according to the node identifier of the target logical node, then traverses each logical volume contained in the target logical volume, sequentially reads target data to be written corresponding to each cached logical volume according to a preset sequence, and sends the read target data to be written to the storage pool. And for each logic volume, reading target data to be written in a buffer space corresponding to the logic volume according to the sequence of the target data to be written corresponding to the logic volume, and sending the read target data to be written to a storage pool.
In one embodiment, after determining each logical volume identifier corresponding to a target logical node according to the node identifier of the target logical node, the physical node starts a plurality of service recovery threads based on the number of logical volumes corresponding to the target logical node, each service recovery thread is configured to read target data to be written corresponding to one logical volume, send the read target data to be written to a storage pool, and after any service recovery thread sends all the target data to be written of the corresponding logical volume to the storage pool, feed back information that the logical volume completes writing to the physical node.
In this embodiment, the data to be written is read in the cache order, preventing data errors due to errors in the write order of the data to be written.
In one embodiment, the method further comprises:
step 207, after receiving the data writing request through the logic node, determining the data to be written carried by the data writing request and the logic volume identifier of the logic volume to be written.
Specifically, the physical node receives a data writing request through the logic node, analyzes the data writing request to obtain data to be written carried by the data writing request, and then determines a logic volume identifier of the logic volume to be written corresponding to the data to be written according to a preset configuration, such as a corresponding relationship between a data type corresponding to the data writing request and the logic volume, or a corresponding relationship between a network address of the data writing request and the logic volume.
Step 209, caching the data to be written into a cache space corresponding to the logical volume identifier through the logical node, and synchronizing the data to be written and the logical volume identifier to other physical nodes.
Specifically, the logical nodes may allocate different cache spaces for different logical volumes in advance. The physical node caches the data to be written into a cache space corresponding to the determined logical volume identifier through the logical node, and synchronizes the data to be written and the logical volume identifier to other physical nodes while caching so that the other physical nodes backup the target data to be written of each logical volume cached by the physical node.
Step 211, reading target data to be written cached in the cache space corresponding to the logical volume identifier through the logical node, sending the target data to be written to the storage pool, and synchronizing the sent target data to be written to other physical nodes.
Specifically, the physical node reads target data to be written cached in the cache space corresponding to the logical volume identifier through the logical node, then sends the target data to be written to the storage pool, and synchronizes the sent target data to be written to other physical nodes, so that the other physical nodes delete the sent target data to be written in the synchronized data, and the synchronized data among the physical nodes are kept consistent.
In step 209 and step 211, the physical node may be two independent threads, and in step 209, the physical node caches each data to be written into the cache space corresponding to each logical volume identifier through one thread, and in step 211, the physical node reads the target data to be written cached in the cache space corresponding to each logical volume identifier through one thread, or reads the target data to be written cached in the cache space corresponding to each logical volume identifier through multiple threads.
In this embodiment, each time a physical node stores one data, the data is synchronized to other physical nodes, so that after the physical node fails, the other physical nodes can recover the service accepted by the physical node according to the synchronized data.
In one embodiment, the present application also provides a distributed system, wherein the system includes a management node and a plurality of physical nodes, the physical nodes having at least one logical node deployed, the logical nodes corresponding to at least one logical volume.
And the management node determines a target physical node with the cached data of the failed physical node in backup under the condition that any physical node fails.
Specifically, the management node stores backup relationships between the physical nodes, determines, in the case that it is determined that any physical node fails, a target physical node that backs up data cached by the failed physical node among other physical nodes that do not fail, then determines, according to load conditions of the physical nodes that do not fail and each logical node in the failed physical node, a target physical node that receives each logical node in the failed physical node, and sends a service restoration instruction for the target logical node deployed by the failed physical node to the target physical node.
Each target physical node creates a new logical node with the same service attribute as the target logical node under the condition of receiving a service recovery instruction of the target logical node deployed for the fault physical node, which is sent by the management node, and then reads data to be written of each logical volume synchronous with the target logical node by taking the logical volume as a unit and sends the data to be written to a storage pool; and under the condition that the data to be written of any target logical volume corresponding to the target logical node is determined to be written, receiving a data writing request aiming at the target logical volume through the new logical node, and completing business processing service aiming at the target logical volume.
And before failure occurs, the physical node determines the data to be written carried by the data writing request and the logical volume identification of the logical volume to be written after the data writing request is received by the logical node. And caching the data to be written into a cache space corresponding to the logical volume identifier through the logical node, and synchronizing the data to be written and the logical volume identifier to the target physical node. And reading data to be written cached in a cache space corresponding to the logical volume identifier through the logical node, sending the data to be written to a storage pool, and synchronizing the sent data to be written to the target physical node.
In this embodiment, the target physical node is a node for performing service recovery, the failed physical node is a node that needs service recovery, each time the failed physical node stores one data before failure, the data is target physical node, and after the failed physical node fails, the target physical node performs service recovery according to the data synchronized by the failed physical node.
In one embodiment, the management node of the above system is further configured to:
the management node obtains the bandwidth and the read-write quantity configured by a user for each logical volume, and then determines the read-write pressure of each physical node according to the corresponding relation between the physical nodes and the logical nodes and the corresponding relation between the logical nodes and the logical volumes. And under the condition that the read-write pressure of each physical node is not uniform, the management node determines an overload physical node with the read-write pressure larger than the upper pressure limit and an idle physical node with the read-write pressure smaller than the lower pressure limit. And the management node distributes at least one logical volume corresponding to the overload physical node to the logical nodes deployed by the unloaded physical node.
Wherein, each logic volume in the distributed storage system is preconfigured by a user and comprises bandwidth and read-write quantity. Specifically, the management node reads the configuration of the user for each logical volume, and obtains the bandwidth and the read-write quantity of each logical volume.
In one embodiment, under the condition that the read-write pressure of each physical node is not uniform, the management node determines an overload physical node with the read-write pressure larger than the upper pressure limit and an idle physical node with the read-write pressure smaller than the lower pressure limit, and then sends migration instructions corresponding to the overload physical node and aiming at the target logical volume to the idle physical node and the overload physical node respectively. And after receiving the migration instruction, the overload physical node stops receiving the data writing request aiming at the target logical volume, and sends the target data to be written of the cached target logical volume to the storage pool to finish the disk dropping operation of the target logical volume. After receiving the migration instruction sent by the management node, the empty physical node constructs a logical node corresponding to the target logical volume, and starts to receive a data writing request aiming at the target logical volume.
In this embodiment, the management node performs load balancing on each logical volume according to the pressure of the logical volume corresponding to each physical node, so that the logical volume resources of the distributed system are effectively utilized.
Fig. 3A-3D are schematic flow diagrams illustrating a business service recovery method according to an embodiment of the present application. Assume that the distributed storage nodes include physical nodes A, B, C, logical nodes 1, 2, 3 logical volumes 1, 2, 3, 4. The clients in the figure are the sum of all clients sending requests, and a plurality of clients are shown in the figure as a whole. FIG. 3A is a schematic diagram of the deployment of each physical node, logical volume before failure of physical node A; FIG. 3B is a diagram showing that after the physical node A fails, the service corresponding to the logical volume 1 is restored on the physical node B; FIG. 3C is a diagram showing that after the physical node A fails, the service corresponding to the logical volume 2 is recovered on the physical node B, i.e. all the services accepted by the physical node A are recovered on the physical node B; fig. 3D is a schematic deployment diagram of each physical node, logical node, and logical volume after the distributed storage system performs load balancing after the physical node B receives all services of the physical node a.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a service recovery device for implementing the service recovery method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more service restoration devices provided below may be referred to the limitation of the service restoration method hereinabove, and will not be described herein.
In one embodiment, as shown in fig. 4, there is provided a business service recovery apparatus applied to a physical node in a distributed system, the distributed system further including a management node and other physical nodes, the physical node being deployed with at least one logical node, the logical node corresponding to at least one logical volume, the apparatus comprising:
a receiving module 401, configured to create a new logical node with the same service attribute as a target logical node in the failed physical node, where the service restoration instruction sent by the management node for the target logical node is received;
a sending module 403, configured to read target data to be written of each logical volume included in the target logical node by taking the logical volume as a unit, and send the target data to be written to a storage pool;
and the recovery module 405 is configured to receive, by the new logical node, a data write request for the target logical volume, and complete a service processing service for the target logical volume, when it is determined that writing of target data to be written of any target logical volume corresponding to the target logical node is completed.
In one embodiment, the receiving module includes:
An information determining unit 401A (not shown in the figure) configured to determine resource configuration information and connection configuration information of a target logical node in the failed physical node, when a service restoration instruction for the target logical node is received, which is sent by the management node;
a construction unit 401B (not shown in the figure) is configured to construct a new logical node, and perform configuration processing on the new logical node according to the resource configuration information and the connection configuration information.
In one embodiment, the sending module 403 is specifically configured to:
traversing all logic volumes contained in the target logic volumes, sequentially reading target data to be written corresponding to the cached logic volumes according to a preset sequence, and sending the target data to be written to a storage pool.
In one embodiment, the apparatus further includes:
a logical volume determining module 407 (not shown in the figure) configured to determine, after receiving a data writing request through a logical node, data to be written carried by the data writing request and a logical volume identifier of a logical volume to be written;
a buffer module 409 (not shown in the figure) configured to buffer, by a logical node, the data to be written into a buffer space corresponding to the logical volume identifier, and synchronize the data to be written and the logical volume identifier to other physical nodes;
The writing module 411 (not shown in the figure) is configured to read, by a logical node, data to be written cached in a cache space corresponding to the logical volume identifier, send the data to be written to a storage pool, and synchronize the sent data to be written to the other physical nodes.
The above-mentioned various modules in the service restoration device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a physical node in a distributed storage system, the internal structure of which may be as shown in FIG. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data to be written. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a business service restoration method.
It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is also provided a computer device including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the above-described embodiments of the business service restoration method when executing the computer program.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the above-described embodiments of the business service restoration method.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the above-described embodiments of the business service restoration method.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A business service restoration method, wherein the method is applied to physical nodes in a distributed system, the distributed system further comprising a management node and other physical nodes, the physical nodes being deployed with at least one logical node, the logical node corresponding to at least one logical volume, the method comprising:
under the condition that a service recovery instruction which is sent by the management node and aims at a target logic node in the fault physical node is received, creating a new logic node with the same service attribute as the target logic node;
Reading target data to be written of each logic volume contained in the target logic node by taking the logic volume as a unit, and sending the target data to be written to a storage pool;
and under the condition that the fact that the target to-be-written data of any target logical volume corresponding to the target logical node is written is determined to be completed, the new logical node receives a data writing request aiming at the target logical volume, and business processing service aiming at the target logical volume is completed.
2. The method according to claim 1, wherein the creating a new logical node with the same service attribute as the target logical node in the failed physical node upon receiving the service restoration instruction sent by the management node for the target logical node comprises:
under the condition that a service recovery instruction which is sent by the management node and aims at a target logic node in the fault physical node is received, determining resource configuration information and connection configuration information of the target logic node;
and constructing a new logic node, and carrying out configuration processing on the new logic node according to the resource configuration information and the connection configuration information.
3. The method according to claim 1, wherein reading, in units of logical volumes, target data to be written of each logical volume included in the target logical node, and sending the target data to be written to a storage pool, includes:
Traversing all logic volumes contained in the target logic volumes, sequentially reading target data to be written corresponding to the cached logic volumes according to a preset sequence, and sending the target data to be written to a storage pool.
4. The method according to claim 1, wherein the method further comprises:
after receiving a data writing request through a logic node, determining data to be written carried by the data writing request and a logic volume identifier of a logic volume to be written;
caching the data to be written into a cache space corresponding to the logical volume identifier through a logical node, and synchronizing the data to be written and the logical volume identifier to other physical nodes;
and reading data to be written cached in a cache space corresponding to the logical volume identifier through a logical node, sending the data to be written to a storage pool, and synchronizing the sent data to be written to other physical nodes.
5. A distributed system comprising a management node and a plurality of physical nodes, the physical nodes having at least one logical node deployed, the logical nodes corresponding to at least one logical volume, wherein:
The management node is used for determining a target physical node backed up with the data cached by the failed physical node under the condition that any physical node fails;
the target physical node is used for creating a new logical node with the same service attribute as the target logical node under the condition that a service recovery instruction of the target logical node deployed for the fault physical node, which is sent by the management node, is received; reading data to be written of each logic volume synchronized with the target logic node by taking the logic volume as a unit, and sending the data to be written to a storage pool; under the condition that the data to be written of any target logical volume corresponding to the target logical node is confirmed to be written, receiving a data writing request aiming at the target logical volume through the new logical node, and completing business processing service aiming at the target logical volume;
the fault physical node is used for determining data to be written carried by the data writing request and a logic volume identifier of a logic volume to be written after the data writing request is received through the logic node; caching the data to be written into a cache space corresponding to the logical volume identifier through a logical node, and synchronizing the data to be written and the logical volume identifier to the target physical node; and reading data to be written cached in a cache space corresponding to the logical volume identifier through a logical node, sending the data to be written to a storage pool, and synchronizing the sent data to be written to the target physical node.
6. The system of claim 5, wherein the management node is further configured to:
acquiring the bandwidth and the read-write quantity configured by a user for each logical volume;
determining the read-write pressure of each physical node by the management node according to the corresponding relation between the physical node and the logical node and the corresponding relation between the logical node and the logical volume;
under the condition that the read-write pressure of each physical node is not uniform, an overload physical node with the read-write pressure larger than the upper pressure limit and an idle physical node with the read-write pressure smaller than the lower pressure limit are determined;
and distributing at least one logical volume corresponding to the overload physical node to the logical node deployed by the overload physical node.
7. A business service restoration device, wherein the device is applied to a physical node in a distributed system, the distributed system further comprises a management node and other physical nodes, the physical node is deployed with at least one logical node, the logical node corresponds to at least one logical volume, and the device comprises:
the receiving module is used for creating a new logic node with the same service attribute as the target logic node under the condition of receiving a service recovery instruction which is sent by the management node and is aimed at the target logic node in the fault physical node;
The sending module is used for reading target data to be written of each logic volume contained in the target logic node by taking the logic volume as a unit, and sending the target data to be written to a storage pool;
and the recovery module is used for receiving a data writing request aiming at the target logical volume through the new logical node under the condition that the fact that the target to-be-written data of any target logical volume corresponding to the target logical node is written is determined to be completed, and completing business processing service aiming at the target logical volume.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any of claims 1 to 4.
CN202211733663.5A 2022-12-30 2022-12-30 Business service recovery method, device and computer equipment Pending CN116048878A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211733663.5A CN116048878A (en) 2022-12-30 2022-12-30 Business service recovery method, device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211733663.5A CN116048878A (en) 2022-12-30 2022-12-30 Business service recovery method, device and computer equipment

Publications (1)

Publication Number Publication Date
CN116048878A true CN116048878A (en) 2023-05-02

Family

ID=86130723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211733663.5A Pending CN116048878A (en) 2022-12-30 2022-12-30 Business service recovery method, device and computer equipment

Country Status (1)

Country Link
CN (1) CN116048878A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501264A (en) * 2023-06-25 2023-07-28 苏州浪潮智能科技有限公司 Data storage method, device, system, equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501264A (en) * 2023-06-25 2023-07-28 苏州浪潮智能科技有限公司 Data storage method, device, system, equipment and readable storage medium
CN116501264B (en) * 2023-06-25 2023-09-15 苏州浪潮智能科技有限公司 Data storage method, device, system, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN107807794B (en) Data storage method and device
CN106407040B (en) A kind of duplicating remote data method and system
US8521685B1 (en) Background movement of data between nodes in a storage cluster
CN107798130B (en) Method for storing snapshot in distributed mode
US7778960B1 (en) Background movement of data between nodes in a storage cluster
JP2019101703A (en) Storage system and control software arrangement method
US20060047926A1 (en) Managing multiple snapshot copies of data
CN106776130B (en) Log recovery method, storage device and storage node
US20070174673A1 (en) Storage system and data restoration method thereof
US20080140963A1 (en) Methods and systems for storage system generation and use of differential block lists using copy-on-write snapshots
CN108509462B (en) Method and device for synchronizing activity transaction table
CN107729536B (en) Data storage method and device
CN110096220B (en) Distributed storage system, data processing method and storage node
US9984139B1 (en) Publish session framework for datastore operation records
CN115599747B (en) Metadata synchronization method, system and equipment of distributed storage system
US11449402B2 (en) Handling of offline storage disk
CN115686932B (en) Backup set file recovery method and device and computer equipment
US20190347165A1 (en) Apparatus and method for recovering distributed file system
CN111291062B (en) Data synchronous writing method and device, computer equipment and storage medium
CN111309245A (en) Layered storage writing method and device, reading method and device and system
US8015375B1 (en) Methods, systems, and computer program products for parallel processing and saving tracking information for multiple write requests in a data replication environment including multiple storage devices
CN116048878A (en) Business service recovery method, device and computer equipment
CN115686881A (en) Data processing method and device and computer equipment
US9933953B1 (en) Managing copy sessions in a data storage system to control resource consumption
CN107943615B (en) Data processing method and system based on distributed cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination