WO2024113687A1

WO2024113687A1 - Data recovery method and related apparatus

Info

Publication number: WO2024113687A1
Application number: PCT/CN2023/093965
Authority: WO
Inventors: 李飞龙; 许永良; 孙明刚
Original assignee: 苏州元脑智能科技有限公司
Priority date: 2022-11-29
Filing date: 2023-05-12
Publication date: 2024-06-06
Also published as: CN115599589A; CN115599589B

Abstract

A data recovery method, relating to the technical field of storage, and comprising: scanning a cache node, and, according to data flushing information recorded by the cache node, determining a stripe of which the flushing of data has not been completed when an accidental power failure occurs, the cache node being located in a non-volatile random access memory (S101); reading data recovery information from check elements of a check data table, the check data table being located in the non-volatile random access memory (S102); and, according to the data recovery information, recovering data of the stripe of which the flushing of data has not been completed (S103). The method can accelerate the recovery of data and improve the data writing performance of a RAID card. Also disclosed are a data recovery apparatus and device, and a computer-readable storage medium, which all have the technical effects.

Description

A data recovery method and related device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to a Chinese patent application filed with the China Patent Office on November 29, 2022, with application number 202211507999.X and application name “A data recovery method and related devices”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the field of storage technology, and in particular to a data recovery method; and also to a data recovery device, equipment and a non-volatile readable storage medium.

Background technique

With the development of science and technology, storage technology is also improving rapidly. RAID (Redundant Array of Independent Disks) technology is an important technology in storage. At present, there are soft RAID storage technology and hard RAID storage technology in the industry. Soft RAID storage technology uses software to manage stripes and blocks in RAID arrays. Hard RAID storage technology (i.e. RAID card) uses hardware to manage data. In hard RAID storage technology, when an unexpected power failure occurs, write data loss failure is prone to occur. In order to solve the write data loss caused by unexpected power failure, a log-based access file system is currently used in RAID cards to solve the write data loss during unexpected power failure. Every time a host I/O write request is received, the timestamp of the write request and other key information are written to a file log. When the data of the write request is safely written to the disk, the corresponding log in the log file is cleared. This solution needs to rely on the support of the underlying file system for logging, and the log file in the file system needs to be read and written before and after the write data is written. And if there is an unexpected power failure during the data writing process, to recover the lost data, it is necessary to read the log file corresponding to the written data from the file system and restore the data based on the timestamp and other key information. The above solution will lead to higher overhead of the RAID card, affecting the speed of data recovery and RAID data writing performance.

Summary of the invention

The purpose of the present application is to provide a data recovery method that can accelerate data recovery and improve the data writing performance of the RAID card. Another purpose of the present application is to provide a data recovery device, equipment and non-volatile readable storage medium, all of which have the above technical effects.

To solve the above technical problems, the present application provides a data recovery method, comprising:

Scan the cache node, and determine the stripe whose data has not been flushed when the power is unexpectedly lost according to the data flushing information recorded by the cache node; the cache node is located in the non-volatile random access memory;

Reading data recovery information from a check element of a check data table; the check data table is located in a non-volatile random access memory;

Restore the data of the stripe for which data flushing has not been completed according to the data recovery information.

Optionally, according to the data flushing information recorded by the cache node, it is determined that the stripes for which data flushing is not completed when the power is unexpectedly lost include:

The stripes for which data flushing is not completed when power is lost unexpectedly are determined according to the bits in the bitmap metadata associated with the control field in the cache node.

Optionally, according to the bit in the bitmap metadata associated with the control field in the cache node, determining the stripes for which data flushing is not completed when power is lost unexpectedly includes:

If the value of the bit corresponding to the stripe is the first value, it is determined that the stripe has not completed data flushing when the power is unexpectedly lost;

If the value of the bit corresponding to the stripe is the second value, it is determined that the stripe has completed data flushing when the power is unexpectedly lost.

Optionally, one bit in the bitmap metadata corresponds to one stripe.

Optionally, before determining the stripe for which data flushing is not completed when power is lost unexpectedly according to the bit in the bitmap metadata associated with the control field in the cache node, the method further includes:

According to the value assigned to the control field, determine whether there are stripes whose data has not been flushed when the power is unexpectedly lost;

If it exists, the stripes for which data flushing was not completed during the unexpected power failure are determined according to the bits in the bitmap metadata associated with the control field in the cache node.

Optionally, judging whether there is a stripe for which data flushing is not completed when power is off unexpectedly according to the value assigned to the control field includes:

If the control field is assigned a value of true, there are no stripes whose data is not flushed when power is lost unexpectedly;

If the control field is assigned a value of false, there are stripes that did not complete the data flushing when the power was unexpectedly lost.

Optionally, also include:

Apply for a cache node and refresh information when the cache node records data.

Optionally, also include:

Apply for a verification element and record data recovery information in the verification element.

Optionally, the information to be refreshed when recording data in the cache node includes:

If all the data blocks and check blocks of the stripe have been flushed to the disk, the bit position corresponding to the stripe in the bitmap metadata associated with the control field in the cache node is set to the first value;

If the data blocks and the check blocks of the stripe are not all flushed to the disk, the bit position corresponding to the stripe in the bitmap metadata is set to the second value.

Optionally, the information refreshed when the cache node records the data also includes:

If the data blocks and check blocks of each stripe have been flushed to disk, the control field is set to true;

If the data blocks and check blocks of at least one stripe are not all flushed to disk, the control field is assigned a value of false.

Optionally, the cache node also includes a front pointer field and a back pointer field; the front pointer field points to the cache node applied for the previous host I/O write request, and the back pointer field points to the cache node applied for the next host I/O write request.

Optionally, also include:

The calculation result of the check block of the stripe is recorded in the check element.

Optionally, the calculation result of the check block of the stripe recorded in the check element includes:

If the check block of the stripe has been calculated, the corresponding bit position in the stripe field in the check element is set to a first preset value;

If the check block of the stripe is not calculated, the corresponding bit position in the stripe field is set to the second preset value.

Optionally, one bit in the stripe field corresponds to one stripe.

Optionally, the calculation result of the check block of the stripe recorded in the check element also includes:

If the check blocks of all stripes have been calculated, the check field in the check element is set to true;

If the check blocks of all stripes are not calculated, the check field is assigned a value of false.

Optionally, also include:

Write the stripe data blocks into the cache field of the cache node.

Optionally, also include:

Determine whether data recovery is successful;

If the data is recovered successfully, the cache node and the check element are released.

In order to solve the above technical problems, the present application also provides a data recovery device, comprising:

A determination module is used to scan cache nodes and determine the stripes for which data flushing has not been completed when power is unexpectedly lost according to the data flushing information recorded by the cache nodes; the cache nodes are located in a non-volatile random access memory;

A reading module, used for reading data recovery information from a check element of a check data table; the check data table is located in a non-volatile random access memory;

The recovery module is used to recover the data of the stripe for which the data is not flushed according to the data recovery information.

In order to solve the above technical problems, the present application also provides a data recovery device, including:

Memory for storing computer programs;

A processor is used to implement the steps of any of the above data recovery methods when executing a computer program.

In order to solve the above technical problems, the present application also provides a non-volatile readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any of the above data recovery methods are implemented. Steps.

The data recovery method provided in the present application includes: scanning cache nodes, and determining stripes for which data flushing has not been completed during an unexpected power outage based on data flushing information recorded by the cache nodes; the cache nodes are located in a non-volatile random access memory; reading data recovery information from a check element of a check data table; the check data table is located in a non-volatile random access memory; and recovering data of the stripes for which data flushing has not been completed based on the data recovery information.

It can be seen that the data recovery method provided by the present application maintains cache nodes and verification data tables in a non-volatile random access memory, records the data flush information of the stripe through the cache nodes during the data writing process, and records the data recovery information through the verification elements in the verification data table. After recovering from an unexpected power failure, the information recorded in the non-volatile random access memory can be read to complete the data recovery. The file system does not need to be accessed during the entire data writing process, and the file system does not need to be accessed during data recovery, thereby accelerating the recovery of data lost in the RAID card due to an unexpected power failure and improving the RAID card data writing performance.

The data recovery device, equipment and non-volatile readable storage medium provided in this application all have the above-mentioned technical effects.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the prior art and the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

FIG1 is a schematic diagram of a flow chart of a data recovery method provided in an embodiment of the present application;

FIG2 is a schematic diagram of a RAID card according to an embodiment of the present application;

FIG3 is a schematic diagram of a cache node provided in an embodiment of the present application;

FIG4 is a schematic diagram of a verification data table provided in an embodiment of the present application;

FIG5 is a schematic diagram of a host I/O partitioning by stripes provided by an embodiment of the present application;

FIG6 is a schematic diagram of a specific data recovery solution provided in an embodiment of the present application;

FIG7 is a schematic diagram of a data recovery device provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of a data recovery device provided in an embodiment of the present application.

Detailed ways

The core of this application is to provide a data recovery method that can accelerate data recovery and improve the RAID card data writing performance. Another core of this application is to provide a data recovery device, equipment and non-volatile readable storage medium, all of which have the above Describe the technical effects.

In order to make the purpose, technical solution and advantages of the embodiments of the present application clearer, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

Please refer to FIG. 1 , which is a flow chart of a data recovery method provided by an embodiment of the present application. Referring to FIG. 1 , the method includes:

S101: Scan cache nodes, and determine the stripes for which data flushing has not been completed when power is unexpectedly lost according to data flushing information recorded in the cache nodes; the cache nodes are located in a non-volatile random access memory;

In conjunction with the architecture diagram of the RAID card shown in FIG2 , this embodiment maintains a check data table and multiple cache nodes in the NVRAM (Non-Volatile Random Access Memory) of the RAID card. The check data table includes multiple check elements. The cache node is a structure used to identify a host I/O write request. One cache node identifies a host I/O write request. The cache node is used to record data flush information. The check data table manages the check elements in a one-dimensional linear table metadata organization method. The check elements in the check data table can be used to record data recovery information. The data recovery information includes the timestamp of the host I/O write request and other key information. Different from the traditional technical solution of storing the timestamp of the host I/O write request and other key information in the file log of the file system, this embodiment stores the timestamp of the host I/O request and other key information in the check elements in the NVRAM. The specific content of the key information is not described in detail in this embodiment, and the content of the key information recorded in the file log in the traditional solution can be referred to. In other words, the improvement of this embodiment lies in the storage location of the timestamp and key information, rather than the specific content of the key information.

The host system in Figure 2 can be a workstation, a personal computer, a mobile phone, a laptop computer, a server, etc. Multiple host systems share storage resources in the storage system through a network, and the network can be a wide area network (WAN) connection, a storage area network (SAN), a local area network (LAN) connection, and a wireless WIFI connection. The network can include one or more wirelessly connected networks, and storage systems composed of multiple RAID cards can be connected to form a large storage environment through Fibre Channel over Ethernet (FCoE), Fibre Channel, iSCSI, etc.

The firmware layer in the RAID card includes drivers, RAID card kernel, and file system. The driver in the firmware layer of the RAID card is used to parse commands issued by the host system, such as parsing write data commands and other commands in the host I/O request. The file system is the file system in the solution mentioned above that the RAID card uses a log-based direct access file system to solve the problem of write data loss during unexpected power failure. The system monitor monitoring system monitors various abnormal events in the RAID card. The RAID card controller includes cache, NVRAM, I/O processor, hard disk controller, and hard disk connector.

After the unexpected power failure is restored, the RAID card scans the cache nodes and determines the stripes for which data flushing was not completed at the time of the unexpected power failure based on the data flushing information recorded by the cache nodes.

In some embodiments, determining, according to the data flushing information recorded by the cache node, the stripes for which data flushing has not been completed during the unexpected power failure includes:

As shown in FIG3 , the control field shown in FIG3 is a control field, and the control field can be of bool type. The control field is associated with bitmap metadata, and bitmap metadata is a data structure for managing stripes. The value of the bit in the bitmap metadata reflects the data flushing status of the stripe.

Wherein, determining the stripe for which data flushing is not completed when power is unexpectedly lost according to the bit in the bitmap metadata associated with the control field in the cache node may include:

One bit in the bitmap metadata corresponds to one slice.

In this embodiment, each bit of the bitmap metadata corresponds to a stripe. The value of each bit reflects the flushing status of the data blocks and check blocks of the corresponding stripe. If the data blocks and check blocks of the stripe have been flushed to the disk, the value of the bit corresponding to this stripe in the bitmap metadata is the first value. If the data blocks and check blocks of the stripe are not all flushed to the disk, the value of the bit corresponding to this stripe in the bitmap metadata is the second value. The first value is different from the second value. By identifying the value of the bit, it can be determined whether the corresponding stripe has not completed the data flushing when an unexpected power failure occurs. Data blocks refer to the disk partition that stores valid data sent by the host. The check block is obtained by the XOR operation of the data blocks in the stripe.

The first value may be 1, and the second value may be 0. If the data blocks and check blocks of the stripe have all been flushed to the disk, the bit position corresponding to the stripe in the bitmap metadata is 1. If the data blocks and check blocks of the stripe have not all been flushed to the disk, the bit position corresponding to the stripe in the bitmap metadata is 0.

In some embodiments, according to the bit in the bitmap metadata associated with the control field in the cache node, determining the stripe for which data flushing is not completed when power is lost unexpectedly also includes:

If it exists, the stripes for which data flushing is not completed during the unexpected power failure are determined according to the bits in the bitmap metadata associated with the control field in the cache node.

This embodiment reflects the data flushing status of all stripes by assigning a value to the control field. Before flushing data to a stripe, first determine whether there is a stripe for which data flushing has not been completed based on the value assigned to the control field. On the basis of determining that there is a stripe for which data flushing has not been completed, optionally determine the specific stripe for which data flushing has not been completed based on the bit position of the bitmap metadata. If it is determined that there is no stripe for which data flushing has not been completed, there is no need to optionally determine the specific stripe for which data flushing has not been completed based on the bit position of the bitmap metadata.

Wherein, judging whether there is a stripe for which data is not flushed when power is unexpectedly lost according to the value assigned to the control field may include:

If all data blocks and check blocks of each stripe have been flushed to disk, the control field is assigned a value of true. If at least one stripe's data blocks and check blocks have not been fully flushed to disk, the control field is assigned a value of false.

S102: reading data recovery information from a check element of a check data table; the check data table is located in a non-volatile random access memory;

Referring to Figure 4, data recovery information may be stored in a data field of a check element. The data field shown in Figure 4 is a data field.

S103: According to the data recovery information, the data of the stripe for which the data is not flushed is recovered.

After determining the specific stripe where the data has not been flushed, the RAID card starts the data recovery process in combination with the read data recovery information to recover the data of the stripe where the data has not been flushed. The specific implementation process of the data recovery process is not described in detail in this application, and the description of the data recovery process in the traditional technical solution can be referred to.

Optionally, in some embodiments, it further includes:

After receiving the host I/O write request sent by the host, the driver of the RAID card parses the host I/O write request and applies for the check element corresponding to this host I/O write request and the corresponding cache node from the check data table.

Wherein, refreshing information by recording data in the cache node may include:

For example, if the data blocks and check blocks of the stripe have all been flushed to disk, the bit position corresponding to the stripe in the bitmap metadata is set to 1. If the data blocks and check blocks of the stripe have not all been flushed to disk, the bit position corresponding to the stripe in the bitmap metadata is set to 0.

In addition, the information for flushing the data recorded in the cache node may also include:

For example, if the bits corresponding to each stripe in the bitmap metadata are all 1, the control field is assigned a value of true. If the bits corresponding to each stripe in the bitmap metadata are not all 1, the control field is kept false.

Optionally, in some embodiments, the cache node also includes a front pointer field and a back pointer field; the front pointer field points to the cache node applied for the previous host I/O write request, and the back pointer field points to the cache node applied for the next host I/O write request.

This embodiment adopts a bidirectional linked list metadata organization method to manage cache nodes. Referring to Figure 3, the pre_pointer field shown in Figure 3 is the front pointer field, and the next_pointer field shown in Figure 3 is the back pointer field. The front pointer field points to the cache node applied for the previous host I/O write request, and the back pointer field points to the cache node applied for the next host I/O write request. The cache nodes corresponding to multiple host I/O write requests can be connected together through the front pointer field and the back pointer field in the cache node.

Optionally, in some embodiments, it further includes:

Write the stripe data blocks into the cache field of the cache node.

Referring to Figure 3, the cache_ptr in Figure 3 is the cache field. The data blocks of each stripe are written into the cache_ptr field in the cache node, pointing to a specific area in the cache.

Optionally, in some embodiments, it further includes:

The calculation result of the check block reflects whether the check block of the stripe has been obtained by performing an XOR operation on the data blocks of the stripe.

The calculation result of the check block of the stripe recorded in the check element may include:

One bit in the stripe field corresponds to one stripe.

As shown in FIG4 , the stripe field is the stripe[32] field shown in FIG4 . The stripe[32] field can be an array of int type. An int type data contains 8 bits, and one bit corresponds to one stripe. If the check block has been calculated for the stripe, the corresponding bit is set to the first preset value. If the check block has not been calculated for the stripe, the corresponding bit is set to the second preset value. The first preset value is different from the second preset value.

For example, the first preset value is 1, and the second preset value is 0. If the stripe has calculated the check block, the corresponding bit is set to 1. If the stripe has not calculated the check block, the corresponding bit is kept to 0.

Optionally, recording the calculation result of the check block of the stripe in the check element may further include:

As shown in reference to Figure 4, the check field is the parity_ok field shown in Figure 4. The parity_ok field can be of bool type. When all the check blocks are calculated for the stripes after the write data is split, the parity_ok field is assigned to true. When all the check blocks are not calculated for the stripes after the write data is split, the check field remains false. The host I/O write request can be divided into each stripe as shown in Figure 5. The data blocks of stripe stripe1 are strip5, strip6, strip7 and strip8, and the check block is parity2.

Optionally, in some embodiments, it further includes:

Determine whether data recovery is successful;

In conjunction with steps S1 to S12 shown in FIG6 , a specific implementation is described below:

The host sends a write data request to the RAID card, and the driver of the RAID card receives and parses the write data request sent by the host.

According to the parsed command parameters, apply for the verification element corresponding to the write data request from the verification data table, apply for the cache node, and add the applied cache node to the cache node managed by the bidirectional linked list metadata organization method.

The main control thread of the RAID card controller divides the write data into stripes. The parity blocks of each stripe after the division have not yet been calculated. Therefore, the bits of the stripe[32] field in the parity element corresponding to each stripe are set to 0, and the parity_ok field in the parity element is assigned to false.

Write the data blocks of each split stripe into the cache_ptr field in the cache node.

The RAID card sends a data write completion signal to the host (i.e., responds to the host immediately).

The worker thread in the thread pool uses the data blocks of the stripe temporarily stored in the cache to perform an XOR operation to obtain the check block of the stripe, and then writes the check block to the cache. Since the check block of the stripe has been calculated, the bits of the stripe[32] field in the check element corresponding to each stripe are set to 1, and the parity_ok field in the check element is assigned to true. Since the data blocks and check blocks of the stripe have not been flushed to the physical disk at this time, all bits in the bitmap metadata associated with the control field in the requested cache node are set to 0, and the control field is assigned to false.

The worker thread in the thread pool flushes the data blocks and check blocks temporarily stored in the cache to the physical disk. Since the data blocks and check blocks have been flushed to the physical disk, the bitmap metadata associated with the control field in the cache node is flushed to the physical disk. Set all bits in the register to 1 and assign the control field to true.

Determine whether the parity_ok field in the check element is true, and determine whether the control field in the cache node is true. If the parity_ok field is not true or the control field in the cache node is not true, return to S6.

Release the check element resources to the check data table, and release the cache nodes managed by the bidirectional linked list metadata organization method to the global free cache node linked list.

If an unexpected power failure occurs at a certain moment, the RAID card controller scans the cache nodes managed by the bidirectional linked list metadata organization method. If it is found that a bit in the bitmap metadata associated with the control field in a cache node is 0, it indicates that the data blocks and check blocks of the stripe temporarily stored in the cache have not been completely flushed to the disk. The data that has not been flushed to the physical disk is the specific data lost due to the unexpected power failure.

Read the timestamp and key information stored in the corresponding verification element data field and start the data recovery process.

Determine whether the data recovery is successful. If the data recovery is successful, jump to S9 to release related resources. If the data recovery fails, the process ends.

In summary, the data recovery method provided by the present application maintains cache nodes and verification data tables in a non-volatile random access memory, records the data flush information of the stripe through the cache nodes during the data writing process, and records the data recovery information through the verification elements in the verification data table. After recovering from an unexpected power outage, the information recorded in the non-volatile random access memory can be read to complete the data recovery. The file system does not need to be accessed during the entire data writing process, and the file system does not need to be accessed during data recovery, thereby accelerating the recovery of data lost in the RAID card due to an unexpected power outage and improving the RAID card data writing performance.

The present application also provides a data recovery device, and the device described below can be referred to in correspondence with the method described above. Please refer to Figure 7, which is a schematic diagram of a data recovery device provided in an embodiment of the present application. As shown in Figure 7, the device includes:

The determination module 10 is used to scan the cache nodes and determine the stripes whose data flushing is not completed when the power is unexpectedly lost according to the data flushing information recorded by the cache nodes; the cache nodes are located in the non-volatile random access memory;

A reading module 20, used to read data recovery information from a check element of a check data table; the check data table is located in a non-volatile random access memory;

The recovery module 30 is used to recover the data of the stripe for which data flushing has not been completed according to the data recovery information.

Based on some of the above embodiments, as a specific implementation manner, the determination module 10 is specifically used to:

Based on some of the above embodiments, as a specific implementation, the determination module 10 includes:

A first determining unit, configured to determine that the stripe has not completed data flushing when power is unexpectedly lost if the value of the bit corresponding to the stripe is a first value;

The second determining unit is configured to determine that the strip has completed data flushing when power is unexpectedly lost if the value of the bit corresponding to the strip is a second value.

Based on some of the above embodiments, as a specific implementation method, one bit in the bitmap metadata corresponds to one stripe.

Based on some of the above embodiments, as a specific implementation method, it also includes:

A first judgment module is used to judge whether there is a stripe for which data is not flushed when power is off unexpectedly according to the value assigned to the control field;

If it exists, the determination module determines the stripe for which data flushing is not completed when power is lost unexpectedly according to the bit in the bitmap metadata associated with the control field in the cache node.

Based on some of the above embodiments, as a specific implementation manner, the first judgment module includes:

A first determining unit, configured to determine that if the control field is set to true, there is no stripe for which data is not flushed when power is unexpectedly lost;

The second determining unit is used to determine that if the value assigned to the control field is false, there is a stripe whose data is not flushed when the power is unexpectedly lost.

The first recording module is used to apply for a cache node and refresh information when the cache node records data.

The second recording module is used to apply for a verification element and record data recovery information in the verification element.

Based on some of the above embodiments, as a specific implementation, the second recording module includes:

A first setting unit, configured to set a bit position corresponding to the stripe in the bitmap metadata associated with the control field in the cache node to a first value if all the data blocks and the check blocks of the stripe have been flushed to the disk;

The second setting unit is used to set the bit position corresponding to the stripe in the bitmap metadata to a second value if the data blocks and the check blocks of the stripe are not all flushed to the disk.

The first assignment module is used to assign a control field to be true if the data blocks and the check blocks of each stripe have been flushed to the disk;

The second assignment module is used to assign a value of false to the control field if all data blocks and check blocks of at least one stripe are not flushed to the disk.

Based on some of the above embodiments, as a specific implementation method, the cache node also includes a front pointer field and a back pointer field; the front pointer field points to the cache node applied for the previous host I/O write request, and the back pointer field points to the cache node applied for the next host I/O write request.

The third recording module is used to record the calculation result of the check block of the stripe in the check element.

Based on some of the above embodiments, as a specific implementation, the third recording module includes:

A first recording unit, configured to set a corresponding bit position in a stripe field in a check element to a first preset value if a check block of the stripe has been calculated;

The second recording unit is used to set the corresponding bit position in the stripe field to a first preset value if the check block of the stripe is not calculated.

Based on some of the above embodiments, as a specific implementation method, one bit in the stripe field corresponds to one stripe.

A third assignment module is used to assign a value of true to a check field in a check element if the check blocks of all stripes have been calculated;

The fourth assignment module is used to assign the check field to false if the check blocks of all the stripes are not calculated.

The write module is used to write the stripe data blocks into the cache field of the cache node.

The second judgment module is used to judge whether the data recovery is successful;

The release module is used to release the cache node and the check element if the data recovery is successful.

The data recovery device provided by the present application maintains cache nodes and a verification data table in a non-volatile random access memory, records the data flush information of the stripe through the cache node during the data writing process, and records the data recovery information through the verification elements in the verification data table. After recovering from an unexpected power failure, the information recorded in the non-volatile random access memory can be read to complete the data recovery. The file system does not need to be accessed during the entire data writing process, and the file system does not need to be accessed during data recovery, thereby accelerating the recovery of data lost in the RAID card due to an unexpected power failure and improving the RAID card data writing performance.

The present application also provides a data recovery device, as shown in FIG8 , the device includes a memory 1 and a processor 2 .

Memory 1, used for storing computer programs;

Processor 2 is used to execute the computer program to implement the following steps:

Scan cache nodes, and determine the stripes for which data flushing has not been completed during unexpected power failure based on data flushing information recorded in the cache nodes; the cache nodes are located in a non-volatile random access memory; read data recovery information from a check element of a check data table; the check data table is located in the non-volatile random access memory; and recover data of the stripes for which data flushing has not been completed based on the data recovery information.

For an introduction to the equipment provided in this application, please refer to the above method embodiments, and this application will not go into details here.

The present application also provides a non-volatile readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, the following steps can be implemented:

The non-volatile readable storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and other media that can store program codes.

For an introduction to the non-volatile readable storage medium provided in this application, please refer to the above method embodiment, and this application will not go into details here.

The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. For the devices, equipment, and non-volatile readable storage media disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the description is relatively simple, and the relevant parts can be referred to the method part description.

Professionals may further appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the interchangeability of hardware and software, the composition and steps of each example have been generally described in the above description according to function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians may use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

The steps of the methods or algorithms described in the embodiments disclosed herein may be implemented directly using hardware, software modules executed by a processor, or a combination of the two. The software modules may be placed in random access memory (RAM), internal memory, read-only storage, or other storage mediums. The memory may be stored in a memory device (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the technical field.

The data recovery method, device, equipment and non-volatile readable storage medium provided by the present application are introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method and core idea of the present application. It should be pointed out that for ordinary technicians in this technical field, without departing from the principles of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall within the scope of protection of the claims of the present application.

Claims

A data recovery method, comprising:

Scanning cache nodes, and determining stripes for which data flushing has not been completed when power is unexpectedly lost according to data flushing information recorded by the cache nodes; the cache nodes are located in a non-volatile random access memory;

Reading data recovery information from a check element of a check data table; the check data table is located in the non-volatile random access memory;

The data of the stripe for which data flushing is not completed is restored according to the data recovery information.
The data recovery method according to claim 1, characterized in that the step of determining, based on the data flushing information recorded by the cache node, the stripes for which data flushing has not been completed during an unexpected power outage comprises:

The stripes for which data flushing is not completed when power is unexpectedly lost are determined according to the bits in the bitmap metadata associated with the control field in the cache node.
The data recovery method according to claim 2, characterized in that, according to the bits in the bitmap metadata associated with the control field in the cache node, determining the stripes for which data flushing is not completed when power is lost unexpectedly comprises:

If the value of the bit corresponding to the stripe is the first value, it is determined that the stripe has not completed data flushing when the power is unexpectedly lost;

If the value of the bit corresponding to the stripe is the second value, it is determined that the stripe has completed data flushing when the power is unexpectedly lost.
The data recovery method according to claim 3 is characterized in that one bit in the bitmap metadata corresponds to one stripe.
The data recovery method according to claim 1, characterized in that before determining the stripe for which data flushing is not completed during an unexpected power failure according to the bit in the bitmap metadata associated with the control field in the cache node, the method further comprises:

According to the value assigned to the control field, determining whether there is a stripe for which data is not flushed when power is unexpectedly lost;

If so, the stripes for which data flushing is not completed during the unexpected power failure are determined according to the bits in the bitmap metadata associated with the control field in the cache node.
The data recovery method according to claim 5, characterized in that judging whether there is a stripe for which data is not flushed when power is off unexpectedly according to the value assigned to the control field comprises:

If the control field is set to true, there is no stripe whose data is not flushed when power is unexpectedly lost;

If the value assigned to the control field is false, there are stripes whose data is not flushed when power is unexpectedly lost.
The data recovery method according to claim 1, further comprising:

Apply for the cache node, and record the data in the cache node to refresh information.
The data recovery method according to claim 1, further comprising:

Apply for the verification element, and record the data recovery information in the verification element.
The data recovery method according to claim 7, characterized in that recording the data flush information in the cache node comprises:

If all the data blocks and check blocks of the stripe have been flushed to the disk, the bit position corresponding to the stripe in the bitmap metadata associated with the control field in the cache node is set to a first value;

If the data blocks and the check blocks of the stripe are not all flushed to the disk, the bit position corresponding to the stripe in the bitmap metadata is set to a second value.
The data recovery method according to claim 9, characterized in that recording the data flush information in the cache node further comprises:

If the data blocks and the check blocks of each stripe have been flushed to the disk, the control field is set to true;

If the data blocks and the check blocks of at least one of the stripes are not all flushed to the disk, the control field is assigned a value of false.
The data recovery method according to claim 1 is characterized in that the cache node also includes a front pointer field and a back pointer field; the front pointer field points to the cache node applied for the previous host I/O write request, and the back pointer field points to the cache node applied for the next host I/O write request.
The data recovery method according to claim 1, further comprising:

The calculation result of the check block of the stripe is recorded in the check element.
The data recovery method according to claim 12, wherein recording the calculation result of the check block of the stripe in the check element comprises:

If the check block of the stripe has been calculated, the corresponding bit position in the stripe field in the check element is set to a first preset value;

If the check block of the stripe is not calculated, the corresponding bit position in the stripe field is set to a second preset value.
The data recovery method according to claim 13 is characterized in that one bit in the stripe field corresponds to one stripe.
The data recovery method according to claim 12, characterized in that recording the calculation result of the check block of the stripe in the check element further comprises:

If all the check blocks of the stripe have been calculated, assigning a value of true to the check field in the check element;

If the check blocks of all the stripes are not calculated, the check field is assigned a value of false.
The data recovery method according to claim 1, further comprising:

The data blocks of the stripe are written into the cache field of the cache node.
The data recovery method according to claim 1, further comprising:

Determine whether data recovery is successful;

If the data recovery is successful, the cache node and the check element are released.
A data recovery device, comprising:

A determination module, configured to scan cache nodes and determine, based on data flushing information recorded by the cache nodes, stripes for which data flushing has not been completed when power is unexpectedly lost; the cache nodes are located in a non-volatile random access memory;

A reading module, used for reading data recovery information from a check element of a check data table; the check data table is located in the non-volatile random access memory;

The recovery module is used to recover the data of the stripe for which data flushing has not been completed according to the data recovery information.
A data recovery device, comprising:

Memory for storing computer programs;

A processor, configured to implement the steps of the data recovery method according to any one of claims 1 to 17 when executing the computer program.
A non-volatile readable storage medium, characterized in that a computer program is stored on the non-volatile readable storage medium, and when the computer program is executed by a processor, the steps of the data recovery method according to any one of claims 1 to 17 are implemented.