CN114153800A

CN114153800A - Data consistency maintenance method, device and equipment

Info

Publication number: CN114153800A
Application number: CN202111439780.6A
Authority: CN
Inventors: 徐涛; 罗心; 江文龙; 周明伟
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-03-08

Abstract

The invention provides a method, a device and equipment for maintaining data consistency, wherein the method comprises the following steps: receiving the metadata of the zone reported by the storage nodes in full, and acquiring the metadata of the ZG in the ZG cache; according to the ZG to which each zone belongs, searching whether the corresponding ZG comprises the corresponding zone in the metadata of the ZG, and determining the first zone which is not searched; checking, in metadata of the zones, whether the included zones can be checked according to the zones included in the at least one ZG, and determining a second zone that is not checked; and updating the metadata of the ZG so as to add the first zone to the first ZG in the ZG cache and delete the second zone from the corresponding ZG, thereby ensuring the real-time consistency and correctness of the data of the management node and the data node.

Description

Data consistency maintenance method, device and equipment

Technical Field

The invention relates to the technical field of distributed object storage, in particular to a method, a device and equipment for maintaining data consistency.

Background

SMR (shingled magnetic recording) is a leading next-generation magnetic disk technology, and adjacent tracks are partially overlapped in sequence, thereby increasing the storage density of a unit storage medium and reducing the storage cost. It is due to the physical characteristics of the SMR Disk that the read behavior is the same as that of the ordinary HDD (Hard Disk Drive) mechanical Hard Disk, but the write behavior is changed greatly, which does not support random write and in-place update write, because the overlapping tracks are overwritten. The SMR disk only supports head-to-tail sequential writes.

SMR disks divide the tracks into a plurality of bands (bands), i.e., areas where shells made up of consecutive tracks are written consecutively, each area becoming a basic unit of cells that need to be written sequentially. Band is the physical concept of an SMR disk, and the corresponding logical concept is called "zone" (zone), and the size of one zone is 256 MB.

Because an SMR disk has an unavoidable advantage in price, how to implement data writing by managing a zone on the SMR disk, and how to implement construction and maintenance of metadata of all zone information of all systems on a management node, load balancing, multiplexing of a space, recovery of a zone, and the like have great challenges.

Disclosure of Invention

The invention provides a data consistency maintenance method which is used for solving the problem of data inconsistency.

In a first aspect, the present invention provides a data consistency maintenance method, including:

receiving metadata of zones reported by a storage node in full, and acquiring metadata of ZGs in a ZG cache, wherein the metadata of the zones comprises ZGs to which the zones belong, and the metadata of the ZGs is used for recording the zones included by at least one ZG;

according to the ZG to which each zone belongs, searching whether the corresponding ZG comprises the corresponding zone in the metadata of the ZG, and determining the first zone which is not searched;

checking, in metadata of the zones, whether the included zones can be checked according to the zones included in the at least one ZG, and determining a second zone that is not checked;

updating metadata of the ZGs to add the first zone to a first ZG in a ZG cache and to delete the second zone from a corresponding ZG, the first ZG being a ZG to which the first zone determined from the metadata belongs.

In one possible implementation, the method further includes:

in response to a data write instruction, selecting ZGs with the number of zones consistent with the number of object slices of an object as target ZGs according to the metadata of the ZGs;

indicating a storage node where a target zone in the target ZG is located to write in a corresponding object fragment, receiving metadata of the target zone reported after the corresponding storage node writes the corresponding object fragment in the target zone respectively, and updating the metadata of the ZG according to the reported metadata of the target zone; or

Responding to a data deleting instruction, and determining a target ZG where an object to be deleted is located according to the metadata of the ZG;

and indicating the storage node where the target zone in the target ZG is located to delete the corresponding object fragment, receiving the reported metadata of the target zone after the corresponding storage node deletes the corresponding object fragment from the target zone, and updating the metadata of the ZG according to the reported metadata of the target zone.

In a possible implementation manner, after the metadata of the zone further includes zone data storage information, and the metadata of the ZG further includes zone data storage information, and receives the metadata of the zone reported by all the storage nodes, and acquires the metadata of the ZG in the ZG cache, the method further includes:

when a corresponding zone is searched in the corresponding ZG in the metadata of the ZG according to the ZG to which the zone belongs, or when the corresponding zone is searched in the metadata of the zone according to the zone included in the ZG, determining whether the searched zone is consistent in the storage information of the zone data in the metadata of the zone and the storage information of the zone data in the metadata of the ZG;

and when the consistency is determined, updating the metadata of the ZG so as to update the storage information of the corresponding zone data in the metadata of the ZG to the storage information of the corresponding zone data in the metadata of the zone.

In one possible implementation, the method further includes:

when updating the metadata of the ZG, recording the operation executed by updating the metadata of the ZG, generating operation log information and storing the operation log information into a log file;

and determining that the data volume in the log file reaches a set threshold or the log file reaches a set period, and performing serialized compression storage on the log file as a metadata mirror image file, wherein corresponding ZG metadata is generated according to an operation log recorded in the log file in the serialized compression storage process.

In one possible implementation, the method further includes:

responding to a metadata recovery instruction of ZG, when only a metadata mirror image file exists, analyzing the metadata mirror image file, and performing data recovery by using the metadata of ZG stored in the metadata mirror image file;

when only a new log file which is not stored as a metadata mirror image file exists, analyzing the new log file, and generating corresponding ZG metadata according to an operation log stored in the new log file;

when determining that a metadata mirror image file and a new log file exist at the same time, analyzing the metadata mirror image file, performing data recovery by using the metadata of the ZG stored in the metadata mirror image file to obtain the metadata of the current ZG, analyzing the new log file, and updating the metadata of the current ZG according to an operation log stored in the new log file.

In a possible implementation manner, when recording an operation performed to update the metadata of the ZG, generating operation log information, and storing the operation log information in a log file, the method further includes:

and the log file is compressed in a serialization way and is sent to a standby management node, so that when the ZG metadata is not stored, the ZG metadata is generated in a memory according to an operation log obtained after the log file is analyzed, or when the ZG metadata is stored, the ZG metadata is updated according to the operation log obtained after the log file is analyzed.

In one possible implementation, updating metadata of the ZG to further include, when adding the first zone to the first ZG in a ZG cache:

and generating and recording the association information of the corresponding relation between the first zone and the corresponding storage node, and storing the association information into the metadata of the ZG.

In one possible implementation, the method further includes:

and when the storage node is determined to be deleted, deleting the metadata corresponding to the zone belonging to the storage node from the metadata of the ZG according to the identification of the storage node.

In one possible implementation, the method further includes:

when the storage node is determined to be offline, according to the identification of the storage node, searching metadata corresponding to the zone matched with the identification of the storage node in the metadata of the ZG, and updating the zone state in the metadata corresponding to the found zone to be an offline state.

In a second aspect, the present invention provides a data consistency maintenance device, including:

the metadata acquisition module is used for receiving metadata of the zones reported by the storage nodes in full quantity and acquiring metadata of the ZGs in the ZG cache, wherein the metadata of the zones comprises the ZGs to which the zones belong, and the metadata of the ZGs is used for recording the zones included by at least one ZG;

the first zone obtaining module is used for searching whether the corresponding ZG comprises the corresponding zone in the metadata of the ZG according to the ZG to which each zone belongs and determining the first zone which is not searched;

a second zone obtaining module, configured to check, according to a zone included in the at least one ZG, whether the included zone is detectable in metadata of the zone, and determine an undetected second zone;

and the metadata updating module is used for updating the metadata of the ZG so as to add the first zone to a first ZG in a ZG cache and delete the second zone from the corresponding ZG, wherein the first ZG is the ZG to which the first zone belongs and is determined according to the metadata.

In a third aspect, the present invention provides a data consistency maintenance device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program to perform the following steps:

In a fourth aspect, the present invention provides a computer-readable storage medium, on which computer program instructions are stored, which, when executed by a processor, perform the steps corresponding to the above-mentioned data consistency maintenance method.

Based on the data consistency maintenance method provided by the invention, the method has the following beneficial effects:

the method comprises the steps of determining the zone to be deleted and added by performing bidirectional comparison on the metadata of the zone reported by the total amount of the storage nodes and the metadata in the ZG cache of the management node, and performing targeted reset on the ZG states corresponding to the two types of zones, so that the ZG state is prevented from being corrected by full-table scanning, and compared with the prior art, the efficiency is greatly improved by performing cyclic traversal on the reset process. Meanwhile, through bidirectional comparison, the defect that ZG cache data constructed by the management node of the V1 version is strongly dependent on storage node reporting is overcome, the condition that zone dirty data appears in the ZG cache of the management node is avoided, and the real-time consistency and correctness of the storage node and the ZG cache metadata of the management node are finally maintained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a block diagram of a system module according to an embodiment of the present invention;

fig. 2 is a flowchart of a data consistency maintenance method according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a bidirectional comparison operation between a management node and zone metadata in a storage node according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a management node recovering a ZG cache based on a persistent file according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating resetting of ZG metadata when a storage node is abnormal according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a data consistency maintenance apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a data consistency maintenance device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Some of the words that appear in the text are explained below:

1. the terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

2. The term SMR hard disk in the embodiment of the invention is a high-capacity and high-cost-performance hard disk, and the storage density is improved by reducing the sector interval. Continuous reading and writing, deletion and random reading are supported, but random writing and deletion are not supported, and the service layer is required to perform continuous writing and deletion of the whole zone data by taking a zone (zone) as a unit. A presently common zone length is 256 Megabytes (MB). SMR hard disk has large storage space, and the performance is superior to that of a CMR disk in a sequential writing scene.

3. The term "distributed storage system" in the embodiments of the present invention refers to a storage manner in which data is stored in a distributed manner on a plurality of independent devices, in the distributed storage system, a file (file) is divided into different objects (objects), and each object is divided into different data blocks (blocks). The storage mode can improve the reliability, the availability and the access efficiency of the system and is easy to expand.

4. The term "ZG" in the embodiment of the present invention refers to an abbreviation of a group (ZoneGroup) composed of a plurality of zones, and the embodiment of the present invention is applied to a distributed storage system that uses a ZG format for data storage and management, that is, when file data is written, one file data may be divided into a plurality of objects, and one object may be divided into M + N object slices, and when an object data writing instruction is received, and the corresponding M + N object slices are written into M + N zones, the M + N zones are stored and managed as one ZG (ZoneGroup, a set of zones), where one ZG includes M + N zones, and each zone is mapped to a specific zone region of a fixed SMR disk on a different storage node.

5. The term "data consistency" in the embodiments of the present invention refers to consistency of zone states in SMR disks. Consistency here has two implications: one aspect refers to metadata consistency between clusters of distributed object storage system management nodes (host management nodes and standby management nodes); the other aspect refers to consistency of the zone metadata (i.e. attribute information of the zone, write pointer position in the zone, etc.) in the management node and the node (dataode, DN) in the system.

The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems.

The embodiment of the invention mainly aims to realize the load balancing technology for managing and designing the SMR disk zone space in the distributed object storage system; the data writing of the zone space is realized according to a certain strategy, the writing characteristics of the zone space are combined, the zone space is utilized to the maximum extent, and the writing management of a group of zones is realized according to an abstracted ZG (Zonecroup) logic concept. Combining the characteristics of distributed object storage, the management node abstractly manages the metadata information of all ZGs, when the SDK client performs service writing, the whole request signaling link carries the ZG information and transmits the ZG information to the storage node, so that the relation binding of the Zone and the ZG is realized, and finally, the data writing and space management of the distributed object storage based on the SMR bare disk are realized.

According to the embodiment of the invention, the zone metadata of each storage node is cached, and the ZG metadata persistent storage and the ZG metadata bidirectional comparison are carried out, so that the ZG state reset in some abnormal scenes is quickly responded and solved. And updating and maintaining the consistency and correctness of the zone data of the whole system.

The data consistency maintenance method provided by the embodiment of the invention is applied to the distributed storage system shown in fig. 1, when an SDK client applies for space to write file data, an instruction for acquiring file data write space is sent to a host management node through a signaling flow, and the host management node acquires, from a corresponding storage node, related information of a zone which can accommodate file data write and corresponds to the instruction according to the instruction, and feeds the related information back to the SDK client, so that the SDK client writes the file data into the corresponding zone. The host management node transmits the information to the standby management node at the same time, so that when the host management node is abnormal, the standby management node can replace the host management node to execute corresponding functions in time.

In the distributed object storage system supporting the SMR disk design V1 version, the management node performs unified caching, management and scheduling on all zone information in the cluster according to ZG as a unit. However, the ZG data is only cached in the memory state of the host management node, and is not persistently stored in the corresponding storage medium. The recovery of the ZG data after the management node is restarted due to the fault strongly depends on the reporting of the storage node, because the storage node zone information is not reported in time (the storage node cannot report the management node at a time in a full amount after the management node is restarted, and the burden of the management node is increased), and due to the related strategies such as batch time-sharing scheduling, the situation that the management node cannot be reused due to the fact that the storage node zone reports the ZG data in time in a short time after the management node is recovered from the fault may occur.

Meanwhile, because the management node does not cache the association information of each storage node and the zone corresponding to the storage node, when the storage node is abnormal (deleted/offline), the management node can only scan and traverse all ZG metadata in a full table to reset the state, but cannot quickly reset the corresponding ZG state according to the mapping relation of the zone metadata in the cached storage node, thereby seriously increasing the time consumed by the ZG for resetting one round.

Finally, when the storage node reports the full amount of the zone metadata, in the prior art, the ZG metadata in the ZG cache of the management node strongly depends on the reporting of the storage node, and bidirectional full amount comparison of the zone metadata of the management node and the data node is not performed, so that the outdated zone metadata (zone dirty data) cannot be deleted quickly, and the real-time accuracy of the ZG data of the management node is guaranteed.

In order to solve the above problem, an embodiment of the present invention provides a data consistency maintenance method, so that a management node records zone metadata information on each storage node respectively according to the storage node as a unit, and performs persistent storage of ZG metadata. When the storage node has an abnormal fault, the ZG state corresponding to the zone on the storage node can be quickly reset, so that the real-time correctness of the ZG state is ensured; when the management node has an abnormal fault, the ZG cache metadata can be quickly recovered based on the persistent metadata file, and the problem of ZG multiplexing failure caused by incomplete reported zone data for the storage node in a short time is solved; and combining ZG cache information recorded by the management node in a lasting way, performing bidirectional full-quantity comparison on the zone metadata when the storage node reports the zone full quantity, adding and deleting the zone metadata information in the management node in real time, and ensuring the consistency and accuracy of the ZG metadata cache in the management node and the real data of the storage node.

Example 1

As shown in fig. 2, an embodiment of the present invention provides a data consistency maintenance method, which includes the following steps:

step 201, receiving metadata of zones reported by a storage node in full, and acquiring metadata of ZGs in a ZG cache, where the metadata of zones includes ZGs to which the zones belong, and the metadata of the ZGs is used to record the zones included in at least one ZG;

the above-mentioned total reported zone metadata refers to metadata of all used zones corresponding to the storage node that the storage node reports periodically, and includes storage information of each zone, that is, various information such as a specific position of the zone in the corresponding storage node, a write pointer position in the zone, remaining space in the zone, and zone attribute information, in addition to the ZG to which each zone belongs.

The above-mentioned ZG metadata includes some attribute information of the ZG, such as remaining space in the ZG, in addition to metadata of each zone in the ZG.

Step 202, searching whether the corresponding ZG comprises the corresponding zone in the metadata of the ZG according to the ZG to which each zone belongs, and determining the first zone which is not searched;

in the implementation, a corresponding zone is searched in the ZG metadata according to the identifier of the ZG described by each zone in the metadata of the zones reported in the total amount, and if the zone is not found, it indicates that the zone exists in the storage node and does not exist in the management node, so that the metadata of the zone needs to be added to the metadata of the ZG of the management node.

In implementation, after the undetected first zone is determined, the first zone is added into a zone queue to be newly added, and after the comparison between the metadata of the zone reported by the node to be stored and the metadata of the ZG of the management node is finished, the new addition of the data in the ZG metadata is performed.

Step 203, checking whether the included zone can be checked in the metadata of the zones according to the zones included in the at least one ZG, and determining a second zone which is not checked;

in the implementation, after the zone to be newly added is judged, the corresponding zone is searched in the metadata of the zones reported in the total amount according to the zone identifiers included in each ZG in the management node, if the zone is not searched, it indicates that the zone exists in the management node, and if the zone does not exist in the storage node, the zone needs to be correspondingly deleted in the management node.

In implementation, after the second zone which is not found is determined, the second zone is added to the zone queue to be deleted, and after the comparison between the metadata of the zone reported by the node to be stored and the metadata of the ZG of the management node is finished, the data in the ZG metadata is deleted.

Step 204, updating metadata of the ZG to add the first zone to a first ZG in a ZG cache and delete the second zone from a corresponding ZG, wherein the first ZG is a ZG to which the first zone belongs and is determined according to the metadata;

as shown in fig. 3, after the comparison between the metadata of the zone reported by the storage node and the metadata of the ZG of the management node is finished, the zone is correspondingly added or deleted in the ZG cache according to the zone recorded in the zone queue to be added and the zone queue to be deleted.

According to the method, the metadata of the zones reported by the total amount of the storage nodes is bidirectionally compared with the metadata in the ZG cache of the management node, the zones to be deleted and added are determined, and the ZG states corresponding to the two zones are subjected to targeted resetting, so that the correction of the ZG states by full-table scanning is avoided, and compared with the prior art, the efficiency is greatly improved by the cyclic traversal of the resetting process. Meanwhile, through bidirectional comparison, the defect that ZG cache data constructed by the management node of the V1 version is strongly dependent on storage node reporting is overcome, the condition that zone dirty data appears in the ZG cache of the management node is avoided, and the real-time consistency and correctness of the storage node and ZG cache metadata of the management node are maintained finally.

In the above bidirectional comparison process, in addition to determining the zones to be added and deleted, it is also necessary to determine the zones to be updated, that is, the zones whose data storage information is inconsistent exist in the zone metadata reported by the storage node in total and the metadata of the management node ZG, for example, the positions of the write pointers are inconsistent, and update the metadata of the ZG according to the data storage information of the zone in the zone metadata reported by the storage node in total.

As an optional implementation manner, after the metadata of the zone further includes zone data storage information, and receiving the metadata of the zone reported by the storage nodes in total, and acquiring the metadata of the ZG in the ZG cache, the method further includes:

In the implementation, the storage node reports the full amount of zone metadata periodically and also reports the full amount of zone metadata in real time, that is, when the object fragment data is written into the target zone of the node or deleted from the target zone of the node, the storage node reports the metadata of the target zone to the management node after the data of the object fragment is written into or deleted, and the management node searches the metadata of the corresponding ZG in the ZG cache according to the ZG to which each target zone belongs in the metadata of the target zone reported by the storage node, and correspondingly adds or deletes the metadata of the target zone in the metadata of the ZG.

As an optional implementation, the method further comprises:

When the management node is restarted after an abnormal fault occurs, the ZG cache in the memory is lost and only depends on the report of the storage node, but in order to avoid the storm reported by the storage node and increase the processing pressure of the management node, the storage node reports the zone metadata in batches and in a time-sharing manner, so that the management node cannot rapidly recover the complete mirror image of the management node metadata before the fault.

Two types of metadata persistence files exist in the management node, namely a log file (editlog) and a metadata mirror file (image). For ZG metadata, each time the management node performs an operation of adding or deleting metadata of a zone, an operation log is dropped to a log file. In order to avoid that the log file is added for a long time to cause overlarge data volume, the management node can periodically perform serialization compression and store the log file as a metadata mirror image file, or when the data volume in the log file reaches a set threshold value, the management node can perform serialization compression and store the log file as the metadata mirror image file.

In implementation, as shown in fig. 4, the management node performs persistent storage of ZG metadata in units of storage nodes, and all operations related to addition/deletion/modification of zone metadata in ZG, the management node may drop a corresponding operation log into a log file for persistent recording, and perform serialized compression and storage periodically as a metadata mirror file, or perform serialized compression and storage of the log file as a metadata mirror file when the data amount in the log file reaches a set threshold. Finally, the consistency of the ZG metadata of the memory state and the ZG metadata recorded by the persistent file is kept, and the correctness and the accuracy of the ZG metadata in the memory recovered according to the ZG metadata recorded by the persistent file are ensured when the management node fails.

As an optional implementation, the method further comprises:

Meanwhile, in order to ensure that the standby management node can quickly replace the host management node to execute corresponding functions after the host management node fails, and real-time consistency of data in the standby management node and the host management node needs to be ensured.

As an optional implementation manner, when recording an operation performed to update the metadata of the ZG, generating operation log information, and storing the operation log information in a log file, the method further includes:

On the basis of the storage of the persistent file, after the management node is restarted due to a fault, the metadata of the ZG is acquired from the persistent file in response to a metadata recovery instruction of the ZG to recover the metadata of the ZG in the memory, so that the consistency of the data of the management node and the storage node after the management node is restarted due to the fault is ensured.

As an optional implementation, the method further comprises:

The ZG metadata of the memory state of the management node is persistently stored in the metadata file through the serialization and deserialization technology, and the metadata file is loaded to restore the ZG metadata of the system when the system is restarted and restored, so that the problems that when the ZG data in the V1 version is not persistent, the fault restoration data of the management node constructs the periodic and full-scale zone report strongly dependent on the storage node, and the report action of certain storage nodes zone corresponding to ZG is not completed, the ZG which can be reused in the corresponding real state is temporarily not reusable because the zone report is not completed, and the service interruption occurs in an extreme scene are solved.

And combining with actual project testing, when the management node is restarted due to a fault, the ZG metadata state before system abnormality can be quickly restored according to the metadata mirror image file which is persistent and the metadata files such as log files. After the mirror image file and the operation log are loaded, the metadata of all ZGs of the system are determined, and the situation that some ZGs are temporarily not reusable due to incomplete zone reporting does not exist any more.

In the prior art, when a storage node has an abnormal fault (delete/offline), a management node can only reset the state in the ZG metadata in a full-table scanning traversal mode, which takes a long time and is inefficient. In order to solve the above problems, the embodiment of the present invention stores the association information between each storage node and the corresponding zone in the metadata of the management node ZG, thereby implementing the accurate management of the management node on the zone metadata of the storage nodes.

When the abnormal fault is brought down again in the storage node after the management node is introduced to cache the associated information of each storage node and the corresponding zone, the zone state in the corresponding ZG can be quickly reset according to the associated information stored in the management node so as to maintain the correctness and consistency of the ZG state of the management node.

As an optional implementation, updating metadata of the ZG to further include, when the first zone is added to the first ZG in the ZG cache:

As shown in fig. 5, when a storage node is abnormal, the ZG to which the zone corresponding to the storage node belongs is quickly determined according to the identifier of the storage node and the association relationship between the storage node and the corresponding zone, and the zone metadata is deleted or the state of the zone metadata is changed in the metadata of the corresponding ZG.

As an optional implementation, the method further comprises:

Through actual project testing, 300 storage nodes 36 are tested (14 TB: 7 ten thousand zone), and the estimated ZG cache amount in the system can reach hundreds of millions of levels after the system runs for a long time. A full table scan on the order of hundreds of millions takes a reset on the order of minutes or even hours. However, after the management node caches the association information of the zone corresponding to each storage node, because the ZG metadata cached by the management node is updated in real time along with the periodic full volume/incremental reporting and real-time reporting of the storage node, the authenticity and validity of the zone data cached inside the management node are ensured, when the storage node is in an abnormal fault, only the metadata of the ZG corresponding to the storage node zone is reset in a targeted manner, so that full-table scanning of the ZG metadata in the billion-level ZG cache is avoided, and the system performance is improved.

Example 2

A data consistency maintenance method according to an embodiment of the present invention is described above, and an apparatus for performing the data consistency maintenance is described below.

Referring to fig. 6, an embodiment of the present invention provides a data consistency maintenance apparatus, applied to a host management node, including:

a metadata obtaining module 601, configured to receive metadata of zones reported by a storage node in full volume, and obtain metadata of a ZG in a ZG cache, where the metadata of a zone includes a ZG to which each zone belongs, and the metadata of a ZG is used to record a zone included in at least one ZG;

a first zone obtaining module 602, configured to search, according to a ZG to which each zone belongs, whether a corresponding ZG includes a corresponding zone in metadata of the ZG, and determine an undetected first zone;

a second zone obtaining module 603, configured to check, according to a zone included in the at least one ZG, whether the included zone is detectable in metadata of the zone, and determine an undetected second zone;

a metadata updating module 604, configured to update metadata of the ZGs, so as to add the first zone to a first ZG in a ZG cache and delete the second zone from a corresponding ZG, where the first ZG is a ZG to which the first zone belongs and is determined according to the metadata.

Optionally, the metadata updating module is further configured to:

Optionally, the metadata of the zone further includes zone data storage information, the metadata of the ZG further includes zone data storage information, and the metadata obtaining module is further configured to, after receiving the metadata of the zone reported by the storage nodes in full and obtaining the metadata of the ZG in the ZG cache:

Optionally, the apparatus further includes a metadata persistence module, where the metadata persistence module is configured to:

Optionally, the apparatus further includes a metadata recovery module, where the metadata recovery module is configured to:

Optionally, the metadata persistence module is configured to record an operation executed to update the metadata of the ZG, generate operation log information, and store the operation log information in a log file, and further includes:

Optionally, the metadata updating module is configured to update metadata of the ZG, so that when the first zone is added to the first ZG in the ZG cache, the method further includes:

Optionally, the metadata updating module is further configured to:

Example 3

Having described the data consistency maintenance method and apparatus according to an exemplary embodiment of the present invention, an apparatus according to another exemplary embodiment of the present invention is described next.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible embodiments, an apparatus according to the present invention may comprise at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the data consistency maintenance method according to various exemplary embodiments of the present invention described above in this specification. For example, the processor may perform the steps of:

An apparatus 700 according to this embodiment of the invention is described below with reference to fig. 7. The device 700 shown in fig. 7 is only an example and should not bring any limitation to the function and scope of use of the embodiments of the present invention.

As shown in fig. 7, the device 700 is embodied in the form of a general purpose device. The components of device 700 may include, but are not limited to: the at least one processor 701, the at least one memory 702, and the bus 703 that connects the various system components (including the memory 702 and the processor 701).

Bus 703 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 702 can include readable media in the form of volatile memory, such as Random Access Memory (RAM)7021 and/or cache memory 7022, and can further include Read Only Memory (ROM) 7023.

Memory 702 may also include a program/utility 7025 having a set (at least one) of program modules 7024, such program modules 7024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Device 700 can also communicate with one or more external devices 704 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with device 700, and/or with any devices (e.g., router, modem, etc.) that enable device 700 to communicate with one or more other devices. Such communication may occur via input/output (I/O) interfaces 705. Also, the device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 706. As shown, the network adapter 706 communicates with the other modules for the device 700 over a bus 703. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the device 700, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Optionally, the processor is further configured to:

Optionally, the metadata of the zone further includes zone data storage information, the metadata of the ZG further includes zone data storage information, and the processor receives the metadata of the zone reported by the storage node in full, and after acquiring the metadata of the ZG in the ZG cache, is further configured to:

Optionally, the processor is further configured to:

Optionally, the processor records an operation executed to update the metadata of the ZG, and when generating operation log information and storing the operation log information in a log file, the processor is further configured to:

Optionally, the processor is configured to update metadata of the ZG, so that when the first zone is added to the first ZG in the ZG cache, the processor further includes:

Optionally, the processor is further configured to:

In some possible embodiments, the aspects of a data consistency maintenance method provided by the present invention can also be implemented in the form of a program product, which includes program code for causing a computer device to perform the steps of a data consistency maintenance method according to various exemplary embodiments of the present invention described above in this specification when the program product runs on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for monitoring of embodiments of the present invention may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device, or entirely on the remote device or server. In the case of remote devices, the remote devices may be connected to the user device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external devices (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the units described above may be embodied in one unit, according to embodiments of the invention. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and block diagrams, and combinations of flows and blocks in the flow diagrams and block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for maintaining data consistency, the method comprising:

2. The method of claim 1, further comprising:

3. The method according to claim 1, wherein the metadata of the zone further includes zone data storage information, and the metadata of the ZG further includes zone data storage information, and after receiving the metadata of the zone reported by the storage node in full and acquiring the metadata of the ZG in the ZG cache, the method further includes:

4. The method of any one of claims 1 to 3, further comprising:

5. The method of claim 4, further comprising:

6. The method of claim 4, wherein recording operations performed to update the metadata of the ZG, generating operation log information, and storing the operation log information in a log file, further comprises:

7. The method of claim 1, wherein updating the metadata for the ZG to add the first zone to the first ZG in the ZG cache further comprises:

8. The method of claim 7, further comprising:

9. The method of claim 7, further comprising:

10. A data consistency maintenance apparatus, characterized in that the apparatus comprises:

11. A data consistency maintenance device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, wherein said processor implements the method of any of claims 1 to 9 when executing said computer program.

12. A computer storage medium on which a computer program is stored, which program, when being executed by a processor, carries out the steps of any one of claims 1 to 9.