WO2021082465A1 - Method for ensuring data consistency and related device - Google Patents

Method for ensuring data consistency and related device Download PDF

Info

Publication number
WO2021082465A1
WO2021082465A1 PCT/CN2020/096005 CN2020096005W WO2021082465A1 WO 2021082465 A1 WO2021082465 A1 WO 2021082465A1 CN 2020096005 W CN2020096005 W CN 2020096005W WO 2021082465 A1 WO2021082465 A1 WO 2021082465A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
metadata
identifier
cluster
Prior art date
Application number
PCT/CN2020/096005
Other languages
French (fr)
Chinese (zh)
Inventor
孟俊才
徐鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021082465A1 publication Critical patent/WO2021082465A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the invention relates to the technical field of computer distributed storage systems, in particular to a method and related equipment for ensuring data consistency.
  • the Raft protocol is a distributed consensus protocol that adopts the plan of the elite to lead the overall situation. In the entire cluster, only the master node can process requests sent by clients, and other nodes must forward them to the master node for processing even if they receive requests.
  • the Raft protocol strongly relies on the master node to ensure cluster data consistency.
  • distributed locks have a very wide range of usage scenarios. For example, in a distributed system, when different devices access a shared resource, the system often needs a distributed lock to support the mutual exclusion of access to the shared resource to ensure consistency, that is, only one node can hold the lock.
  • the lock preemption can guarantee the uniqueness of the master node.
  • the embodiment of the invention discloses a method and related equipment for ensuring data consistency, which can ensure data consistency and avoid data conflicts without loss of database performance.
  • the present application provides a method for ensuring data consistency, including: a first node receives an upgrade message sent by a node management server, the node management server is used to manage a node cluster, and the node cluster includes the first node A node; the first node updates the tenure management data, the tenure management data includes root metadata identification and tenure identification, the root metadata identification is used to determine root metadata, the root metadata is used to manage the Metadata corresponding to the node cluster, the term identifier is used to indicate that the first node is upgraded to the master node of the node cluster; the first node sets the data corresponding to the node cluster to read-only mode, and The data includes the root metadata.
  • the first node upgrades to the master node after receiving the upgrade message sent by the node management server, updates the tenure management data, and sets the data corresponding to the node cluster to read-only mode to ensure that at the same time, Only one node can write data, thereby ensuring data consistency and avoiding data conflicts.
  • the entire process does not need to negotiate with other nodes, the performance of the system is guaranteed.
  • the first node updates the tenure identifier while reading the root metadata identifier.
  • the first node guarantees that reading the root metadata identifier and updating the tenure identifier are atomic, which can prevent other nodes, such as the original master node, from concurrently modifying the root metadata identifier during this process, resulting in data conflicts. Lead to data inconsistencies.
  • the node cluster further includes a second node, and the second node is used to read and write data corresponding to the node cluster and update the root element Data identification; after the first node updates the tenure management data, the root metadata identification is locked, and the root metadata identification is prohibited from being updated by the second node.
  • the second node before the first node updates the tenure management data, the second node (for example, the original master node) can read and write the data corresponding to the node cluster, and can update the root metadata identification, but the tenure is updated on the first node After managing the data, the root metadata identifier will be locked. Although the second node can continue to read and write data, it is not allowed to modify the root metadata identifier. This ensures that in the subsequent process, only one node will be able to exist at any one time. Write data to ensure data consistency.
  • the data corresponding to the node cluster includes root metadata, metadata, and user data
  • the metadata is used to manage the user data
  • the User data is data written to the node cluster; after the first node sets the data corresponding to the node cluster to read-only mode, sets the metadata to read-only mode, and finally sets the user data It is read-only mode.
  • the first node is set in layers, and the root metadata, metadata, and user data are set to read-only mode in turn to ensure the consistency of the data corresponding to the node cluster and ensure that the first node can accurately Find all user data written to the node cluster.
  • the first node updates the root metadata identifier and writes data to the node cluster.
  • the first node after the first node is upgraded to become the new master node, it can write user data to the node cluster, and can manage the written user data by updating the root metadata identifier.
  • the present application provides a first node, including: a receiving module, configured to receive an upgrade message sent by a node management server, where the node management server is used to manage a node cluster, and the node cluster includes the first node.
  • Node update module, used to update tenure management data, the tenure management data includes root metadata identification and tenure identification, the root metadata identification is used to determine root metadata, the root metadata is used to manage the node Metadata corresponding to the cluster, the tenure identifier is used to indicate that the first node is upgraded to the master node of the node cluster; a processing module is used to set the data corresponding to the node cluster to a read-only mode, the data Including the root metadata.
  • the update module is further configured to read the root metadata identifier and update the tenure identifier at the same time.
  • the node cluster further includes a second node, and the second node is used to read and write data corresponding to the node cluster and update the root element Data identification; after the update module updates the tenure management data, the root metadata identification is prohibited from being updated by the second node.
  • the data corresponding to the node cluster includes root metadata, metadata, and user data
  • the metadata is used to manage the user data
  • the User data is data written to the node cluster
  • the processing module is also used to set the root metadata to read-only mode, then set the metadata to read-only mode, and finally set the user data Set to read-only mode.
  • the processing module is further configured to update the root metadata identifier and write data to the node cluster.
  • the present application provides a computing device, the computing device includes a processor and a memory, the memory is configured to store program code, and the processor is configured to call the program code in the memory to execute the above-mentioned first aspect And a method combining any one of the above-mentioned first aspects.
  • the present application provides a computer-readable storage medium that stores a computer program.
  • the computer program When the computer program is executed by a processor, it can implement the above-mentioned first aspect and in combination with the above-mentioned first aspect.
  • the process of the method provided by any implementation method.
  • the present application provides a computer program product, the computer program product includes instructions, when the computer program is executed by a computer, the computer can execute the first aspect and any one of the first aspects mentioned above.
  • the flow of the method provided by the method is not limited to:
  • FIG. 1 is a schematic diagram of a node state switching provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for ensuring data consistency provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a data storage relationship provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a first node provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Cloud database is a stable, reliable, and elastically scalable online database service. This database is deployed in a virtual computing environment and managed through a unified management system, which can effectively reduce maintenance costs.
  • the computing and storage of the cloud database are separated.
  • the computing layer database does not store user data, but is only responsible for computing.
  • the storage layer uses a new storage system as a shared storage system.
  • Atomic write means that all write operations in an inseparable transaction must end or roll back together, and are indivisible.
  • Traditional storage systems manage data in units of blocks (for example, a block is 512 bytes).
  • a certain write will partially succeed and partially fail; or in a concurrent scenario, two threads
  • the written data overwrites each other. For example, one thread needs to write 123 and one thread needs to write 456. Concurrency may cause the write to 126.
  • the function of atomic write is to ensure that either all writes succeed or all fail at one time, so the above situation can be avoided.
  • Append only storage is a file system customized for new storage hardware such as solid state drive (SSD).
  • This file system provides append, delete, seal, etc. Basic operation, modification operation is not supported.
  • the seal operation is an operation peculiar to the append only storage system, that is, a file is set to a read-only state and cannot be added.
  • data is divided into user data and metadata.
  • User data refers to the data that users really need to read and write. It is stored in fixed-size folders. Each folder corresponds to a unique identification (identification, ID). ).
  • Metadata is the system data used to describe and manage the characteristics of a file, such as access permissions, file owner, file data block distribution information, etc.
  • the distribution information includes the location of the file on the disk and the location of the disk. The location in the cluster. Users who need to manipulate a file must first get its metadata before they can locate the location of the file and get the content or related attributes of the file.
  • the master node in order to ensure the uniqueness of data writing, can be selected by using a distributed consistency protocol, and then the master node will write the data, so as to ensure that only one node can write at any time Import data to ensure data consistency.
  • each node has three states, namely the standby state, the election state, and the master node state. All nodes are in the standby state at startup. If the message (such as heartbeat information) from the master node is not received within a preset time (for example, within 2 mental states), a state switch will occur, and the standby state will be switched to election status. When a node is in an election state, it will vote for itself first, and then pull votes from other nodes, that is, request other nodes to vote for itself so that it becomes the master node. If a node gets more than half of the total number of nodes in the cluster, then that node will become the new primary node, and other nodes will be switched to the standby state.
  • the node in the master node state is the master node of the entire cluster, and all operations such as adding, modifying, and deleting system data can only be completed through the master node.
  • the uniqueness of the host can also be guaranteed through lock preemption.
  • each node applies for a distributed lock from zookeeper.
  • Zookeerer can authorize the use of locks to only one node based on first-come, first-served or weight distribution, and the node that finally obtains the lock can write data.
  • the distributed lock requires a long time and occupies a large bandwidth. In actual use, it is often necessary to negotiate between multiple nodes, and the performance of the entire system will suffer severe losses.
  • this application provides a method and related equipment for ensuring data consistency, which can ensure data consistency and avoid data conflicts without losing database performance.
  • the node may be a container, a virtual machine, a physical machine, etc., which is not limited in this application.
  • the distributed storage system 200 includes a node management server 210, a node cluster 220, and a storage device 230.
  • the node cluster 220 includes a master node 221, a backup node 222, a backup node 223, and a backup node 224. It should be understood that the node cluster 220 may also include more or less nodes, and the description is here by taking 4 nodes as an example.
  • the storage device 230 includes a tenure management data storage unit 2310 and other data storage units 2320.
  • the tenure management data storage unit 2310 is used to store the tenure identifier and root metadata identifier of the master node; the other data storage unit 2320 is used to store root metadata and metadata. Data and user data.
  • the node management server 210 is used to monitor the nodes in the node cluster 220. When it is monitored that the master node 221 is abnormal, a backup node is selected, for example, the backup node 222 is selected to be upgraded to a new master node.
  • the master node 221 works in a readable mode and a write mode, that is, the master node 221 can read data in the storage device 230 and can also write data to the storage device 230.
  • the standby node 222, the standby node 223, and the standby node 224 work in The readable mode, that is, only the data in the storage device 230 can be read, and the data cannot be written.
  • the term identifier of the primary node in the tenure management data storage unit 2310 can be updated, and data can be written to other data storage units 2320.
  • the original primary node 221 is in the standby node. After updating the tenure identifier in 222, it can be determined that a new master node currently exists, and the original master node 221 can no longer write data, and can commit suicide or switch the working mode to read-only mode.
  • the storage device 230 stores the tenure identifier, which can ensure that the original master node can recognize that a new master node is generated.
  • the original master node can avoid writing data with the new master node at the same time by suicide or switching working modes, so as to avoid causing data. Conflict to ensure data consistency.
  • the method for ensuring data consistency includes but is not limited to the following steps:
  • S301 The first node receives the upgrade message sent by the node management server.
  • the first node may specifically be a virtual machine or a container, etc., running in a physical machine.
  • Multiple nodes form a cluster.
  • there is only one primary node in the cluster and other nodes are standby nodes.
  • the primary node can write data, and other standby nodes cannot write data to ensure data. consistency.
  • a cluster composed of multiple nodes can be deployed in a cloud environment, specifically one or more computing devices in the cloud environment (such as a central server); it can also be deployed in an edge environment, specifically one or more in the edge environment
  • the edge computing device may be a server.
  • the cloud environment refers to the central computing equipment cluster owned by the cloud service provider and used to provide computing, storage, and communication resources;
  • the edge environment refers to the geographically far away from the central cloud environment, which is used to provide computing and storage.
  • the edge computing equipment cluster of communication resources are used to provide computing and storage.
  • the first node may be any backup node in the node cluster, such as the above-mentioned backup node 222. During the operation of the first node, if it receives an upgrade message sent by the node management server, it indicates that the current master node may exist If it fails or is abnormal, the first node needs to be upgraded to the new master node.
  • S302 The first node updates the tenure management data.
  • the first node needs to update the tenure management data after determining to upgrade to become the new master node.
  • the tenure management data includes a root metadata identifier and a tenure identifier.
  • the root metadata identifier is used to determine root metadata. That is, the root metadata identifier can be used to determine the specific location where the data is stored in the storage device.
  • the tenure identifier is used It characterizes that the first node is upgraded to become a new master node, that is, the term identifier will change with the change of the master node. Whenever the node management server determines a new master node, the determined new master node The tenure indicator will be updated.
  • the term of office is identified as 5, that is, the cluster has produced 5 master nodes.
  • the new master node will take the term of office
  • the identifier is updated to 6, indicating that the new master node is the sixth master node generated by the cluster.
  • the data written by users is eventually written into files of a fixed size, and each file is assigned a unique identifier. With the increase of written data, more files are needed. In order to manage these files, some specific data need to be used. These specific data are called metadata, and the metadata includes the identification of these files. Each metadata also has a unique identifier. Similarly, in order to facilitate the management of metadata, a root file needs to be used. The root file is also called root metadata. The root metadata includes all metadata identifiers.
  • FIG. 4 Exemplarily, refer to Figure 4.
  • all user data is stored in different files, such as file 1.1, file 1.2, etc.
  • One metadata manages multiple files, for example, metadata 1 manages file 1.1 , File 1.2, ...file 1.N, metadata 2 manages file 2.1, file 2.2, ...file 2.N, all metadata is managed by root metadata, there is only one root metadata, and root metadata corresponds to one
  • the identifier, the identifier and the tenure identifier belong to tenure management data and are stored together in the tenure management data unit.
  • the original master node can be identified and a new master node can be determined to avoid writing data again, data conflicts can be avoided, and data consistency can be ensured.
  • the first node updates the tenure identifier while reading the root metadata identifier.
  • the first node must ensure that reading root metadata and updating the tenure identifier occur at the same time, that is, the operation is atomic, otherwise the operation is abandoned and the execution continues again.
  • the original master node may have failed due to network fluctuations and other reasons, and the node management server has re-elected the first node as the new master node. However, the original master node may return to normal after a while, but The original master node cannot perceive that a new master node has appeared. At this time, there may be concurrency, that is, the original master node may continue to write data and update the root metadata identifier. If the first node does not read the root metadata and update the tenure identifier at the same time, there is a period of time between these two operations. For example, the first node reads the root metadata identifier first, and then updates the tenure identifier, which may result in data Inconsistent.
  • the original master node when the first node reads the root metadata identifier, the original master node needs to modify the root metadata identifier, then the root metadata identifier read by the first node and the root metadata actually stored in the storage device Data identification will be inconsistent, and the first node needs to rely on root metadata identification to read or write data, which will eventually lead to data loss or data inconsistency. If the first node reads the root metadata identifier and modifies the tenure identifier at the same time, since the root metadata identifier is bound to the tenure identifier, the original master node needs to determine the tenure identifier and write it by itself when modifying the root metadata identifier. The entry term is the same, otherwise the modification will not succeed.
  • the original master node will not be able to successfully modify the root metadata identifier.
  • the original master node can determine that there is a new master node. Then the original master node will stop modifying the root metadata identifier to avoid data conflicts and ensure data consistency.
  • the first node sets the root metadata to a read-only mode.
  • the root metadata is set to a read-only mode, that is, the root metadata will not be allowed to be modified.
  • the original master node when it does not perceive that there is a new master node, it still writes data to the storage device. As shown in Figure 4 above, the written data will be stored in a fixed-size file. If the file storage space can no longer support continued storage, a new file will be created for storage. At this time, the metadata needs to be modified, and relevant information such as the identification of the newly created file should be added to the metadata. Similarly, when the metadata The storage space of is also full. At this time, a new metadata needs to be created, and relevant information such as the identification of the newly created metadata needs to be added to the root metadata, and the first node has set the root metadata to read-only Therefore, the original master node will not be able to successfully modify the relevant information in the root metadata. At this time, the original master node will be able to determine that there is a new master node, and the original master node will abandon this operation and stop Write data to the storage device, use suicide and other methods to avoid data conflicts to ensure data consistency.
  • S304 The first node sets all metadata to a read-only mode.
  • the first node sets all the metadata to the read-only mode, that is, all the metadata is not allowed to be modified.
  • S305 The first node sets all user data to a read-only mode.
  • the first node sets all files storing user data to read-only mode, that is, all files are no longer allowed to write data.
  • the original master node needs to write data, it needs to write the data to the corresponding file, and the first node sets all files to read-only mode, resulting in the original master node.
  • the node cannot write data to the storage device, that is, the write fails.
  • the original master node can confirm that there is a new master node, and the original master node will abandon this operation and stop writing data to the storage device. Use suicide and other methods to avoid data conflicts to ensure data consistency.
  • the first node is set in a hierarchical manner, and the root metadata, metadata, and user data are set to read-only mode in turn, which can avoid data loss, ensure data consistency, and ensure that all files that have been written to the storage device can be accurately found. data.
  • S306 The first node updates the root metadata identifier and writes data to the storage device.
  • the first node starts to perform the function of the master node after setting all files storing user data to the read-only mode. If the first node needs to write user data, because the first node has set all files to read-only mode, the first node needs to rebuild a file to store the written data. Because the file is newly created, more metadata is needed , And then need to update the root metadata and root metadata identification, and other nodes can only access the data in the storage device, and cannot write data to ensure data consistency.
  • steps S301 to S306 involved in the foregoing method embodiments are only schematic descriptions and summaries, and should not constitute specific limitations. The involved steps can be added, reduced, or combined as needed.
  • FIG. 5 is a schematic structural diagram of a first node provided by an embodiment of the present application.
  • the first node may be the first node in the method embodiment described in FIG. 3, and may execute the method and steps in the method embodiment described in FIG. 3 where the first node is the execution subject.
  • the first node 500 includes a receiving module 510, an updating module 520, and a processing module 530. among them,
  • the receiving module 510 is configured to receive an upgrade message sent by a node management server, where the node management server is used to manage a node cluster, and the node cluster includes the first node;
  • the update module 520 is configured to update tenure management data, the tenure management data includes root metadata identification and tenure identification, the root metadata identification is used to determine root metadata, and the root metadata is used to manage the node cluster Corresponding metadata, where the tenure identifier is used to characterize that the first node is upgraded to the master node of the node cluster;
  • the processing module 530 is configured to set the data corresponding to the node cluster to a read-only mode, and the data includes the root metadata.
  • the update module 520 is further configured to read the root metadata identifier and update the tenure identifier at the same time.
  • the node cluster further includes a second node, the second node is used to read and write data corresponding to the node cluster and update the root metadata identifier; the update module 520 updates the tenure management After the data, the root metadata identifier is prohibited from being updated by the second node.
  • the data corresponding to the node cluster includes root metadata, metadata, and user data
  • the metadata is used to manage the user data
  • the user data is data written to the node cluster
  • the processing module 530 is further configured to set the metadata to the read-only mode after setting the root metadata to the read-only mode, and finally set the user data to the read-only mode.
  • processing module 530 is further configured to update the root metadata identifier and write data to the node cluster.
  • the receiving module 510 in the embodiment of the present application may be implemented by a transceiver or transceiver-related circuit components
  • the update module 520 and the processing module 530 may be implemented by a processor or processor-related circuit components.
  • each module in the first node may be added, reduced, or combined as needed.
  • the operation and/or function of each module in the first node is to realize the corresponding process of the method described in FIG. 3 above, and is not repeated here for brevity.
  • FIG. 6 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the computing device 600 includes a processor 610, a communication interface 620, and a memory 630, and the processor 610, the communication interface 620, and the memory 630 are connected to each other through an internal bus 640.
  • the computing device 600 may be a computing device in cloud computing or a computing device in an edge environment.
  • the processor 610 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • the bus 640 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus 640 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
  • the memory 630 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 630 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory, ROM, flash memory, hard disk drive (HDD), or solid-state drive (SSD); the memory 630 may also include a combination of the above types.
  • the memory 730 may be used to store programs and data, so that the processor 610 can call the program codes stored in the memory 630 to implement the aforementioned method for ensuring data consistency.
  • the program code may be used to implement the functional module of the first node shown in FIG. 5, or used to implement the method steps in the method embodiment shown in FIG. 3 with the first node as the execution subject.
  • the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it can implement any part of the method described in the above method embodiments. Or all steps.
  • the embodiment of the present invention also provides a computer program, which includes instructions, when the computer program is executed by a computer, the computer can execute part or all of the steps of any method for ensuring data consistency.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for ensuring data consistency and a related device. The method comprises: a first node receiving an upgrade message sent by a node management server, the node management server being used to manage a node cluster, and the node cluster comprising the first node; the first node updating tenure management data, wherein the tenure management data comprises a root metadata identifier and a tenure identifier, the root metadata identifier is used to determine root metadata, the root metadata is used to manage the metadata corresponding to the node cluster, and the tenure identifier is used to signify that the first node is upgraded to a master node of the node cluster; and the first node setting the data corresponding to the node cluster to a read-only mode, the data comprising root metadata. The described method can ensure data consistency and prevent data conflicts.

Description

一种保证数据一致性的方法及相关设备A method and related equipment for ensuring data consistency 技术领域Technical field
本发明涉及计算机分布式存储系统技术领域,尤其涉及一种保证数据一致性的方法及相关设备。The invention relates to the technical field of computer distributed storage systems, in particular to a method and related equipment for ensuring data consistency.
背景技术Background technique
目前,在分布式数据库的应用中,为了保证数据写入的唯一性,在整个数据库集群中,必须保证在同一时间只有一个节点可以写入数据,即在同一时刻,必须保证数据库集群中只存在一个主节点。At present, in the application of distributed database, in order to ensure the uniqueness of data writing, in the entire database cluster, it must be ensured that only one node can write data at the same time, that is, at the same time, it must be ensured that only exists in the database cluster A master node.
Raft协议是一种分布式一致性协议,采用精英领导全局的方案。整个集群中只有主节点可以处理客户端发送的请求,其它节点即使接收到了请求也必须转发到主节点进行处理。Raft协议强依赖主节点来确保集群数据一致性。The Raft protocol is a distributed consensus protocol that adopts the plan of the elite to lead the overall situation. In the entire cluster, only the master node can process requests sent by clients, and other nodes must forward them to the master node for processing even if they receive requests. The Raft protocol strongly relies on the master node to ensure cluster data consistency.
在云计算环境下,分布式锁有着非常广泛的使用场景。例如,在分布式系统中,不同设备对一个共享资源进行访问的时候,系统往往需要分布式锁来支持对共享资源访问的互斥性以保证一致性,即只有一个节点能够持有锁,通过锁的抢占可以保证主节点的唯一性。In the cloud computing environment, distributed locks have a very wide range of usage scenarios. For example, in a distributed system, when different devices access a shared resource, the system often needs a distributed lock to support the mutual exclusion of access to the shared resource to ensure consistency, that is, only one node can hold the lock. The lock preemption can guarantee the uniqueness of the master node.
然后,不管是raft协议还是分布式锁,都存在一定的缺陷。利用raft协议进行选主节点,必须保证集群中存活的节点数大于节点总数的一半,否则无法选出主节点,而利用分布式锁,为了严格保证数据一致性(即写操作不冲突),需要在每次写入的时候都加上锁,数据库性能将遭到严重损失。因此,如何在数据库性能不损失的情况下保证数据一致性是一个亟待解决的问题。Then, whether it is the raft protocol or the distributed lock, there are certain flaws. To use the raft protocol to select the master node, you must ensure that the number of surviving nodes in the cluster is greater than half of the total number of nodes, otherwise the master node cannot be selected. Distributed locks are used to strictly ensure data consistency (that is, write operations do not conflict). Add a lock every time it is written, and database performance will suffer severe losses. Therefore, how to ensure data consistency without loss of database performance is an urgent problem to be solved.
发明内容Summary of the invention
本发明实施例公开了一种保证数据一致性的方法及相关设备,能够在数据库性能不损失的情况下,保证数据一致性,避免数据冲突。The embodiment of the invention discloses a method and related equipment for ensuring data consistency, which can ensure data consistency and avoid data conflicts without loss of database performance.
第一方面,本申请提供了一种保证数据一致性的方法,包括:第一节点接收节点管理服务器发送的升级消息,所述节点管理服务器用于管理节点集群,所述节点集群包括所述第一节点;所述第一节点更新任期管理数据,所述任期管理数据包括root元数据标识和任期标识,所述root元数据标识用于确定root元数据,所述root元数据用于管理所述节点集群对应的元数据,所述任期标识用于表征所述第一节点升级为所述节点集群的主节点;所述第一节点将所述节点集群对应的数据设置为只读模式,所述数据包括所述root元数据。In a first aspect, the present application provides a method for ensuring data consistency, including: a first node receives an upgrade message sent by a node management server, the node management server is used to manage a node cluster, and the node cluster includes the first node A node; the first node updates the tenure management data, the tenure management data includes root metadata identification and tenure identification, the root metadata identification is used to determine root metadata, the root metadata is used to manage the Metadata corresponding to the node cluster, the term identifier is used to indicate that the first node is upgraded to the master node of the node cluster; the first node sets the data corresponding to the node cluster to read-only mode, and The data includes the root metadata.
在本申请提供的方案中,第一节点在接收到节点管理服务器发送的升级消息之后升级为主节点,更新任期管理数据,并将节点集群对应的数据设置为只读模式,保证在同一时刻,只有一个节点能够写入数据,从而保证数据的一致性,避免出现数据冲突,此外,由于整个过程不需要和其它节点进行协商,保证了系统的性能。In the solution provided by this application, the first node upgrades to the master node after receiving the upgrade message sent by the node management server, updates the tenure management data, and sets the data corresponding to the node cluster to read-only mode to ensure that at the same time, Only one node can write data, thereby ensuring data consistency and avoiding data conflicts. In addition, since the entire process does not need to negotiate with other nodes, the performance of the system is guaranteed.
结合第一方面,在第一方面的一种可能的实现方式中,所述第一节点读取所述root 元数据标识的同时更新所述任期标识。With reference to the first aspect, in a possible implementation of the first aspect, the first node updates the tenure identifier while reading the root metadata identifier.
在本申请实施例中,第一节点保证读取root元数据标识和更新任期标识是原子性的,可以避免其它节点,例如原主节点在此过程中并发修改root元数据标识,导致数据冲突,进而导致数据的不一致。In the embodiment of the present application, the first node guarantees that reading the root metadata identifier and updating the tenure identifier are atomic, which can prevent other nodes, such as the original master node, from concurrently modifying the root metadata identifier during this process, resulting in data conflicts. Lead to data inconsistencies.
结合第一方面,在第一方面的一种可能的实现方式中,所述节点集群还包括第二节点,所述第二节点用于读写所述节点集群对应的数据并更新所述root元数据标识;在所述第一节点更新任期管理数据之后,所述root元数据标识被锁定,所述root元数据标识禁止被所述第二节点更新。With reference to the first aspect, in a possible implementation of the first aspect, the node cluster further includes a second node, and the second node is used to read and write data corresponding to the node cluster and update the root element Data identification; after the first node updates the tenure management data, the root metadata identification is locked, and the root metadata identification is prohibited from being updated by the second node.
在本申请实施例中,在第一节点更新任期管理数据之前,第二节点(例如原主节点)可以读写节点集群对应的数据,并且可以更新root元数据的标识,但是在第一节点更新任期管理数据之后,root元数据标识将被锁定,第二节点虽然可以继续读写数据,但不被允许修改root元数据标识,这样可以保证在后续过程中,任意一个时刻,只会存在一个节点能够写入数据,保证数据的一致性。In the embodiment of the present application, before the first node updates the tenure management data, the second node (for example, the original master node) can read and write the data corresponding to the node cluster, and can update the root metadata identification, but the tenure is updated on the first node After managing the data, the root metadata identifier will be locked. Although the second node can continue to read and write data, it is not allowed to modify the root metadata identifier. This ensures that in the subsequent process, only one node will be able to exist at any one time. Write data to ensure data consistency.
结合第一方面,在第一方面的一种可能的实现方式中,所述节点集群对应的数据包括root元数据、元数据和用户数据,所述元数据用于管理所述用户数据,所述用户数据为写入所述节点集群的数据;所述第一节点将所述节点集群对应的数据设置为只读模式之后,将所述元数据设置为只读模式,最后将所述用户数据设置为只读模式。With reference to the first aspect, in a possible implementation of the first aspect, the data corresponding to the node cluster includes root metadata, metadata, and user data, the metadata is used to manage the user data, and the User data is data written to the node cluster; after the first node sets the data corresponding to the node cluster to read-only mode, sets the metadata to read-only mode, and finally sets the user data It is read-only mode.
在本申请实施例中,第一节点分层次进行设置,依次将root元数据、元数据和用户数据设置为只读模式,保证节点集群所对应的数据的一致性,保证第一节点能够准确的找到所有写入节点集群的用户数据。In the embodiment of this application, the first node is set in layers, and the root metadata, metadata, and user data are set to read-only mode in turn to ensure the consistency of the data corresponding to the node cluster and ensure that the first node can accurately Find all user data written to the node cluster.
结合第一方面,在第一方面的一种可能的实现方式中,所述第一节点更新所述root元数据标识并向所述节点集群写入数据。With reference to the first aspect, in a possible implementation of the first aspect, the first node updates the root metadata identifier and writes data to the node cluster.
在本申请实施例中,第一节点在升级成为新的主节点之后,可以向节点集群写入用户数据,并可以通过更新root元数据标识进行对写入的用户数据进行管理。In the embodiment of the present application, after the first node is upgraded to become the new master node, it can write user data to the node cluster, and can manage the written user data by updating the root metadata identifier.
第二方面,本申请提供了一种第一节点,包括:接收模块,用于接收节点管理服务器发送的升级消息,所述节点管理服务器用于管理节点集群,所述节点集群包括所述第一节点;更新模块,用于更新任期管理数据,所述任期管理数据包括root元数据标识和任期标识,所述root元数据标识用于确定root元数据,所述root元数据用于管理所述节点集群对应的元数据,所述任期标识用于表征所述第一节点升级为所述节点集群的主节点;处理模块,用于将所述节点集群对应的数据设置为只读模式,所述数据包括所述root元数据。In a second aspect, the present application provides a first node, including: a receiving module, configured to receive an upgrade message sent by a node management server, where the node management server is used to manage a node cluster, and the node cluster includes the first node. Node; update module, used to update tenure management data, the tenure management data includes root metadata identification and tenure identification, the root metadata identification is used to determine root metadata, the root metadata is used to manage the node Metadata corresponding to the cluster, the tenure identifier is used to indicate that the first node is upgraded to the master node of the node cluster; a processing module is used to set the data corresponding to the node cluster to a read-only mode, the data Including the root metadata.
结合第二方面,在第二方面的一种可能的实现方式中,所述更新模块,还用于读取所述root元数据标识的同时更新所述任期标识。With reference to the second aspect, in a possible implementation of the second aspect, the update module is further configured to read the root metadata identifier and update the tenure identifier at the same time.
结合第二方面,在第二方面的一种可能的实现方式中,所述节点集群还包括第二节点,所述第二节点用于读写所述节点集群对应的数据并更新所述root元数据标识;所述更新模块更新所述任期管理数据之后,所述root元数据标识禁止被所述第二节点更新。With reference to the second aspect, in a possible implementation of the second aspect, the node cluster further includes a second node, and the second node is used to read and write data corresponding to the node cluster and update the root element Data identification; after the update module updates the tenure management data, the root metadata identification is prohibited from being updated by the second node.
结合第二方面,在第二方面的一种可能的实现方式中,所述节点集群对应的数据包括root元数据、元数据和用户数据,所述元数据用于管理所述用户数据,所述用户数据 为写入所述节点集群的数据;所述处理模块,还用于将所述root元数据设置为只读模式之后,将所述元数据设置为只读模式,最后将所述用户数据设置为只读模式。With reference to the second aspect, in a possible implementation of the second aspect, the data corresponding to the node cluster includes root metadata, metadata, and user data, the metadata is used to manage the user data, and the User data is data written to the node cluster; the processing module is also used to set the root metadata to read-only mode, then set the metadata to read-only mode, and finally set the user data Set to read-only mode.
结合第二方面,在第二方面的一种可能的实现方式中,所述处理模块,还用于更新所述root元数据标识并向所述节点集群写入数据。With reference to the second aspect, in a possible implementation of the second aspect, the processing module is further configured to update the root metadata identifier and write data to the node cluster.
第三方面,本申请提供了一种计算设备,所述计算设备包括处理器和存储器,所述存储器用于存储程序代码,所述处理器用于调用所述存储器中的程序代码执行上述第一方面以及结合上述第一方面中的任意一种实现方式的方法。In a third aspect, the present application provides a computing device, the computing device includes a processor and a memory, the memory is configured to store program code, and the processor is configured to call the program code in the memory to execute the above-mentioned first aspect And a method combining any one of the above-mentioned first aspects.
第四方面,本申请提供了计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当该计算机程序被处理器执行时,可以实现上述第一方面以及结合上述第一方面中的任意一种实现方式所提供的方法的流程。In a fourth aspect, the present application provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, it can implement the above-mentioned first aspect and in combination with the above-mentioned first aspect. The process of the method provided by any implementation method.
第五方面,本申请提供了一种计算机程序产品,该计算机程序产品包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第一方面以及结合上述第一方面中的任意一种实现方式所提供的方法的流程。In a fifth aspect, the present application provides a computer program product, the computer program product includes instructions, when the computer program is executed by a computer, the computer can execute the first aspect and any one of the first aspects mentioned above. The flow of the method provided by the method.
附图说明Description of the drawings
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1是本申请实施例提供的一种节点状态切换的示意图;FIG. 1 is a schematic diagram of a node state switching provided by an embodiment of the present application;
图2是本申请实施例提供的一种应用场景示意图;Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图3是本申请实施例提供的一种保证数据一致性的方法的流程示意图;FIG. 3 is a schematic flowchart of a method for ensuring data consistency provided by an embodiment of the present application;
图4是本申请实施例提供的一种数据存储关系的示意图;FIG. 4 is a schematic diagram of a data storage relationship provided by an embodiment of the present application;
图5是本申请实施例提供的一种第一节点的结构示意图;FIG. 5 is a schematic structural diagram of a first node provided by an embodiment of the present application;
图6是本申请实施例提供的一种计算设备的结构示意图。Fig. 6 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图对本申请实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。The following describes the technical solutions in the embodiments of the present application clearly and completely with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments.
首先,结合附图对本申请中所涉及的部分用语和相关技术进行解释说明,以便于本领域技术人员理解。First of all, some terms and related technologies involved in this application will be explained in conjunction with the accompanying drawings, so as to facilitate the understanding of those skilled in the art.
云数据库是一种稳定可靠、可弹性伸缩的在线数据库服务,这种数据库被部署到一个虚拟计算环境中,通过一套统一的管理系统进行管理,可以有效降低维护成本。云数据库的计算与存储是分离的,计算层数据库不存储用户的数据,只负责计算,存储层采用新型存储系统作为共享存储系统。Cloud database is a stable, reliable, and elastically scalable online database service. This database is deployed in a virtual computing environment and managed through a unified management system, which can effectively reduce maintenance costs. The computing and storage of the cloud database are separated. The computing layer database does not store user data, but is only responsible for computing. The storage layer uses a new storage system as a shared storage system.
原子写(atomic write)是指一笔不可分开的事务中的所有写操作必须一起结束或 者一起回退,不可分割。传统存储系统是以块为单位来管理数据(比如一个块为512字节),在没有原子写的情况下会出现某次写入部分成功,部分失败;或者是在并发场景下,两个线程写入的数据相互覆盖,例如一个线程需要写入123,一个线程需要写入456,并发可能会导致出现写入为126的情况。原子写的功能是保证一次写入要么全部成功,要么全部失败,因此可以避免出现上述情况。Atomic write means that all write operations in an inseparable transaction must end or roll back together, and are indivisible. Traditional storage systems manage data in units of blocks (for example, a block is 512 bytes). In the absence of atomic writes, a certain write will partially succeed and partially fail; or in a concurrent scenario, two threads The written data overwrites each other. For example, one thread needs to write 123 and one thread needs to write 456. Concurrency may cause the write to 126. The function of atomic write is to ensure that either all writes succeed or all fail at one time, so the above situation can be avoided.
只允许追加(append only)存储为固态硬盘(solid state drive,SSD)等新型存储硬件定制的一种文件系统,这种文件系统提供新增(append)、删除(delete)、固化(seal)等基本操作,不支持修改操作。seal操作是append only存储系统特有的一种操作,即将一个文件设置为只读、不可新增的状态。Append only storage is a file system customized for new storage hardware such as solid state drive (SSD). This file system provides append, delete, seal, etc. Basic operation, modification operation is not supported. The seal operation is an operation peculiar to the append only storage system, that is, a file is set to a read-only state and cannot be added.
在分布式存储系统中,数据分为用户数据和元数据,用户数据是指用户真正需要读写的数据,存储在固定大小的文件夹中,每个文件夹对应一个唯一的标识(identification,ID)。元数据是用来描述和管理一个文件的特征的系统数据,例如访问权限、文件拥有者、文件数据块的分布信息等,在集群文件系统中,分布信息包括文件在磁盘上的位置以及磁盘在集群中的位置。用户需要操作一个文件必须首先得到它的元数据,才能定位到文件的位置并且得到文件的内容或相关属性。In a distributed storage system, data is divided into user data and metadata. User data refers to the data that users really need to read and write. It is stored in fixed-size folders. Each folder corresponds to a unique identification (identification, ID). ). Metadata is the system data used to describe and manage the characteristics of a file, such as access permissions, file owner, file data block distribution information, etc. In a cluster file system, the distribution information includes the location of the file on the disk and the location of the disk. The location in the cluster. Users who need to manipulate a file must first get its metadata before they can locate the location of the file and get the content or related attributes of the file.
一般,在分布式场景下,为了保证数据写入的唯一性,可以通过采用分布式一致性协议选出主节点,然后由主节点进行数据的写入,从而保证任意时刻,只有一个节点能够写入数据,保证数据的一致性。Generally, in a distributed scenario, in order to ensure the uniqueness of data writing, the master node can be selected by using a distributed consistency protocol, and then the master node will write the data, so as to ensure that only one node can write at any time Import data to ensure data consistency.
参见图1,如图1所示,每个节点都存在三种状态,分别是备用状态、选举状态以及主节点状态。所有节点在启动时都处于备用状态,如果在预设时间内(例如2个心态周期内)没有接收到主节点发生的消息(例如心跳信息),则将发生状态切换,由备用状态切换为选举状态。当节点处于选举状态时,会先投自己一票,然后会向其它节点拉选票,即请求其它节点投票给自己,以使自己成为主节点。如果某个节点得到的票数大于集群中节点总数的一半,那么该节点就将成为新的主节点,其它节点则将切换为备用状态。处于主节点状态的节点是整个集群的主节点,所有对系统数据的添加、修改和删除等操作只能通过主节点完成。Refer to Figure 1. As shown in Figure 1, each node has three states, namely the standby state, the election state, and the master node state. All nodes are in the standby state at startup. If the message (such as heartbeat information) from the master node is not received within a preset time (for example, within 2 mental states), a state switch will occur, and the standby state will be switched to election status. When a node is in an election state, it will vote for itself first, and then pull votes from other nodes, that is, request other nodes to vote for itself so that it becomes the master node. If a node gets more than half of the total number of nodes in the cluster, then that node will become the new primary node, and other nodes will be switched to the standby state. The node in the master node state is the master node of the entire cluster, and all operations such as adding, modifying, and deleting system data can only be completed through the master node.
可以理解,在这种分布式一致性协议的算法逻辑下,可以保证任意时刻只有一个主节点,即可以保证数据一致性,但是,当集群中存活的节点小于等于节点总数的一半时,无法再选举出新的主节点。It can be understood that under the algorithm logic of this distributed consistency protocol, it can be guaranteed that there is only one master node at any time, that is, data consistency can be guaranteed. However, when the surviving nodes in the cluster are less than or equal to half of the total number of nodes, it can no longer be Election of a new master node.
此外,在分布式场景下,还可以通过锁的抢占来保证主机的唯一性。例如在使用zookeeper集群来做分布式锁管理,各个节点各自向zookeeper申请分布式锁,zookeerer可以根据先到先得或者权重分配等方式向唯一一个节点授权使用锁,最终获得锁的节点可以写入数据。In addition, in a distributed scenario, the uniqueness of the host can also be guaranteed through lock preemption. For example, when using a zookeeper cluster for distributed lock management, each node applies for a distributed lock from zookeeper. Zookeerer can authorize the use of locks to only one node based on first-come, first-served or weight distribution, and the node that finally obtains the lock can write data.
应理解,分布式锁所需要的时间较长,且占用的带宽较大,在实际使用过程中往往需要在多个节点之间进行协商,整个系统性能将遭受严重损失。It should be understood that the distributed lock requires a long time and occupies a large bandwidth. In actual use, it is often necessary to negotiate between multiple nodes, and the performance of the entire system will suffer severe losses.
为了解决上述问题,本申请提供了一种保证数据一致性的方法及相关设备,能够在不损失数据库性能的情况下,保证数据一致性,避免数据冲突。其中,节点可以是容器、 虚拟机、物理机等,本申请对此不作限定。In order to solve the above-mentioned problems, this application provides a method and related equipment for ensuring data consistency, which can ensure data consistency and avoid data conflicts without losing database performance. Among them, the node may be a container, a virtual machine, a physical machine, etc., which is not limited in this application.
参见图2,图2示出了本申请实施例的一种可能的应用场景。在该应用场景中,分布式存储系统200包括节点管理服务器210、节点集群220、存储设备230。其中,节点集群220中包括主节点221、备节点222、备节点223和备节点224。应理解,节点集群220还可以包括更多或更少的节点,这里以包括4个节点为例进行说明。存储设备230包括任期管理数据存储单元2310和其它数据存储单元2320,任期管理数据存储单元2310用于存储主节点的任期标识和root元数据标识;其它数据存储单元2320用于存储root元数据、元数据和用户数据。节点管理服务器210用于监控节点集群220中的节点,当监控到主节点221发生异常时,选择一个备节点,例如选择备节点222升级为新的主节点。主节点221工作在可读模式和写模式,即主节点221可以读取存储设备230中的数据,也可以向存储设备230中写入数据,备节点222、备节点223和备节点224工作在可读模式,即只能够读取存储设备230中的数据,不能写入数据。特别的,在备节点222升级为新的主节点之后,可以更新任期管理数据存储单元2310中的主节点的任期标识,并可以向其它数据存储单元2320中写入数据,原主节点221在备节点222更新任期标识之后,可以判断出当前已经存在新的主节点,则原主节点221不能再写入数据,可以自杀或者将工作模式切换为只读模式。Refer to Fig. 2, which shows a possible application scenario of an embodiment of the present application. In this application scenario, the distributed storage system 200 includes a node management server 210, a node cluster 220, and a storage device 230. The node cluster 220 includes a master node 221, a backup node 222, a backup node 223, and a backup node 224. It should be understood that the node cluster 220 may also include more or less nodes, and the description is here by taking 4 nodes as an example. The storage device 230 includes a tenure management data storage unit 2310 and other data storage units 2320. The tenure management data storage unit 2310 is used to store the tenure identifier and root metadata identifier of the master node; the other data storage unit 2320 is used to store root metadata and metadata. Data and user data. The node management server 210 is used to monitor the nodes in the node cluster 220. When it is monitored that the master node 221 is abnormal, a backup node is selected, for example, the backup node 222 is selected to be upgraded to a new master node. The master node 221 works in a readable mode and a write mode, that is, the master node 221 can read data in the storage device 230 and can also write data to the storage device 230. The standby node 222, the standby node 223, and the standby node 224 work in The readable mode, that is, only the data in the storage device 230 can be read, and the data cannot be written. In particular, after the standby node 222 is upgraded to the new primary node, the term identifier of the primary node in the tenure management data storage unit 2310 can be updated, and data can be written to other data storage units 2320. The original primary node 221 is in the standby node. After updating the tenure identifier in 222, it can be determined that a new master node currently exists, and the original master node 221 can no longer write data, and can commit suicide or switch the working mode to read-only mode.
容易理解,通过设置节点管理服务器210对节点集群220进行管理,可以在主节点221发生故障时,直接选出新的主节点,不需要利用分布式锁来确定新的主节点,可以提高系统性能。此外,存储设备230中存储了任期标识,可以保证原主节点能够识别出产生了新的主节点,原主节点可以通过自杀或切换工作模式等方式来避免与新主节点同时写入数据,避免造成数据冲突,保证数据一致性。It is easy to understand that by setting the node management server 210 to manage the node cluster 220, a new master node can be directly selected when the master node 221 fails, without the need to use a distributed lock to determine the new master node, which can improve system performance . In addition, the storage device 230 stores the tenure identifier, which can ensure that the original master node can recognize that a new master node is generated. The original master node can avoid writing data with the new master node at the same time by suicide or switching working modes, so as to avoid causing data. Conflict to ensure data consistency.
结合图2所示的应用场景,下面将结合图3描述本申请实施例提供的保证数据一致性的方法,如图3所示,该方法包括但不限于以下步骤:With reference to the application scenario shown in FIG. 2, the method for ensuring data consistency provided by the embodiment of the present application will be described below in conjunction with FIG. 3. As shown in FIG. 3, the method includes but is not limited to the following steps:
S301:第一节点接收节点管理服务器发送的升级消息。S301: The first node receives the upgrade message sent by the node management server.
具体地,该第一节点具体可以是虚拟机或容器等,运行于物理机中。多个节点组成一个集群,如上述节点集群220,在正常工作状态下,该集群中只有一个主节点,其它节点为备节点,主节点可以写入数据,其它备节点不能写入数据以保证数据一致性。多个节点组成的集群可以部署在云环境中,具体为云环境上的一个或多个计算设备上(例如中心服务器);也可以部署在边缘环境中,具体为边缘环境中的一个或多个计算设备(边缘计算设备)上,边缘计算设备可以为服务器。其中,云环境是指云服务提供商拥有的,用于提供计算、存储、通信资源的中心计算设备集群;边缘环境是指在地理位置上距离中心云环境较远的,用于提供计算、存储、通信资源的边缘计算设备集群。Specifically, the first node may specifically be a virtual machine or a container, etc., running in a physical machine. Multiple nodes form a cluster. For example, the node cluster 220 mentioned above. Under normal working conditions, there is only one primary node in the cluster, and other nodes are standby nodes. The primary node can write data, and other standby nodes cannot write data to ensure data. consistency. A cluster composed of multiple nodes can be deployed in a cloud environment, specifically one or more computing devices in the cloud environment (such as a central server); it can also be deployed in an edge environment, specifically one or more in the edge environment On the computing device (edge computing device), the edge computing device may be a server. Among them, the cloud environment refers to the central computing equipment cluster owned by the cloud service provider and used to provide computing, storage, and communication resources; the edge environment refers to the geographically far away from the central cloud environment, which is used to provide computing and storage. , The edge computing equipment cluster of communication resources.
进一步的,第一节点可以是节点集群中的任意一个备节点,例如上述的备节点222,第一节点在运行过程中,若接收到节点管理服务器发送的升级消息,则说明当前主节点可能存在故障或异常,则第一节点需要升级为新的主节点。Further, the first node may be any backup node in the node cluster, such as the above-mentioned backup node 222. During the operation of the first node, if it receives an upgrade message sent by the node management server, it indicates that the current master node may exist If it fails or is abnormal, the first node needs to be upgraded to the new master node.
S302:第一节点更新任期管理数据。S302: The first node updates the tenure management data.
具体地,第一节点在确定升级成为新的主节点之后,需要更新任期管理数据。任期管理数据包括root元数据标识和任期标识,该root元数据标识用于确定root元数据, 即利用该root元数据标识,可以确定数据具体存储在存储设备中的具体位置,该任期标识用于表征所述第一节点升级成为新的主节点,即该任期标识是会随着主节点的变化而发生变化的,每当节点管理服务器确定一个新的主节点时,被确定的新的主节点将会更新任期标识。示例性的,在节点管理服务器确定新的主节点之前,该任期标识为5,即该集群曾经产生过5个主节点,当节点管理服务器确定新的主节点之后,新的主节点将该任期标识更新为6,表示该新的主节点为该集群产生的第6个主节点。Specifically, the first node needs to update the tenure management data after determining to upgrade to become the new master node. The tenure management data includes a root metadata identifier and a tenure identifier. The root metadata identifier is used to determine root metadata. That is, the root metadata identifier can be used to determine the specific location where the data is stored in the storage device. The tenure identifier is used It characterizes that the first node is upgraded to become a new master node, that is, the term identifier will change with the change of the master node. Whenever the node management server determines a new master node, the determined new master node The tenure indicator will be updated. Exemplarily, before the node management server determines the new master node, the term of office is identified as 5, that is, the cluster has produced 5 master nodes. When the node management server determines the new master node, the new master node will take the term of office The identifier is updated to 6, indicating that the new master node is the sixth master node generated by the cluster.
需要说明的是,在分布式存储系统中,用户写入的数据最终都是写入到一个个固定大小的文件中的,每个文件都被分配了一个唯一的标识。随着写入数据的增多,所需要的文件越多,为了管理这些文件,需要使用一些特定的数据,这些特定的数据叫做元数据,元数据中包括这些文件的标识。每一个元数据也存在着一个唯一标识,同理,为了便于管理元数据,需要使用一个根文件,该根文件又叫做root元数据,root元数据中包括所有元数据的标识。It should be noted that in a distributed storage system, the data written by users is eventually written into files of a fixed size, and each file is assigned a unique identifier. With the increase of written data, more files are needed. In order to manage these files, some specific data need to be used. These specific data are called metadata, and the metadata includes the identification of these files. Each metadata also has a unique identifier. Similarly, in order to facilitate the management of metadata, a root file needs to be used. The root file is also called root metadata. The root metadata includes all metadata identifiers.
示例性的,参见图4,如图4所示,所有的用户数据存储在不同的文件中,例如文件1.1、文件1.2等,一个元数据管理着多个文件,例如元数据1管理着文件1.1、文件1.2、…文件1.N,元数据2管理着文件2.1、文件2.2、…文件2.N,所有的元数据由root元数据进行管理,root元数据只有一个,root元数据也对应一个标识,该标识与任期标识属于任期管理数据,一起存储在任期管理数据单元中。Exemplarily, refer to Figure 4. As shown in Figure 4, all user data is stored in different files, such as file 1.1, file 1.2, etc. One metadata manages multiple files, for example, metadata 1 manages file 1.1 , File 1.2, …file 1.N, metadata 2 manages file 2.1, file 2.2, …file 2.N, all metadata is managed by root metadata, there is only one root metadata, and root metadata corresponds to one The identifier, the identifier and the tenure identifier belong to tenure management data and are stored together in the tenure management data unit.
可以看出,第一节点通过更新任期标识,可以使得原来的主节点能够进行识别并确定已经产生新的主节点,避免再次写入数据,可以避免数据冲突,保证数据一致性。It can be seen that by updating the tenure identifier of the first node, the original master node can be identified and a new master node can be determined to avoid writing data again, data conflicts can be avoided, and data consistency can be ensured.
在一种可能的实现方式中,该第一节点读取root元数据标识的同时更新所述任期标识。In a possible implementation manner, the first node updates the tenure identifier while reading the root metadata identifier.
具体地,第一节点必须确保读取root元数据以及更新任期标识是同时发生的,即该操作是原子性的,否则放弃本次操作,继续重新执行。Specifically, the first node must ensure that reading root metadata and updating the tenure identifier occur at the same time, that is, the operation is atomic, otherwise the operation is abandoned and the execution continues again.
容易理解,原来的主节点可能因为网络波动等原因导致节点管理服务器判定出现了故障,重新选出了第一节点作为新的主节点,但是原来的主节点可能过一段时间又恢复正常了,但原来的主节点不能感知到已经出现了新的主节点,这时,就可能存在着并发的情况,即原来的主节点可能继续写入数据并更新root元数据标识。若第一节点没有同时读取root元数据并更新任期标识,这两个操作之间存在着一段时间间隔,例如第一节点先读取root元数据标识,然后再更新任期标识,就可能导致数据不一致。It is easy to understand that the original master node may have failed due to network fluctuations and other reasons, and the node management server has re-elected the first node as the new master node. However, the original master node may return to normal after a while, but The original master node cannot perceive that a new master node has appeared. At this time, there may be concurrency, that is, the original master node may continue to write data and update the root metadata identifier. If the first node does not read the root metadata and update the tenure identifier at the same time, there is a period of time between these two operations. For example, the first node reads the root metadata identifier first, and then updates the tenure identifier, which may result in data Inconsistent.
示例性的,第一节点在读取root元数据标识的同时,原来的主节点需要修改root元数据的标识,那么第一节点读取到的root元数据的标识与存储设备实际存储的root元数据标识将不一致,而第一节点读取或写入数据是需要依靠root元数据标识的,因此最终会导致数据缺失或数据不一致。而如果第一节点是同时读取root元数据标识并修改任期标识的话,由于root元数据标识与任期标识是绑定的,原来的主节点在修改root元数据标识时需要确定任期标识与自己写入的任期一致,否则修改将不会成功,因此,在这种情况下,原来的主节点将不能成功修改root元数据标识,这时,原来的主节点可以确定当前存在着新的主节点,那么原来的主节点将停止修改root元数据标识以避免数据冲突,保证数据一致性。Exemplarily, when the first node reads the root metadata identifier, the original master node needs to modify the root metadata identifier, then the root metadata identifier read by the first node and the root metadata actually stored in the storage device Data identification will be inconsistent, and the first node needs to rely on root metadata identification to read or write data, which will eventually lead to data loss or data inconsistency. If the first node reads the root metadata identifier and modifies the tenure identifier at the same time, since the root metadata identifier is bound to the tenure identifier, the original master node needs to determine the tenure identifier and write it by itself when modifying the root metadata identifier. The entry term is the same, otherwise the modification will not succeed. Therefore, in this case, the original master node will not be able to successfully modify the root metadata identifier. At this time, the original master node can determine that there is a new master node. Then the original master node will stop modifying the root metadata identifier to avoid data conflicts and ensure data consistency.
S303:第一节点将root元数据设置为只读模式。S303: The first node sets the root metadata to a read-only mode.
具体地,第一节点在更新任期管理数据之后,将root元数据设置为只读模式,即root元数据将不允许修改。Specifically, after the first node updates the tenure management data, the root metadata is set to a read-only mode, that is, the root metadata will not be allowed to be modified.
应理解,当原来的主节点没有感知到已经存在一个新的主节点,仍旧向存储设备写入数据,由上述图4所述可知,写入的数据将会存储到一个固定大小的文件中,若该文件存储空间已经不能支持继续存储,则会新建一个新的文件进行存储,此时需要修改元数据,在元数据中新增该新建的文件的标识等相关信息,同理,当元数据的存储空间也已经占满,此时需要创建一个新的元数据,并需要root元数据中新增该新建的元数据的标识等相关信息,而第一节点已经将root元数据设置为只读模式,因此,原来的主节点将不能成功修改root元数据中的相关信息,这时,原来的主节点将能够确定已经存在着新的主节点,原来的主节点将会放弃本次操作,停止向存储设备写入数据,利用自杀等方式避免数据冲突,以保证数据一致性。It should be understood that when the original master node does not perceive that there is a new master node, it still writes data to the storage device. As shown in Figure 4 above, the written data will be stored in a fixed-size file. If the file storage space can no longer support continued storage, a new file will be created for storage. At this time, the metadata needs to be modified, and relevant information such as the identification of the newly created file should be added to the metadata. Similarly, when the metadata The storage space of is also full. At this time, a new metadata needs to be created, and relevant information such as the identification of the newly created metadata needs to be added to the root metadata, and the first node has set the root metadata to read-only Therefore, the original master node will not be able to successfully modify the relevant information in the root metadata. At this time, the original master node will be able to determine that there is a new master node, and the original master node will abandon this operation and stop Write data to the storage device, use suicide and other methods to avoid data conflicts to ensure data consistency.
S304:第一节点将所有的元数据设置为只读模式。S304: The first node sets all metadata to a read-only mode.
具体地,第一节点在将root元数据设置为只读模式之后,将所有的元数据设置为只读模式,即所有的元数据将不允许修改。Specifically, after setting the root metadata to the read-only mode, the first node sets all the metadata to the read-only mode, that is, all the metadata is not allowed to be modified.
容易理解,在原来的主节点没有感知到新节点的情况下写入数据,并且写入的数据导致元数据需要修改时,由于第一节点已经将所有的元数据设置为只读模式,因此原来的主节点将修改失败,这时可以确认当前已经存在着新的主节点,那么原来的主节点将放弃本次操作,停止向存储设备写入数据,利用自杀等方式避免数据冲突,以保证数据一致性。It is easy to understand that when the original master node does not perceive the new node to write data, and the written data causes metadata to be modified, because the first node has set all metadata to read-only mode, the original The master node of will fail to modify. At this time, it can be confirmed that there is a new master node, then the original master node will abandon this operation, stop writing data to the storage device, and use suicide methods to avoid data conflicts to ensure data consistency.
S305:第一节点将所有用户数据设置为只读模式。S305: The first node sets all user data to a read-only mode.
具体地,第一节点在将所有元数据设置为只读模式之后,将存储用户数据的所有文件设置为只读模式,即所有的文件将不再允许写入数据。Specifically, after setting all metadata to read-only mode, the first node sets all files storing user data to read-only mode, that is, all files are no longer allowed to write data.
由上述S303和S304中的描述可知,原来的主节点若需要写入数据,那么需要将数据写入到相应的文件中,而第一节点将所有的文件设置为只读模式,导致原来的主节点不能向存储设备写入数据,即写入失败,此时原来的主节点可以确认当前已经存在着新的主节点,那么原来的主节点将放弃本次操作,停止向存储设备写入数据,利用自杀等方式避免数据冲突,以保证数据一致性。From the description in S303 and S304 above, if the original master node needs to write data, it needs to write the data to the corresponding file, and the first node sets all files to read-only mode, resulting in the original master node. The node cannot write data to the storage device, that is, the write fails. At this time, the original master node can confirm that there is a new master node, and the original master node will abandon this operation and stop writing data to the storage device. Use suicide and other methods to avoid data conflicts to ensure data consistency.
可以理解,第一节点分层次进行设置,依次将root元数据、元数据以及用户数据设置为只读模式,可以避免数据出现缺失,保证数据一致性,保证能够准确找到所有已经写入存储设备的数据。It can be understood that the first node is set in a hierarchical manner, and the root metadata, metadata, and user data are set to read-only mode in turn, which can avoid data loss, ensure data consistency, and ensure that all files that have been written to the storage device can be accurately found. data.
S306:第一节点更新root元数据标识并向存储设备写入数据。S306: The first node updates the root metadata identifier and writes data to the storage device.
具体地,第一节点在将存储用户数据的所有的文件设置为只读模式之后,开始执行主节点的功能。若第一节点需要写入用户数据,由于第一节点已经将所有的文件设置为了只读模式,因此,第一节点需要重建一个文件存储写入的数据,由于新建了文件,因此需要更加元数据,进而需要更新root元数据以及root元数据标识,而其它的节点只能访问存储设备中的数据,不能写入数据以保证数据一致性。Specifically, the first node starts to perform the function of the master node after setting all files storing user data to the read-only mode. If the first node needs to write user data, because the first node has set all files to read-only mode, the first node needs to rebuild a file to store the written data. Because the file is newly created, more metadata is needed , And then need to update the root metadata and root metadata identification, and other nodes can only access the data in the storage device, and cannot write data to ensure data consistency.
应理解,上述方法实施例所涉及的步骤S301至步骤S306只是示意性的描述概括,不应构成具体限定,可以根据需要对所涉及的步骤进行增加、减少或合并。It should be understood that steps S301 to S306 involved in the foregoing method embodiments are only schematic descriptions and summaries, and should not constitute specific limitations. The involved steps can be added, reduced, or combined as needed.
上述详细阐述了本申请实施例的方法,为了便于更好的实施本申请实施例的上述方案,相应地,下面还提供用于配合实施上述方案的相关设备。The foregoing describes the methods of the embodiments of the present application in detail. In order to facilitate better implementation of the above solutions in the embodiments of the present application, correspondingly, the following provides related equipment for cooperating with the implementation of the foregoing solutions.
参见图5,图5是本申请实施例提供的一种第一节点的结构示意图。该第一节点可以是上述图3所述的方法实施例中第一节点,可以执行图3所述的保证数据一致性方法实施例中以第一节点为执行主体的方法和步骤。如图5所示,第一节点500包括接收模块510、更新模块520和处理模块530。其中,Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a first node provided by an embodiment of the present application. The first node may be the first node in the method embodiment described in FIG. 3, and may execute the method and steps in the method embodiment described in FIG. 3 where the first node is the execution subject. As shown in FIG. 5, the first node 500 includes a receiving module 510, an updating module 520, and a processing module 530. among them,
接收模块510,用于接收节点管理服务器发送的升级消息,所述节点管理服务器用于管理节点集群,所述节点集群包括所述第一节点;The receiving module 510 is configured to receive an upgrade message sent by a node management server, where the node management server is used to manage a node cluster, and the node cluster includes the first node;
更新模块520,用于更新任期管理数据,所述任期管理数据包括root元数据标识和任期标识,所述root元数据标识用于确定root元数据,所述root元数据用于管理所述节点集群对应的元数据,所述任期标识用于表征所述第一节点升级为所述节点集群的主节点;The update module 520 is configured to update tenure management data, the tenure management data includes root metadata identification and tenure identification, the root metadata identification is used to determine root metadata, and the root metadata is used to manage the node cluster Corresponding metadata, where the tenure identifier is used to characterize that the first node is upgraded to the master node of the node cluster;
处理模块530,用于将所述节点集群对应的数据设置为只读模式,所述数据包括所述root元数据。The processing module 530 is configured to set the data corresponding to the node cluster to a read-only mode, and the data includes the root metadata.
作为一个实施例,所述更新模块520,还用于读取所述root元数据标识的同时更新所述任期标识。As an embodiment, the update module 520 is further configured to read the root metadata identifier and update the tenure identifier at the same time.
作为一个实施例,所述节点集群还包括第二节点,所述第二节点用于读写所述节点集群对应的数据并更新所述root元数据标识;所述更新模块520更新所述任期管理数据之后,所述root元数据标识禁止被所述第二节点更新。As an embodiment, the node cluster further includes a second node, the second node is used to read and write data corresponding to the node cluster and update the root metadata identifier; the update module 520 updates the tenure management After the data, the root metadata identifier is prohibited from being updated by the second node.
作为一个实施例,所述节点集群对应的数据包括root元数据、元数据和用户数据,所述元数据用于管理所述用户数据,所述用户数据为写入所述节点集群的数据;所述处理模块530,还用于将所述root元数据设置为只读模式之后,将所述元数据设置为只读模式,最后将所述用户数据设置为只读模式。As an embodiment, the data corresponding to the node cluster includes root metadata, metadata, and user data, the metadata is used to manage the user data, and the user data is data written to the node cluster; The processing module 530 is further configured to set the metadata to the read-only mode after setting the root metadata to the read-only mode, and finally set the user data to the read-only mode.
作为一个实施例,所述处理模块530,还用于更新所述root元数据标识并向所述节点集群写入数据。As an embodiment, the processing module 530 is further configured to update the root metadata identifier and write data to the node cluster.
可以理解,本申请实施例中的接收模块510可以由收发器或收发器相关电路组件实现,更新模块520和处理模块530可以由处理器或处理器相关电路组件实现。It can be understood that the receiving module 510 in the embodiment of the present application may be implemented by a transceiver or transceiver-related circuit components, and the update module 520 and the processing module 530 may be implemented by a processor or processor-related circuit components.
需要说明的是,上述第一节点的结构仅仅作为一种示例,不应构成具体限定,可以根据需要对第一节点中的各个模块进行增加、减少或合并。此外,第一节点中的各个模块的操作和/或功能为了实现上述图3所描述的方法的相应流程,为了简洁,在此不再赘述。It should be noted that the above-mentioned structure of the first node is only used as an example, and should not constitute a specific limitation, and each module in the first node may be added, reduced, or combined as needed. In addition, the operation and/or function of each module in the first node is to realize the corresponding process of the method described in FIG. 3 above, and is not repeated here for brevity.
参见图6,图6是本申请实施例提供的一种计算设备的结构示意图。如图6所示,该计算设备600包括:处理器610、通信接口620以及存储器630,所述处理器610、通信接口620以及存储器630通过内部总线640相互连接。应理解,该计算设备600可以是云计算中的计算设备,或边缘环境中的计算设备。Refer to FIG. 6, which is a schematic structural diagram of a computing device provided by an embodiment of the present application. As shown in FIG. 6, the computing device 600 includes a processor 610, a communication interface 620, and a memory 630, and the processor 610, the communication interface 620, and the memory 630 are connected to each other through an internal bus 640. It should be understood that the computing device 600 may be a computing device in cloud computing or a computing device in an edge environment.
所述处理器610可以由一个或者多个通用处理器构成,例如中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件 (programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。The processor 610 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
总线640可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线640可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但不表示仅有一根总线或一种类型的总线。The bus 640 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 640 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
存储器630可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器630也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM)、快闪存储器(flash memory)、硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器630还可以包括上述种类的组合。存储器730可用于存储程序和数据,以便于处理器610调用存储器630中存储的程序代码以实现上述保证数据一致性方法。程序代码可以是用来实现图5所示的第一节点的功能模块,或者用于实现图3所示的方法实施例中以第一节点为执行主体的方法步骤。The memory 630 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 630 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory, ROM, flash memory, hard disk drive (HDD), or solid-state drive (SSD); the memory 630 may also include a combination of the above types. The memory 730 may be used to store programs and data, so that the processor 610 can call the program codes stored in the memory 630 to implement the aforementioned method for ensuring data consistency. The program code may be used to implement the functional module of the first node shown in FIG. 5, or used to implement the method steps in the method embodiment shown in FIG. 3 with the first node as the execution subject.
本申请还提供一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,当该计算机程序被处理器执行时,可以实现上述方法实施例中记载的任意一种的部分或全部步骤。The present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it can implement any part of the method described in the above method embodiments. Or all steps.
本发明实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行任意一种保证数据一致性方法的部分或全部步骤。The embodiment of the present invention also provides a computer program, which includes instructions, when the computer program is executed by a computer, the computer can execute part or all of the steps of any method for ensuring data consistency.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions. Because according to this application, some steps may be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Claims (13)

  1. 一种保证数据一致性的方法,其特征在于,所述方法包括:A method for ensuring data consistency, characterized in that the method includes:
    第一节点接收节点管理服务器发送的升级消息,所述节点管理服务器用于管理节点集群,所述节点集群包括所述第一节点;The first node receives an upgrade message sent by a node management server, where the node management server is used to manage a node cluster, and the node cluster includes the first node;
    所述第一节点更新任期管理数据,所述任期管理数据包括root元数据标识和任期标识,所述root元数据标识用于确定root元数据,所述root元数据用于管理所述节点集群对应的元数据,所述任期标识用于表征所述第一节点升级为所述节点集群的主节点;The first node updates the tenure management data, the tenure management data includes a root metadata identifier and a tenure identifier, the root metadata identifier is used to determine root metadata, and the root metadata is used to manage the node cluster correspondence Metadata of, the tenure identifier is used to characterize the upgrade of the first node to the master node of the node cluster;
    所述第一节点将所述节点集群对应的数据设置为只读模式,所述数据包括所述root元数据。The first node sets the data corresponding to the node cluster to a read-only mode, and the data includes the root metadata.
  2. 如权利要求1所述的方法,其特征在于,所述第一节点更新任期管理数据包括:The method according to claim 1, wherein the first node to update the tenure management data comprises:
    所述第一节点读取所述root元数据标识的同时更新所述任期标识。The first node updates the tenure identifier while reading the root metadata identifier.
  3. 如权利要求1或2所述的方法,其特征在于,所述节点集群还包括第二节点,所述第二节点用于读写所述节点集群对应的数据并更新所述root元数据标识;在所述第一节点更新任期管理数据之后:The method according to claim 1 or 2, wherein the node cluster further comprises a second node, and the second node is used to read and write data corresponding to the node cluster and update the root metadata identifier; After the first node updates the tenure management data:
    所述root元数据标识被锁定,所述root元数据标识禁止被所述第二节点更新。The root metadata identifier is locked, and the root metadata identifier is prohibited from being updated by the second node.
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述节点集群对应的数据包括root元数据、元数据和用户数据,所述元数据用于管理所述用户数据,所述用户数据为写入所述节点集群的数据;所述第一节点将所述节点集群对应的数据设置为只读模式,包括:The method according to any one of claims 1-3, wherein the data corresponding to the node cluster includes root metadata, metadata, and user data, and the metadata is used to manage the user data, and the User data is data written to the node cluster; the first node setting the data corresponding to the node cluster in a read-only mode includes:
    所述第一节点将所述root元数据设置为只读模式之后,将所述元数据设置为只读模式,最后将所述用户数据设置为只读模式。After the first node sets the root metadata to the read-only mode, sets the metadata to the read-only mode, and finally sets the user data to the read-only mode.
  5. 如权利要求1-4所述的方法,其特征在于,所述第一节点将所述节点集群对应的数据设置为只读模式之后,所述方法还包括:The method according to claims 1-4, wherein after the first node sets the data corresponding to the node cluster to read-only mode, the method further comprises:
    所述第一节点更新所述root元数据标识并向所述节点集群写入用户数据。The first node updates the root metadata identifier and writes user data to the node cluster.
  6. 一种第一节点,其特征在于,包括:A first node, characterized in that it comprises:
    接收模块,用于接收节点管理服务器发送的升级消息,所述节点管理服务器用于管理节点集群,所述节点集群包括所述第一节点;A receiving module, configured to receive an upgrade message sent by a node management server, where the node management server is used to manage a node cluster, and the node cluster includes the first node;
    更新模块,用于更新任期管理数据,所述任期管理数据包括root元数据标识和任期标识,所述root元数据标识用于确定root元数据,所述root元数据用于管理所述节点集群对应的元数据,所述任期标识用于表征所述第一节点升级为所述节点集群的主节点;The update module is used to update the tenure management data, the tenure management data includes root metadata identification and tenure identification, the root metadata identification is used to determine root metadata, and the root metadata is used to manage the corresponding node cluster Metadata of, the tenure identifier is used to characterize the upgrade of the first node to the master node of the node cluster;
    处理模块,用于将所述节点集群对应的数据设置为只读模式,所述数据包括所述root元数据。The processing module is configured to set the data corresponding to the node cluster to a read-only mode, and the data includes the root metadata.
  7. 如权利要求6所述的第一节点,其特征在于,所述更新模块,还用于读取所述root元数据标识的同时更新所述任期标识。7. The first node according to claim 6, wherein the update module is further configured to read the root metadata identifier and update the tenure identifier at the same time.
  8. 如权利要求6或7所述的第一节点,其特征在于,所述节点集群还包括第二节点,所述第二节点用于读写所述节点集群对应的数据并更新所述root元数据标识;所述更新模块更新所述任期管理数据之后,所述root元数据标识禁止被所述第二节点更新。The first node according to claim 6 or 7, wherein the node cluster further comprises a second node, and the second node is used to read and write data corresponding to the node cluster and update the root metadata Identification; after the update module updates the tenure management data, the root metadata identification is prohibited from being updated by the second node.
  9. 如权利要求6-8任一项所述的第一节点,其特征在于,所述节点集群对应的数据包括root元数据、元数据和用户数据,所述元数据用于管理所述用户数据,所述用户数据为写入所述节点集群的数据;8. The first node according to any one of claims 6-8, wherein the data corresponding to the node cluster includes root metadata, metadata, and user data, and the metadata is used to manage the user data, The user data is data written to the node cluster;
    所述处理模块,还用于将所述root元数据设置为只读模式之后,将所述元数据设置为只读模式,最后将所述用户数据设置为只读模式。The processing module is further configured to set the metadata to the read-only mode after setting the root metadata to the read-only mode, and finally set the user data to the read-only mode.
  10. 如权利要求6-9任一项所述的第一节点,其特征在于,The first node according to any one of claims 6-9, wherein:
    所述处理模块,还用于更新所述root元数据标识并向所述节点集群写入数据。The processing module is also used to update the root metadata identifier and write data to the node cluster.
  11. 一种计算设备,其特征在于,所述计算设备包括至少一个存储单元和至少一个处理器,所述至少一个存储单元用于存储至少一条指令,所述至少一个处理器执行所述至少一条指令,实现如权利要求1-5任一项所述的方法。A computing device, wherein the computing device includes at least one storage unit and at least one processor, the at least one storage unit is configured to store at least one instruction, and the at least one processor executes the at least one instruction, Implement the method according to any one of claims 1-5.
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当该计算机程序被处理器执行时,实现如权利要求1-5任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1-5 is realized.
  13. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,当所述计算机程序被计算机执行时,实现如权利要求1-5任一项所述的方法。A computer program product, characterized in that the computer program product includes instructions, when the computer program is executed by a computer, the method according to any one of claims 1-5 is realized.
PCT/CN2020/096005 2019-10-31 2020-06-14 Method for ensuring data consistency and related device WO2021082465A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911057345.XA CN112749178A (en) 2019-10-31 2019-10-31 Method for ensuring data consistency and related equipment
CN201911057345.X 2019-10-31

Publications (1)

Publication Number Publication Date
WO2021082465A1 true WO2021082465A1 (en) 2021-05-06

Family

ID=75645771

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/096005 WO2021082465A1 (en) 2019-10-31 2020-06-14 Method for ensuring data consistency and related device

Country Status (2)

Country Link
CN (1) CN112749178A (en)
WO (1) WO2021082465A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282334A (en) * 2021-06-07 2021-08-20 深圳华锐金融技术股份有限公司 Method and device for recovering software defects, computer equipment and storage medium
CN113326251B (en) * 2021-06-25 2024-02-23 深信服科技股份有限公司 Data management method, system, device and storage medium
CN113448649B (en) * 2021-07-06 2023-07-14 聚好看科技股份有限公司 Redis-based home page data loading server and method
CN114844799A (en) * 2022-05-27 2022-08-02 深信服科技股份有限公司 Cluster management method and device, host equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170272100A1 (en) * 2016-03-15 2017-09-21 Cloud Crowding Corp. Distributed Storage System Data Management And Security
CN109729129A (en) * 2017-10-31 2019-05-07 华为技术有限公司 Configuration modification method, storage cluster and the computer system of storage cluster
CN110096237A (en) * 2019-04-30 2019-08-06 北京百度网讯科技有限公司 Replica processes method and node, storage system, server, readable medium
CN110377577A (en) * 2018-04-11 2019-10-25 北京嘀嘀无限科技发展有限公司 Method of data synchronization, device, system and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170272100A1 (en) * 2016-03-15 2017-09-21 Cloud Crowding Corp. Distributed Storage System Data Management And Security
CN109729129A (en) * 2017-10-31 2019-05-07 华为技术有限公司 Configuration modification method, storage cluster and the computer system of storage cluster
CN110377577A (en) * 2018-04-11 2019-10-25 北京嘀嘀无限科技发展有限公司 Method of data synchronization, device, system and computer readable storage medium
CN110096237A (en) * 2019-04-30 2019-08-06 北京百度网讯科技有限公司 Replica processes method and node, storage system, server, readable medium

Also Published As

Publication number Publication date
CN112749178A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
WO2021082465A1 (en) Method for ensuring data consistency and related device
US11153380B2 (en) Continuous backup of data in a distributed data store
US11809726B2 (en) Distributed storage method and device
US11888599B2 (en) Scalable leadership election in a multi-processing computing environment
US10831614B2 (en) Visualizing restoration operation granularity for a database
US10579610B2 (en) Replicated database startup for common database storage
US9053167B1 (en) Storage device selection for database partition replicas
US20190188406A1 (en) Dynamic quorum membership changes
US9304815B1 (en) Dynamic replica failure detection and healing
US10382380B1 (en) Workload management service for first-in first-out queues for network-accessible queuing and messaging services
KR101833114B1 (en) Fast crash recovery for distributed database systems
US9424140B1 (en) Providing data volume recovery access in a distributed data store to multiple recovery agents
US20240053886A1 (en) File operations in a distributed storage system
US11080253B1 (en) Dynamic splitting of contentious index data pages
JP2007072975A (en) Apparatus for switching systems for writing transaction data to disk, switching method, and switching program
WO2021057108A1 (en) Data reading method, data writing method, and server
WO2021004256A1 (en) Node switching method in node failure and related device
US10223184B1 (en) Individual write quorums for a log-structured distributed storage system
US10785295B2 (en) Fabric encapsulated resilient storage
US10783134B2 (en) Polling process for monitoring interdependent hardware components
CN115168367B (en) Data configuration method and system for big data
WO2020207078A1 (en) Data processing method and device, and distributed database system
US20240223510A1 (en) Scalable leadership election in a multi-processing computing environment
CN115599411A (en) Service node updating method and device, electronic equipment and storage medium
CN116820430A (en) Asynchronous read-write method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883028

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883028

Country of ref document: EP

Kind code of ref document: A1