WO2024040902A1 - 数据访问方法、分布式数据库系统及计算设备集群 - Google Patents

数据访问方法、分布式数据库系统及计算设备集群 Download PDF

Info

Publication number
WO2024040902A1
WO2024040902A1 PCT/CN2023/079068 CN2023079068W WO2024040902A1 WO 2024040902 A1 WO2024040902 A1 WO 2024040902A1 CN 2023079068 W CN2023079068 W CN 2023079068W WO 2024040902 A1 WO2024040902 A1 WO 2024040902A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
transaction
snapshot point
nodes
Prior art date
Application number
PCT/CN2023/079068
Other languages
English (en)
French (fr)
Inventor
徐宜良
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024040902A1 publication Critical patent/WO2024040902A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • This application relates to the field of database technology, and in particular to a data access method, a distributed database system and a computing device cluster.
  • read-write separation is often used to improve the read performance of distributed database systems. For example, read-write separation is achieved by setting up primary and secondary nodes for each data shard in a distributed database system. The primary node is responsible for "write operations" and the standby node is responsible for "read operations.”
  • the updated data will be obtained from the standby node corresponding to data shard A.
  • the data shards obtained before the update are obtained from the standby node corresponding to data shard B, causing the data obtained not to meet the consistency principle.
  • the embodiments of this application provide a data access method, a distributed database system and a computing device cluster, which can ensure that the data obtained from the standby node meets the consistency principle in a read-write separation scenario.
  • the technical solution is as follows:
  • a data access method is provided, which is applied to a distributed database system.
  • the system includes a coordination node and multiple groups of data nodes.
  • Each group of data nodes includes a master node and a backup node.
  • the method includes:
  • the coordination node determines a first snapshot point from transaction snapshot points reported by multiple standby nodes in the multiple groups of data nodes, where the first snapshot point indicates that the multiple standby nodes have The transaction with the earliest submission order among the latest transactions that have completed data synchronization with the master node;
  • the coordination node sends the data access request and the first snapshot point to the multiple backup nodes;
  • the multiple backup nodes Based on the data access request and the first snapshot point, the multiple backup nodes send the data fragments of the target version of the data to the coordination node, and the target version is the version corresponding to the first snapshot point.
  • the transaction snapshot point is the transaction submission snapshot point obtained when the transaction submission is completed.
  • the transaction snapshot point of each transaction The identifier is globally unique, and the transaction snapshot point of each transaction is globally unique, and the transaction snapshot point can indicate the order in which transactions are submitted.
  • the transaction snapshot point as the transaction commit sequence number (CSN) as an example, if the CSNs of transaction A, transaction B, and transaction C are 111, 113, and 112 respectively, it means that the submission order of these three transactions is transaction A, transaction B, and transaction C.
  • Transaction C and transaction B the transaction commit sequence number
  • the coordination node responds to the data access request for the data, determines the first snapshot point from the transaction snapshot points reported by multiple standby nodes in the multiple groups of data nodes, and combines the first snapshot point with the data access request.
  • the request is sent to the multiple backup nodes, so that the multiple backup nodes return data fragments of the target version of the data based on the first snapshot point.
  • the first snapshot point can indicate the transaction with the earliest submission order among the latest transactions on multiple standby nodes that have completed data synchronization with the primary node, the first snapshot point can be used as a globally consistent read. Snapshot points ensure that multiple standby nodes return the same version of data shards, thus meeting the data consistency principle.
  • the multiple backup nodes send the data fragments of the target version of the data to the coordination node based on the data access request and the first snapshot point, including:
  • the first backup node Based on the data access request and the first snapshot point, if the transaction snapshot point of the latest transaction on the first backup node that has completed data synchronization with the primary node is greater than or equal to the first snapshot point, the first backup node will The data fragments of the target version of the data on the first backup node are sent to the coordination node, and the first backup node is any backup node among the plurality of backup nodes.
  • the above process is the process in which the standby node determines data visibility based on the first snapshot point. In this way, the standby node can return the version of data shards corresponding to the first snapshot point, satisfying the data consistency principle.
  • the method further includes:
  • the first backup node Based on the data access request and the first snapshot point, if the transaction snapshot point of the latest transaction on the first backup node that has completed data synchronization with the primary node is less than the first snapshot point, the first backup node will The data access request and the first snapshot point are sent to the primary node corresponding to the first backup node;
  • the master node corresponding to the first backup node sends the data fragments of the target version of the data to the coordination node based on the data access request and the first snapshot point.
  • the primary node corresponding to the backup node returns the execution results to the coordination node, thus ensuring that the client can receive complete data access results.
  • the method further includes:
  • the multiple backup nodes send the transaction snapshot points of the latest transactions on the multiple backup nodes that have completed data synchronization with the primary node to the coordination node every first interval;
  • the coordination node updates the transaction snapshot points reported by the multiple standby nodes based on the received transaction snapshot points.
  • each standby node in the distributed database system maintains a maximum transaction snapshot point internally, and reports it to the coordination node regularly, so that the coordination node knows that each standby node in the system has completed data synchronization with the primary node.
  • the latest transaction provides the basis for ensuring the data consistency of each data shard when subsequently reading data based on data access requests.
  • the transaction snapshot point is any of the following:
  • the coordination node determines the first snapshot point from transaction snapshot points reported by multiple standby nodes in the multiple groups of data nodes, including:
  • the coordinating node determines the first snapshot point from a target list for The node identifiers of the multiple standby nodes and the transaction snapshot points reported by the multiple standby nodes are stored.
  • the transaction snapshot points reported by the multiple standby nodes indicate the latest transactions on the multiple standby nodes that have completed data synchronization with the primary node.
  • the coordination node stores the transaction snapshot points reported by each standby node in the form of a list, which facilitates query and improves data access efficiency.
  • the method further includes:
  • the coordination node determines a second snapshot point from the transaction snapshot points reported by the plurality of backup nodes.
  • the second snapshot point indicates that the plurality of backup nodes have been reconciled with the master node.
  • the first master node is any one of the multiple master nodes;
  • the coordination node sends the second snapshot point to the first master node
  • the first master node cleans the data fragments of the historical version of the data on the first master node, and the historical version is the version before the second snapshot point.
  • the second snapshot point can indicate the transaction with the earliest submission order among the latest transactions on multiple standby nodes that have completed data synchronization with the primary node
  • the second snapshot point can be used as a snapshot point for global consistency cleanup. , ensuring the data consistency of each data shard when the master node performs data cleaning.
  • system further includes a management node
  • method further includes:
  • the multiple backup nodes send the transaction snapshot points of the latest transactions on the multiple backup nodes that have completed data synchronization with the primary node to the management node every second interval;
  • the coordination node determines the first snapshot point from the transaction snapshot points reported by multiple standby nodes in the multiple groups of data nodes, including:
  • the coordination node In response to the data access request, the coordination node sends a snapshot point acquisition request to the management node to obtain the first snapshot point.
  • each standby node in the distributed database system maintains a maximum transaction snapshot point internally, and reports it to the management node regularly, so that the management node knows that each standby node in the system has completed data synchronization with the primary node.
  • the coordination node reads data based on a data access request, it can obtain the corresponding snapshot point by sending a snapshot point acquisition request to the management node, which provides a basis for ensuring the data consistency of each data shard.
  • system further includes a management node
  • method further includes:
  • the first master node When the target transaction submission is completed, the first master node sends a transaction submission request for the target transaction to the management node;
  • the management node responds to the transaction submission request, generates a transaction snapshot point of the target transaction, and sends the transaction snapshot point of the target transaction to the first master node;
  • the backup node corresponding to the first master node performs log playback on the target transaction and completes data synchronization with the first master node.
  • embodiments of the present application provide a distributed database system, which includes a coordination node and multiple groups of data nodes, each group of data nodes including a master node and a backup node;
  • the coordination node is configured to respond to a data access request for data and determine a first snapshot point from transaction snapshot points reported by multiple backup nodes in the plurality of groups of data nodes, where the first snapshot point indicates the multiple backup nodes.
  • the transaction with the earliest submission order among the latest transactions on the node that have completed data synchronization with the master node;
  • the coordination node is also used to send the data access request and the first snapshot point to the multiple backup nodes;
  • the multiple backup nodes are configured to send data fragments of a target version of the data to the coordination node based on the data access request and the first snapshot point, and the target version is a version corresponding to the first snapshot point.
  • the first backup node is configured to, based on the data access request and the first snapshot point, the snapshot point of the latest transaction that has completed data synchronization with the primary node on the first backup node is greater than or equal to the first snapshot point.
  • the data fragments of the target version of the data on the first backup node are sent to the coordination node, and the first backup node is any backup node among the plurality of backup nodes.
  • the first standby node is also configured to, based on the data access request and the first snapshot point, the transaction snapshot point of the latest transaction that has completed data synchronization with the primary node on the first standby node is smaller than the In the case of the first snapshot point, send the data access request and the first snapshot point to the primary node corresponding to the first backup node;
  • the primary node corresponding to the first backup node is used to send the data fragments of the target version of the data to the coordination node based on the data access request and the first snapshot point.
  • the multiple backup nodes are configured to send to the coordination node the transaction snapshot points of the latest transactions on the multiple backup nodes that have completed data synchronization with the primary node every first interval;
  • the coordination node is used to update the transaction snapshot points reported by the multiple standby nodes based on the received transaction snapshot points.
  • the transaction snapshot point is any of the following:
  • the coordination node is configured to determine the first snapshot point from a target list in response to the data access request.
  • the target list is used to store node identifiers of the multiple backup nodes and the multiple backup nodes.
  • the transaction snapshot points reported by the multiple standby nodes indicate the latest transactions on the multiple standby nodes that have completed data synchronization with the primary node.
  • the coordination node is also configured to respond to the data cleaning request sent by the first master node and determine a second snapshot point from the transaction snapshot points reported by the multiple backup nodes.
  • the second snapshot point indicates Among the latest transactions on the multiple standby nodes that have completed data synchronization with the primary node, the transaction with the earliest submission order, the first primary node is any one of the multiple primary nodes;
  • the coordination node is also used to send the second snapshot point to the first master node
  • the first master node is configured to clean the data fragments of the historical version of the data on the first master node based on the second snapshot point, and the historical version is the version before the second snapshot point.
  • the system also includes a management node
  • the multiple backup nodes are also used to send transaction snapshot points of the latest transactions on the multiple backup nodes that have completed data synchronization with the primary node to the management node every second interval;
  • the coordination node is configured to respond to the data access request and send a snapshot point acquisition request to the management node to obtain the first snapshot point.
  • the system also includes a management node
  • the first master node is used to send a transaction submission request for the target transaction to the management node when the target transaction submission is completed;
  • the management node is configured to respond to the transaction submission request, generate a transaction snapshot point of the target transaction, and send the transaction snapshot point of the target transaction to the first master node;
  • the standby node corresponding to the first master node is used to perform log playback of the target transaction and complete data synchronization with the first master node.
  • embodiments of the present application provide a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is used to execute the memory of the at least one computing device.
  • the stored instructions enable the computing device cluster to execute the data access method provided by the aforementioned first aspect or any possible implementation of the first aspect.
  • inventions of the present application provide a data access device applied to a distributed database system.
  • the system includes a coordination node and multiple groups of data nodes.
  • Each group of data nodes includes a master node and a backup node.
  • the device includes at least one The functional module is used to perform the functions of the coordination node involved in the aforementioned second aspect or any possible implementation manner of the second aspect.
  • inventions of the present application provide a data access device applied to a distributed database system.
  • the system includes a coordination node and multiple groups of data nodes.
  • Each group of data nodes includes a master node and a backup node.
  • the device includes at least one The functional module is used to perform the functions of the backup node involved in the aforementioned second aspect or any possible implementation manner of the second aspect.
  • embodiments of the present application provide a computer program product containing instructions.
  • the cluster of computing devices causes the cluster of computing devices to perform any of the above-mentioned first aspects or possible methods of the first aspect.
  • the computer program product can be a software installation package. When it is necessary to realize the functions of the aforementioned computing device cluster, the computer program product can be downloaded and executed on the computing device cluster.
  • embodiments of the present application provide a computer-readable storage medium, including computer program instructions.
  • the computing device cluster executes the aforementioned first aspect or the first aspect.
  • Data access methods provided by any possible implementation.
  • the storage medium includes but is not limited to volatile memory, such as random access memory, and non-volatile memory, such as flash memory, hard disk drive (HDD), and solid state drive (SSD).
  • Figure 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • Figure 2 is a schematic architectural diagram of a distributed database system provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of the hardware structure of a computing device provided by an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of a connection method of a computing device cluster provided by an embodiment of the present application.
  • Figure 6 is a schematic flowchart of a data access method provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of a data node regularly reporting transaction snapshot points provided by an embodiment of the present application.
  • Figure 8 is a schematic flowchart of a data access method provided by an embodiment of the present application.
  • Figure 9 is a schematic flow chart of a data cleaning method provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a data access device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a data access device provided by an embodiment of the present application.
  • Database is an electronic file cabinet, which is a place where electronic files are stored. Users can add, query, update, delete and other operations on the data in electronic files.
  • the so-called “database” is a collection of data that is stored together in a certain way, can be shared with multiple users, has as little redundancy as possible, and is independent of the application.
  • a transaction is a logical unit in the process of executing operations in a database system. It consists of a limited sequence of database operations and is the smallest execution unit of database system operations.
  • Data shard refers to the smallest logical unit for data management in a distributed database system.
  • a data shard has multiple copies.
  • the sharding information of the data shard is stored.
  • the fragmentation information includes the data range of the data fragmentation and node information of each of the multiple replicas, etc., which is not limited.
  • a virtual machine refers to a complete computer system with complete hardware system functions simulated by software and running in a completely isolated environment. Everything that can be done in a server can be done in a virtual machine.
  • Each virtual machine has an independent hard disk and operating system. Users of the virtual machine can use it just like using the server. virtual machine to operate.
  • Figure 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. As shown in Figure 1, the implementation environment includes a terminal 101 and a server 102. The terminal 101 is directly or indirectly connected to the server 102 through a wireless network or a wired network.
  • the terminal 101 may be at least one of a smart phone, a desktop computer, an augmented reality terminal, a tablet computer, an e-book reader, and a laptop computer.
  • Terminal 101 is capable of installing and running applications.
  • the application can be a client application, a browser application, etc., and is not limited to this.
  • the application is a web browsing client, a social networking client or an audio and video client, etc. Taking the application as a web browsing client as an example, the user can browse various web data through the web browsing client.
  • the server 102 is an independent physical server, or a server cluster or distributed file system composed of multiple physical servers, or it provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and middleware. Cloud servers for basic cloud computing services such as software services, domain name services, security services, content delivery network (CDN), and big data and artificial intelligence platforms.
  • the server 102 is used to run a distributed database system and provide background services for applications running on the terminal 101. For example, taking the web browsing client as an example, the terminal 101 responds to the user's browsing operation for the target web page, triggering the web browsing client to send a data access request for the target web page to the distributed database system running on the server 102.
  • the database system In response to the data access request, the database system reads the web page data of the target web page and returns the data access result to the web browsing client.
  • the server 102 runs the distributed database system through a virtual machine, or the server 102 runs the distributed database system through a container engine, which is not limited.
  • the terminal 101 can generally refer to one of multiple terminals, or a collection of multiple terminals; the server 102 can be a computing device cluster, a virtual machine or a container engine, etc.
  • the embodiment of this application does not specify the number and quantity of each device in the implementation environment.
  • the device type is not limited.
  • the wireless network or wired network described above uses standard communication technologies and/or protocols.
  • Networks include but are not limited to data center network (data center network), storage area network (SAN), local area network (LAN), metropolitan area network (MAN), wide area network (wide area network) , WAN), mobile, wired or wireless network, private network or any combination of virtual private network.
  • technologies and/or formats including hyper text markup language (HTML), extensible markup language (XML), etc. are used to represent data exchanged over the network.
  • you can also use services such as Secure Sockets Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec) Wait for conventional encryption techniques to encrypt all or part of the link.
  • customized and/or dedicated data communication technologies can also be used in place of or in addition to the data communication technologies described above.
  • FIG. 2 is a schematic architectural diagram of a distributed database system provided by an embodiment of the present application.
  • FIG. 2 is only an exemplary structural diagram showing a distributed database system. This application does not limit the division of various parts in the distributed database system.
  • the distributed database system 200 includes a coordinator node (CN) 201, multiple groups of data nodes (DN) 202, and a management node 203.
  • Each group of data nodes 202 includes a master node and at least one Standby node, for any group of data nodes 202, the group of data nodes 202 is used to maintain a data fragment, and the master node in the group of data nodes 202 is used to execute write transactions for the data fragment.
  • the group of data nodes 202 The standby node in is used to execute read transactions for this data shard.
  • the coordination node 201 There is a communication connection between the coordination node 201 and the terminal.
  • the coordination node 201 There is a client running on the terminal.
  • the coordination node 201 is used to receive data processing requests for data sent by the client (such as data access requests, data update requests, etc.), and sends the requests to
  • the corresponding data node 202 executes, and after receiving the execution result fed back by the data node 202, the corresponding data processing result (such as data access result, data update result, etc.) is returned to the client.
  • Multiple groups of data nodes 202 are used to store client data, receive data processing requests from the coordination node 201, execute corresponding transactions (such as read transactions corresponding to data access, write transactions corresponding to data updates, etc.), and return execution to the coordination node 201 result.
  • the management node 203 is used to generate and maintain globally unique information such as transaction identifiers and transaction snapshot points.
  • the management node 203 is also called a global transaction manager (global transaction manager, GTM), which is not limited.
  • GTM global transaction manager
  • the coordination node 201 receives the data update request sent by the client, applies to the management node 203 to open a distributed write transaction, receives the transaction identifier issued by the management node 203, and determines the data corresponding to the data.
  • Sharding according to the routing sharding rules, the distributed write transaction is sent to the corresponding master node (DN1 master and DN2 master) for execution. After the master node completes the execution, the execution result is sent to the coordination node 201, so that the coordination node 201 to customers The end returns the data update result.
  • the coordination node 201 applies to the management node 203 for a transaction snapshot point for which the distributed write transaction has been submitted, and feeds back the transaction snapshot point to the corresponding master node.
  • the standby nodes (DN1 standby and DN2 standby) perform log playback of the distributed write transaction and complete data synchronization with the primary node.
  • the coordination node 201 receives the data access request sent by the client, determines the data fragment corresponding to the data, and satisfies read-write separation on the backup nodes (DN1 backup and DN2 backup) where the data fragments are located. If the conditions are met, the data access request is sent to the corresponding standby node for execution according to the routing fragmentation rules. After the standby node completes the execution, the execution result is sent to the coordination node 201, so that the coordination node 201 returns data to the client. Access results. This process will be introduced in detail in subsequent method embodiments and will not be described again here.
  • both the coordination node and the data node in the above distributed database system can be implemented by software or can be implemented by hardware.
  • the implementation of the coordination node is introduced next.
  • the implementation of data nodes can refer to the implementation of coordination nodes.
  • a coordination node may include code running on a computing instance.
  • the computing instance may be at least one of a physical host (computing device), a virtual machine, a container, and other computing devices. Further, the above computing device may be one or more.
  • a coordinator node can include code running on multiple hosts/VMs/containers. It should be noted that multiple hosts/virtual machines/containers used to run the application can be distributed in the same region (region) or in different regions. Multiple hosts/VMs/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs, each AZ including a data center or multiple geographical locations Close data center. Among them, usually a region can include multiple AZs.
  • the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs.
  • VPC virtual private cloud
  • Cross-region communication between two VPCs in the same region or between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway.
  • a node may include at least one computing device, such as a server.
  • the coordination node can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • Multiple computing devices included in the coordination node can be distributed in the same region or in different regions. Multiple computing devices included in the coordination node can be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the coordination node can be distributed in the same VPC or in multiple VPCs.
  • the multiple computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • FIG. 3 is a schematic diagram of the hardware structure of a computing device provided by an embodiment of the present application.
  • computing device 300 includes: bus 302 , processor 304 , memory 306 , and communication interface 308 .
  • Processor 304, storage Communication between the device 306 and the communication interface 308 is through the bus 302. It should be understood that this application does not limit the number of processors and memories in the computing device 300.
  • the bus 302 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 3, but it does not mean that there is only one bus or one type of bus.
  • Bus 304 may include a path that carries information between various components of computing device 300 (eg, memory 306, processor 304, communications interface 308).
  • the processor 304 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 306 may include volatile memory, such as random access memory (RAM).
  • RAM random access memory
  • the processor 304 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 306 stores executable program code, and the processor 304 executes the executable program code to respectively implement the functions of the aforementioned coordination node and data node, thereby implementing the following data access method. That is, the memory 306 stores instructions for executing the data access method.
  • the communication interface 303 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 300 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • Figure 4 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application. As shown in FIG. 4 , the computing device cluster includes at least one computing device 300 . The same instructions for performing data access methods may be stored in the memory 306 of one or more computing devices 300 in a cluster of computing devices.
  • the memory 306 of one or more computing devices 300 in the computing device cluster may also store part of the instructions for executing the data access method respectively.
  • a combination of one or more computing devices 300 may collectively execute instructions for performing a data access method.
  • the memory 306 in different computing devices 300 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the distributed database system. That is, instructions stored in memory 306 in different computing devices 300 may implement the functions of one or more of the coordination nodes and the data nodes.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network can be a wide area network or a local area network, etc.
  • Figure 5 is a schematic diagram of a connection method of a computing device cluster provided by an embodiment of the present application. As shown in Figure 5, two computing devices 300 are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
  • the connection method between computing device clusters shown in Figure 5 takes into account that the data access method provided by this application involves different types of nodes, so there are different execution methods stored in the memories of different computing devices. Instructions for the node's functionality. For example, memory 306 in a computing device 300 stores instructions for performing the functions of a coordinating node. Memory 306 in another computing device 300 contains instructions for performing the functions of the data node.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
  • the computer program product when executed on a cluster of computing devices, causes the cluster of computing devices to execute the data access method.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium can be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct a cluster of computing devices to perform a data access method.
  • FIG 6 is a schematic flowchart of a data access method provided by an embodiment of the present application. As shown in Figure 6, this data access method is applied to the above-mentioned distributed database system. The following takes the interaction between nodes in the distributed database system as an example to introduce this data access method. Illustratively, the method includes the following steps 601 to 606.
  • Multiple backup nodes in multiple groups of data nodes send transaction snapshot points of the latest transactions on the multiple backup nodes that have completed data synchronization with the primary node to the coordination node at intervals of a first period of time.
  • the distributed database system stores data based on data sharding.
  • a data shard is maintained by a group of data nodes.
  • the group of data nodes includes a master node and at least one backup node.
  • the master node is used to perform writing.
  • Transaction the standby node is used to execute read transactions.
  • the transaction snapshot point is the transaction submission snapshot point obtained by the master node through the management node when the transaction submission is completed.
  • the transaction identifier of each transaction is globally unique, and the transaction snapshot point of each transaction is globally unique, and the transaction snapshot point can indicate The order in which transactions are committed. In other words, there is a mapping relationship between transactions, transaction identifiers, and transaction snapshot points, and each transaction corresponds to a unique transaction identifier and transaction snapshot point.
  • the transaction snapshot point is a transaction commit sequence number (CSN) or a transaction commit timestamp, etc., which is not limited.
  • CSN transaction commit sequence number
  • CSN is a global self-increasing integer.
  • the order of transaction submission can be determined based on the size of CSN. The smaller the CSN, the earlier the transaction submission order. For example, if the CSNs of transaction A, transaction B, and transaction C are 111, 113, and 112 respectively, it indicates that the submission order of these three transactions is transaction A, transaction C, and transaction B.
  • the first master node (any master node among multiple groups of data nodes) executing a target transaction as an example
  • the first master node sends a transaction submission request for the target transaction to the management node
  • the management node responds to the transaction submission request, generates a transaction snapshot point of the target transaction, and sends the transaction snapshot point of the target transaction to the first master node
  • the backup node corresponding to the first master node performs log playback of the target transaction , complete data synchronization with the first master node.
  • the backup node corresponding to the first master node obtains the transaction snapshot point of the target transaction during the log playback process of the target transaction.
  • the backup nodes of multiple groups of data nodes complete data synchronization with the primary node through log playback, that is, obtain the transaction snapshot point of the submitted transaction.
  • each backup node Every first interval, the node sends the transaction snapshot point of the latest transaction on the standby node that has completed data synchronization with the primary node to the coordination node.
  • the first duration is a preset duration, which can be set according to requirements. For example, the first duration is 5 seconds, which is not limited. This process means that, taking the transaction snapshot point as CSN as an example, each standby node in the distributed database system maintains a maximum transaction snapshot point internally and reports it to the coordination node regularly.
  • the multiple backup nodes can synchronously report transaction snapshot points to the coordination node every first time interval (for example, backup node 1 and backup node 2 report synchronously), or they can report asynchronously (such as backup node 1 and backup node 2). 2 reported in sequence), the embodiments of this application do not limit this.
  • the coordination node updates the transaction snapshot points reported by the multiple backup nodes based on the received transaction snapshot points.
  • the coordination node stores a target list, which is used to store the node identifiers of the multiple standby nodes and the transaction snapshot points reported by the multiple standby nodes. It should be noted that the coordination node can also store transaction snapshot points reported by multiple standby nodes in other forms, and is not limited to list form, which is not limited in the embodiments of this application.
  • each time the coordination node receives a transaction snapshot point reported by a standby node it updates the transaction snapshot point reported by the standby node in the target list based on the node identifier of the standby node. For example, taking the transaction snapshot point as CSN, the target list stores backup node 1 (115), backup node 2 (116), and backup node 3 (117). If the coordination node receives the transaction snapshot point sent by backup node 1 118, then update the target list to standby node 1 (118), standby node 2 (116), and standby node 3 (117). In this way, the coordination node can promptly update the target list based on the received transaction snapshot point, ensuring the real-time nature of the target list.
  • the coordinating node updates the target list based on multiple transaction snapshot points received within a specified time period.
  • the specified time period is 10 seconds, and there is no limit to this.
  • a certain backup node may report multiple transaction snapshot points within a specified time period.
  • the coordination node updates the target list based on the last transaction snapshot point reported by the backup node.
  • the coordinating node updates the target list when the number of received multiple transaction snapshot points reaches a specified number, for example, the specified number is 10.
  • the specified number of transaction snapshot points received by the coordination node may also include multiple transaction snapshot points reported by a certain standby node. Based on this, the coordination node updates the transaction snapshot point based on the last transaction snapshot point reported by the standby node. Target list.
  • each standby node in the distributed database system maintains a maximum transaction snapshot point, also called a transaction strong consistency snapshot point, which can ensure that the execution of transactions does not affect the data consistency of the system.
  • a transaction strong consistency snapshot point also called a transaction strong consistency snapshot point
  • the coordination node determines the first snapshot point from the transaction snapshot points reported by multiple standby nodes in the multiple groups of data nodes.
  • the first snapshot point indicates that the transaction has been completed and processed on the multiple standby nodes.
  • the transaction with the earliest submission order among the latest transactions completed by the master node to complete data synchronization.
  • a data access request is sent by the client, and the coordination node responds to the data access request and determines multiple groups of data nodes where the multiple data fragments are located based on the multiple data fragments corresponding to the data.
  • the coordination node responds to the data access request and determines multiple groups of data nodes where the multiple data fragments are located based on the multiple data fragments corresponding to the data.
  • the first snapshot point is determined from the transaction snapshot points reported by the multiple standby nodes in the multiple groups of data nodes.
  • the read-write separation condition indicates that the data update time difference between the standby node and the primary node in any group of data nodes is less than or equal to the target duration, that is, between the latest version of the data on the primary node and the latest version of the data on the standby node It cannot be too far apart, otherwise it will not meet business needs.
  • the target duration is 5 seconds, there is no limit to this.
  • the coordination node stores a target list, and accordingly, the coordination node responds to the data
  • the access request determines the first snapshot point from the target list.
  • the coordination node responds to the data access request, based on the multiple data shards corresponding to the data, determines the multiple groups of data nodes where the multiple data shards are located, and based on the multiple groups of data nodes
  • the node identifiers of multiple standby nodes are determined from the CSNs reported by the multiple standby nodes in the target list.
  • the minimum CSN is also the transaction with the earliest submission order, and the minimum CSN is used as the first snapshot point.
  • the coordination node sends the data access request and the first snapshot point to multiple standby nodes.
  • the coordination node sends the data access request and the first snapshot point to multiple backup nodes based on routing fragmentation rules.
  • multiple backup nodes Based on the data access request and the first snapshot point, multiple backup nodes send the data fragments of the target version of the data to the coordination node.
  • the target version refers to the version corresponding to the first snapshot point.
  • the transaction snapshot point is CSN
  • the first snapshot point is 119
  • the target version of the data refers to the version of the data after the transaction with CSN 119 is submitted.
  • the first backup node For any one of the multiple backup nodes (called the first backup node), based on the data access request and the first snapshot point, the first backup node has completed data processing with the primary node on the first backup node. If the transaction snapshot point of the latest synchronized transaction is greater than or equal to the first snapshot point, the data fragments of the target version of the data on the first backup node are sent to the coordination node. This process is also a process of determining data visibility based on the first snapshot point.
  • each standby node sends the same version of data fragments on its respective node to the coordinating node, ensuring the data consistency of each data fragment. For example, assuming that the transaction snapshot point is CSN, the first snapshot point is 119, and the transaction snapshot point of the latest transaction that has completed data synchronization with the primary node on the first standby node is 120. If 120 is greater than 119, the first snapshot point will be 119. The data fragments of the version corresponding to the transaction with CSN 119 on the standby node are sent to the coordination node.
  • the first backup node is based on the data access request and the first snapshot point.
  • the transaction snapshot point of the latest transaction that has completed data synchronization with the primary node on the first backup node is smaller than the first snapshot point.
  • the data access request and the first snapshot point are sent to the primary node corresponding to the first backup node; the primary node corresponding to the first backup node sends the data access request and the first snapshot point to Data fragments of the target version of the data are sent to the coordinating node. That is, when there is no data sharding of the target version on a backup node, the primary node corresponding to the backup node returns the execution results to the coordination node, thus ensuring that the client can receive complete data access results.
  • the coordination node returns the data access result for the data based on the multiple received data fragments.
  • the coordination node summarizes the multiple data fragments based on the fragmentation information of the multiple data fragments, obtains the data access result of the data, and returns the data access result to the client.
  • the backup node in the distributed database system reports the transaction snapshot point to the coordination node as an example.
  • the backup node can also report the transaction snapshot point to the management node.
  • the coordination node reads data based on a data access request, it can obtain the corresponding snapshot point by sending a snapshot point acquisition request to the management node.
  • multiple backup nodes send transaction snapshot points of the latest transactions on the multiple backup nodes that have completed data synchronization with the primary node to the management node every second interval.
  • the second duration is a preset duration, which can be set according to requirements. For example, the second duration is 5 seconds, which is not limited.
  • the management node updates the transaction snapshot points reported by the multiple backup nodes based on the received transaction snapshot points. That is, the management node can also store the multiple backup nodes in the form of a list. The transaction snapshot point reported by the node.
  • the coordination node receives the data access request sent by the client, the coordination node responds to the data access request and sends a snapshot point acquisition request to the management node.
  • the management node responds to the snapshot point acquisition request, determines the first snapshot point from the transaction snapshot points reported by multiple standby nodes in the multiple groups of data nodes, and sends the first snapshot point to the coordination node. This process is the same as the above-mentioned step 603 and will not be described again here. In this way, the management node maintains transaction snapshot points reported by multiple standby nodes, which can save the computing resources of the coordination node and release the storage space of the coordination node.
  • a data access method provided by the embodiment of the present application is introduced, which is applied to a distributed database system, in which the coordination node responds to a data access request for data and reports from multiple backup nodes in multiple groups of data nodes.
  • the transaction snapshot point determine the first snapshot point, and send the first snapshot point and the data access request to the multiple backup nodes, so that the multiple backup nodes return the target version of the data based on the first snapshot point.
  • Data sharding since the first snapshot point can indicate the transaction with the earliest submission order among the latest transactions on multiple standby nodes that have completed data synchronization with the primary node, the first snapshot point can be used as a globally consistent read. Snapshot points ensure that multiple standby nodes return the same version of data shards, thus meeting the data consistency principle.
  • the primary node when performing data cleaning, can ensure the data consistency of each data shard based on transaction snapshot points reported by multiple standby nodes. This process is introduced below, including the following steps:
  • Step A In response to the data cleaning request sent by the first master node, the coordination node determines a second snapshot point from the transaction snapshot points reported by the multiple backup nodes.
  • the second snapshot point indicates that the transaction snapshot points on the multiple backup nodes have been The transaction with the earliest submission order among the latest transactions completed by the master node to complete data synchronization.
  • the first master node is any one of the plurality of master nodes.
  • the coordination node stores a target list, and the coordination node determines the second snapshot point from the target list in response to the data cleaning request.
  • the coordination node responds to the data cleaning request, based on the multiple data shards corresponding to the data, determines the multiple groups of data nodes where the multiple data shards are located, and based on the multiple groups of data nodes
  • the node identifiers of multiple standby nodes are determined from the CSNs reported by the multiple standby nodes in the target list.
  • the minimum CSN is also the transaction with the earliest submission order, and the minimum CSN is used as the second snapshot point.
  • Step B The coordinating node sends the second snapshot point to the first master node.
  • Step C The first master node cleans the data fragments of the historical version of the data on the first master node based on the second snapshot point.
  • the historical version refers to the version before the second snapshot point. For example, assuming that the transaction snapshot point is CSN, if the second snapshot point is 118, the historical version of the data refers to the version of the data before the transaction with CSN 118 was executed. It should be understood that the first master node will save this data cleaning process in the log, and the backup node corresponding to the first master node can also complete data synchronization with the master node through log playback, which will not be described again here.
  • the coordination node responds to the data cleaning request of the first master node, determines the second snapshot point from the transaction snapshot points reported by multiple backup nodes, and sends the second snapshot point to the first
  • the master node enables the first master node to clean the historical version of the data based on the second snapshot point.
  • the second snapshot point can indicate the transaction with the earliest submission order among the latest transactions on multiple standby nodes that have completed data synchronization with the primary node, the second snapshot point can be used as a snapshot for global consistency cleanup. This ensures the data consistency of each data shard when the master node performs data cleaning.
  • the transaction snapshot point of the latest transaction submitted on the primary node in the group of data nodes is greater than the transaction snapshot point of the latest transaction on the standby node in the group of data nodes that has completed data synchronization with the primary node. point, because the data of the standby node cannot be cleaned while it is being read, so the minimum value among the transaction snapshot points reported by the standby node is used as the cleanup point, which can ensure the data consistency of each data shard during data cleansing.
  • FIG 7 is a schematic diagram of a data node regularly reporting transaction snapshot points provided by an embodiment of the present application.
  • each standby node in multiple groups of data nodes internally maintains a maximum transaction snapshot point (ie, CSN) and reports it to the coordination node regularly.
  • the coordination node After receiving the CSN reported by each standby node, the coordination node maintains a target list internally.
  • the target list is used to store the node identifiers of all standby nodes and the CSN reported by all standby nodes.
  • the CSN reported by the standby node is the CSN reported by the standby node.
  • FIG 8 is a schematic flowchart of a data access method provided by an embodiment of the present application. As shown in Figure 8, the data access method includes the following steps:
  • Step 1 The client initiates a data access request for data, that is, sends a data access request to the coordination node.
  • Step 2 The coordination node responds to the data access request and determines whether read-write separation can be used. If so, it determines the first snapshot point from the internally maintained "target list”. The first snapshot point is in the "target list”. The minimum CSN.
  • Step 3 The coordination node sends the data access request and the first snapshot point to multiple standby nodes.
  • Step 4 The backup node receives the data access request and the first snapshot point, makes a data visibility determination based on the first snapshot point, and reads the data fragments of the target version of the data.
  • Step 5 The standby node sends the data fragments to the coordinating node.
  • Step 6 The coordination node summarizes each data fragment and returns it to the client.
  • the first snapshot point can indicate the transaction with the earliest submission order among the latest transactions on multiple standby nodes that have completed data synchronization with the primary node, the first snapshot point can be used as a globally consistent read snapshot. point to ensure that multiple standby nodes return the same version of data shards, thus meeting the data consistency principle.
  • FIG. 9 is a schematic flowchart of a data cleaning method provided by an embodiment of the present application. As shown in Figure 9, the data cleaning method includes the following steps:
  • Step 1 The master node sends a data cleaning request to the coordination node.
  • Step 2 In response to the data cleaning request, the coordination node determines the second snapshot point from the internally maintained “target list", and the second snapshot point is the minimum CSN in the "target list”.
  • Step 3 The coordination node sends the second snapshot point to the master node.
  • Step 4 Based on the second snapshot point, the master node cleans the data shards of the historical version of the data on the master node.
  • the second snapshot point can indicate the transaction with the earliest submission order among the latest transactions that have been submitted on multiple master nodes, the second snapshot point can be used as a snapshot point for global consistency cleanup, ensuring that the master node Data consistency of each data shard during data cleaning.
  • Figure 10 is a schematic structural diagram of a data access device provided by an embodiment of the present application.
  • the data access device can realize part or all of the functions of the coordination node in the aforementioned distributed database system through software, hardware, or a combination of both.
  • the data access device provided by the embodiment of the present application is applied to a distributed database system.
  • the system includes a coordination node and multiple groups of data nodes. Each group of data nodes includes a master node and a backup node, and can realize the tasks performed by the coordination node in the above method embodiment. step.
  • the data access device includes a determining module 1001 and a sending module 1002.
  • the determination module 1001 is configured to respond to a data access request for data and determine a first snapshot point from transaction snapshot points reported by multiple standby nodes in the plurality of groups of data nodes, where the first snapshot point indicates the plurality of backup nodes.
  • the standby node has been connected to the primary node.
  • the transaction with the earliest submission order among the latest transactions that complete data synchronization;
  • the sending module 1002 is configured to send the data access request and the first snapshot point to the multiple backup nodes, so that the multiple backup nodes send the target version of the data based on the data access request and the first snapshot point. of data fragmentation, and the target version is the version corresponding to the first snapshot point.
  • the device further includes an update module, configured to: receive a transaction snapshot point of the latest transaction that has completed data synchronization with the primary node and is sent by the multiple backup nodes every first interval, based on the received transaction snapshot Click to update the transaction snapshot points reported by the multiple standby nodes.
  • an update module configured to: receive a transaction snapshot point of the latest transaction that has completed data synchronization with the primary node and is sent by the multiple backup nodes every first interval, based on the received transaction snapshot Click to update the transaction snapshot points reported by the multiple standby nodes.
  • the transaction snapshot point is any of the following:
  • the determination module 1001 is configured to determine the first snapshot point from a target list in response to the data access request.
  • the target list is used to store node identifiers of the multiple backup nodes and the multiple backup nodes.
  • the transaction snapshot points reported by the multiple standby nodes indicate the latest transactions on the multiple standby nodes that have completed data synchronization with the primary node.
  • the determination module 1001 is also configured to respond to a data cleaning request sent by the first master node, and determine a second snapshot point from the transaction snapshot points reported by the multiple backup nodes. Indicates the transaction with the earliest submission order among the latest transactions on the multiple standby nodes that have completed data synchronization with the primary node, and the first primary node is any primary node among the multiple primary nodes;
  • the sending module 1002 is also used to send the second snapshot point to the first master node, so that the first master node cleans the historical version of the data on the first master node based on the second snapshot point.
  • the historical version is the version before the second snapshot point.
  • the data access device provided in the above embodiment performs data processing
  • only the division of the above functional modules is used as an example.
  • the above function allocation can be completed by different functional modules as needed. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the data access device provided by the above embodiments and the data access device embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be described again here.
  • Figure 11 is a schematic structural diagram of a data access device provided by an embodiment of the present application.
  • the data access device can realize part or all of the functions of the backup node in the aforementioned distributed database system through software, hardware, or a combination of both.
  • the data access device provided by the embodiment of the present application is applied to a distributed database system.
  • the system includes a coordination node and multiple groups of data nodes. Each group of data nodes includes a master node and a backup node, and can realize the functions of any backup node in the above method embodiments. steps to perform.
  • the data access device includes a receiving module 1101 and a sending module 1102.
  • the receiving module 1101 is configured to receive a data access request for data sent by the coordinating node and a first snapshot point.
  • the first snapshot point is obtained by the coordinating node from multiple backups in the multiple groups of data nodes in response to the data access request. Determined from the transaction snapshot points reported by the node, the first snapshot point indicates the transaction with the earliest submission order among the latest transactions on the multiple backup nodes that have completed data synchronization with the primary node;
  • the sending module 1102 is configured to send the data fragments of the target version of the data to the coordination node based on the data access request and the first snapshot point, and the target version is the version corresponding to the first snapshot point.
  • the sending module 1102 is configured to, based on the data access request and the first snapshot point, the If the transaction snapshot point of the latest transaction for which the master node completes data synchronization is greater than or equal to the first snapshot point, the data fragments of the target version of the data are sent to the coordination node.
  • the device further includes a reporting module configured to send to the coordination node at every first interval a transaction snapshot point of the latest transaction on the standby node that has completed data synchronization with the primary node.
  • the transaction snapshot point is any of the following:
  • system further includes a management node
  • sending module 1102 is further configured to send to the management node every second interval a transaction snapshot point of the latest transaction on the standby node that has completed data synchronization with the primary node.
  • the data access device provided in the above embodiment performs data processing
  • only the division of the above functional modules is used as an example.
  • the above function allocation can be completed by different functional modules as needed. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the data access device provided by the above embodiments and the data access device embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be described again here.
  • first, second and other words are used to distinguish the same or similar items with basically the same functions and functions. It should be understood that the terms “first”, “second” and “nth” There is no logical or sequential dependency, and there is no limit on the number or execution order. It should also be understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, without departing from the scope of the various described examples, a first backup node may be referred to as a second backup node, and similarly, the second backup node may be referred to as a first phrase. Both the first phrase and the second backup node may be backup nodes, and in some cases, may be separate and distinct backup nodes.
  • multiple backup nodes means two or more standby node.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of program structure information.
  • the program structure information includes one or more program instructions.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium can be read-only memory, magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据访问方法、分布式数据库系统及计算设备集群,属于数据库技术领域。该数据访问方法应用于分布式数据库系统,该系统包括协调节点(201)和多组数据节点(202),其中,协调节点(201)响应于针对数据的数据访问请求,从多组数据节点(202)中的多个备节点上报的事务快照点中,确定第一快照点,将第一快照点和数据访问请求发送给多个备节点,以使多个备节点基于第一快照点返回数据的目标版本的数据分片。在这一过程中,由于第一快照点能够指示多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,因此该第一快照点能够作为一种全局一致性读的快照点,确保多个备节点返回同一版本的数据分片,从而满足了数据一致性原则。

Description

数据访问方法、分布式数据库系统及计算设备集群
本申请要求于2022年08月22日提交的申请号为202211009112.4、发明名称为“数据访问方法、分布式数据库系统及计算设备集群”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据库技术领域,特别涉及一种数据访问方法、分布式数据库系统及计算设备集群。
背景技术
目前,越来越多的互联网应用采用分布式数据库系统来实现相关业务。由于大多数互联网业务操作的特点是读多写少,因此,往往采用读写分离来提升分布式数据库系统的读性能。例如,通过为分布式数据库系统中各个数据分片(shard)设置主备节点来实现读写分离,由主节点负责“写操作”,由备节点负责“读操作”。
然而,由于分布式数据库系统中各个数据分片对应的主备节点之间数据同步时会存在延迟,导致在向多个备节点同时发起访问请求时,难以确保获取到的数据满足一致性原则。例如,在基于一个分布式事务对数据分片A和数据分片B进行数据更新时,由于主备节点之间的数据同步存在延迟,数据分片A对应的备节点已完成数据同步,数据分片B对应的备节点尚未完成数据同步,此时,若向这两个数据分片对应的备节点同时发起访问请求,则从数据分片A对应的备节点上获取到的是更新后的数据分片,从数据分片B对应的备节点上获取到的是更新前的数据分片,导致获取到的数据不满足一致性原则。
因此,亟需一种能够确保读写分离场景下,从备节点获取到的数据满足一致性原则的数据访问方法和分布式数据库系统。
发明内容
本申请实施例提供了一种数据访问方法、分布式数据库系统及计算设备集群,能够确保读写分离场景下,从备节点获取到的数据满足一致性原则。该技术方案如下:
第一方面,提供了一种数据访问方法,应用于分布式数据库系统,该系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点,该方法包括:
该协调节点响应于针对数据的数据访问请求,从该多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,该第一快照点指示该多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务;
该协调节点将该数据访问请求和该第一快照点发送给该多个备节点;
该多个备节点基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给该协调节点,该目标版本为该第一快照点对应的版本。
其中,事务快照点是事务提交完成的情况下获取到的事务提交快照点,每个事务的事务 标识全局唯一,每个事务的事务快照点全局唯一,且事务快照点能够指示事务提交的先后顺序。换言之,事务、事务标识以及事务快照点之间存在映射关系,每个事务都对应有唯一的事务标识和事务快照点。以事务快照点为事务提交序列号(commit sequence number,CSN)为例,若事务A、事务B以及事务C的CSN分别为111、113、112,表明这三个事务的提交顺序为事务A、事务C以及事务B。在上述方法中,协调节点响应于针对数据的数据访问请求,从多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,将该第一快照点和该数据访问请求发送给该多个备节点,以使该多个备节点基于该第一快照点返回该数据的目标版本的数据分片。在这一过程中,由于第一快照点能够指示多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,因此该第一快照点能够作为一种全局一致性读的快照点,确保多个备节点返回同一版本的数据分片,从而满足了数据一致性原则。
在一些实施例中,该多个备节点基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给该协调节点,包括:
第一备节点基于该数据访问请求和该第一快照点,在该第一备节点上已和主节点完成数据同步的最新事务的事务快照点大于或等于该第一快照点的情况下,将该第一备节点上该数据的目标版本的数据分片发送给该协调节点,该第一备节点为该多个备节点中的任一个备节点。
上述过程也即是备节点基于第一快照点进行数据可见性判定的过程,通过这种方式,备节点能够返回该第一快照点对应的版本的数据分片,满足数据一致性原则。
在一些实施例中,该方法还包括:
该第一备节点基于该数据访问请求和该第一快照点,在该第一备节点上已和主节点完成数据同步的最新事务的事务快照点小于该第一快照点的情况下,将该数据访问请求和该第一快照点发送给该第一备节点对应的主节点;
该第一备节点对应的主节点基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给该协调节点。
通过这种方式,当某一备节点上不存在目标版本的数据分片时,由该备节点对应的主节点向协调节点返回执行结果,从而确保了客户端能够接收到完整的数据访问结果。
在一些实施例中,该方法还包括:
该多个备节点每间隔第一时长,向该协调节点发送该多个备节点上已和主节点完成数据同步的最新事务的事务快照点;
该协调节点基于接收到的事务快照点,更新该多个备节点上报的事务快照点。
通过上述方式,分布式数据库系统中每个备节点内部都维护一个最大的事务快照点,并定时上报给协调节点,以使协调节点获知系统中每个备节点上已和主节点完成数据同步的最新事务,为后续基于数据访问请求读取数据时,确保各个数据分片的数据一致性提供了基础。
在一些实施例中,该事务快照点为下述任一项:
事务提交序列号;
事务提交时间戳。
在一些实施例中,该协调节点响应于针对数据的数据访问请求,从该多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,包括:
该协调节点响应于该数据访问请求,从目标列表中确定该第一快照点,该目标列表用于 存储该多个备节点的节点标识和该多个备节点上报的事务快照点,该多个备节点上报的事务快照点指示该多个备节点上已和主节点完成数据同步的最新事务。
通过这种方式,协调节点以列表的形式存储各个备节点上报的事务快照点,便于查询,提高了数据访问效率。
在一些实施例中,该方法还包括:
该协调节点响应于第一主节点发送的数据清理请求,从该多个备节点上报的事务快照点中,确定第二快照点,该第二快照点指示该多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,该第一主节点为该多个主节点中的任一个主节点;
该协调节点将该第二快照点发送给该第一主节点;
该第一主节点基于该第二快照点,清理该第一主节点上该数据的历史版本的数据分片,该历史版本为该第二快照点之前的版本。
在上述过程中,由于第二快照点能够指示多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,因此第二快照点能够作为一种全局一致性清理的快照点,确保了主节点在进行数据清理时各个数据分片的数据一致性。
在一些实施例中,该系统还包括管理节点,该方法还包括:
该多个备节点每间隔第二时长,向该管理节点发送该多个备节点上已和主节点完成数据同步的最新事务的事务快照点;
该协调节点响应于针对数据的数据访问请求,从该多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,包括:
该协调节点响应于该数据访问请求,向该管理节点发送快照点获取请求,以获取该第一快照点。
通过上述方式,分布式数据库系统中每个备节点内部都维护一个最大的事务快照点,并定时上报给管理节点,以使管理节点获知系统中每个备节点上已和主节点完成数据同步的最新事务,当协调节点基于数据访问请求读取数据时,能够通过向管理节点发送快照点获取请求的方式获取相应的快照点,为确保各个数据分片的数据一致性提供了基础。
在一些实施例中,该系统还包括管理节点,该方法还包括:
第一主节点在目标事务提交完成的情况下,向该管理节点发送针对该目标事务的事务提交请求;
该管理节点响应于该事务提交请求,生成该目标事务的事务快照点,将该目标事务的事务快照点发送给该第一主节点;
该第一主节点对应的备节点对该目标事务进行日志回放,完成和该第一主节点的数据同步。
第二方面,本申请实施例提供了一种分布式数据库系统,该系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点;
该协调节点,用于响应于针对数据的数据访问请求,从该多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,该第一快照点指示该多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务;
该协调节点,还用于将该数据访问请求和该第一快照点发送给该多个备节点;
该多个备节点,用于基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给该协调节点,该目标版本为该第一快照点对应的版本。
在一些实施例中,第一备节点,用于基于该数据访问请求和该第一快照点,在该第一备节点上已和主节点完成数据同步的最新事务的快照点大于或等于该第一快照点的情况下,将该第一备节点上该数据的目标版本的数据分片发送给该协调节点,该第一备节点为该多个备节点中的任一个备节点。
在一些实施例中,该第一备节点,还用于基于该数据访问请求和该第一快照点,在该第一备节点上已和主节点完成数据同步的最新事务的事务快照点小于该第一快照点的情况下,将该数据访问请求和该第一快照点发送给该第一备节点对应的主节点;
该第一备节点对应的主节点,用于基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给该协调节点。
在一些实施例中,该多个备节点,用于每间隔第一时长,向该协调节点发送该多个备节点上已和主节点完成数据同步的最新事务的事务快照点;
该协调节点,用于基于接收到的事务快照点,更新该多个备节点上报的事务快照点。
在一些实施例中,该事务快照点为下述任一项:
事务提交序列号;
事务提交时间戳。
在一些实施例中,该协调节点,用于响应于该数据访问请求,从目标列表中确定该第一快照点,该目标列表用于存储该多个备节点的节点标识和该多个备节点上报的事务快照点,该多个备节点上报的事务快照点指示该多个备节点上已和主节点完成数据同步的最新事务。
在一些实施例中,该协调节点,还用于响应于第一主节点发送的数据清理请求,从该多个备节点上报的事务快照点中,确定第二快照点,该第二快照点指示该多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,该第一主节点为该多个主节点中的任一个主节点;
该协调节点,还用于将该第二快照点发送给该第一主节点;
该第一主节点,用于基于该第二快照点,清理该第一主节点上该数据的历史版本的数据分片,该历史版本为该第二快照点之前的版本。
在一些实施例中,该系统还包括管理节点,
该多个备节点,还用于每间隔第二时长,向该管理节点发送该多个备节点上已和主节点完成数据同步的最新事务的事务快照点;
该协调节点,用于响应于该数据访问请求,向该管理节点发送快照点获取请求,以获取该第一快照点。
在一些实施例中,该系统还包括管理节点,
第一主节点,用于在目标事务提交完成的情况下,向该管理节点发送针对该目标事务的事务提交请求;
该管理节点,用于响应于该事务提交请求,生成该目标事务的事务快照点,将该目标事务的事务快照点发送给该第一主节点;
该第一主节点对应的备节点,用于对该目标事务进行日志回放,完成和该第一主节点的数据同步。
第三方面,本申请实施例提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;该至少一个计算设备的处理器用于执行该至少一个计算设备的存储器中存储的指令,以使得该计算设备集群执行如前述第一方面或第一方面的任意一种可能的实现方式所提供的数据访问方法。
第四方面,本申请实施例提供了一种数据访问装置,应用于分布式数据库系统,该系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点,该装置包括至少一个功能模块,用于执行如前述第二方面或第二方面的任意一种可能的实现方式所涉及的协调节点的功能。
第五方面,本申请实施例提供了一种数据访问装置,应用于分布式数据库系统,该系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点,该装置包括至少一个功能模块,用于执行如前述第二方面或第二方面的任意一种可能的实现方式所涉及的备节点的功能。
第六方面,本申请实施例提供了一种包含指令的计算机程序产品,当该指令被计算设备集群运行时,使得该计算设备集群执行如前述第一方面或第一方面的任意一种可能的实现方式所提供的数据访问方法。该计算机程序产品可以为一个软件安装包,在需要实现前述计算设备集群的功能的情况下,可以下载该计算机程序产品并在计算设备集群上执行该计算机程序产品。
第七方面,本申请实施例提供了一种计算机可读存储介质,包括计算机程序指令,当该计算机程序指令由计算设备集群执行时,该计算设备集群执行如前述第一方面或第一方面的任意一种可能的实现方式所提供的数据访问方法。该存储介质包括但不限于易失性存储器,例如随机访问存储器,非易失性存储器,例如快闪存储器、硬盘(hard disk drive,HDD)、固态硬盘(solid state drive,SSD)。
附图说明
图1是本申请实施例提供的一种实施环境示意图;
图2是本申请实施例提供的一种分布式数据库系统的架构示意图;
图3是本申请实施例提供的一种计算设备的硬件结构示意图;
图4是本申请实施例提供的一种计算设备集群的结构示意图;
图5是本申请实施例提供的一种计算设备集群的连接方式示意图;
图6是本申请实施例提供的一种数据访问方法的流程示意图;
图7是本申请实施例提供的一种数据节点定时上报事务快照点的示意图;
图8是本申请实施例提供的一种数据访问方法的流程示意图;
图9是本申请实施例提供的一种数据清理方法的流程示意图;
图10是本申请实施例提供的一种数据访问装置的结构示意图;
图11是本申请实施例提供的一种数据访问装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
为了方便理解,下面先对本申请涉及的关键术语和关键概念进行说明。
数据库(database),一种电子化的文件柜,也即是存储电子文件的处所,用户可以对电子文件中的数据进行新增、查询、更新、删除等操作。所谓“数据库”是以一定方式储存在一起、能与多个用户共享、具有尽可能小的冗余度、与应用程序彼此独立的数据集合。
事务,是数据库系统在执行操作的过程中的一个逻辑单位,由一个有限的数据库操作序列构成,是数据库系统操作的最小执行单位。
数据分片(shard),是指在分布式数据库系统中数据管理的最小逻辑单元。一个数据分片拥有多个副本。在一些实施例中,当分布式数据库系统新建一个数据分片时,将该数据分片的分片信息进行存储。例如,该分片信息包括该数据分片的数据范围以及多个副本各自的节点信息等,对此不作限定。
虚拟机(virtual machine),是指通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的完整计算机系统。在服务器中能够完成的工作在虚拟机中都能够实现。在服务器中创建虚拟机时,需要将实体机的部分硬盘和内存容量作为虚拟机的硬盘和内存容量,每个虚拟机都有独立的硬盘和操作系统,虚拟机的用户可以像使用服务器一样对虚拟机进行操作。
下面对本申请涉及的应用场景和实施环境进行介绍。
本申请实施例提供的技术方案能够应用于数据库领域,其应用场景包括各种能够实现读写分离的分布式数据库系统中。
图1是本申请实施例提供的一种实施环境示意图。如图1所示,该实施环境包括终端101和服务器102,终端101通过无线网络或有线网络与服务器102直接或间接相连。
终端101可以是智能手机、台式计算机、增强现实终端、平板电脑、电子书阅读器和膝上型便携计算机中的至少一种。终端101能够安装和运行应用程序。该应用程序可以是客户端应用,也可以是浏览器应用等,对此不作限定。例如,应用程序为网页浏览客户端、社交客户端或者音视频客户端等等。以应用程序为网页浏览客户端为例,用户能够通过该网页浏览客户端浏览各类网页数据。
服务器102为独立的物理服务器,或者是多个物理服务器构成的服务器集群或者分布式文件系统,又或者是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(content delivery network,CDN)以及大数据和人工智能平台等基础云计算服务的云服务器。服务器102用于运行分布式数据库系统,为终端101上运行的应用程序提供后台服务。例如,以网页浏览客户端为例,终端101响应于用户针对目标网页的浏览操作,触发网页浏览客户端向服务器102上运行的分布式数据库系统发送针对该目标网页的数据访问请求,由该分布式数据库系统响应于该数据访问请求,读取目标网页的网页数据,向网页浏览客户端返回数据访问结果。在一些实施例中,服务器 102通过虚拟机来运行分布式数据库系统,或者,服务器102通过容器引擎来运行分布式数据库系统,对此不作限定。
终端101可以泛指多个终端中的一个,或者多个终端组成的集合;服务器102可以是计算设备集群、虚拟机或容器引擎等等,本申请实施例对实施环境中每种设备的数量和设备类型不做限定。
在一些实施例中,上述的无线网络或有线网络使用标准通信技术和/或协议。网络包括但不限于数据中心网络(data center network)、存储区域网(storage area network,SAN)、局域网(local area network,LAN)、城域网(metropolitan area network,MAN)、广域网(wide area network,WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合。在一些实现方式中,使用包括超级文本标记语言(hyper text markup language,HTML)、可扩展标记语言(extensible markup language,XML)等的技术和/或格式来代表通过网络交换的数据。此外还能够使用诸如安全套接字层(secure sockets layer,SSL)、传输层安全(transport layer security,TLS)、虚拟专用网络(virtual private network,VPN)、网际协议安全(internet protocol security,IPsec)等常规加密技术来加密所有或者部分链路。在另一些实施例中,还能够使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。
下面对上述实施环境涉及的分布式数据库系统的架构进行介绍。
图2是本申请实施例提供的一种分布式数据库系统的架构示意图。图2仅是示例性地展示了分布式数据库系统的一种结构化示意图,本申请并不限定对分布式数据库系统中各个部分的划分。如图2所示,该分布式数据库系统200包括协调节点(coordinator node,CN)201、多组数据节点(data node,DN)202以及管理节点203,每组数据节点202包括主节点和至少一个备节点,对于任一组数据节点202,该组数据节点202用于维护一个数据分片,该组数据节点202中的主节点用于执行针对该数据分片的写事务,该组数据节点202中的备节点用于执行针对该数据分片的读事务。
协调节点201与终端之间通信连接,终端上运行有客户端,协调节点201用于接收客户端发送的针对数据的数据处理请求(如数据访问请求、数据更新请求等),将请求下发给对应的数据节点202执行,在接收到数据节点202反馈的执行结果后,向客户端返回相应的数据处理结果(如数据访问结果、数据更新结果等)。
多组数据节点202用于存储客户端的数据,接收来自协调节点201的数据处理请求,执行相应的事务(如数据访问对应的读事务、数据更新对应的写事务等),向协调节点201返回执行结果。
管理节点203用于生成和维护事务标识和事务快照点等全局唯一的信息。在一些实施例中,该管理节点203也称为全局事务管理器(global transaction manager,GTM),对此不作限定。
下面对分布式数据库系统200处理数据更新请求和数据访问请求的过程进行简要介绍。
以数据更新请求为例,协调节点201接收客户端发送的针对数据的数据更新请求,向管理节点203申请开启一个分布式写事务,接收管理节点203下发的事务标识,确定该数据对应的数据分片,根据路由分片规则,将该分布式写事务下发给对应的主节点(DN1主和DN2主)执行,主节点执行完成后,将执行结果发送给协调节点201,以使协调节点201向客户 端返回数据更新结果。另外,协调节点201在该分布式写事务提交完成的情况下,向管理节点203申请该分布式写事务已提交的事务快照点,将该事务快照点反馈给对应的主节点。在这一过程中,备节点(DN1备和DN2备)对该分布式写事务进行日志回放,完成和主节点的数据同步。
以数据访问请求为例,协调节点201接收客户端发送的针对数据的数据访问请求,确定该数据对应的数据分片,在数据分片所在的备节点(DN1备和DN2备)满足读写分离条件的情况下,根据路由分片规则,将该数据访问请求下发给对应的备节点执行,备节点执行完成后,将执行结果发送给协调节点201,以使协调节点201向客户端返回数据访问结果。这一过程会在后续方法实施例中进行详细介绍,在此不再赘述。
需要说明的是,上述分布式数据库系统中的协调节点和数据节点均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来介绍协调节点的实现方式。类似的,数据节点的实现方式可以参考协调节点的实现方式。
节点作为软件功能单元的一种举例,协调节点可以包括运行在计算实例上的代码。其中,计算实例可以是物理主机(计算设备)、虚拟机、容器等计算设备中的至少一种。进一步地,上述计算设备可以是一台或者多台。例如,协调节点可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该应用程序的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内。同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。
节点作为硬件功能单元的一种举例,协调节点可以包括至少一个计算设备,如服务器等。或者,协调节点也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
协调节点包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。协调节点包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,协调节点包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,该多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。
下面对上述分布式数据库系统中涉及的计算设备的结构进行介绍。
本申请还提供了一种计算设备,能够配置为上述分布式数据库系统中的协调节点和数据节点。参考图3,图3是本申请实施例提供的一种计算设备的硬件结构示意图。如图3所示,计算设备300包括:总线302、处理器304、存储器306和通信接口308。处理器304、存储 器306和通信接口308之间通过总线302通信。应理解,本申请不限定计算设备300中的处理器、存储器的个数。
总线302可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线304可包括在计算设备300各个部件(例如,存储器306、处理器304、通信接口308)之间传送信息的通路。
处理器304可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器306可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器304还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器306中存储有可执行的程序代码,处理器304执行该可执行的程序代码以分别实现前述协调节点和数据节点的功能,从而实现下述数据访问方法。也即,存储器306上存有用于执行数据访问方法的指令。
通信接口303使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备300与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。
图4是本申请实施例提供的一种计算设备集群的结构示意图。如图4所示,该计算设备集群包括至少一个计算设备300。计算设备集群中的一个或多个计算设备300中的存储器306中可以存有相同的用于执行数据访问方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备300的存储器306中也可以分别存有用于执行数据访问方法的部分指令。换言之,一个或多个计算设备300的组合可以共同执行用于执行数据访问方法的指令。
需要说明的是,计算设备集群中的不同的计算设备300中的存储器306可以存储不同的指令,分别用于执行分布式数据库系统的部分功能。也即,不同的计算设备300中的存储器306存储的指令可以实现协调节点和数据节点中的一个或多个节点的功能。
在一些实施例中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,该网络可以是广域网或局域网等等。图5是本申请实施例提供的一种计算设备集群的连接方式示意图。如图5所示,两个计算设备300之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与该网络进行连接。在这一类可能的实现方式中,图5所示的计算设备集群之间的连接方式考虑到本申请提供的数据访问方法涉及不同类型的节点,因此在不同计算设备的存储器中存有执行不同节点的功能的指令。例如,一个计算设备300中的存储器306中存有执行协调节点的功能的指令。另一个计算设备300中的存储器306中存有执行数据节点的功能的指令。
应理解,图5中示出的计算设备300的功能也可以由多个计算设备300完成。
本申请实施例还提供了一种包含指令的计算机程序产品。该计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当该计算机程序产品在计算设备集群上运行时,使得计算设备集群执行数据访问方法。
本申请实施例还提供了一种计算机可读存储介质。该计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,该指令指示计算设备集群执行数据访问方法。
下面对本申请实施例提供的数据访问方法进行介绍。
图6是本申请实施例提供的一种数据访问方法的流程示意图。如图6所示,该数据访问方法应用于上述分布式数据库系统,下面以分布式数据库系统中各节点之间的交互为例,对该数据访问方法进行介绍。示意性地,该方法包括如下步骤601至步骤606。
601、多组数据节点中的多个备节点每间隔第一时长,向协调节点发送该多个备节点上已和主节点完成数据同步的最新事务的事务快照点。
在本申请实施例中,分布式数据库系统基于数据分片来存储数据,一个数据分片由一组数据节点来维护,该组数据节点包括主节点和至少一个备节点,主节点用于执行写事务,备节点用于执行读事务。事务快照点是主节点在事务提交完成的情况下,通过管理节点获取到的事务提交快照点,每个事务的事务标识全局唯一,每个事务的事务快照点全局唯一,且事务快照点能够指示事务提交的先后顺序。换言之,事务、事务标识以及事务快照点之间存在映射关系,每个事务都对应有唯一的事务标识和事务快照点。在一些实施例中,事务快照点为事务提交序列号(commit sequence number,CSN)或事务提交时间戳等,对此不作限定。以事务快照点为CSN为例,CSN是一种全局自增的整数,根据CSN的大小可以判断事务提交的先后顺序,CSN越小表示事务的提交顺序越早。例如,若事务A、事务B以及事务C的CSN分别为111、113、112,表明这三个事务的提交顺序为事务A、事务C以及事务B。
以第一主节点(多组数据节点中的任一个主节点)执行目标事务为例,第一主节点在目标事务提交完成的情况下,向管理节点发送针对该目标事务的事务提交请求;该管理节点响应于该事务提交请求,生成该目标事务的事务快照点,将该目标事务的事务快照点发送给该第一主节点;该第一主节点对应的备节点对该目标事务进行日志回放,完成和该第一主节点的数据同步。其中,第一主节点对应的备节点在对该目标事务进行日志回放的过程中,也即获取到了该目标事务的事务快照点。
在分布式数据库系统的运行过程中,多组数据节点的备节点通过日志回放的方式完成和主节点的数据同步,也即获取到已提交事务的事务快照点,在这一过程中,各个备节点每间隔第一时长,向协调节点发送备节点上已和主节点完成数据同步的最新事务的事务快照点。其中,该第一时长为预设时长,能够根据需求进行设置,例如,第一时长为5秒,对此不作限定。这一过程也即是,以事务快照点为CSN为例,分布式数据库系统中的各个备节点内部都维护一个最大的事务快照点,并定时上报给协调节点。例如,第一时长为5秒,备节点每 间隔5秒,将该备节点上存储的最大CSN发送给协调节点。需要说明的是,该多个备节点可以每间隔第一时长,同步向协调节点上报事务快照点(如备节点1和备节点2同步上报),也可以异步上报(如备节点1和备节点2依次上报),本申请实施例对此不作限定。
602、协调节点基于接收到的事务快照点,更新该多个备节点上报的事务快照点。
在本申请实施例中,协调节点存储有目标列表,该目标列表用于存储该多个备节点的节点标识和该多个备节点上报的事务快照点。需要说明的是,协调节点还能够以其他形式存储多个备节点上报的事务快照点,并不局限于列表形式,本申请实施例对此不作限定。
在一些实施例中,协调节点每接收到一个备节点上报的事务快照点,基于该备节点的节点标识,更新该目标列表中该备节点上报的事务快照点。例如,以事务快照点为CSN为例,该目标列表存储有备节点1(115)、备节点2(116)、备节点3(117),若协调节点接收到备节点1发送的事务快照点118,则更新该目标列表为备节点1(118)、备节点2(116)、备节点3(117)。通过这种方式,协调节点能够及时根据接收到的事务快照点更新目标列表,确保目标列表的实时性。
在一些实施例中,协调节点基于指定时间段内接收到的多个事务快照点,更新该目标列表。例如,该指定时间段为10秒,对此不作限定。应理解,指定时间段内某一备节点有可能上报了多个事务快照点,基于此,协调节点基于该备节点上报的最后一个事务快照点,更新该目标列表。或者,协调节点在接收到的多个事务快照点的数量达到指定数量的情况下,更新该目标列表,例如,该指定数量为10。同理,协调节点接收到的指定数量个事务快照点中也可能包括了某一个备节点上报的多个事务快照点,基于此,协调节点基于该备节点上报的最后一个事务快照点,更新该目标列表。通过上述方式,能够减少协调节点更新目标列表的频率,从而节约数据处理量。
经过上述步骤601和步骤602,分布式数据库系统中每个备节点内部都维护一个最大的事务快照点,也称为事务强一致性快照点,能够确保事务的执行不影响系统的数据一致性,通过将事务快照点定时上报给协调节点,便于协调节点获知系统中每个备节点上已和主节点完成数据同步的最新事务,为后续基于数据访问请求读取数据时,确保各个数据分片的数据一致性提供了基础。
603、协调节点响应于针对数据的数据访问请求,从多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,该第一快照点指示多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务。
在本申请实施例中,数据访问请求由客户端发送,协调节点响应于该数据访问请求,基于该数据对应的多个数据分片,确定该多个数据分片所在的多组数据节点,在该多组数据节点中的多个备节点满足读写分离条件的情况下,从该多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点。其中,读写分离条件指示任一组数据节点中的备节点与主节点之间数据更新的时间差小于或等于目标时长,即,主节点上数据的最新版本和备节点上数据的最新版本之间不能相差太远,否则不满足业务需求。例如,该目标时长为5秒,对此不作限定。这一过程也即是,在协调节点接收到一个数据访问请求的情况下,会判断该数据访问请求是否可以使用读写分离,在可以使用读写分离的情况下,再将该数据访问请求下发至备节点执行。
基于上述步骤602可知,协调节点上存储有目标列表,相应地,协调节点响应于该数据 访问请求,从目标列表中确定该第一快照点。以事务快照点为CSN为例,协调节点响应于该数据访问请求,基于该数据对应的多个数据分片,确定该多个数据分片所在的多组数据节点,基于该多组数据节点中的多个备节点的节点标识,从目标列表中该多个备节点上报的CSN中,确定最小CSN,该最小CSN也即是提交顺序最早的事务,将该最小CSN作为第一快照点。
604、协调节点将该数据访问请求和该第一快照点发送给多个备节点。
在本申请实施例中,协调节点基于路由分片规则,将该数据访问请求和该第一快照点发送给多个备节点。
605、多个备节点基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给协调节点。
在本申请实施例中,目标版本是指该第一快照点对应的版本。例如,以事务快照点为CSN为例,若第一快照点为119,则该数据的目标版本是指CSN为119的事务提交完成后该数据的版本。对于多个备节点中的任一个备节点(称为第一备节点),该第一备节点基于该数据访问请求和该第一快照点,在该第一备节点上已和主节点完成数据同步的最新事务的事务快照点大于或等于该第一快照点的情况下,将该第一备节点上该数据的目标版本的数据分片发送给该协调节点。这一过程也即是基于第一快照点进行数据可见性判定的过程。
基于此,每个备节点都将各自节点上同一版本的数据分片发送给协调节点,确保了各个数据分片的数据一致性。例如,以事务快照点为CSN为例,第一快照点为119,第一备节点上已和主节点完成数据同步的最新事务的事务快照点为120,该120大于119,则将该第一备节点上与CSN为119的事务对应的版本的数据分片发送给协调节点。
在一些实施例中,该第一备节点基于该数据访问请求和该第一快照点,在该第一备节点上已和主节点完成数据同步的最新事务的事务快照点小于该第一快照点的情况下,将该数据访问请求和该第一快照点发送给该第一备节点对应的主节点;该第一备节点对应的主节点基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给该协调节点。即,当某一备节点上不存在目标版本的数据分片时,由该备节点对应的主节点向协调节点返回执行结果,从而确保了客户端能够接收到完整的数据访问结果。
606、协调节点基于接收到的多个数据分片,返回针对该数据的数据访问结果。
在本申请实施例中,协调节点基于该多个数据分片的分片信息,对该多个数据分片进行汇总,得到该数据的数据访问结果,向客户端返回该数据访问结果。
在上述步骤601至步骤606中,是以分布式数据库系统中备节点向协调节点上报事务快照点为例进行说明的,在另一些实施例中,备节点还能够向管理节点上报事务快照点,当协调节点基于数据访问请求读取数据时,能够通过向管理节点发送快照点获取请求的方式获取相应的快照点。
示意性地,多个备节点每间隔第二时长,向该管理节点发送该多个备节点上已和主节点完成数据同步的最新事务的事务快照点。其中,该第二时长为预设时长,能够根据需求进行设置,例如,第二时长为5秒,对此不作限定。基于与上述步骤601和步骤602同理的过程,管理节点基于接收到的事务快照点,更新该多个备节点上报的事务快照点,即,管理节点也能够以列表的形式存储该多个备节点上报的事务快照点。在协调节点接收到客户端发送的数据访问请求的情况下,该协调节点响应于该数据访问请求,向该管理节点发送快照点获取请 求,以获取该第一快照点。其中,管理节点响应于该快照点获取请求,从多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,将该第一快照点发送给协调节点。这一过程与上述步骤603同理,在此不再赘述。通过这种方式,由管理节点维护多个备节点上报的事务快照点,能够节约协调节点的计算资源,释放协调节点的存储空间。
综上,介绍了本申请实施例提供的一种数据访问方法,应用于分布式数据库系统,其中,协调节点响应于针对数据的数据访问请求,从多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,将该第一快照点和该数据访问请求发送给该多个备节点,以使该多个备节点基于该第一快照点返回该数据的目标版本的数据分片。在这一过程中,由于第一快照点能够指示多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,因此该第一快照点能够作为一种全局一致性读的快照点,确保多个备节点返回同一版本的数据分片,从而满足了数据一致性原则。
另外,在一些实施例中,主节点在进行数据清理时,能够根据多个备节点上报的事务快照点来确保各个数据分片的数据一致性。下面对这一过程进行介绍,包括下述几个步骤:
步骤A、协调节点响应于第一主节点发送的数据清理请求,从该多个备节点上报的事务快照点中,确定第二快照点,该第二快照点指示该多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务。
其中,该第一主节点为该多个主节点中的任一个主节点。基于与上述步骤603同理的过程,协调节点上存储有目标列表,该协调节点响应于该数据清理请求,从目标列表中确定该第二快照点。以事务快照点为CSN为例,协调节点响应于该数据清理请求,基于该数据对应的多个数据分片,确定该多个数据分片所在的多组数据节点,基于该多组数据节点中的多个备节点的节点标识,从目标列表中该多个备节点上报的CSN中,确定最小CSN,该最小CSN也即是提交顺序最早的事务,将该最小CSN作为第二快照点。
步骤B、协调节点将该第二快照点发送给该第一主节点。
步骤C、第一主节点基于该第二快照点,清理该第一主节点上该数据的历史版本的数据分片。
其中,该历史版本是指该第二快照点之前的版本。例如,以事务快照点为CSN为例,若第二快照点为118,则该数据的历史版本是指CSN为118的事务执行之前该数据的版本。应理解,第一主节点会将这一数据清理过程保存在日志中,该第一主节点对应的备节点也能够通过日志回放的方式完成与主节点的数据同步,在此不再赘述。
经过上述步骤A至步骤C,协调节点响应于第一主节点的数据清理请求,从多个备节点上报的事务快照点中,确定第二快照点,将该第二快照点发送给该第一主节点,以使该第一主节点基于该第二快照点清理该数据的历史版本。在这一过程中,由于第二快照点能够指示多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,因此第二快照点能够作为一种全局一致性清理的快照点,确保了主节点在进行数据清理时各个数据分片的数据一致性。例如,以一组数据节点为例,该组数据节点中主节点上已提交的最新事务的事务快照点,大于该组数据节点中备节点上已和主节点完成数据同步的最新事务的事务快照点,由于备节点的数据正在读取时是不能清理的,因此以备节点上报的事务快照点中的最小值为清理点,能够确保数据清理时各个数据分片的数据一致性。
下面参考图7至图9,以事务快照点为CSN为例,对上述实施例中介绍的方法流程进行举例说明。
图7是本申请实施例提供的一种数据节点定时上报事务快照点的示意图。如图7所示,在本申请实施例提供的分布式数据库系统中,多组数据节点中每个备节点内部都维护一个最大的事务快照点(即CSN),并定时上报给协调节点。协调节点在收到各个备节点上报的CSN后,在内部维护目标列表,其中,目标列表用于存储所有备节点的节点标识以及所有备节点上报的CSN,备节点上报的CSN为备节点上已和主节点完成数据同步的最新事务的CSN。
图8是本申请实施例提供的一种数据访问方法的流程示意图。如图8所示,该数据访问方法包括下述几个步骤:
步骤1、客户端发起一个针对数据的数据访问请求,即向协调节点发送数据访问请求。
步骤2、协调节点响应于该数据访问请求,判断是否可以使用读写分离,若可以,从内部维护的“目标列表”中确定第一快照点,该第一快照点即为“目标列表”中的最小CSN。
步骤3、协调节点将数据访问请求和该第一快照点发送给多个备节点。
步骤4、备节点接收该数据访问请求和该第一快照点,基于该第一快照点进行数据可见性判定,读取该数据的目标版本的数据分片。
步骤5、备节点把数据分片发送给协调节点。
步骤6、协调节点汇总各个数据分片并返回给客户端。
在上述过程中,由于第一快照点能够指示多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,因此该第一快照点能够作为一种全局一致性读的快照点,确保多个备节点返回同一版本的数据分片,从而满足了数据一致性原则。
图9是本申请实施例提供的一种数据清理方法的流程示意图。如图9所示,该数据清理方法包括下述几个步骤:
步骤1、主节点向协调节点发送数据清理请求。
步骤2、协调节点响应于该数据清理请求,从内部维护的“目标列表”中确定第二快照点,该第二快照点即为“目标列表”中的最小CSN。
步骤3、协调节点将该第二快照点发送给主节点。
步骤4、主节点基于该第二快照点,清理该主节点上该数据的历史版本的数据分片。
在上述过程中,由于第二快照点能够指示多个主节点上已提交的最新事务中提交顺序最早的事务,因此第二快照点能够作为一种全局一致性清理的快照点,确保了主节点在进行数据清理时各个数据分片的数据一致性。
图10是本申请实施例提供的一种数据访问装置的结构示意图。该数据访问装置可以通过软件、硬件或者两者的结合实现前述分布式数据库系统中协调节点的部分或者全部功能。本申请实施例提供的数据访问装置应用于分布式数据库系统,该系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点,能够实现上述方法实施例中协调节点所执行的步骤。如图10所示,该数据访问装置包括确定模块1001和发送模块1002。
该确定模块1001,用于响应于针对数据的数据访问请求,从该多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,该第一快照点指示该多个备节点上已和主节点 完成数据同步的最新事务中提交顺序最早的事务;
该发送模块1002,用于将该数据访问请求和该第一快照点发送给该多个备节点,以使该多个备节点基于该数据访问请求和该第一快照点发送该数据的目标版本的数据分片,该目标版本为该第一快照点对应的版本。
在一些实施例中,该装置还包括更新模块,用于:接收该多个备节点每间隔第一时长发送的已和主节点完成数据同步的最新事务的事务快照点,基于接收到的事务快照点,更新该多个备节点上报的事务快照点。
在一些实施例中,该事务快照点为下述任一项:
事务提交序列号;
事务提交时间戳。
在一些实施例中,该确定模块1001,用于响应于该数据访问请求,从目标列表中确定该第一快照点,该目标列表用于存储该多个备节点的节点标识和该多个备节点上报的事务快照点,该多个备节点上报的事务快照点指示该多个备节点上已和主节点完成数据同步的最新事务。
在一些实施例中,该确定模块1001,还用于响应于第一主节点发送的数据清理请求,从该多个备节点上报的事务快照点中,确定第二快照点,该第二快照点指示该多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,该第一主节点为该多个主节点中的任一个主节点;
该发送模块1002,还用于将该第二快照点发送给该第一主节点,以使该第一主节点基于该第二快照点,清理该第一主节点上该数据的历史版本的数据分片,该历史版本为该第二快照点之前的版本。
需要说明的是:上述实施例提供的数据访问装置在进行数据处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据访问装置与数据访问装置实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图11是本申请实施例提供的一种数据访问装置的结构示意图。该数据访问装置可以通过软件、硬件或者两者的结合实现前述分布式数据库系统中备节点的部分或者全部功能。本申请实施例提供的数据访问装置应用于分布式数据库系统,该系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点,能够实现上述方法实施例中任一个备节点所执行的步骤。如图11所示,该数据访问装置包括接收模块1101和发送模块1102。
接收模块1101,用于接收协调节点发送的针对数据的数据访问请求和第一快照点,该第一快照点由该协调节点响应于该数据访问请求,从该多组数据节点中的多个备节点上报的事务快照点中确定,该第一快照点指示该多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务;
发送模块1102,用于基于该数据访问请求和该第一快照点,将该数据的目标版本的数据分片发送给该协调节点,该目标版本为该第一快照点对应的版本。
在一些实施例中,该发送模块1102,用于基于该数据访问请求和该第一快照点,在已和 主节点完成数据同步的最新事务的事务快照点大于或等于该第一快照点的情况下,将该数据的目标版本的数据分片发送给该协调节点。
在一些实施例中,该装置还包括上报模块,该上报模块用于每间隔第一时长,向该协调节点发送备节点上已和主节点完成数据同步的最新事务的事务快照点。
在一些实施例中,该事务快照点为下述任一项:
事务提交序列号;
事务提交时间戳。
在一些实施例中,该系统还包括管理节点,该发送模块1102,还用于每间隔第二时长,向该管理节点发送备节点上已和主节点完成数据同步的最新事务的事务快照点。
需要说明的是:上述实施例提供的数据访问装置在进行数据处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据访问装置与数据访问装置实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种所述示例的范围的情况下,第一备节点可以被称为第二备节点,并且类似地,第二备节点可以被称为第一词组。第一词组和第二备节点都可以是备节点,并且在某些情况下,可以是单独且不同的备节点。
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个备节点是指两个或两个以上的备节点。
以上描述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以程序结构信息的形式实现。该程序结构信息包括一个或多个程序指令。在计算设备上加载和执行该程序指令时,全部或部分地产生按照本申请实施例中的流程或功能。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的保护范围。

Claims (21)

  1. 一种数据访问方法,其特征在于,应用于分布式数据库系统,所述系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点,所述方法包括:
    所述协调节点响应于针对数据的数据访问请求,从所述多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,所述第一快照点指示所述多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务;
    所述协调节点将所述数据访问请求和所述第一快照点发送给所述多个备节点;
    所述多个备节点基于所述数据访问请求和所述第一快照点,将所述数据的目标版本的数据分片发送给所述协调节点,所述目标版本为所述第一快照点对应的版本。
  2. 根据权利要求1所述的方法,其特征在于,所述多个备节点基于所述数据访问请求和所述第一快照点,将所述数据的目标版本的数据分片发送给所述协调节点,包括:
    第一备节点基于所述数据访问请求和所述第一快照点,在所述第一备节点上已和主节点完成数据同步的最新事务的事务快照点大于或等于所述第一快照点的情况下,将所述第一备节点上所述数据的目标版本的数据分片发送给所述协调节点,所述第一备节点为所述多个备节点中的任一个备节点。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    所述第一备节点基于所述数据访问请求和所述第一快照点,在所述第一备节点上已和主节点完成数据同步的最新事务的事务快照点小于所述第一快照点的情况下,将所述数据访问请求和所述第一快照点发送给所述第一备节点对应的主节点;
    所述第一备节点对应的主节点基于所述数据访问请求和所述第一快照点,将所述数据的目标版本的数据分片发送给所述协调节点。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    所述多个备节点每间隔第一时长,向所述协调节点发送所述多个备节点上已和主节点完成数据同步的最新事务的事务快照点;
    所述协调节点基于接收到的事务快照点,更新所述多个备节点上报的事务快照点。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述事务快照点为下述任一项:
    事务提交序列号;
    事务提交时间戳。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述协调节点响应于针对数据的数据访问请求,从所述多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,包括:
    所述协调节点响应于所述数据访问请求,从目标列表中确定所述第一快照点,所述目标列表用于存储所述多个备节点的节点标识和所述多个备节点上报的事务快照点,所述多个备节点上报的事务快照点指示所述多个备节点上已和主节点完成数据同步的最新事务。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:
    所述协调节点响应于第一主节点发送的数据清理请求,从所述多个备节点上报的事务快照点中,确定第二快照点,所述第二快照点指示所述多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,所述第一主节点为所述多个主节点中的任一个主节点;
    所述协调节点将所述第二快照点发送给所述第一主节点;
    所述第一主节点基于所述第二快照点,清理所述第一主节点上所述数据的历史版本的数据分片,所述历史版本为所述第二快照点之前的版本。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述系统还包括管理节点,所述方法还包括:
    所述多个备节点每间隔第二时长,向所述管理节点发送所述多个备节点上已和主节点完成数据同步的最新事务的事务快照点;
    所述协调节点响应于针对数据的数据访问请求,从所述多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,包括:
    所述协调节点响应于所述数据访问请求,向所述管理节点发送快照点获取请求,以获取所述第一快照点。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述系统还包括管理节点,所述方法还包括:
    第一主节点在目标事务提交完成的情况下,向所述管理节点发送针对所述目标事务的事务提交请求;
    所述管理节点响应于所述事务提交请求,生成所述目标事务的事务快照点,将所述目标事务的事务快照点发送给所述第一主节点;
    所述第一主节点对应的备节点对所述目标事务进行日志回放,完成和所述第一主节点的数据同步。
  10. 一种分布式数据库系统,其特征在于,所述系统包括协调节点和多组数据节点,每组数据节点包括主节点和备节点;
    所述协调节点,用于响应于针对数据的数据访问请求,从所述多组数据节点中的多个备节点上报的事务快照点中,确定第一快照点,所述第一快照点指示所述多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务;
    所述协调节点,还用于将所述数据访问请求和所述第一快照点发送给所述多个备节点;
    所述多个备节点,用于基于所述数据访问请求和所述第一快照点,将所述数据的目标版本的数据分片发送给所述协调节点,所述目标版本为所述第一快照点对应的版本。
  11. 根据权利要求10所述的系统,其特征在于,
    第一备节点,用于基于所述数据访问请求和所述第一快照点,在所述第一备节点上已和主节点完成数据同步的最新事务的快照点大于或等于所述第一快照点的情况下,将所述第一备节点上所述数据的目标版本的数据分片发送给所述协调节点,所述第一备节点为所述多个备节点中的任一个备节点。
  12. 根据权利要求11所述的系统,其特征在于,
    所述第一备节点,还用于基于所述数据访问请求和所述第一快照点,在所述第一备节点上已和主节点完成数据同步的最新事务的事务快照点小于所述第一快照点的情况下,将所述数据访问请求和所述第一快照点发送给所述第一备节点对应的主节点;
    所述第一备节点对应的主节点,用于基于所述数据访问请求和所述第一快照点,将所述数据的目标版本的数据分片发送给所述协调节点。
  13. 根据权利于要求10至12中任一项所述的系统,其特征在于,
    所述多个备节点,用于每间隔第一时长,向所述协调节点发送所述多个备节点上已和主节点完成数据同步的最新事务的事务快照点;
    所述协调节点,用于基于接收到的事务快照点,更新所述多个备节点上报的事务快照点。
  14. 根据权利要求10至13中任一项所述的系统,其特征在于,所述事务快照点为下述任一项:
    事务提交序列号;
    事务提交时间戳。
  15. 根据权利要求10至14中任一项所述的系统,其特征在于,
    所述协调节点,用于响应于所述数据访问请求,从目标列表中确定所述第一快照点,所述目标列表用于存储所述多个备节点的节点标识和所述多个备节点上报的事务快照点,所述多个备节点上报的事务快照点指示所述多个备节点上已和主节点完成数据同步的最新事务。
  16. 根据权利要求10至15中任一项所述的系统,其特征在于,
    所述协调节点,还用于响应于第一主节点发送的数据清理请求,从所述多个备节点上报的事务快照点中,确定第二快照点,所述第二快照点指示所述多个备节点上已和主节点完成数据同步的最新事务中提交顺序最早的事务,所述第一主节点为所述多个主节点中的任一个主节点;
    所述协调节点,还用于将所述第二快照点发送给所述第一主节点;
    所述第一主节点,用于基于所述第二快照点,清理所述第一主节点上所述数据的历史版本的数据分片,所述历史版本为所述第二快照点之前的版本。
  17. 根据权利要求10至16中任一项所述的系统,其特征在于,所述系统还包括管理节点,
    所述多个备节点,还用于每间隔第二时长,向所述管理节点发送所述多个备节点上已和 主节点完成数据同步的最新事务的事务快照点;
    所述协调节点,用于响应于所述数据访问请求,向所述管理节点发送快照点获取请求,以获取所述第一快照点。
  18. 根据权利要求10至17中任一项所述的系统,其特征在于,所述系统还包括管理节点,
    第一主节点,用于在目标事务提交完成的情况下,向所述管理节点发送针对所述目标事务的事务提交请求;
    所述管理节点,用于响应于所述事务提交请求,生成所述目标事务的事务快照点,将所述目标事务的事务快照点发送给所述第一主节点;
    所述第一主节点对应的备节点,用于对所述目标事务进行日志回放,完成和所述第一主节点的数据同步。
  19. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至9中任一项所述的数据访问方法。
  20. 一种包含指令的计算机程序产品,其特征在于,当所述指令被计算设备集群运行时,使得所述计算设备集群执行如权利要求的1至9中任一项所述的数据访问方法。
  21. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如权利要求1至9中任一项所述的数据访问方法。
PCT/CN2023/079068 2022-08-22 2023-03-01 数据访问方法、分布式数据库系统及计算设备集群 WO2024040902A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211009112.4 2022-08-22
CN202211009112.4A CN117668097A (zh) 2022-08-22 2022-08-22 数据访问方法、分布式数据库系统及计算设备集群

Publications (1)

Publication Number Publication Date
WO2024040902A1 true WO2024040902A1 (zh) 2024-02-29

Family

ID=90012263

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079068 WO2024040902A1 (zh) 2022-08-22 2023-03-01 数据访问方法、分布式数据库系统及计算设备集群

Country Status (2)

Country Link
CN (1) CN117668097A (zh)
WO (1) WO2024040902A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180336258A1 (en) * 2017-05-22 2018-11-22 Sap Se Validating Query Results During Asynchronous Database Replication
CN110737719A (zh) * 2019-09-06 2020-01-31 深圳平安通信科技有限公司 数据同步方法、装置、设备及计算机可读存储介质
CN111338766A (zh) * 2020-03-12 2020-06-26 腾讯科技(深圳)有限公司 事务处理方法、装置、计算机设备及存储介质
CN113535656A (zh) * 2021-06-25 2021-10-22 中国人民大学 数据访问方法、装置、设备及存储介质
CN113987064A (zh) * 2021-09-23 2022-01-28 阿里云计算有限公司 数据处理方法、系统及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180336258A1 (en) * 2017-05-22 2018-11-22 Sap Se Validating Query Results During Asynchronous Database Replication
CN110737719A (zh) * 2019-09-06 2020-01-31 深圳平安通信科技有限公司 数据同步方法、装置、设备及计算机可读存储介质
CN111338766A (zh) * 2020-03-12 2020-06-26 腾讯科技(深圳)有限公司 事务处理方法、装置、计算机设备及存储介质
CN113535656A (zh) * 2021-06-25 2021-10-22 中国人民大学 数据访问方法、装置、设备及存储介质
CN113987064A (zh) * 2021-09-23 2022-01-28 阿里云计算有限公司 数据处理方法、系统及设备

Also Published As

Publication number Publication date
CN117668097A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
US10929428B1 (en) Adaptive database replication for database copies
US20180018241A1 (en) Visualizing restoration operation granularity for a database
US9304815B1 (en) Dynamic replica failure detection and healing
US10324799B2 (en) Enhanced application write performance
US11199985B2 (en) Tracking storage capacity usage by snapshot lineages using metadata in a multi-level tree structure
US10911540B1 (en) Recovering snapshots from a cloud snapshot lineage on cloud storage to a storage system
US9275060B1 (en) Method and system for using high availability attributes to define data protection plans
WO2019001017A1 (zh) 集群间数据迁移方法、系统、服务器及计算机存储介质
WO2018014650A1 (zh) 分布式数据库数据同步方法、相关装置及系统
US11991094B2 (en) Metadata driven static determination of controller availability
US20230046983A1 (en) Snapshot shipping to multiple cloud destinations
US9413826B1 (en) Concurrent file and object protocol access using space-efficient cloning
US20200364241A1 (en) Method for data synchronization between a source database system and target database system
US11144407B1 (en) Synchronous database geo-mirroring using delayed visibility write operations
US11537553B2 (en) Managing snapshots stored locally in a storage system and in cloud storage utilizing policy-based snapshot lineages
US11288134B2 (en) Pausing and resuming copying of snapshots from a local snapshot lineage to at least one cloud snapshot lineage
US11288112B1 (en) Enforcing data loss thresholds for performing updates to mirrored data sets
WO2024040902A1 (zh) 数据访问方法、分布式数据库系统及计算设备集群
CN115510016A (zh) 一种基于目录分片的客户端应答方法、装置及介质
US11573923B2 (en) Generating configuration data enabling remote access to portions of a snapshot lineage copied to cloud storage
US11630736B2 (en) Recovering a storage volume associated with a snapshot lineage from cloud storage
US11886439B1 (en) Asynchronous change data capture for direct external transmission
US12007954B1 (en) Selective forwarding for multi-statement database transactions
US11366600B2 (en) Moving snapshots from a local snapshot lineage on a storage system to a cloud snapshot lineage on cloud storage
US8738581B1 (en) Using multiple clients for data backup

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23855999

Country of ref document: EP

Kind code of ref document: A1