WO2022111188A1 - 事务处理方法、系统、装置、设备、存储介质及程序产品 - Google Patents

事务处理方法、系统、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2022111188A1
WO2022111188A1 PCT/CN2021/126408 CN2021126408W WO2022111188A1 WO 2022111188 A1 WO2022111188 A1 WO 2022111188A1 CN 2021126408 W CN2021126408 W CN 2021126408W WO 2022111188 A1 WO2022111188 A1 WO 2022111188A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
node device
data
life cycle
target
Prior art date
Application number
PCT/CN2021/126408
Other languages
English (en)
French (fr)
Inventor
李海翔
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2023517375A priority Critical patent/JP2023541298A/ja
Priority to EP21896690.1A priority patent/EP4216061A4/en
Publication of WO2022111188A1 publication Critical patent/WO2022111188A1/zh
Priority to US18/070,141 priority patent/US20230099664A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • G06F16/2315Optimistic concurrency control
    • G06F16/2322Optimistic concurrency control using timestamps
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Definitions

  • the embodiments of the present application relate to the technical field of databases, and in particular, to a transaction processing method, system, apparatus, device, storage medium, and program product.
  • Embodiments of the present application provide a transaction processing method, system, apparatus, device, storage medium, and program product, which can be used to improve transaction processing efficiency.
  • an embodiment of the present application provides a transaction processing method, the method is applied to a transaction distribution device, and the transaction distribution device is in a distributed database system, and the distributed database system also includes a shared storage system. At least two node devices of the system, the method includes:
  • a coordinating node device of the target transaction is determined in the at least two node devices, and the coordinating node device performs coordination processing on the target transaction.
  • a transaction processing method is also provided.
  • the method is applied to a coordinating node device, and the coordinating node device is a node device used for coordinating processing of a target transaction in at least two node devices sharing the same storage system.
  • the coordinating node device is determined according to the transaction allocation indicators corresponding to the at least two node devices respectively, and the method includes:
  • a data node device Based on the transaction information of the target transaction, send a data read request to a data node device, where the data node device is a node device used to process the target transaction among the at least two node devices;
  • a processing instruction of the target transaction is determined, and the processing instruction is sent to the data node device, where the processing instruction is a commit instruction or a rollback instruction, so The data node device is used to execute the processing instructions.
  • a transaction processing method is also provided, the method is applied to a data node device, and the data node device is a node device used to process a target transaction among at least two node devices sharing the same storage system, and the method include:
  • a data reading result is obtained, and the data reading result is returned to the coordinating node device, and the coordinating node device allocates indicators according to the transactions corresponding to the at least two node devices respectively Sure;
  • the processing instruction In response to receiving the processing instruction of the target transaction sent by the coordinating node device, the processing instruction is executed, and the processing instruction is a commit instruction or a rollback instruction.
  • a transaction processing system in another aspect, includes a coordinator node device and a data node device, and the coordinator node device is used for processing a target transaction among at least two node devices sharing the same storage system.
  • a node device for coordinating processing, the coordinating node device is determined according to the transaction allocation indicators corresponding to the at least two node devices respectively, and the data node device is one of the at least two node devices for participating in processing the target transaction. node device;
  • the coordinating node device is used to obtain transaction information of the target transaction; based on the transaction information of the target transaction, send a data read request to the data node device;
  • the data node device configured to obtain a data read result based on the data read request sent by the coordination node device, and return the data read result to the coordination node device;
  • the coordination node device is further configured to send a transaction verification request and a local write set to the data node device in response to the data read result returned by the data node device meeting a transaction verification condition;
  • the data node device is further configured to obtain the verification result of the target transaction based on the transaction verification request and the local write set sent by the coordination node device, and return the verification result of the target transaction to the coordination node device;
  • the coordination node device is further configured to determine a processing instruction of the target transaction based on the verification result of the target transaction returned by the data node device, and send the processing instruction to the data node device, the processing instruction For committing an instruction or rolling back an instruction;
  • the data node device is further configured to execute the processing instruction in response to receiving the processing instruction of the target transaction sent by the coordinating node device.
  • a transaction processing apparatus comprising:
  • the first determining unit is configured to, in response to the allocation request of the target transaction, determine transaction allocation indicators corresponding to at least two node devices sharing the same storage system, respectively, and the transaction allocation indicator corresponding to one node device is used to indicate that the one node The matching degree of the device to allocate new transactions;
  • a second determining unit configured to determine, in the at least two node devices, the coordinating node device of the target transaction based on the transaction allocation indicators corresponding to the at least two node devices respectively, and the coordinating node device determines the coordinating node device for the target transaction.
  • the target transaction is coordinated.
  • a transaction processing device comprising:
  • an acquisition unit used to acquire transaction information of the target transaction
  • the first sending unit is configured to send a data read request to a data node device based on the transaction information of the target transaction, where the data node device is at least two node devices that share the same storage system and is used to participate in processing the The node device of the target transaction;
  • a second sending unit configured to send a transaction verification request and a local write set to the data node device in response to the data read result returned by the data node device meeting the transaction verification condition
  • a determining unit configured to determine a processing instruction of the target transaction based on the verification result of the target transaction returned by the data node device;
  • a third sending unit configured to send the processing instruction to the data node device, where the processing instruction is a commit instruction or a rollback instruction, and the data node device is configured to execute the processing instruction.
  • a transaction processing device comprising:
  • a first obtaining unit configured to obtain a data read result based on a data read request sent by a coordinating node device, where the coordinating node device is at least two node devices that share the same storage system and is used for coordinating processing of the target transaction
  • the node device, the coordinating node device is determined according to the transaction allocation indicators corresponding to the at least two node devices respectively;
  • a returning unit configured to return the data reading result to the coordinating node device
  • a second obtaining unit configured to obtain the verification result of the target transaction based on the transaction verification request and the local write set sent by the coordinating node device;
  • the returning unit is further configured to return the verification result of the target transaction to the coordinating node device;
  • An execution unit configured to execute the processing instruction in response to receiving the processing instruction of the target transaction sent by the coordinating node device, where the processing instruction is a commit instruction or a rollback instruction.
  • a computer device in another aspect, includes a processor and a memory, the memory stores at least one computer program, the at least one computer program is loaded and executed by the processor to cause all The computer device implements any of the transaction processing methods described above.
  • a non-transitory computer-readable storage medium is also provided, and at least one computer program is stored in the non-transitory computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to A computer is made to implement any one of the transaction processing methods described above.
  • a computer program product or computer program comprising computer instructions, the computer instructions being stored in a computer-readable storage medium, the processor of the computer device from the computer A readable storage medium reads the computer instructions, and the processor executes the computer instructions, so that the computer device executes any of the transaction processing methods described above.
  • the coordinating node device for coordinating and processing the target transaction is determined according to the transaction allocation index corresponding to each node device, and the transaction allocation process does not need to consider the data items involved in the transaction, nor the distribution of the data items.
  • each node device can coordinate and process transactions as a decentralized device, so that transactions can be processed across nodes, which is conducive to improving transaction processing efficiency, and the reliability of transaction processing is high, which is conducive to improving the database system. system performance.
  • FIG. 1 is a schematic diagram of an implementation environment of a transaction processing method provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a transaction processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a format of a transaction log provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a format of a transaction log provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a transaction processing apparatus provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a transaction processing apparatus provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a transaction processing apparatus provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the distributed database system involved in the embodiments of the present application is a distributed database system based on a shared storage (share-disk) architecture
  • the distributed database system based on a shared storage architecture includes at least two A node device, the at least two node devices have their own local memory areas, and directly access the same storage system through a network communication mechanism, that is, at least two node devices share the same storage system.
  • a network communication mechanism that is, at least two node devices share the same storage system.
  • at least two node devices share the same storage system.
  • share the same HDFS Hadadoop Distributed File System
  • Multiple data tables may be stored in the storage system shared by at least two node devices, and each data table may be used to store one or more data items.
  • node devices in a distributed database system can be divided into two roles: coordinating node devices and data node devices.
  • coordinating node devices are mainly responsible for producing, distributing processing plans, and coordinating distributed transactions
  • the data node device is mainly responsible for receiving the processing plan sent by the coordinating node device, executing the corresponding transaction and returning the relevant data involved in the transaction to the coordinating node device.
  • a transaction In a distributed database system, the smallest operation execution unit is a transaction. According to whether a transaction needs to operate on data items on multiple data node devices, transactions can be divided into two types: distributed transactions and local transactions. Different transactions can adopt different execution processes to minimize network communication overhead and improve transaction processing efficiency.
  • distributed transaction means that the transaction needs to perform read and write operations across multiple data node devices, that is, the transaction needs to operate on data items on multiple data node devices.
  • transaction T needs to operate data node devices RM1, RM2, RM3 On the data item, then the transaction T is a distributed transaction.
  • a local transaction means that the transaction only needs to operate on data items on a single data node device. For example, if transaction T only needs to operate data items on RM1, the transaction T is a local transaction.
  • FIG. 1 is a schematic diagram of an implementation environment of a transaction processing method provided by an embodiment of the present application.
  • the embodiments of the present application can be applied to a distributed database system based on a share-disk framework, and the distributed database system can include a gateway server 101, a transaction distribution device 102, a distributed storage cluster 103, and a global timestamp generator Cluster 104 .
  • the distributed storage cluster 103 includes m (m is an integer not less than 2) node devices, and the m node devices share the same storage system.
  • the gateway server 101 is configured to receive external read and write requests, and distribute the read and write transactions corresponding to the read and write requests to the transaction distribution device 102 or the distributed storage cluster 103 .
  • the user triggers the application client to generate read and write requests, and calls the API (Application Programming Interface) provided by the distributed database system to read and write transactions corresponding to the read and write requests. Sent to the gateway server 101 .
  • API Application Programming Interface
  • the gateway server 101 may be combined with any node device in the distributed storage cluster 103 on the same physical machine, that is, a certain node device may act as the gateway server 101 .
  • the terminal where the application client is located can directly establish a communication connection with the transaction distribution device 102 and the distributed storage cluster 103 in the distributed database system. In this case, there may be no gateway server in the distributed database system. 101.
  • the transaction allocation device 102 is used to allocate the appropriate node device as the coordinating node device for the new transaction.
  • the transaction distribution device is in a distributed coordination system such as ZooKeeper.
  • the distributed coordination system may be used to manage at least one of the gateway server 101 , the distributed storage cluster 103 and the global timestamp generation cluster 104 .
  • the technician can access the distributed coordination system through a scheduler on the terminal, so as to control the distributed coordination system at the back-end based on the front-end scheduler, so as to realize the management of each cluster or server.
  • the technician can control ZooKeeper to delete a certain node device from the distributed storage cluster 103 through the scheduler, that is, make a certain node device fail.
  • the distributed storage cluster 103 may include a data node device and a coordination node device, each coordination node device may correspond to at least one data node device, and the division of the data node device and the coordination node device is for different transactions.
  • the initiating node device of the distributed transaction may be referred to as a coordinating node device, and other node devices involved in the distributed transaction are referred to as data node devices.
  • the number of data node devices or coordination node devices may be one or more, and the embodiment of the present application does not specifically limit the number of data node devices or coordination node devices in the distributed storage cluster 103 .
  • XA extended Architecture, X/Open organization distributed transaction specification
  • 2PC Tro-Phase Commit, two-phase
  • the coordinating node device is used to act as the coordinator in the 2PC algorithm
  • the coordinating node device The corresponding data node devices are used to act as participants in the 2PC algorithm.
  • Each data node device or coordinating node device can be a stand-alone device, or a master-standby structure (that is, a cluster with one master and multiple backups), as shown in Figure 1, as a node device (data node device or coordination node device)
  • a cluster of one master and two backups is used as an example for illustration.
  • Each node device includes one host and two standby machines.
  • each host or standby machine is configured with an agent device, and the agent device can be connected with the host.
  • the standby machine is physically independent, of course, the proxy device can also be used as a proxy module on the main machine or the standby machine.
  • node device 1 includes a main database and an agent device (main database+agent, referred to as main DB+agent), and also includes two standby databases and agent devices (standby database+agent, referred to as standby DB+ agent).
  • the primary database is the above-mentioned host, and the standby database is the above-mentioned standby machine.
  • the global timestamp generation cluster 104 is used to generate a global commit timestamp (Global Timestamp, Gts) of a distributed transaction.
  • the distributed transaction may refer to a transaction involving multiple data node devices.
  • a distributed read transaction may involve a pair of Reading of data stored on multiple data node devices
  • a distributed write transaction may involve writing data on multiple data node devices.
  • the global timestamp generation cluster 104 can be logically regarded as a single point, but in some embodiments, a service with higher availability can be provided through a master-three-slave architecture, and the global commit timestamp can be implemented in the form of a cluster The generation of , can prevent a single point of failure, and also avoid the single point bottleneck problem.
  • the global commit timestamp is a globally unique and monotonically increasing timestamp identifier in the distributed database system, which can be used to mark the order of the global commit of each transaction, so as to reflect the real time between transactions.
  • the global commit timestamp may use at least one of a physical clock, a logical clock, or a hybrid physical clock, and the embodiment of the present application does not specifically limit the type of the global commit timestamp.
  • the global timestamp generation cluster 104 may be physically independent, or may be merged with a distributed coordination system (eg, ZooKeeper).
  • a distributed coordination system eg, ZooKeeper
  • FIG. 1 is only an architecture diagram that provides a lightweight transaction processing, and is an exemplary description of a distributed database system based on a share-disk architecture.
  • the distributed database system formed by the gateway server 101, the transaction distribution device 102, the distributed storage cluster 103, and the global timestamp generation cluster 104 can be regarded as a server that provides data services to user terminals,
  • the server can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • CDN Content Delivery Network
  • the above-mentioned user terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • an embodiment of the present application provides a transaction processing method. As shown in FIG. 2 , the method provided by this embodiment of the present application includes the following steps 201 to 209 .
  • the transaction allocation device determines the transaction allocation index corresponding to at least two node devices respectively in response to the allocation request of the target transaction, and the transaction allocation index corresponding to one node device is used to indicate the matching of allocating a new transaction for the one node device Spend.
  • Both the transaction distribution device and the at least two node devices are in a distributed database system, and the at least two node devices share the same storage system.
  • the embodiments of the present application do not limit the specific structure of the distributed data system, as long as it includes a transaction allocation device and at least two node devices that share the same storage system.
  • the target transaction refers to a transaction to be processed, and the target transaction may be a distributed transaction or a local transaction, which is not limited in this embodiment of the present application.
  • the allocation request of the target transaction is used to instruct to allocate an appropriate node device for the target transaction as a coordinator node device, so that the allocated coordinator node device can coordinate and process the target transaction.
  • the allocation request of the target transaction is initiated by the terminal, and the allocation request of the target transaction initiated by the terminal is directly sent by the terminal to the transaction allocation device, or forwarded to the transaction allocation device by the gateway server, which is not limited in this embodiment of the present application.
  • the terminal may be any electronic device corresponding to the user, including but not limited to: at least one of a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, or a smart watch, and the embodiment of this application does not specifically limit the type of the terminal .
  • an application client is installed on the terminal, and the application client can be any client that can provide data services.
  • the application client can be a payment application client, a takeaway application client, or a taxi application client. At least one of a client or a social application client, and the embodiment of the present application does not specifically limit the type of the application client.
  • the at least two node devices refer to node devices in the distributed database system that can coordinate and process transactions as decentralized node devices, and each node device can be used to coordinate and process distributed transactions through decentralized algorithms.
  • the transaction allocation device After receiving the allocation request of the target transaction, the transaction allocation device needs to allocate an appropriate node device for the target transaction as a coordinating node device to ensure the efficiency of transaction processing.
  • the transaction allocating device first determines the transaction allocation indexes corresponding to at least two node devices respectively.
  • the transaction allocation index corresponding to a node device is used to indicate the matching degree of allocating a new transaction to the node device. The higher the matching degree of assigning a new transaction to a node device, the more suitable it is to assign a new transaction to the one node device.
  • the transaction allocation indicator is an indicator determined from the perspective of transactions and used to measure whether it is appropriate to allocate a new transaction to a node device.
  • the process of determining transaction allocation indicators corresponding to at least two node devices respectively includes the following steps 2011 and 2012 .
  • Step 2011 Determine a transaction allocation mode, where the transaction allocation mode includes any one of allocation based on the busyness of the transaction, allocation based on the busyness of the device, and allocation based on the busyness of the hybrid.
  • the transaction allocation mode is used to indicate the way of determining the transaction allocation index corresponding to the node device.
  • the transaction assignment schema is set up by the developer and uploaded to the transaction assignment device. It should be noted that the transaction allocation mode adopted in different periods may be different. What is determined in this step 2011 is the transaction allocation mode that should be adopted when the allocation request of the target transaction is received.
  • the transaction allocation mode includes any one of allocation based on transaction busyness, allocation based on device busyness, and allocation based on mixed busyness.
  • the mode of allocating based on the transaction busyness refers to determining the transaction allocation index from the perspective of considering the transaction processing quantity of the node device, and the transaction processing quantity of the node device can reflect the transaction busyness of the node device.
  • the mode of allocation based on the busyness of the device refers to determining the transaction allocation index from the perspective of the device resource utilization rate of the node device, which can reflect the device busyness of the node device.
  • the mode of allocation based on mixed busyness refers to determining the transaction allocation index from the perspective of comprehensively considering the number of transactions of node devices and the utilization of device resources of node devices. The number of transactions of node devices and the utilization of device resources of node devices can reflect The mixed busyness of the outgoing node device.
  • Step 2012 Determine transaction allocation indicators corresponding to at least two node devices respectively according to the determination manner indicated by the transaction allocation mode.
  • Different transaction allocation mode indications are determined in different ways. After the transaction allocation mode is determined, transaction allocation indicators corresponding to at least two node devices respectively are determined according to the determination mode indicated by the transaction allocation mode. Next, the methods of determining the transaction allocation index corresponding to the first node device in the at least two node devices under different transaction allocation modes are respectively introduced.
  • the first node device is any one of the at least two node devices.
  • the transaction allocation mode is based on transaction busyness.
  • the method of determining the transaction allocation index corresponding to the first node device is: based on the number of transactions processed by the first node device, determine the transaction allocation corresponding to the first node device. index.
  • the number of transactions processed by the first node device refers to the number of transactions that the first node device needs to process within a unit time. It should be noted that the transaction to be processed here refers to the transaction that has been allocated to the first node device for processing. The greater the number of transactions that the first node device needs to process in a unit time, the less suitable it is to allocate new transactions to the first node device.
  • the transaction processing quantity of the first node device may be fed back to the transaction distribution device by the first node device, or may be determined by the transaction distribution device itself according to the transaction distribution, which is not limited in this embodiment of the present application .
  • the embodiment of the present application does not limit the expression form of the transaction allocation indicator.
  • the expression form of the transaction allocation indicator is a busy level or a numerical value.
  • the manner of determining the transaction allocation indicator corresponding to the first node device based on the number of transactions processed by the first node device is: setting different transactions for different busy levels.
  • the busy level corresponding to the transaction processing range in which the transaction processing quantity of the first node device is located is taken as the busy level corresponding to the first node device.
  • the busy levels include "busy”, “partially busy", and "idle”, the range of the number of transactions corresponding to "busy” is [10, + ⁇ ), and the range of the number of transactions corresponding to "partly busy” is [3 , 10), the range of transaction processing quantity corresponding to "idle” is [0, 3).
  • the transaction processing quantity of the first node device is 2, then "idle" is used as the transaction allocation indicator corresponding to the first node device. The closer the transaction allocation index corresponding to the first node device is to "idle", the higher the matching degree of allocating new transactions to the first node device.
  • the method for determining the transaction allocation indicator corresponding to the first node device based on the transaction processing quantity of the first node device is: the transaction processing quantity of the first node device.
  • the manner of numerically processing the number of transactions is set according to experience, or flexibly adjusted according to application scenarios, which is not limited in this embodiment of the present application.
  • the way of digitizing the number of transactions is: calculating the product of the number of transactions and a reference weight. In this way, the greater the number of transactions, the greater the value obtained after numerical processing.
  • the transaction allocation mode is based on device busyness.
  • the method of determining the transaction allocation index corresponding to the first node device is: determining the transaction allocation corresponding to the first node device based on the device resource utilization rate of the first node device index.
  • the device resource utilization rate of the first node device refers to the ratio of the total device resources occupied by the device resources that have been used by the first node device.
  • the device resources refer to CPU (Central Processing Unit, central processing unit) resources.
  • CPU Central Processing Unit, central processing unit
  • the device resource utilization rate of the first node device may be monitored in real time by the first node device and fed back to the transaction allocation device, or may be obtained by the transaction allocation device itself, which is not limited in this embodiment of the present application.
  • the method of determining the transaction allocation index corresponding to the first node device based on the device resource utilization rate of the first node device please refer to the method of determining the transaction allocation index corresponding to the first node device based on the number of transactions processed by the first node device. It is not repeated here.
  • the transaction allocation pattern is allocation based on mixed busyness.
  • the method of determining the transaction allocation index corresponding to the first node device is: based on the number of transactions processed by the first node device, the device resource utilization rate of the first node device, the transaction The quantity weight, the equipment resource utilization rate weight and the weight adjustment parameter are processed to determine the transaction allocation index corresponding to the first node equipment.
  • the transaction processing quantity weight and the device resource utilization ratio weight are used to adjust the percentage ratios of the two parameters of the transaction processing quantity and the device resource utilization ratio, which can be obtained by actual measurement.
  • the transaction processing quantity weighting and device resource usage weights both default to 1.
  • the weight adjustment parameter refers to the relative proportion factor of the device resource utilization rate and the number of transactions, which is used to adjust the weight distribution of the device resource utilization rate and the number of transactions, which can be measured.
  • the default value of the weight adjustment parameter is 0.33 .
  • p 3 represents the weight of other factors, and p 3 can be measured according to the types of other factors.
  • the default value of p 3 is 1. The smaller the transaction allocation index Q corresponding to the first node device, the higher the matching degree of allocating a new transaction to the first node device.
  • step 202 is executed.
  • the transaction allocation device determines the coordinating node device of the target transaction in the at least two node devices based on the transaction allocation indicators corresponding to the at least two node devices respectively, and the coordinating node device performs coordination processing on the target transaction.
  • the coordinating node device of the target transaction refers to a node device suitable for allocating a new transaction among at least two node devices.
  • the coordinating node device of the target transaction is used for coordinating processing of the target transaction, that is, the coordinating node device of the target transaction refers to the coordinator of the target transaction.
  • the process of coordinating and processing the target transaction refers to the process of initiating the target transaction in the distributed database system, and then organizing the data node devices of the target transaction to jointly process the target transaction.
  • the data node device of the target transaction refers to the node device used to process the target transaction among the at least two node devices, that is, the data node device of the target transaction refers to the participant of the target transaction.
  • the coordination node device and the data node device mentioned in the embodiments of this application are both for the target transaction.
  • the coordination node device or the data node device is not fixed. That is to say, the same node device may belong to the coordinator node device for some transactions, and belong to the data node device for other transactions.
  • the manner in which the coordinating node device of the target transaction is determined in the at least two node devices is different according to the expression of the transaction allocation indicators, which is not applied in this embodiment of the present application. It is limited as long as it can be ensured that the coordinating node device is a node device that is currently suitable for allocating new transactions.
  • the representation form of the transaction allocation indicator is a busy level, and the busy levels are "busy”, “partially busy” and "idle”, respectively.
  • the method of determining the coordinating node device of the target transaction in the at least two node devices is: set the transaction allocation indicators corresponding to the at least two node devices as " The node device that is "idle” is used as an alternative node device, and one of the alternative node devices is selected as the coordinating node device of the target transaction.
  • the node device whose corresponding transaction allocation indicator is "idle” is used as a candidate node device, and then the node device is used in the standby mode.
  • the transaction allocation indicators corresponding to the at least two node devices are both "busy”
  • the coordinating node device that determines the target transaction is suspended, and the transaction allocation indicators corresponding to the at least two node devices are re-determined after waiting for the reference duration, and then Redetermine the coordinating node device for the target transaction.
  • the reference duration is set based on experience. For example, the reference duration is the measured average duration of completing a transaction.
  • the representation form of the transaction allocation index is a numerical value, and the smaller the transaction allocation index corresponding to a node device, the higher the matching degree of allocating a new transaction to the node device.
  • the method of determining the coordinating node device of the target transaction in the at least two node devices is: An integer not less than 1) the node device of the small transaction allocation index is used as the candidate node device, and one node device is selected from the candidate node devices as the coordinating node device of the target transaction.
  • the value of s is set according to experience, or flexibly adjusted according to the total number of at least two node devices, which is not limited in this embodiment of the present application, for example, the value of s is 1, or the value of s is 3, etc.
  • the above is only an exemplary description of the manner in which the coordination node device of the target transaction is determined in the at least two node devices based on the transaction allocation indicators corresponding to the at least two node devices, and the embodiment of the present application does not limited to this.
  • the representation of the transaction allocation index is a numerical value, and the larger the transaction allocation index corresponding to one node device, the higher the matching degree of allocating a new transaction to the one node device, the at least two node devices will be allocated.
  • the node device corresponding to the large transaction allocation index of the first t (t is an integer not less than 1) is used as the candidate node device, and one node device is selected from the candidate node devices as the coordinating node device of the target transaction.
  • the coordinating node device of the target transaction determined based on the transaction allocation index is a node device suitable for allocating a new transaction among at least two node devices, and then the target transaction is allocated to the coordinating node device, and the coordinating node device coordinates the target transaction. processing, which is conducive to ensuring the processing efficiency of the target transaction.
  • each node device serves a certain number of regions (regions), and each node device maintains the distribution information of the data items in the region served by the node device, and the distribution information of the data items is used to indicate the storage of the data items. Location.
  • the meta-information of the region is maintained in the transaction distribution device.
  • the transaction allocation device determines a node device for serving the region where the data item involved in the target transaction is located according to the maintained region meta information, and then the node device independently processes the target transaction. In this way, the processing efficiency of the transaction is greatly limited, and the real distributed transaction cannot be supported.
  • the node device no longer serves certain fixed areas, the node device no longer maintains the distribution information of data items, and the transaction distribution device no longer maintains the meta information of the area.
  • the meta-information for a region is distributed throughout a shared storage system in a distributed database system.
  • the transaction allocation device can allocate an appropriate node device as the coordinating node device for the target transaction based on the transaction allocation index, without considering the data items involved in the transaction and the distribution of the data items.
  • the node device can automatically transfer data from the shared storage system according to the requirements of the SQL (Structured Query Language) statement in the transaction. In this way, each node device can coordinate and process distributed transactions as a decentralized node device, so that the distributed database system has decentralized distributed transaction processing capabilities.
  • SQL Structured Query Language
  • the method further includes: sending the device identification information of the coordinating node device to the terminal that initiates the allocation request, and the terminal is used for, according to the device identification information of the coordinating node device,
  • the transaction information of the target transaction is sent to the coordinating node device, and the coordinating node device coordinates and processes the target transaction based on the transaction information.
  • the device identification information of the coordinating node device is used to uniquely identify the coordinating node device.
  • the terminal can know the coordinating node device for coordinating and processing the target transaction.
  • the terminal sends the transaction information of the target transaction to the coordinating node device after learning the coordinating node device for coordinating and processing the target transaction according to the device identification information.
  • the transaction information of the target transaction is used to indicate the relevant processing operations of the target transaction, and exemplarily, the transaction information of the target transaction refers to an SQL statement.
  • the terminal directly sends the transaction information of the target transaction to the coordinating node device; or, the terminal sends the transaction information of the target transaction and the device identification information of the coordinating node device to the gateway server, and the gateway server sends the target transaction to the gateway server.
  • the transaction information is forwarded to the coordinating node device.
  • the coordinating node device After receiving the transaction information of the target transaction, the coordinating node device coordinates and processes the target transaction based on the transaction information.
  • the coordinating node device can parse transaction information, such as SQL statements, generate a transaction execution plan, and then complete the processing of the target transaction by communicating with the relevant data node devices.
  • the method provided by the embodiment of the present application can realize decentralized transactions processing.
  • decentralized transaction processing multiple distributed transactions are coordinated and processed by multiple node devices respectively.
  • the coordinating node device establishes communication with other node devices, and obtains the data generated by other node devices in the process of coordinating and processing other distributed transactions. Data information, and then perform data exception or serializable verification according to the acquired data information to determine whether the target transaction conforms to transaction consistency and ensure that the transaction processing technology is correct.
  • the coordinating node device buffers the data information transmitted from other node devices in a temporary data buffer, and the target transaction ends and is cleared.
  • each other transaction is coordinated and processed by the coordinating node device
  • each other transaction is coordinated and processed by a suitable coordinating node device allocated in real time by the transaction allocation device according to the transaction allocation index of the node device, which is not limited in the embodiment of the present application.
  • step 203 the coordinating node device acquires transaction information of the target transaction.
  • the transaction information of the target transaction may be directly sent to the coordinating node device by the creation terminal of the target transaction, or may be forwarded to the coordinating node device by the gateway server, which is not limited in this embodiment of the present application.
  • the transaction information of the target transaction refers to the SQL statement used to realize the target transaction.
  • the coordinating node device initializes the target transaction after acquiring the transaction information of the target transaction.
  • the phase in which the target transaction is initialized can be considered as the snapshot phase in which the transaction is established.
  • a global consistent snapshot point can be established to ensure global read consistency.
  • the coordinating node device may perform at least one of the following two initialization operations.
  • Initialization operation 1 The coordinating node device assigns a globally unique transaction identifier TID to the target transaction.
  • the transaction identifier TID is used to uniquely identify the target transaction.
  • Initialization operation 2 The coordinating node device records the initial state information of the target transaction in the first transaction state list.
  • the transaction state list maintained by the coordinating node device is referred to as the first transaction state list
  • the first transaction state list is a global state list used to record the global state of the target transaction under the decentralized framework.
  • the state information of the target transaction recorded in the first transaction state list includes, but is not limited to, the transaction identifier of the target transaction, the global transaction state of the target transaction, and the logical life cycle of the target transaction.
  • the logical life cycle consists of the lower bound of timestamp and the upper bound of timestamp.
  • the lower bound of the timestamp of the logical life cycle is called the start timestamp (Begintimestamp, Bts) of the target transaction
  • the upper bound of the timestamp of the logical life cycle is called the end timestamp (Endtimestamp, Ets) of the target transaction, that is, the logical life cycle It consists of the lower bound of timestamp Bts and the upper bound of timestamp Ets.
  • the transaction identifier TID of the target transaction is assigned in the initialization operation 1
  • the global transaction status Status of the target transaction is Grunning (globally running)
  • the logical life cycle of the target transaction is the first logical life cycle
  • the lower bound Bts of the timestamp of the first logical life cycle is a globally unique incremental timestamp value
  • the upper bound Ets of the timestamp of the first logical life cycle is + ⁇ .
  • the lower bound Bts of the timestamp and the upper bound Ets of the timestamp of the first logical life cycle are obtained in the following manner: for the isolation level above the serializable level, the coordinating node device obtains the timestamp value from the global clock ; For serializable levels and weaker isolation levels, the coordinating node device obtains the timestamp value from the local Hybrid Logical Clock (HLC).
  • HLC Hybrid Logical Clock
  • the coordinating node device can also obtain the timestamp lower bound Bts and the timestamp of the first logical life cycle by obtaining the timestamp value from the global clock.
  • Upper bound Ets In an exemplary embodiment, obtaining the timestamp value from the local HLC is more efficient for serializable levels and weaker isolation levels.
  • the global clock refers to the clock generated by the global logical clock generator, which has the characteristic of monotonically increasing.
  • the global clock is provided by a global timestamp generation cluster in a distributed database system.
  • the global clock is provided by the storage system in the distributed database system through an API.
  • the global logical clock generator can assign values to the start timestamp Bts of the transaction and the end timestamp Ets of the transaction, and can also assign values to the global LSN (Log Sequence Number) of the WAL (Write Ahead Logging) Assignment.
  • the global clock is a logical concept that provides a uniform monotonically increasing value for the entire system; the physical form may be a global physical clock or a global logical clock.
  • the global clock is a distributed decentralized clock similar to Google's "Truetime (a clock mechanism)" mechanism; The clock provided by the active and standby systems of the cluster constructed by Paxos/Raft, etc.); alternatively, the global clock is a clock provided by an algorithmic mechanism with a precise synchronization mechanism and a coordinated node exit mechanism.
  • the composition of Bts and Ets of one transaction is composed of 8 bytes.
  • the 8 bytes are divided into two parts.
  • the first part is the value of the physical timestamp (that is, a Unix (an operating system) timestamp, accurate to milliseconds), which is used to identify the global time (represented by gts);
  • the second part is A monotonically increasing count within a certain millisecond, used to identify relative time in global time (ie, local time, denoted by lts).
  • the first 44 bits in 8 bytes are the first part, which can represent a total of 2 44 unsigned integers, so in theory, a total of about 557.8 can be represented.
  • the number of bits of the two parts may also be adjusted so that the ranges represented by the global time gts and the local time lts vary.
  • composition of Bts and Ets of a transaction may be composed of more than 8 bytes of bytes or composed of less than 8 bytes of bytes.
  • composition of Bts and Ets is adjusted to consist of 10 bytes, so that the local time lts is increased to cope with a larger number of concurrent transactions.
  • Ti.bts and Tj.bts composed of two parts of global time gts and local time lts
  • the coordinating node device sends a data read request to the data node device based on the transaction information of the target transaction, and the data node device is a node device used to process the target transaction among the at least two node devices.
  • the execution phase of the target transaction is started, and the execution phase of the transaction can be regarded as the operational phase of transaction semantics.
  • the data node device is a node device used to process the target transaction among at least two node devices, and the data node device can obtain the data items involved in the target transaction, that is, the data node device in the embodiment of the present application is the data node device related to the target transaction. .
  • the transaction information of the target transaction carries relevant information of the data to be read, and the coordinating node device can generate a data read request according to the transaction information of the target transaction, and then send the data read request to the data node device.
  • the data read request is represented as ReadRequestMessage (read request message), abbreviated as rrqm.
  • the data read request carries the first logical life cycle of the target transaction, the transaction identifier of the target transaction, and the read plan.
  • the first logical life cycle is represented by the lower bound of timestamp Bts and the upper bound of timestamp Ets.
  • the read plan refers to the data read plan corresponding to the target transaction, which is used to indicate the data items that need to be read.
  • the transaction identifier, the lower time stamp Bts, the upper time stamp Ets, and the read plan are recorded in four fields of rrqm, respectively.
  • the number of data node devices may be one or more, and the number of data node devices is not specifically limited in this embodiment of the present application. In the case where the number of data node devices is multiple, the read plans carried in the data read requests sent to different data node devices are different to indicate that different data items need to be read in different data node devices.
  • step 205 the data node device obtains the data read result based on the data read request sent by the coordination node device, and returns the data read result to the coordination node device.
  • the data node device After obtaining the data read request, the data node device obtains the data read result based on the data read request. In a possible implementation manner, for the case where the data read request carries the first logical life cycle of the target transaction, the data node device obtains the data read result based on the data read request, including the following steps 2051 to 2053 .
  • Step 2051 Based on the first logical life cycle, determine the visible version data of the data item to be read indicated by the data read request.
  • the data node device can determine the data items to be read by the target transaction based on the read plan carried in the data read request, and use the data items to be read by the target transaction as the data items to be read.
  • the visible version data of the data item to be read refers to a certain version of data that is visible to the target transaction among the versions of data corresponding to the data item to be read.
  • a data buffer is provided in the data node device. If there are versions of data of the data item to be read in the data buffer, the data node device directly obtains the data of each version of the data item to be read from the data buffer. ; If the data of each version of the data item to be read does not exist in the data buffer, the data node device obtains the data of each version of the data item to be read from the shared storage system.
  • the data node device after receiving the data read request, the data node device first checks whether the local transaction status list (Local TS) contains the status information of the target transaction.
  • the local transaction state list is a transaction state list maintained by the data node device, and the local transaction state list records the state information of each uncommitted transaction that the data node device participates in.
  • the data node device after receiving the data read request, the data node device checks whether the local transaction status list contains the status information of the target transaction according to the transaction identifier of the target transaction carried in the data read request, and the check result includes the following two kinds.
  • the status information of the target transaction can be initialized in the local transaction status list, that is, a record related to the target transaction is inserted into the Local TS, and the values in the record are the target transaction carried by the data read request respectively.
  • the method of determining the visible version data of the data item to be read indicated by the data read request is: determining the visible version data of the data item to be read relative to the first logical life cycle .
  • the local transaction status list contains the status information of the target transaction.
  • the method of determining the visible version data of the data item to be read indicated by the data read request is: based on the first logical life cycle, determine the updated logical life cycle; Read the visible version data of the data item relative to the updated logical lifetime.
  • the implementation method of determining the visible version data of the data item to be read relative to the first logical life cycle is similar to the implementation method of determining the visible version data of the data item to be read relative to the updated logical life cycle.
  • the visible version data of the data item to be read relative to the first logical life cycle is determined as an example for description.
  • the validity of the first logical life cycle is checked to determine whether the first logical life cycle is valid.
  • a method of validating the first logical life cycle is to check whether the lower bound of the timestamp of the first logical life cycle is smaller than the upper bound of the timestamp of the first logical life cycle.
  • the transaction state of the target transaction in the local transaction state list is updated from Running to Aborted (rollback).
  • the data node device returns the data read result carrying the Abort (rollback) message to the coordinator node device.
  • the data reading result is represented as ReadReplyMessage (read feedback message), abbreviated as rrpm.
  • ReadReplyMessage read feedback message
  • the lower bound of the timestamp of the first logical life cycle is less than the upper bound of the timestamp of the first logical life cycle, it indicates that the first logical life cycle is valid, and at this time, the visibility of the data item to be read relative to the first logical life cycle is determined. Manipulation of version data.
  • the process of determining the visible version data of the data item to be read relative to the first logical life cycle is: in response to the creation timestamp of the latest version of the data item to be read being smaller than the first logical life cycle In response to the upper bound of the timestamp of the latest version of the data, the latest version of the data is regarded as the visible version of the data; in response to the creation timestamp of the latest version of the data item to be read is not less than the upper bound of the timestamp of the first logical life cycle, the data to be read continues to be The data of the previous version of the item is compared with the upper bound of the timestamp of the first logical life cycle, until it is determined that the first version of data whose creation timestamp is less than the upper bound of the timestamp of the first logical life cycle is determined. data as visible version data.
  • the data node device first starts to check the data of the latest version of the data item x to be read, if the logical life cycle
  • the timestamp upper bound T.Ets is greater than the creation timestamp Wts of the latest version of the data, and the latest version of the data is the visible version of the data relative to the logical life cycle.
  • the latest version data is not the visible version data relative to the logical life cycle, and the previous version data needs to be searched until the first version data x.v that satisfies T.Ets>Wts is found, and the version data x.v is taken as Visible version data relative to this logical lifetime.
  • the visible version data x.v is stored in the read set of the target transaction.
  • the read set here may be a local read set or a global read set.
  • the read set is taken as an example for illustration, which can avoid the problems caused by synchronizing the global read set. communication overhead.
  • the read set of a transaction records the visible version data of the data items that the transaction needs to read.
  • the read set of the distributed read transaction can be divided into a local read set and a global read set.
  • the local read set exists on the data node device, and the global read set exists in the coordination. on the node device.
  • the coordinator node device can periodically synchronize the global read set to each data node device, so that the data node device can also maintain the global read set of the transaction.
  • Step 2052 Determine the second logical life cycle of the target transaction based on the creation timestamp of the visible version data and the first logical life cycle.
  • the data node device After determining the visible version data, the data node device determines the second logical life cycle of the target transaction based on the creation timestamp of the visible version data and the first logical life cycle.
  • the implementation of step 2052 is as follows: directly based on the creation timestamp of the visible version data and the first logical life cycle, determine The second logical lifetime of the target transaction.
  • the implementation of this step 2052 is: based on the creation time stamp of the visible version data and according to the first logical life cycle
  • the updated logical life cycle determined by the cycle determines the second logical life cycle of the target transaction.
  • the method for determining the second logical life cycle of the target transaction is: adjusting the timestamp lower bound of the first logical life cycle so that the first logical life cycle is The lower bound of the time stamp of a logical life cycle is greater than the creation time stamp of the visible version data x.v, that is, making T.Bts>x.v.Wts to eliminate write and read exceptions; the adjusted logical life cycle is taken as the second logical life cycle.
  • the second logical life cycle of the target transaction is determined directly based on the creation timestamp of the visible version data and the first logical life cycle
  • the cycle method is: adjust the lower bound of the timestamp of the first logical life cycle, so that the lower bound of the timestamp of the first logical life cycle is greater than the creation timestamp of the visible version data x.v, that is, make T.Bts>x.v.Wts, to eliminate write and read Exception; in response to the fact that the transaction to be written corresponding to the visible version data is not empty, adjust the upper bound of the timestamp of the first logical life cycle so that the upper bound of the timestamp of the first logical life cycle is smaller than the logic of the transaction to be written corresponding to the visible version data
  • the lower bound of the timestamp of the life cycle that is, T.Ets ⁇ T0.Bts (T0 represents the to-be-written transaction corresponding
  • the to-be-written transaction WT corresponding to the visible version data is a transaction that is modifying the data item corresponding to the visible version data and has passed the verification.
  • the to-be-written transaction is recorded by recording the transaction identifier of the to-be-written transaction.
  • the transaction identifier of the target transaction is added to the active transaction set of the visible version data; the visible version data is added to the local read set of the target transaction.
  • the active transaction set (RTlist) is used to record the active transactions that have accessed the latest version of the data, and can also be called a read transaction list.
  • the active transaction set can be in the form of an array, a list, a queue, a stack, etc. This application The embodiment does not specifically limit the form of the active transaction set, and each element in the RTlist may be a transaction identifier (TID) of a transaction that has read the above-mentioned latest version data.
  • TID transaction identifier
  • Step 2053 Use the result carrying the second logical life cycle and the visible version data as the data reading result.
  • the data node device After determining the second logical life cycle and the visible version data, the data node device takes the result of carrying the second logical life cycle and the visible version data as the data reading result, and then reads the data carrying the second logical life cycle and the visible version data. The result is returned to the coordinating node device, so that the coordinating node device obtains the second logical life cycle and visible version data.
  • the data reading result is represented as ReadReplyMessage (read feedback message), abbreviated as rrpm.
  • the rrpm carrying the second logical life cycle and the visible version data includes Bts, Ets and Value fields, wherein the Bts field and the Ets field respectively record the timestamp lower bound of the second logical life cycle and the second logical life cycle. The upper bound of the timestamp, the Value field records the value of the visible version of the data.
  • step 206 the coordinator node device sends a transaction verification request and a local write set to the data node device in response to the data read result returned by the data node device meeting the transaction verification condition.
  • the coordinator node device determines whether the data read result meets the transaction verification condition, and then sends a transaction verification request to the data node device when it is determined that the data read result meets the transaction verification condition and a local write set to enable the data node device to validate the target transaction.
  • Abort rollback
  • the maximum value in the target transaction is used as the lower bound of the timestamp of the third logical lifetime of the target transaction, and the minimum value between the upper bound of the timestamp of the first logical lifetime and the upper bound of the timestamp of the second logical lifetime is taken as the third bound of the target transaction.
  • the timestamp lower bound and the timestamp upper bound of the logical life cycle (ie the first logical life cycle), rrpm.Bts and rrpm.Ets are the timestamp lower bound and the timestamp upper bound of the second logical life cycle carried by the data read result, respectively .
  • T.Bts in the first transaction state list is less than T.Ets, that is, check whether the timestamp lower bound of the third logical life cycle is less than the third
  • the upper bound of the timestamp of the logical life cycle to determine whether the third logical life cycle is valid.
  • the third logical life cycle is invalid.
  • the data read result does not meet the transaction verification conditions and enters the global Rollback phase; when the lower bound of the timestamp of the third logical life cycle is less than the upper bound of the timestamp of the third logical life cycle, the third logical life cycle is valid. In this case, the data read result is considered to meet the transaction verification conditions, and the The data node device sends a transaction verification request carrying the third logical life cycle.
  • the coordinating node device decides to roll back the target transaction, it needs to modify the global transaction state of the target transaction in the first transaction state list to Gaborting (globally being rolled back), and notify the relevant child nodes (that is, the data node device). ) to perform a partial rollback.
  • the coordinating node device modifies the global transaction state of the target transaction in the first transaction state list to Gvalidating (globally validating).
  • the transaction validation request is represented as ValidateRequestMessage (validation request message), abbreviated as vrm.
  • vrm ValidateRequestMessage
  • the Bts and Ets fields are included in the vrm. The Bts field and the Ets field respectively record the timestamp lower bound and the timestamp upper bound of the latest logical life cycle of the target transaction in the first transaction state list, that is, the timestamp lower bound and the timestamp upper bound of the third logical life cycle.
  • the number of data node devices is multiple.
  • each data node device returns a data read result
  • the data read result satisfying the transaction verification condition means that each data node device returns a data read result.
  • Each data read result satisfies the transaction verification condition.
  • the third logical life cycle is a logical life cycle determined by comprehensively considering each data read result.
  • the transaction verification condition is considered satisfied after all required data has been read and the update written to local memory. That is, the coordinator node device sends a transaction verification request to the data node device in response to the third logical life cycle being valid and the global write set of the target transaction being stored in the local memory.
  • the global write set of the target transaction is generated by the terminal and transmitted to the coordinator node device, or is generated by the coordinator node device itself, which is not limited in this embodiment of the present application.
  • the write set of a transaction records the data items that the transaction needs to update. Similar to the read set structure, the memory linked list structure can also be used to maintain the write set of the transaction. It should be noted that, for a distributed write transaction, the write set of the distributed write transaction can be divided into a local write set and a global write set, the local write set exists on the data node device, and the global write set exists in the coordination on the node device. Of course, the coordinator node device can periodically synchronize the global write set to each data node device, so that the data node device can also maintain the global write set of the transaction.
  • the coordinator node device After writing the global write set of the target transaction to the local memory of the coordinator node device, the coordinator node device can determine the local write set of the data node device based on the global write set, so as to send the transaction verification request and the local write set to the data node device together .
  • the local write set of the data node device refers to the write set that needs to be written by the data node device in the global write set of the target transaction.
  • n is an integer greater than 1
  • n is the number of remote readings
  • a maximum of 2n communications are required, and the maximum communication volume can be expressed as n ⁇ (data read request message size + data read result message size).
  • the data read requests of the data of the multiple data items are packaged and sent, so as to read the data in batches, saving energy Communication times, improve data reading efficiency.
  • step 207 the data node device obtains the verification result of the target transaction based on the transaction verification request and the local write set sent by the coordinating node device, and returns the verification result of the target transaction to the coordinating node device.
  • the data node device After receiving the transaction verification request and the local write set sent by the coordinating node device, the data node device verifies the validity of the target transaction to obtain the verification result of the target transaction. This stage is the transaction validity verification stage before the transaction is committed.
  • the verification process of the data node device is a local verification process, and the process of obtaining the verification result of the target transaction based on the transaction verification request and the local write set is the process of performing the local verification operation by the data node device.
  • the transaction verification request carries a third logical life cycle
  • the third logical life cycle is an effective logical life cycle determined by the coordinating node device based on the first logical life cycle and the second logical life cycle.
  • the third logical life cycle is the latest logical life cycle of the target transaction maintained before the coordinating node device sends the transaction verification request.
  • the logical life cycle of the target transaction maintained in the local transaction status list of the data node device is the second logical life cycle.
  • the logical life cycle of the target transaction maintained by the local transaction state list is called the fourth logical life cycle.
  • the data node device takes the maximum value of the timestamp lower bound of the third logical life cycle and the timestamp lower bound of the second logical life cycle as the lower bound of the fourth logical life cycle of the target transaction;
  • the minimum value of the timestamp upper bound and the timestamp upper bound of the second logical lifetime is used as the timestamp upper bound of the fourth logical lifetime of the target transaction.
  • the fourth logical life cycle is obtained. It should be noted that what is updated here is the logical life cycle of the target transaction maintained in the local transaction state list of the data node device, and such an update can be used for transaction concurrent access control, that is, to ensure transaction consistency.
  • the fourth logical lifetime is determined, it is verified by checking whether the timestamp lower bound of the fourth logical lifetime is less than the timestamp upper bound of the fourth logical lifetime Whether the fourth logical life cycle is valid.
  • the local verification of the target transaction fails, and the data node device sends the data to the coordinating node device. Returns the verification result carrying the Abort message.
  • the Abort message is used to cause a global rollback.
  • the process of returning the verification result of the target transaction to the coordinating node device can be regarded as the process of sending the local verification feedback message lvm to the coordinating node device.
  • the transaction-related information and the fourth logical life cycle determine the fifth logical life cycle of the target transaction.
  • the fifth logical life cycle refers to a logical life cycle obtained by updating in the process of performing read-write conflict verification on each to-be-written data item in the local write set.
  • the read transaction related information of a data item to be written includes the maximum read transaction timestamp of the data item to be written and the end timestamp of the target read transaction of the data item to be written at least one of.
  • the maximum read transaction timestamp (marked as Rts) of a data item to be written is used to indicate the maximum value among the logical commit timestamps of each read transaction that has read the to-be-written data item, a to-be-written data item
  • the target read transaction of the data item is the read transaction that has passed the local verification or is in the commit phase corresponding to the data item to be written
  • the end timestamp of the target read transaction is the upper bound of the timestamp of the logical life cycle of the target read transaction.
  • a target read transaction of a data item to be written is a read transaction that has passed the local verification or is in the commit phase in the active transaction set corresponding to the data item to be written. By detecting the transaction status of each read transaction in the active transaction set corresponding to the one data item to be written, the target read transaction of the one data item to be written can be determined.
  • the process of determining the fifth logical life cycle of the target transaction is also different.
  • the read transaction related information of a data item to be written includes the maximum read transaction timestamp of the data item to be written.
  • the process of determining the fifth logical life cycle of the target transaction is: based on each data item to be written The maximum read transaction timestamp of the item and the fourth logical life cycle determine the fifth logical life cycle of the target transaction, where the lower bound of the timestamp of the fifth logical life cycle is greater than the maximum read transaction timestamp of each data item to be written. maximum value.
  • the fourth logical life cycle is the latest logical life cycle of the target transaction maintained in the local transaction state list of the data node device before the fifth logical life cycle is determined.
  • the method of determining the fifth logical life cycle of the target transaction is: based on the maximum read transaction time stamp of each data item to be written
  • the read transaction timestamp adjusts the timestamp lower bound of the fourth logical life cycle, and uses the adjusted logical life cycle as the fifth logical life cycle.
  • the data node device after receiving the local write set, the data node device first detects whether the to-be-written transaction WT of each to-be-written data item corresponding to the local write set is empty.
  • the write transaction WT is not empty, indicating that other transactions are modifying the data item to be written, and the transaction has entered the verification stage.
  • the target transaction needs to be rolled back to eliminate the write-write conflict, that is, return the data item to the coordinating node device with the The validation result of the Abort message.
  • the transaction identifier of the target transaction is assigned to the to-be-written transaction WT of each to-be-written data item to indicate that the target transaction entering the verification phase needs to modify each to-be-written transaction Enter the data item.
  • the lock-free CAS Compare and Swap
  • the transaction WT is locked to prevent other concurrent transactions from concurrently modifying y, and then assigns a value to the locked transaction WT to be written.
  • an advisory lock is imposed on the data item y to be written, and the advisory lock is used to instruct the mutual exclusion of the modification operation of the transaction WT to be written to the data item y to be written.
  • the read transaction related information of a data item to be written includes the end timestamp of the target read transaction of the data item to be written.
  • the process of determining the fifth logical life cycle of the target transaction is: based on each data item to be written The end timestamp of the target read transaction of the item and the fourth logical life cycle determine the fifth logical life cycle of the target transaction, where the lower bound of the timestamp of the fifth logical life cycle is greater than the end of the target read transaction of each data item to be written Maximum value in timestamp.
  • the method of determining the fifth logical life cycle of the target transaction is: based on each data item to be written The end timestamp of the target read transaction is adjusted to the lower bound of the timestamp of the fourth logical life cycle, and the adjusted logical life cycle is taken as the fifth logical life cycle.
  • T.Bts in parentheses represents the lower bound of the timestamp of the fourth logical life cycle
  • T1.Ets represents the maximum value among the end timestamps of the target read transaction of each data item to be written
  • a value of 1 is used to ensure that the lower bound of the timestamp of the fifth logical lifetime is greater than the maximum value of the end timestamps of the target read transactions of each data item to be written.
  • the number of target read transactions for a data item to be written may be one or more, and for a case where the number of target read transactions for a data item to be written is multiple, the above T1.Ets refers to all The maximum value among the end timestamps of all target read transactions for the data item to be written.
  • the occurrence of the write operation of the target transaction can be delayed until after the read operation of the target read transaction, so as to avoid read-write conflicts.
  • the read transaction related information of a data item to be written includes the maximum read transaction timestamp of the data item to be written and the end timestamp of the target read transaction of the data item to be written.
  • the process of determining the fifth logical life cycle of the target transaction is: based on each data item to be written
  • the maximum read transaction timestamp of the item and the end timestamp of the target read transaction of each data item to be written are adjusted twice consecutively to the fourth logical life cycle, and the logical life cycle obtained after the two adjustments is taken as the first time of the target transaction.
  • This embodiment of the present application does not limit the sequence of the two adjustments.
  • the fourth logical life cycle is adjusted based on the maximum read transaction timestamp of each data item to be written, and then based on each data item to be written.
  • the end timestamp of the target read transaction is adjusted for the logical lifetime obtained after one adjustment.
  • the fourth logical life cycle can also be adjusted based on the end timestamp of the target read transaction of each data item to be written, and then based on the maximum read transaction timestamp of each data item to be written. The logical life cycle obtained after one adjustment is adjusted.
  • the logic life cycle obtained after two adjustments is used as the fifth logic life cycle
  • the fifth logical life cycle After the fifth logical life cycle is obtained, it is determined whether the fifth logical life cycle is valid by verifying whether the lower bound of the timestamp of the fifth logical life cycle is smaller than the upper bound of the timestamp of the fifth logical life cycle. In response to the fifth logical life cycle being valid, the verification result used to indicate that the verification passed is used as the verification result of the target transaction; in response to the fifth logical life cycle being invalid, the verification result used to indicate that the verification failed is used as the verification result of the target transaction. .
  • the latest logical life cycle (that is, the fifth logical life cycle) obtained by the target transaction on the data node device is recorded in the local verification feedback message lvm of the data node device. period) of the lower bound Bts of the timestamp and the upper bound of the timestamp Ets.
  • the verification result used to indicate that the verification fails is the verification result carrying the Abort message.
  • the data node device after determining that the local verification of the target transaction is passed, creates a new version of data of the data item to be written according to the updated value of the data item to be written.
  • the created new version data is set with a first flag indicating that the new version data is not globally committed. The new version data with the first tag is not visible to the outside world.
  • an active transaction set of data items to be written includes not only the target read transaction, but also a running read transaction, and the running read transaction needs to be adjusted according to the fifth logical life cycle of the target transaction.
  • the logical life cycle of the transaction so that the running read transaction cannot read the data newly written by the target transaction, thereby avoiding the phenomenon of read-write conflict and ensuring the correct execution of the transaction.
  • a running read transaction refers to a transaction whose transaction status in the active transaction set is Running.
  • the way of adjusting the logical life cycle of the running read transaction is as follows: the upper bound of the timestamp of the logical life cycle of the running read transaction is smaller than the lower bound of the timestamp of the fifth logical life cycle of the target transaction.
  • communication mainly occurs between the coordinating node device and the related data node devices.
  • the communication mainly includes the following two steps: the coordinating node device sends each related data node to the The device sends a transaction verification request and a local write set, and the related data node device feeds back the verification result to the coordinating node device. Therefore, in the verification phase of the target transaction, assuming that m (m is an integer not less than 1) is the number of data node devices related to the target transaction T, then a maximum of 2m communications are required, and the maximum communication volume can be expressed as m ⁇ ( Transaction verification request message size + verification result message size) + global write set size.
  • the coordinating node device determines a processing instruction of the target transaction based on the verification result of the target transaction returned by the data node device, and sends the processing instruction to the data node device, where the processing instruction is a commit instruction or a rollback instruction.
  • the coordinating node device After receiving the verification result returned by the data node device, the coordinating node device judges whether the target transaction can pass the global verification according to the received verification result, then determines the processing instruction of the target transaction, and sends the processing instruction to the data node device.
  • the processing instruction is a commit instruction or a rollback instruction.
  • the number of data node devices is one or more, and when the number of data node devices is multiple, each data node device returns a verification result.
  • the process of determining the processing instruction of the target transaction is: in response to at least two verification results returned by the at least two data node devices There is a verification result indicating that the verification fails, and the rollback instruction is used as the processing instruction of the target transaction.
  • the intersection of the logical life cycles carried by the at least two verification results is taken as the target logical life cycle; in response to the target logical life cycle being valid, the instruction will be submitted As the processing instruction of the target transaction; in response to the invalidation of the target logical life cycle, the rollback instruction is used as the processing instruction of the target transaction.
  • the verification result used to indicate that the verification fails is the verification result carrying the Abort message, if a certain verification result does not carry the Abort message, but carries the logical life cycle (that is, the fifth logical life cycle), the verification result indicates that the verification is passed. That is to say, in the process of the coordinating node device judging whether the target transaction can pass the global verification according to the received verification results, if there is at least one verification result carrying the Abort message in the received verification results, that is, if the IsAbort field is equal to 1 lvm, indicating that the target transaction has not passed all local verifications, the global verification of the target transaction has failed, and the target transaction needs to be rolled back globally.
  • the rollback instruction is used as the processing instruction of the target transaction.
  • the coordinating node device updates the global transaction state of the target transaction in the first transaction state list to Gaborting (globally rolling back).
  • the coordinator node device sends a rollback instruction to the data node device to notify the data node device to perform a local rollback.
  • the processing instruction is sent by writing the commit/rollback message coarm.
  • the processing instruction is a rollback instruction
  • the coordinating node device calculates the intersection of the logical life cycles carried by the received verification results to obtain the target logical life cycle. If the lower bound of the timestamp of the target logic life cycle is not less than the upper bound of the timestamp of the target logic life cycle, it means that the target logic life cycle is invalid, and it is determined that the global verification of the target transaction fails, the target transaction needs to be rolled back globally, and the coordinating node device will The rollback instruction serves as the processing instruction for the target transaction.
  • the coordinating node device also updates the global transaction state of the target transaction in the first transaction state list to Gaborting (globally rolling back), and the coordinating node device sends a rollback instruction to the data node device to notify the data node device to perform local rollback .
  • the coordinating node device randomly selects a timestamp from the target logic life cycle Assign a value to the logical commit timestamp Cts of the target transaction, for example, select the lower bound of the timestamp of the logical lifetime of the target as the logical commit timestamp of the target transaction.
  • the coordinating node device uses the commit instruction as the processing instruction of the target transaction, and sends the commit instruction to the data node device to notify the data node device to commit the target transaction.
  • the processing instruction is sent by writing the commit/rollback message coarm
  • the Cts and Gts fields in the coarm are The logical commit timestamp of the target transaction and the global commit timestamp of the target transaction are recorded in .
  • step 209 the data node device executes the processing instruction in response to receiving the processing instruction of the target transaction sent by the coordinating node device, and the processing instruction is a commit instruction or a rollback instruction.
  • the data node device After receiving the processing instruction, the data node device executes the processing instruction.
  • the stage in which the data node device executes the processing instruction is the final stage of transaction commit or rollback operation.
  • the processing instruction is a commit instruction
  • it means that the global verification of the target transaction has passed, and the commit phase is entered, that is, the update of the data by the target transaction is persisted to the database, and some follow-up cleaning work is done.
  • the data node device receives the commit instruction sent by the coordinator node device, the following operations A to E may be performed.
  • Operation C Clear the local read set and local write set of the target transaction.
  • the local transaction state list at this time is used to ensure transaction consistency and does not need to involve synchronization of the global transaction state.
  • Operation E Return a successful ACK (Acknowledge Character) to the coordinating node device.
  • the coordinating node device After the coordinating node device receives the successful submission ACKs returned by all the data node devices, it modifies the global transaction state of the target transaction in the first transaction state list to Gcommitted (global submission is complete), and then the coordinating node device sends a message to the data node device.
  • the state information cleanup command causes the data node device to delete the state information of the target transaction from the local transaction state list.
  • the processing instruction is a rollback instruction
  • the cleaning work includes: deleting the transaction identifier TID of the target transaction from the active transaction list RTlist of each data item x corresponding to the local read set of the target transaction; cleaning up each data corresponding to the local write set of the target transaction.
  • the coordinating node device After the coordinating node device receives the ACKs returned by all the data node devices to complete the rollback, it modifies the global transaction state of the target transaction in the first transaction state list to Gaborted (global rollback is completed), and then the coordinating node device sends the data node to the data node.
  • the device sends a state information cleanup instruction, so that the data node device deletes the state information of the target transaction from the local transaction state list.
  • the coordinating node device sends state information cleaning instructions to the data node device in batches, so as to reduce the number of communications.
  • communication mainly occurs between the coordinating node device and the related data node device, and the communication mainly includes the following two steps: the coordinating node device sends a message to each related data node device.
  • Commit/rollback instruction each relevant data node device sends a corresponding commit/rollback complete message (ACK) to the coordinator node device. Therefore, in the commit/rollback phase, a maximum of 2m communications are performed, and the size of the communication volume is m ⁇ (commit/rollback command message size + commit/rollback completion message size), where m (m is an integer not less than 1) is The number of data node devices associated with the target transaction T.
  • the embodiments of the present application are introduced by taking the target transaction involving read and write operations as an example, and the embodiments of the present application are not limited to this.
  • the transaction processing method provided in the embodiment of the present application realizes the processing of the transaction, which is not repeated in the embodiment of the present application.
  • the transaction processing method mainly applies the algorithm framework of OCC (Optimistic Concurrency Control, optimistic concurrency control), combined with the DTA (Dynamic Timestamp Allocation, dynamic timestamp allocation) algorithm, to reduce network transmission time.
  • OCC Optimistic Concurrency Control
  • DTA Dynamic Timestamp Allocation, dynamic timestamp allocation
  • the DTA algorithm belongs to the TO (Timestamp Ordering, timestamp ordering) algorithm, and the timestamp lower bound and timestamp upper bound of the logical life cycle of a transaction can be dynamically adjusted.
  • the methods provided by the embodiments of the present application are not affected by the data storage format.
  • the distributed database system in the embodiments of the present application supports both the key-value data storage format (KV data storage format) (for example, the data storage format in the HBase database system). ), and supports segment page data storage formats (eg, data storage formats in PostgreSQL and MsSQL/InnoDB database systems).
  • KV data storage format for example, the data storage format in the HBase database system.
  • segment page data storage formats eg, data storage formats in PostgreSQL and MsSQL/InnoDB database systems.
  • a data buffer is established in the node device to buffer the data transmitted from the shared storage system, so as to speed up the next data acquisition, and the buffering format is the same as that of the lower layer.
  • the data storage format remains the same.
  • the data transferred from the shared storage system is buffered in the local data buffer, and the transaction ends but is not cleared until the local data buffer is full or there is dirty data that needs to be flushed back to the shared storage system, or the buffer fails (eg, The same data is modified on other node devices).
  • each node device calculates a transaction log (eg, WAL log) from the shared storage system, and the transaction log asks the shared storage system for an LSN value, which is a globally unique and incremental value.
  • a transaction log eg, WAL log
  • LSN value which is a globally unique and incremental value.
  • transaction logs generated during transaction processing have different formats. Exemplarily, when the data storage format is the KV data storage format, the format of the transaction log is shown in FIG. 3 .
  • Each region (Region) divided into a large table maintained by the database system shares a log file.
  • a single region is stored in the log in chronological order, but multiple regions may not be completely in chronological order.
  • the minimum unit of each log consists of a log key (HLogKey) and a log edit (WALEdit).
  • HLogKey consists of sequenceid, timestamp, cluster ids, region name and table name, etc.
  • WALEdit consists of a series of key-value pairs (Key Value ), the update operations for all columns (that is, all Key Values) on a row are contained in the same WALEdit object, which is mainly to achieve atomicity when writing multiple columns in a row.
  • sequenceid is an auto-incrementing sequence number of a storage level, which is relied upon for data recovery in a region and clearing of log expiration.
  • sequenceid refers to the LSN value of a transaction log.
  • each Region shares a log file.
  • a single Region is stored in chronological order in the log, and multiple Regions may not be completely chronologically ordered.
  • Each log minimum unit is no longer composed of HLogkey and WALEdit, but composed of a log record (XLog Record).
  • XLog Record consists of two parts, the first part is the header information, the size is fixed (for example, 24B (Bytes, bytes), the corresponding structure is XLogRecord; the second part is the log record data (XLog Record data).
  • XLog Record is divided into the following three categories according to the stored data content.
  • Type 1 Record for backup block (backup block record): block (block) that stores full-write-page (full-write page), this type of record is to solve the problem of partial page writing.
  • the data page is modified for the first time after the checkpoint (detection point) is completed, and the entire page is written when the change is recorded and written to the transaction log file (the corresponding initialization parameters need to be set, and the default is open).
  • Category 2 Record for tuple data block: used to store tuple changes in the page.
  • Category 3 Record for Checkpoint: When checkpoint occurs, record checkpoint information (including Redo point) in the transaction log file.
  • XLog Record data is where the actual data is stored and consists of the following four parts.
  • each XLogRecordBlockHeader corresponds to a block data (block data). If the BKPBLOCK_HAS_IMAGE flag is set, the XLogRecordBlockHeader structure is followed by the XLogRecordBlockImageHeader structure; if the BKPBLOCK_HAS_HOLE&BKPIMAGE_IS_COMPRESSED flag is set, the XLogRecordBlockHeader structure is followed by the XLogRecordBlockCompressHeader structure; if the BKPBLOCK_SAME_REL flag is not set, the XLogRecordBlockHeader structure is followed by RelFileNode. Exemplarily, BlockNumber (block number) may also be followed by the XLogRecordBlockHeader structure.
  • Part 3 block data: full-write-page data and tuple data.
  • full-write-page data if compression is enabled, the data is compressed and stored. After compression, the metadata related to the page is stored in the XLogRecordBlockCompressHeader (log record block compression header)
  • Part 4 main data: record log data such as checkpoint.
  • an XLog Record is defined as follows:
  • the data storage format is the segment page data storage format
  • ES node devices
  • Class conflicts cause data overwrite problems.
  • the transaction processing mechanism runs concurrent and parallel transaction execution, and there is no data exception at the transaction level.
  • there is a choice of whether to select the transaction log flushed by ES-1 or ES-2 which leads to the problem that modifications to the same physical page cannot coexist.
  • a segment page list (list) is added to mark the address of the page in this segment log (eg file number, table space number, relative offset in the file) etc.) and the id of the transaction that is performing the write operation on each page.
  • the verification device checks whether the pages in the list of transactions submitted by all concurrent transactions to the shared storage system overlap. If so, it indicates that concurrent transactions have written the same page (if If the same data item is written, in the transaction verification stage, the existence of transaction conflict is detected and the conflict has been resolved through rollback), and different data items on the same page are written.
  • the subject performing the above-mentioned page-level conflict verification is a verification device in a distributed database system.
  • the verification device may be located on the same physical machine as any node device, or may be an independent device, which is not limited in this embodiment of the present application.
  • the distributed database system can support distributed transactions and achieve multi-reading of global consistency, and can take into account the performance through the decentralized transaction processing technology. It has good global consistency multi-read and consistent multi-write capabilities with transaction attributes.
  • a distributed database system based on a share-disk architecture such as the HBase database system under the well-known NoSQL (Non-relational SQL, generally refers to a non-relational database)
  • NoSQL Non-relational SQL, generally refers to a non-relational database
  • the coordinating node device for coordinating and processing the target transaction is determined according to the transaction allocation index corresponding to each node device, and the transaction allocation process does not need to consider the data items involved in the transaction, nor the distribution of the data items.
  • each node device can coordinate and process transactions as a decentralized device, so that transactions can be processed across nodes, which is conducive to improving transaction processing efficiency, and the reliability of transaction processing is high, which is conducive to improving the database system. system performance.
  • An embodiment of the present application provides a transaction processing system, where the transaction processing system includes a coordination node device and a data node device, and the coordination node device is a device used for coordinating processing of a target transaction among at least two node devices sharing the same storage system.
  • the node device, the coordination node device is determined according to the transaction allocation indicators corresponding to the at least two node devices respectively, and the data node device is the node device used to participate in processing the target transaction among the at least two node devices;
  • the coordination node device is used to obtain transaction information of the target transaction; based on the transaction information of the target transaction, a data read request is sent to the data node device;
  • the data node device is used to obtain the data read result based on the data read request sent by the coordination node device, and return the data read result to the coordination node device;
  • the coordination node device is further configured to send a transaction verification request and a local write set to the data node device in response to the data read result returned by the data node device meeting the transaction verification condition;
  • the data node device is also used for obtaining the verification result of the target transaction based on the transaction verification request and the local write set sent by the coordinating node device, and returning the verification result of the target transaction to the coordinating node device;
  • the coordination node device is also used to determine the processing instruction of the target transaction based on the verification result of the target transaction returned by the data node device, and send the processing instruction to the data node device, where the processing instruction is a commit instruction or a rollback instruction;
  • the data node device is further configured to execute the processing command in response to receiving the processing command of the target transaction sent by the coordinating node device.
  • the data read result carries a second logical life cycle
  • the second logical life cycle is determined by the data node device according to the first logical life cycle of the target transaction carried in the data read request.
  • the first logical life cycle It consists of the lower bound of timestamp and the upper bound of timestamp; the coordinating node device is also used to take the maximum value of the lower bound of timestamp of the first logical life cycle and the lower bound of timestamp of the second logical life cycle as the third logical life of the target transaction
  • the lower bound of the timestamp of the cycle; the minimum value of the upper bound of the timestamp of the first logical life cycle and the upper bound of the timestamp of the second logical life cycle is taken as the upper bound of the timestamp of the third logical life cycle of the target transaction;
  • the third logical life cycle is valid, and a transaction verification request carrying the third logical life cycle is sent to the data node device.
  • the third logical life cycle is valid to indicate that the lower bound of the times
  • the number of data node devices is at least two
  • the coordination node device is further configured to respond to the presence of a verification indicating that the verification fails in the at least two verification results returned by the at least two data node devices
  • the rollback instruction is used as the processing instruction of the target transaction; in response to at least two verification results returned by the at least two data node devices indicating that the verification is passed, the intersection of the logical life cycles carried by the at least two verification results is taken as the target logical life cycle. cycle; in response to the target logical life cycle being valid, the commit instruction is used as the processing command of the target transaction; in response to the target logical life cycle being invalid, the rollback instruction is used as the processing command of the target transaction.
  • the data read request carries the first logical life cycle of the target transaction, and the first logical life cycle is composed of a timestamp lower bound and a timestamp upper bound;
  • the data node device is used for, based on the first logical life cycle, , determine the visible version data of the data item to be read indicated by the data read request; based on the creation timestamp of the visible version data and the first logical life cycle, determine the second logical life cycle of the target transaction; will carry the second logical life cycle and the result of the visible version data as the data read result.
  • the transaction verification request carries a third logical life cycle of the target transaction
  • the third logical life cycle is an effective logical life cycle determined by the coordinating node device based on the first logical life cycle and the second logical life cycle
  • the data node device is further configured to use the maximum value among the timestamp lower bound of the third logical life cycle and the timestamp lower bound of the second logical life cycle as the timestamp lower bound of the fourth logical life cycle of the target transaction;
  • the minimum value of the upper bound of the timestamp of the period and the upper bound of the timestamp of the second logical life cycle is used as the upper bound of the timestamp of the fourth logical life cycle of the target transaction; in response to the fourth logical life cycle being valid, based on the corresponding local write set
  • the read transaction related information and the fourth logical life cycle of each data item to be written determine the fifth logical life cycle of the target transaction; in response to the fifth logical life cycle being valid, the verification result used to indicate that the verification passed is used
  • the read transaction related information of a data item to be written includes a maximum read transaction timestamp of the data item to be written, and a maximum read transaction timestamp of the data item to be written is used to indicate the read The maximum value among the logical commit timestamps of each read transaction that has passed a data item to be written; the data node device is also used to determine the target based on the maximum read transaction timestamp of each data item to be written and the fourth logical life cycle The fifth logical life cycle of the transaction, where the lower bound of the timestamp of the fifth logical life cycle is greater than the maximum value among the maximum read transaction timestamps of each data item to be written.
  • the read transaction-related information of a data item to be written includes an end timestamp of a target read transaction of the data item to be written, and the target read transaction is a read transaction that has passed the local verification or is in the commit phase,
  • the end timestamp of the target read transaction is the upper bound of the timestamp of the logical life cycle of the target read transaction; the data node device is also used for the end timestamp and the fourth logical life cycle of the target read transaction based on each data item to be written.
  • a fifth logical life cycle of the target transaction is determined, where the lower bound of the timestamp of the fifth logical life cycle is greater than the maximum value among the end timestamps of the target read transactions of each data item to be written.
  • an embodiment of the present application provides a transaction processing apparatus, the apparatus includes:
  • the first determining unit 501 is configured to, in response to an allocation request of a target transaction, determine transaction allocation indicators corresponding to at least two node devices sharing the same storage system, respectively, and the transaction allocation indicator corresponding to one node device is used to indicate that the one node The matching degree of the device to allocate new transactions;
  • the second determining unit 502 is configured to determine the coordinating node device of the target transaction in the at least two node devices based on the transaction allocation indicators corresponding to the at least two node devices respectively, and the coordinating node device performs coordination processing on the target transaction.
  • the first determining unit 501 is configured to determine a transaction allocation mode, where the transaction allocation mode includes any one of allocation based on the busyness of the transaction, allocation based on the busyness of the device, and allocation based on the mixed busyness ; According to the determination method indicated by the transaction distribution mode, determine the transaction distribution indicators corresponding to at least two node devices respectively.
  • the transaction allocation mode includes allocation based on mixed busyness
  • the first determining unit 501 is further configured to perform the allocation based on the transaction processing quantity of the first node device, the device resource utilization rate of the first node device, the transaction processing The quantity weight, the equipment resource utilization rate weight, and the weight adjustment parameter determine the transaction allocation index corresponding to the first node device, and the first node device is any one of the at least two node devices.
  • the device further includes:
  • the sending unit is used to send the device identification information of the coordinating node device to the terminal that initiates the allocation request, and the terminal is used to send the transaction information of the target transaction to the coordinating node device according to the device identification information of the coordinating node device.
  • the transaction information coordinates and processes the target transaction.
  • the distributed database system supports a key-value data storage format and a segment page data storage format.
  • the coordinating node device for coordinating and processing the target transaction is determined according to the transaction allocation index corresponding to each node device, and the transaction allocation process does not need to consider the data items involved in the transaction, nor the distribution of the data items.
  • each node device can coordinate and process transactions as a decentralized device, so that transactions can be processed across nodes, which is conducive to improving transaction processing efficiency, and the reliability of transaction processing is high, which is conducive to improving the database system. system performance.
  • an embodiment of the present application provides a transaction processing device, the device includes:
  • the first sending unit 602 is configured to send a data read request to a data node device based on the transaction information of the target transaction, where the data node device is a node device used to process the target transaction among at least two node devices sharing the same storage system ;
  • the second sending unit 603 is configured to send a transaction verification request and a local write set to the data node device in response to the data read result returned by the data node device meeting the transaction verification condition;
  • a determination unit 604 configured to determine the processing instruction of the target transaction based on the verification result of the target transaction returned by the data node device;
  • the third sending unit 605 is configured to send a processing instruction to the data node device, where the processing instruction is a commit instruction or a rollback instruction, and the data node device is configured to execute the processing instruction.
  • the data read result carries a second logical life cycle
  • the second logical life cycle is determined by the data node device according to the first logical life cycle of the target transaction carried in the data read request.
  • the first logical life cycle It is composed of a timestamp lower bound and a timestamp upper bound;
  • the second sending unit 603 is configured to use the maximum value in the timestamp lower bound of the first logical life cycle and the timestamp lower bound of the second logical life cycle as the third logic of the target transaction
  • the lower bound of the timestamp of the life cycle; the minimum value of the upper bound of the timestamp of the first logical life cycle and the upper bound of the timestamp of the second logical life cycle is taken as the upper bound of the timestamp of the third logical life cycle of the target transaction;
  • the third logical life cycle is valid, and a transaction verification request carrying the third logical life cycle is sent to the data node device.
  • the third logical life cycle is valid to indicate that the lower bound of the timestamp
  • the number of data node devices is at least two
  • the determining unit 604 is configured to respond to the presence of a verification result indicating that the verification fails in the at least two verification results returned by the at least two data node devices , take the rollback instruction as the processing instruction of the target transaction; in response to at least two verification results returned by at least two data node devices indicating that the verification is passed, take the intersection of the logical life cycles carried by the at least two verification results as the target logical life cycle ;
  • the commit instruction is used as the processing instruction of the target transaction; in response to the target logical life cycle being invalid, the rollback instruction is used as the processing instruction of the target transaction.
  • the coordinating node device for coordinating and processing the target transaction is determined according to the transaction allocation index corresponding to each node device, and the transaction allocation process does not need to consider the data items involved in the transaction, nor the distribution of the data items.
  • each node device can coordinate and process transactions as a decentralized device, so that transactions can be processed across nodes, which is conducive to improving transaction processing efficiency, and the reliability of transaction processing is high, which is conducive to improving the database system. system performance.
  • an embodiment of the present application provides a transaction processing apparatus, the apparatus includes:
  • the first obtaining unit 701 is configured to obtain a data reading result based on a data reading request sent by a coordinating node device, where the coordinating node device is a device used for coordinating processing of a target transaction among at least two node devices sharing the same storage system. node device, and the coordination node device is determined according to the transaction allocation indicators corresponding to at least two node devices respectively;
  • Returning unit 702 configured to return the data reading result to the coordinating node device
  • the second obtaining unit 703 is configured to obtain the verification result of the target transaction based on the transaction verification request and the local write set sent by the coordinating node device;
  • the returning unit 702 is further configured to return the verification result of the target transaction to the coordinating node device;
  • the execution unit 704 is configured to execute the processing instruction in response to receiving the processing instruction of the target transaction sent by the coordinating node device, and the processing instruction is a commit instruction or a rollback instruction.
  • the data read request carries the first logical life cycle of the target transaction, and the first logical life cycle is composed of a lower timestamp bound and an upper timestamp bound;
  • the first obtaining unit 701 is configured to, based on the first logic Life cycle, determine the visible version data of the data item to be read indicated by the data read request; determine the second logical life cycle of the target transaction based on the creation timestamp of the visible version data and the first logical life cycle; will carry the second logical life cycle
  • the results of the lifetime and visible version data are read as a result of the data.
  • the transaction verification request carries a third logical life cycle of the target transaction, and the third logical life cycle is an effective logical life cycle determined by the coordinating node device based on the first logical life cycle and the second logical life cycle;
  • the second obtaining unit 703 is configured to take the maximum value of the timestamp lower bound of the third logical life cycle and the timestamp lower bound of the second logical life cycle as the timestamp lower bound of the fourth logical life cycle of the target transaction;
  • the minimum value of the upper bound of the timestamp of the life cycle and the upper bound of the timestamp of the second logical life cycle is used as the upper bound of the timestamp of the fourth logical life cycle of the target transaction; in response to the fourth logical life cycle being valid, based on the local write set
  • the read transaction related information and the fourth logical life cycle of the corresponding data items to be written determine the fifth logical life cycle of the target transaction; in response to the fifth logical life cycle being valid, the verification result used to indicate that the verification is passed
  • the read transaction related information of a data item to be written includes the maximum read transaction timestamp of the data item to be written, and the maximum read transaction timestamp of the data item to be written is used to indicate The maximum value among the logical commit timestamps of each read transaction that has read the one data item to be written; the second obtaining unit 703 is also configured to be based on the maximum read transaction timestamp of each data item to be written and the fourth logic For the life cycle, the fifth logical life cycle of the target transaction is determined, and the lower bound of the timestamp of the fifth logical life cycle is greater than the maximum value among the maximum read transaction timestamps of each data item to be written.
  • the read transaction-related information of a data item to be written includes an end timestamp of the target read transaction of the data item to be written, and the target read transaction is a read transaction that has passed local verification or is in the commit phase , the end time stamp of the target read transaction is the upper bound of the time stamp of the logical life cycle of the target read transaction; the second acquisition unit 703 is also used for the end time stamp of the target read transaction based on each data item to be written and the fourth logic For the life cycle, the fifth logical life cycle of the target transaction is determined, and the lower bound of the timestamp of the fifth logical life cycle is greater than the maximum value of the end timestamps of the target read transactions of each data item to be written.
  • the coordinating node device for coordinating and processing the target transaction is determined according to the transaction allocation index corresponding to each node device, and the transaction allocation process does not need to consider the data items involved in the transaction, nor the distribution of the data items.
  • each node device can coordinate and process transactions as a decentralized device, so that transactions can be processed across nodes, which is conducive to improving transaction processing efficiency, and the reliability of transaction processing is high, which is conducive to improving the database system. system performance.
  • the computer device may vary greatly due to different configurations or performance, and may include one or more processors (Central Processing Units, CPU) 801 and One or more memories 802, wherein, at least one computer program is stored in the one or more memories 802, and the at least one computer program is loaded and executed by the one or more processors 801, so that the computer device realizes the above-mentioned various methods
  • the transaction processing method provided by the embodiment may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the computer device may also include other components for implementing device functions, which will not be repeated here.
  • a non-transitory computer-readable storage medium is also provided, and at least one computer program is stored in the non-transitory computer-readable storage medium, and the at least one computer program is loaded by a processor of a computer device and execute it, so that the computer can realize any one of the above-mentioned transaction processing methods.
  • the above-mentioned non-transitory computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a compact disc read (Compact Disc Read) -Only Memory, CD-ROM), magnetic tapes, floppy disks and optical data storage devices, etc.
  • ROM read-only memory
  • RAM random access memory
  • CD-ROM compact disc read
  • magnetic tapes floppy disks and optical data storage devices, etc.
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform any one of the transaction processing methods described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

事务处理方法、系统、装置、设备、存储介质及程序产品,属于数据库技术领域。方法包括:响应于目标事务的分配请求,确定至少两个节点设备分别对应的事务分配指标(201);基于至少两个节点设备分别对应的事务分配指标,在至少两个节点设备中确定目标事务的协调节点设备,由协调节点设备对目标事务进行协调处理(202)。基于此种方式,每个协调节点设备均能够作为去中心化的设备协调处理事务,使得事务能够跨节点处理,有利于提高事务的处理效率,事务处理的可靠性较高,有利于提升数据库系统的系统性能。

Description

事务处理方法、系统、装置、设备、存储介质及程序产品
本申请要求于2020年11月27日提交的申请号为202011362629.2、发明名称为“事务处理方法、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据库技术领域,特别涉及一种事务处理方法、系统、装置、设备、存储介质及程序产品。
背景技术
随着数据库技术的发展,为了能够适应大数据、云计算等业务场景,分布式数据库系统逐渐变得普及。在多种分布式数据库系统中,基于共享存储(share-disk)架构的分布式数据库系统是一种主流系统。
目前,在基于share-disk架构的分布式数据库系统中,根据数据项的分布情况对事务进行分配,将涉及某一数据项的事务分配给为该数据项服务的固定节点设备进行独立处理。基于此,很大程度上限制了事务的处理效率,事务处理的可靠性较差。
发明内容
本申请实施例提供了一种事务处理方法、系统、装置、设备、存储介质及程序产品,可用于提高事务的处理效率。
一方面,本申请实施例提供了一种事务处理方法,所述方法应用于事务分配设备上,所述事务分配设备处于分布式数据库系统中,所述分布式数据库系统中还包括共享同一个存储系统的至少两个节点设备,所述方法包括:
响应于目标事务的分配请求,确定所述至少两个节点设备分别对应的事务分配指标,一个节点设备对应的事务分配指标用于指示为所述一个节点设备分配新事务的匹配度;
基于所述至少两个节点设备分别对应的事务分配指标,在所述至少两个节点设备中确定所述目标事务的协调节点设备,由所述协调节点设备对所述目标事务进行协调处理。
还提供了一种事务处理方法,所述方法应用于协调节点设备上,所述协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定,所述方法包括:
获取所述目标事务的事务信息;
基于所述目标事务的事务信息,向数据节点设备发送数据读取请求,所述数据节点设备为所述至少两个节点设备中用于参与处理所述目标事务的节点设备;
响应于所述数据节点设备返回的数据读取结果满足事务验证条件,向所述数据节点设备发送事务验证请求和本地写集;
基于所述数据节点设备返回的所述目标事务的验证结果,确定所述目标事务的处理指令,向所述数据节点设备发送所述处理指令,所述处理指令为提交指令或者回滚指令,所述数据节点设备用于执行所述处理指令。
还提供了一种事务处理方法,所述方法应用于数据节点设备上,所述数据节点设备为共享同一个存储系统的至少两个节点设备中用于参与处理目标事务的节点设备,所述方法包括:
基于协调节点设备发送的数据读取请求,获取数据读取结果,将所述数据读取结果返回所述协调节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定;
基于所述协调节点设备发送的事务验证请求和本地写集,获取所述目标事务的验证结果,将所述目标事务的验证结果返回所述协调节点设备;
响应于接收到所述协调节点设备发送的所述目标事务的处理指令,执行所述处理指令,所述处理指令为提交指令或者回滚指令。
另一方面,提供了一种事务处理系统,所述事务处理系统包括协调节点设备和数据节点设备,所述协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定,所述数据节点设备为所述至少两个节点设备中用于参与处理所述目标事务的节点设备;
所述协调节点设备,用于获取所述目标事务的事务信息;基于所述目标事务的事务信息,向所述数据节点设备发送数据读取请求;
所述数据节点设备,用于基于所述协调节点设备发送的所述数据读取请求,获取数据读取结果,将所述数据读取结果返回所述协调节点设备;
所述协调节点设备,还用于响应于所述数据节点设备返回的所述数据读取结果满足事务验证条件,向所述数据节点设备发送事务验证请求和本地写集;
所述数据节点设备,还用于基于所述协调节点设备发送的所述事务验证请求和所述本地写集,获取所述目标事务的验证结果,将所述目标事务的验证结果返回所述协调节点设备;
所述协调节点设备,还用于基于所述数据节点设备返回的所述目标事务的验证结果,确定所述目标事务的处理指令,向所述数据节点设备发送所述处理指令,所述处理指令为提交指令或者回滚指令;
所述数据节点设备,还用于响应于接收到所述协调节点设备发送的所述目标事务的处理指令,执行所述处理指令。
另一方面,提供了一种事务处理装置,所述装置包括:
第一确定单元,用于响应于目标事务的分配请求,确定共享同一个存储系统的至少两个节点设备分别对应的事务分配指标,一个节点设备对应的事务分配指标用于指示为所述一个节点设备分配新事务的匹配度;
第二确定单元,用于基于所述至少两个节点设备分别对应的事务分配指标,在所述至少两个节点设备中确定所述目标事务的协调节点设备,由所述协调节点设备对所述目标事务进行协调处理。
还提供了一种事务处理装置,所述装置包括:
获取单元,用于获取目标事务的事务信息;
第一发送单元,用于基于所述目标事务的事务信息,向数据节点设备发送数据读取请求,所述数据节点设备为共享同一个存储系统的至少两个节点设备中用于参与处理所述目标事务的节点设备;
第二发送单元,用于响应于所述数据节点设备返回的数据读取结果满足事务验证条件,向所述数据节点设备发送事务验证请求和本地写集;
确定单元,用于基于所述数据节点设备返回的所述目标事务的验证结果,确定所述目标事务的处理指令;
第三发送单元,用于向所述数据节点设备发送所述处理指令,所述处理指令为提交指令或者回滚指令,所述数据节点设备用于执行所述处理指令。
还提供了一种事务处理装置,所述装置包括:
第一获取单元,用于基于协调节点设备发送的数据读取请求,获取数据读取结果,所述协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定;
返回单元,用于将所述数据读取结果返回所述协调节点设备;
第二获取单元,用于基于所述协调节点设备发送的事务验证请求和本地写集,获取所述目标事务的验证结果;
所述返回单元,还用于将所述目标事务的验证结果返回所述协调节点设备;
执行单元,用于响应于接收到所述协调节点设备发送的所述目标事务的处理指令,执行所述处理指令,所述处理指令为提交指令或者回滚指令。
另一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述处理器加载并执行,以使所述计算机设备实现上述任一所述的事务处理方法。
另一方面,还提供了一种非临时性计算机可读存储介质,所述非临时性计算机可读存储介质中存储有至少一条计算机程序,所述至少一条计算机程序由处理器加载并执行,以使计算机实现上述任一所述的事务处理方法。
另一方面,还提供了一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机指令,所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行上述任一所述的事务处理方法。
在本申请实施例中,根据各个节点设备分别对应的事务分配指标确定用于协调处理目标事务的协调节点设备,事务的分配过程无需考虑事务涉及的数据项,也无需考虑数据项的分布情况。基于此种方式,每个节点设备均能够作为去中心化的设备协调处理事务,使得事务能够跨节点处理,有利于提高事务的处理效率,事务处理的可靠性较高,有利于提升数据库系统的系统性能。
附图说明
图1是本申请实施例提供的一种事务处理方法的实施环境的示意图;
图2是本申请实施例提供的一种事务处理方法的流程图;
图3是本申请实施例提供的一种事务日志的格式示意图;
图4是本申请实施例提供的一种事务日志的格式示意图;
图5是本申请实施例提供的一种事务处理装置的示意图;
图6是本申请实施例提供的一种事务处理装置的示意图;
图7是本申请实施例提供的一种事务处理装置的示意图;
图8是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
需要说明的是,本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可 以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
在一些实施例中,本申请实施例所涉及的分布式数据库系统,是一种基于共享存储(share-disk)架构的分布式数据库系统,在基于共享存储架构的分布式数据库系统中包括至少两个节点设备,该至少两个节点设备具有自己本地的内存区域,通过网络通讯机制直接访问同一个存储系统,即至少两个节点设备共享同一个存储系统。例如,共享同一个HDFS(Hadoop Distributed File System,分布式文件系统)。至少两个节点设备共享的存储系统中可以存储有多个数据表,每个数据表可以用于存储一个或多个数据项。
从逻辑的角度出发,可以将分布式数据库系统中的节点设备划分为两种角色:协调节点设备和数据节点设备,其中,协调节点设备主要负责生产、分发处理计划,以及协调分布式事务,而数据节点设备则主要负责接收协调节点设备发来的处理计划,执行相应的事务并向协调节点设备返回事务涉及的相关数据。
在分布式数据库系统中,最小的操作执行单元为事务,依据事务是否需要对多个数据节点设备上的数据项进行操作,事务可以被划分为分布式事务和本地事务两种,针对这两种不同的事务,可以分别采取不同的执行流程,以尽量减少网络通信开销,提升事务处理效率。其中,分布式事务表示事务需要跨多个数据节点设备执行读写操作,也即事务需要对多个数据节点设备上的数据项进行操作,例如,事务T需要操作数据节点设备RM1、RM2、RM3上的数据项,那么该事务T为一个分布式事务。本地事务表示事务只需要对单个数据节点设备上的数据项进行操作,例如,事务T只需要操作RM1上的数据项,则该事务T为一个本地事务。
图1是本申请实施例提供的一种事务处理方法的实施环境的示意图。参见图1,本申请实施例可以应用于基于share-disk框架的分布式数据库系统中,该分布式数据库系统中可以包括网关服务器101、事务分配设备102、分布式存储集群103以及全局时间戳生成集群104。分布式存储集群103中包括m(m为不小于2的整数)个节点设备,该m个节点设备共享同一个存储系统。
网关服务器101用于接收外部的读写请求,并将读写请求对应的读写事务分发至事务分配设备102或者分布式存储集群103。比如,用户在登录终端上的应用客户端之后,触发应用客户端生成读写请求,调用分布式数据库系统提供的API(Application Programming Interface,应用程序编程接口)将该读写请求对应的读写事务发送至网关服务器101。
在一些实施例中,网关服务器101可以与分布式存储集群103中的任一个节点设备合并在同一个物理机上,也即是,让某个节点设备充当网关服务器101。
在一些实施例中,应用客户端所在的终端能够与分布式数据库系统中的事务分配设备102和分布式存储集群103直接建立通信连接,此种情况下,分布式数据库系统中可以不存在网关服务器101。
事务分配设备102用于为新事务分配合适的节点设备作为协调节点设备。在示例性实施例中,事务分配设备处于分布式协调系统(如ZooKeeper)中。分布式协调系统可以用于对网关服务器101、分布式存储集群103和全局时间戳生成集群104中的至少一项进行管理。可选地,技术人员可以通过终端上的调度器(scheduler)访问该分布式协调系统,从而基于前端的调度器来控制后端的分布式协调系统,实现对各个集群或服务器的管理。例如,技术人员可以通过调度器来控制ZooKeeper将某一个节点设备从分布式存储集群103中删除,也即是使得某一个节点设备失效。
分布式存储集群103可以包括数据节点设备和协调节点设备,每个协调节点设备可以对应于至少一个数据节点设备,数据节点设备与协调节点设备的划分是针对不同事务而言的。 以某一分布式事务为例,分布式事务的发起节点设备可以称为协调节点设备,分布式事务所涉及的其他节点设备称为数据节点设备。数据节点设备或协调节点设备的数量可以是一个或多个,本申请实施例不对分布式存储集群103中的数据节点设备或协调节点设备的数量进行具体限定。
由于本申请实施例所提供的分布式数据库系统中缺乏全局事务管理器,因此在该系统中可以采用XA(eXtended Architecture,X/Open组织分布式事务规范)/2PC(Two-Phase Commit,二阶段提交)技术来支持跨节点的事务(分布式事务),保证跨节点写操作时数据的原子性和一致性,此时,协调节点设备用于充当2PC算法中的协调者,而该协调节点设备所对应的各个数据节点设备用于充当2PC算法中的参与者。
每个数据节点设备或者协调节点设备可以是单机设备,也可以采用主备结构(也即是为一主多备集群),如图1所示,以节点设备(数据节点设备或协调节点设备)为一主两备集群为例进行示意,每个节点设备中包括一个主机和两个备机,可选地,每个主机或备机都对应配置有代理(agent)设备,代理设备可以与主机或备机是物理独立的,当然,代理设备还可以作为主机或备机上的一个代理模块。以节点设备1为例,节点设备1包括一个主数据库及代理设备(主database+agent,简称主DB+agent),此外还包括两个备数据库及代理设备(备database+agent,简称备DB+agent)。主数据库即为上述所述的主机,备数据库即为上述所述的备机。
全局时间戳生成集群104用于生成分布式事务的全局提交时间戳(Global Timestamp,Gts),该分布式事务可以是指涉及到多个数据节点设备的事务,例如分布式读事务可以涉及到对多个数据节点设备上存储的数据的读取,又例如,分布式写事务可以涉及到对多个数据节点设备上的数据写入。全局时间戳生成集群104在逻辑上可以视为一个单点,但在一些实施例中可以通过一主三从的架构来提供具有更高可用性的服务,采用集群的形式来实现该全局提交时间戳的生成,可以防止单点故障,也就规避了单点瓶颈问题。
可选地,全局提交时间戳是一个在分布式数据库系统中全局唯一且单调递增的时间戳标识,能够用于标志每个事务全局提交的顺序,以此来反映出事务之间在真实时间上的先后关系(事务的全序关系),全局提交时间戳可以采用物理时钟、逻辑时钟或者混合物理时钟中至少一项,本申请实施例不对全局提交时间戳的类型进行具体限定。
在一些实施例中,该全局时间戳生成集群104可以是物理独立的,也可以和分布式协调系统(例如ZooKeeper)合并到一起。
上述图1仅是提供了一种轻量级的事务处理的架构图,是一种基于share-disk架构的分布式数据库系统的示例性描述。在一些实施例中,上述网关服务器101、事务分配设备102、分布式存储集群103以及全局时间戳生成集群104所构成的分布式数据库系统,可以视为一种向用户终端提供数据服务的服务器,该服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。可选地,上述用户终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
基于上述图1所示的实施环境,本申请实施例提供一种事务处理方法。如图2所示,本申请实施例提供的方法包括如下步骤201至步骤209。
在步骤201中,事务分配设备响应于目标事务的分配请求,确定至少两个节点设备分别对应的事务分配指标,一个节点设备对应的事务分配指标用于指示为该一个节点设备分配新事务的匹配度。
事务分配设备和至少两个节点设备均处于分布式数据库系统中,且至少两个节点设备共 享同一个存储系统。本申请实施例对分布式数据系统的具体结构不加以限定,只要包括事务分配设备和共享同一个存储系统的至少两个节点设备即可。
目标事务是指待处理的事务,目标事务可以是分布式事务,也可以是本地事务,本申请实施例对此不加以限定。目标事务的分配请求用于指示为目标事务分配合适的节点设备作为协调节点设备,以由分配的协调节点设备协调处理该目标事务。
目标事务的分配请求由终端发起,终端发起的目标事务的分配请求由终端直接发送给事务分配设备,或者由网关服务器转发给事务分配设备,本申请实施例对此不加以限定。终端可以是用户所对应的任一电子设备,包括但不限于:智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱或者智能手表中至少一项,本申请实施例不对终端的类型进行具体限定。可选地,在终端上安装有应用客户端,该应用客户端可以是能够提供数据服务的任一客户端,例如,该应用客户端可以是支付应用客户端、外卖应用客户端、打车应用客户端或者社交应用客户端中至少一项,本申请实施例不对应用客户端的类型进行具体限定。
至少两个节点设备是指分布式数据库系统中能够作为去中心化的节点设备协调处理事务的节点设备,每个节点设备均能够用于通过去中心化的算法协调处理分布式事务。
事务分配设备在接收到目标事务的分配请求后,需要为目标事务分配合适的节点设备作为协调节点设备,以保证事务处理的效率。在为目标事务分配合适的节点设备作为协调节点设备的过程中,事务分配设备先确定至少两个节点设备分别对应的事务分配指标。一个节点设备对应的事务分配指标用于指示为该一个节点设备分配新事务的匹配度。为一个节点设备分配新事务的匹配度越高,说明越适合为该一个节点设备分配新事务。
事务分配指标是从事务的角度确定的用于衡量是否适合为某一节点设备分配新事务的指标。在一种可能实现方式中,确定至少两个节点设备分别对应的事务分配指标的过程包括以下步骤2011和步骤2012。
步骤2011:确定事务分配模式,事务分配模式包括基于事务繁忙程度进行分配、基于设备繁忙程度进行分配和基于混合繁忙程度进行分配中的任一种。
事务分配模式用于指示确定节点设备对应的事务分配指标的确定方式。在一些实施例中,事务分配模式由开发人员设置并上传至事务分配设备。需要说明的是,不同时期采用的事务分配模式可能不同。在此步骤2011中确定的是在接收到目标事务的分配请求时应该采用的事务分配模式。
事务分配模式包括基于事务繁忙程度进行分配、基于设备繁忙程度进行分配和基于混合繁忙程度进行分配中的任一种。其中,基于事务繁忙程度进行分配的模式是指从考虑节点设备的事务处理数量的角度确定事务分配指标,节点设备的事务处理数量能够反映出节点设备的事务繁忙程度。基于设备繁忙程度进行分配的模式是指从考虑节点设备的设备资源使用率的角度确定事务分配指标,节点设备的设备资源使用率能够反映出节点设备的设备繁忙程度。基于混合繁忙程度进行分配的模式是指从综合考虑节点设备的事务处理数量以及节点设备的设备资源使用率的角度确定事务分配指标,节点设备的事务处理数量以及节点设备的设备资源使用率能够反映出节点设备的混合繁忙程度。
步骤2012:根据事务分配模式指示的确定方式,确定至少两个节点设备分别对应的事务分配指标。
不同的事务分配模式指示的确定方式不同,在确定事务分配模式后,根据事务分配模式指示的确定方式,确定至少两个节点设备分别对应的事务分配指标。接下来分别介绍不同的事务分配模式下,确定至少两个节点设备中的第一节点设备对应的事务分配指标的方式。其中,第一节点设备为至少两个节点设备中的任意一个节点设备。
在一些实施例中,事务分配模式为基于事务繁忙程度进行分配。在此种情况下,根据事务分配模式指示的确定方式,确定第一节点设备对应的事务分配指标的方式为:基于该第一节点设备的事务处理数量,确定该第一节点设备对应的事务分配指标。
第一节点设备的事务处理数量是指第一节点设备在单位时间内需要处理的事务的数量。 需要说明的是,此处的需要处理的事务是指已经分配给该第一节点设备进行处理的事务。第一节点设备在单位时间内需要处理的事务的数量越多,说明越不适合为该第一节点设备分配新事务。在示例性实施例中,第一节点设备的事务处理数量可以由该第一节点设备反馈给事务分配设备,也可以由事务分配设备根据事务分配情况自行确定,本申请实施例对此不加以限定。
本申请实施例对事务分配指标的表现形式不加以限定,示例性地,事务分配指标的表现形式为繁忙级别或数值。
示例性地,当事务分配指标的表现形式为繁忙级别时,基于该第一节点设备的事务处理数量确定该第一节点设备对应的事务分配指标的方式为:为不同的繁忙级别设置不同的事务处理数量范围,将该第一节点设备的事务处理数量所处的事务处理范围对应的繁忙级别作为该第一节点设备对应的繁忙级别。示例性地,繁忙级别包括“繁忙”、“部分繁忙”和“清闲”,“繁忙”对应的事务处理数量范围为[10,+∞),“部分繁忙”对应的事务处理数量范围为[3,10),“清闲”对应的事务处理数量范围为[0,3),若第一节点设备的事务处理数量为2,则将“清闲”作为该第一节点设备对应的事务分配指标。第一节点设备对应的事务分配指标越接近“清闲”,说明为该第一节点设备分配新事务的匹配度越高。
示例性地,当事务分配指标的表现形式为数值时,基于该第一节点设备的事务处理数量确定该第一节点设备对应的事务分配指标的方式为:对该第一节点设备的事务处理数量进行数值化处理,将数值化处理后得到的数值作为该第一节点设备对应的事务分配指标。对事务处理数量进行数值化处理的方式根据经验设置,或者根据应用场景灵活调整,本申请实施例对此不加以限定。示例性地,对事务处理数量进行数值化处理的方式为:计算事务处理数量和参考权重的乘积。在此种方式下,事务处理数量越多,数值化处理后得到的数值越大。第一节点设备对应的事务分配指标越小,说明为该第一节点设备分配新事务的匹配度越高。
在一些实施例中,事务分配模式为基于设备繁忙程度进行分配。在此种情况下,根据事务分配模式指示的确定方式,确定第一节点设备对应的事务分配指标的方式为:基于该第一节点设备的设备资源使用率确定该第一节点设备对应的事务分配指标。
第一节点设备的设备资源使用率是指第一节点设备已经使用的设备资源占用总设备资源的比率,示例性地,设备资源是指CPU(Central Processing Unit,中央处理器)资源。第一节点设备的设备资源使用率越高,说明越不适合为该第一节点设备分配新事务。需要说明的是,第一节点设备的设备资源使用率可以由该第一节点设备实时监控并反馈给事务分配设备,也可以由事务分配设备自行监控得到,本申请实施例对此不加以限定。
基于该第一节点设备的设备资源使用率确定该第一节点设备对应的事务分配指标的方式参见基于该第一节点设备的事务处理数量确定该第一节点设备对应的事务分配指标的方式,此处不再赘述。
在一些实施例中,事务分配模式为基于混合繁忙程度进行分配。在此种情况下,根据事务分配模式指示的确定方式,确定第一节点设备对应的事务分配指标的方式为:基于第一节点设备的事务处理数量、第一节点设备的设备资源使用率、事务处理数量权重、设备资源使用率权重以及权重调节参数,确定第一节点设备对应的事务分配指标。
在示例性实施例中,事务处理数量权重和设备资源使用率权重用于调节事务处理数量和设备资源使用率这两个参数的百分占比,可实测得到,示例性地,事务处理数量权重和设备资源使用率权重的默认值均为1。权重调节参数是指设备资源使用率和事务处理数量的相对占比因子,用以调节设备资源使用率和事务处理数量的权重分配,可实测得到,示例性地,权重调节参数的默认值为0.33。
示例性地,利用p 1表示设备资源使用率权重,利用p 2表示事务处理数量权重,利用w表示权重调节参数,则第一节点设备对应的事务分配指标Q可以表示为:Q=p 1×设备资源使用率+p 2×w×事务处理数量。
在一些实施例中,在基于混合繁忙程度进行分配的模式中,除了综合考虑事务处理数量 和设备资源使用率外,还可以考虑其他方面的因素,如,需要处理的事务中长事务的数量等。在此种情况下,第一节点设备对应的事务分配指标Q可以表示为:Q=p 1×设备资源使用率+p 2×w×事务处理数量+p 3×其他因素。其中,p 3表示其他因素权重,p 3可根据其他因素的类型实测得到,示例性地,p 3的默认值为1。第一节点设备对应的事务分配指标Q越小,说明为该第一节点设备分配新事务的匹配度越高。
需要说明的是,以上所述仅从第一节点设备的角度介绍了确定第一节点设备对应的事务分配指标的过程,根据上述方式能够确定分布式数据库系统中的至少两个节点设备分别对应的事务分配指标,进而执行步骤202。
在步骤202中,事务分配设备基于至少两个节点设备分别对应的事务分配指标,在至少两个节点设备中确定目标事务的协调节点设备,由协调节点设备对目标事务进行协调处理。
目标事务的协调节点设备是指至少两个节点设备中适合分配新事务的节点设备。目标事务的协调节点设备用于对目标事务进行协调处理,也就是说,目标事务的协调节点设备是指目标事务的协调者。示例性地,对目标事务进行协调处理的过程是指在分布式数据库系统中发起目标事务,然后组织目标事务的数据节点设备共同处理该目标事务的过程。目标事务的数据节点设备是指至少两个节点设备中用于参与处理目标事务的节点设备,也就是说,目标事务的数据节点设备是指目标事务的参与者。
需要说明的是,本申请实施例中提到的协调节点设备和数据节点设备均是针对目标事务而言的,对不同的事务而言,协调节点设备或者数据节点设备并非是固定不变的,也即是说,同一节点设备有可能对一些事务而言属于协调节点设备,对另一些事务而言属于数据节点设备。
基于至少两个节点设备分别对应的事务分配指标,在至少两个节点设备中确定目标事务的协调节点设备的方式根据事务分配指标的表现形式的不同有所不同,本申请实施例对此不加以限定,只要能够保证协调节点设备为当前适合分配新事务的节点设备即可。
在一些实施例中,事务分配指标的表现形式为繁忙级别,繁忙级别分别为“繁忙”、“部分繁忙”和“清闲”。此种情况下,基于至少两个节点设备分别对应的事务分配指标,在至少两个节点设备中确定目标事务的协调节点设备的方式为:将至少两个节点设备中对应的事务分配指标为“清闲”的节点设备作为备选节点设备,在备选节点设备中任选一个节点设备作为目标事务的协调节点设备。
示例性地,若不存在对应的事务分配指标为“清闲”的节点设备,则将至少两个节点设备中对应的事务分配指标为“部分繁忙”的节点设备作为备选节点设备,进而在备选节点设备中任选一个节点设备作为目标事务的协调节点设备。示例性地,若至少两个节点设备对应的事务分配指标均为“繁忙”,则暂停确定目标事务的协调节点设备,等待参考时长后重新确定至少两个节点设备分别对应的事务分配指标,进而重新确定目标事务的协调节点设备。参考时长根据经验设置,例如,参考时长为实测的完成一个事务的平均时长。
在一些实施例中,事务分配指标的表现形式为数值,且一个节点设备对应的事务分配指标越小,说明为该一个节点设备分配新事务的匹配度越高。在此种情况下,基于至少两个节点设备分别对应的事务分配指标,在至少两个节点设备中确定目标事务的协调节点设备的方式为:将至少两个节点设备中对应前s(s为不小于1的整数)小事务分配指标的节点设备作为备选节点设备,在备选节点设备中任选一个节点设备作为目标事务的协调节点设备。s的取值根据经验设置,或者根据至少两个节点设备的总数量灵活调整,本申请实施例对此不加以限定,例如,s的取值为1,或者s的取值为3等。
需要说明的是,以上所述仅为基于至少两个节点设备分别对应的事务分配指标,在至少两个节点设备中确定目标事务的协调节点设备的方式的示例性描述,本申请实施例并不局限于此。示例性地,对于事务分配指标的表现形式为数值,且一个节点设备对应的事务分配指标越大,说明为该一个节点设备分配新事务的匹配度越高的情况,将至少两个节点设备中对应前t(t为不小于1的整数)大事务分配指标的节点设备作为备选节点设备,在备选节点设 备中任选一个节点设备作为目标事务的协调节点设备。
基于事务分配指标确定出的目标事务的协调节点设备为至少两个节点设备中适合分配新事务的节点设备,进而将目标事务分配给该协调节点设备,由该协调节点设备对该目标事务进行协调处理,有利于保证目标事务的处理效率。
相关技术中,每个节点设备为一定数量的区域(region)服务,每个节点设备均维护该节点设备服务的区域中的数据项的分布信息,数据项的分布信息用于指示数据项的存储位置。此外,事务分配设备中维护有区域的元信息。在此种架构下,相关技术中,事务分配设备根据维护的区域的元信息,确定用于为目标事务涉及的数据项所在的区域服务的节点设备,进而由该节点设备独立处理目标事务。在此种方式下,极大程度上限制了事务的处理效率,无法支持真正的分布式事务,不具备良好的带有事务属性特征的全局一致性多读和一致性多写的能力。
本申请实施例中,节点设备不再为某些固定的区域服务,节点设备不再维护数据项的分布信息,事务分配设备也不再维护区域的元信息。示例性地,将区域的元信息分布在分布式数据库系统中的整个共享的存储系统中。基于此种改进,事务分配设备能够基于事务分配指标为目标事务分配合适的节点设备作为协调节点设备,无需考虑事务涉及的数据项,也无需考虑数据项的分布情况。节点设备能够根据事务中的SQL(Structured Query Language,结构化查询语言)语句的需求,自动从共享的存储系统中调入数据。在此种方式下,每个节点设备均能够作为去中心化的节点设备协调处理分布式事务,使得分布式数据库系统具有去中心化的分布式事务处理能力。
在一种可能实现方式中,在确定目标事务的协调节点设备之后,还包括:将协调节点设备的设备标识信息发送给发起该分配请求的终端,终端用于根据协调节点设备的设备标识信息,将目标事务的事务信息发送给协调节点设备,由协调节点设备基于事务信息对目标事务进行协调处理。
协调节点设备的设备标识信息用于唯一标识该协调节点设备,通过将协调节点设备的设备标识信息发送给发起分配请求的终端,能够使终端获知用于协调处理目标事务的协调节点设备。终端在根据设备标识信息获知用于协调处理目标事务的协调节点设备后,将目标事务的事务信息发送给协调节点设备。目标事务的事务信息用于指示目标事务的相关处理操作,示例性地,目标事务的事务信息是指SQL语句。
在一种可能实现方式中,终端直接将目标事务的事务信息发送给协调节点设备;或者,终端将目标事务的事务信息以及协调节点设备的设备标识信息发送给网关服务器,由网关服务器将目标事务的事务信息转发给协调节点设备。
协调节点设备在接收到目标事务的事务信息后,基于事务信息协调处理目标事务。协调节点设备能够解析事务信息,如,SQL语句,生成事务执行计划,然后通过与相关的数据节点设备进行通信完成目标事务的处理。
在示例性实施例中,由于目标事务的协调节点设备是根据事务分配指标确定的,不同的事务能够利用不同的节点设备进行协调处理,所以本申请实施例提供的方法能够实现去中心化的事务处理过程。在去中心化的事务处理过程中,多个分布式事务分别由多个节点设备进行协调处理。在目标事务的协调节点设备协调处理目标事务的过程中,若存在多个分布式事务,则协调节点设备与其他节点设备建立通讯,获取其他节点设备在协调处理其他分布式事务的过程中产生的数据信息,进而根据获取的数据信息做数据异常或可串行化的验证,以判断目标事务是否符合事务一致性,保证事务处理技术是正确的。在示例性实施例中,协调节点设备将其他节点设备传来的数据信息缓冲在临时数据缓冲区中,目标事务结束而被清理。
在一种可能实现方式中,对于发起目标事务的分配请求的终端而言,该终端在接入分布式数据库系统后,若产生了其他事务,每个其他事务均由该协调节点设备进行协调处理;或者,每个其他事务均由事务分配设备根据节点设备的事务分配指标实时分配的适合的协调节点设备进行协调处理,本申请实施例对此不加以限定。
在步骤203中,协调节点设备获取目标事务的事务信息。
目标事务的事务信息可以由目标事务的创建终端直接发送给协调节点设备,也可以由网关服务器转发给协调节点设备,本申请实施例对此不加以限定。示例性地,目标事务的事务信息是指用于实现目标事务的SQL语句。
在一种可能实现方式中,协调节点设备在获取目标事务的事务信息后,对目标事务进行初始化。对目标事务进行初始化的阶段可认为是建立事务的快照阶段。此阶段能够建立全局的一致性快照点,保障全局读一致性。
在一种可能实现方式中,在对目标事务进行初始化的过程中,协调节点设备可以执行下述两项初始化操作中的至少一项。
初始化操作一:协调节点设备为目标事务分配一个全局唯一的事务标识TID。
该事务标识TID用于唯一标识该目标事务。
初始化操作二:协调节点设备在第一事务状态列表中记录目标事务的初始状态信息。
本申请实施例将协调节点设备维护的事务状态列表称为第一事务状态列表,该第一事务状态列表为去中心框架下用于记录目标事务的全局状态的全局状态列表。
在示例性实施例中,第一事务状态列表中记录的目标事务的状态信息包括但不限于目标事务的事务标识、目标事务的全局事务状态和目标事务的逻辑生命周期。其中,逻辑生命周期由时间戳下界和时间戳上界构成。逻辑生命周期的时间戳下界称为目标事务的开始时间戳(Begintimestamp,Bts),逻辑生命周期的时间戳上界称为目标事务的结束时间戳(Endtimestamp,Ets),也就是说,逻辑生命周期由时间戳下界Bts和时间戳上界Ets构成。
在目标事务的初始状态信息中,目标事务的事务标识TID为初始化操作一中分配的,目标事务的全局事务状态Status为Grunning(全局正在运行),目标事务的逻辑生命周期为第一逻辑生命周期,该第一逻辑生命周期的时间戳下界Bts为全局的唯一递增时间戳值,第一逻辑生命周期的时间戳上界Ets为+∞。
在示例性实施例中,第一逻辑生命周期的时间戳下界Bts和时间戳上界Ets的获取方式为:对于可串行化级别之上的隔离级别,协调节点设备从全局时钟获取时间戳值;对于可串行化级别以及更弱的隔离级别而言,协调节点设备从本地的混合逻辑时钟(Hybrid Logical Clock,HLC)获取时间戳值。当然,在一些实施例中,对于可串行化级别以及更弱的隔离级别,协调节点设备也能够通过从全局时钟获取时间戳值的方式获取第一逻辑生命周期的时间戳下界Bts和时间戳上界Ets。在示例性实施例中,对于可串行化级别以及更弱的隔离级别而言,从本地的HLC获取时间戳值的效率较高。
全局时钟是指全局逻辑时钟生成器生成的时钟,具备单调递增的特性,形式上可以是walltime(墙上时间),也可以是自然数N等。示例性地,全局时钟由分布式数据库系统中的全局时间戳生成集群提供。示例性地,对于基于share-disk架构的分布式数据库系统而言,全局时钟由分布式数据库系统中的存储系统通过API的方式提供。示例性地,全局逻辑时钟生成器能够为事务的开始时间戳Bts、事务的结束时间戳Ets赋值,还能够为WAL(Write Ahead Logging,提前写日志)的全局LSN(Log Sequence Number日志序列号)赋值。
在一些实施例中,全局时钟是一个逻辑概念,为全系统提供统一的单调递增值;物理形态可以是一个全局的物理时钟,也可以是一个全局的逻辑时钟。全局时钟的实现形态,可以有多种方式。如,全局时钟是一个类似Google(谷歌)的“Truetime(一种时钟机制)”机制的分布式去中心化的时钟;或者,全局时钟是一个采取多个冗余节点(如,一致性协议(Paxos/Raft等)构造的集群)的主备系统统一提供的时钟;再或者,全局时钟是一个具有精准同步机制协同节点退出机制的一种算法机制提供的时钟。
在示例性实施例中,一个事务的Bts和Ets的构成均由8字节组成。8字节分为两个部分,第一部分为物理时间戳的取值(即Unix(一种操作系统)时间戳,精确到毫秒),用于标识全局时间(用gts表示);第二部分为在某一毫秒内的单调递增计数,用于标识在全局时间上的相对时间(即局部时间,用lts表示)。示例性地,8字节中的前44位为第一部分,这 样共可表示2 44个无符号整数,因此理论上一共可以表示约为557.8
Figure PCTCN2021126408-appb-000001
年的物理时间戳;8字节中的后20位为第二部分,这样,每毫秒有2 20个(约100万个)计数。在示例性实施例中,也可以调整两个部分的位数,使得全局时间gts和局部时间lts表示的范围发生变化。
需要说明的是,根据实际需要,一个事务的Bts和Ets的构成可由多于8字节的字节组成或者由少于8字节的字节组成。示例性地,将Bts和Ets的构成调整为由10字节组成,使得局部时间lts增大,以应对更大的并发事务数量。
示例性地,对于由全局时间gts和局部时间lts这两部分构成的两个时间戳Ti.bts和Tj.bts而言,如果Ti.bts.gts<Tj.bts.gts,或者Ti.bts.gts=Tj.bts.gts且Ti.bts.lts<Tj.bts.lts,则认为Ti.bts<Tj.bts。
在步骤204中,协调节点设备基于目标事务的事务信息,向数据节点设备发送数据读取请求,数据节点设备为至少两个节点设备中用于参与处理目标事务的节点设备。
协调节点设备对目标事务进行初始化后,开始目标事务的执行阶段,事务的执行阶段可认为事务语义实现操作阶段。
数据节点设备为至少两个节点设备中用于参与处理目标事务的节点设备,数据节点设备能够获取目标事务涉及的数据项,即本申请实施例中的数据节点设备为目标事务相关的数据节点设备。目标事务的事务信息中携带需要读取的数据的相关信息,协调节点设备能够根据目标事务的事务信息,生成数据读取请求,然后向数据节点设备发送数据读取请求。
在一种可能实现方式中,数据读取请求表示为ReadRequestMessage(读请求消息),简称rrqm。
在一种可能实现方式中,数据读取请求携带目标事务的第一逻辑生命周期、目标事务的事务标识和读取计划。第一逻辑生命周期利用时间戳下界Bts和时间戳上界Ets表示。读取计划是指目标事务对应的数据读取计划,用于指示需要读取的数据项。在示例性实施例中,事务标识、时间戳下界Bts、时间戳上界Ets和读取计划分别记录在rrqm的四个字段中。
需要说明的是,数据节点设备的数量可以为一个或多个,本申请实施例中不对数据节点设备的数量进行具体限定。对于数据节点设备的数量为多个的情况,向不同的数据节点设备发送的数据读取请求中携带的读取计划不同,以指示需要在不同的数据节点设备中读取不同的数据项。
在步骤205中,数据节点设备基于协调节点设备发送的数据读取请求,获取数据读取结果,将数据读取结果返回协调节点设备。
数据节点设备在获取数据读取请求后,基于数据读取请求,获取数据读取结果。在一种可能实现方式中,对于数据读取请求携带目标事务的第一逻辑生命周期的情况,数据节点设备基于数据读取请求,获取数据读取结果的过程包括以下步骤2051至步骤2053。
步骤2051:基于第一逻辑生命周期,确定数据读取请求指示的待读取数据项的可见版本数据。
数据节点设备能够基于数据读取请求携带的读取计划,确定目标事务需要读取的数据项,将目标事务需要读取的数据项作为待读取数据项。待读取数据项的可见版本数据是指待读取数据项对应的各版本数据中相对于目标事务可见的某一版本数据。示例性地,数据节点设备中设置有数据缓冲区,若数据缓冲区中存在待读取数据项的各版本数据,则数据节点设备直接从数据缓冲区中获取待读取数据项的各版本数据;若数据缓冲区中不存在待读取数据项的各版本数据,则数据节点设备从共享的存储系统中获取待读取数据项的各版本数据。
在示例性实施例中,数据节点设备在接收到数据读取请求后,先检查本地事务状态列表(Local TS)中是否包含目标事务的状态信息。本地事务状态列表为数据节点设备维护的事务状态列表,本地事务状态列表中记录有该数据节点设备参与的各未提交事务的状态信息。在示例性实施例中,数据节点设备在接收到数据读取请求后,根据数据读取请求携带的目标事 务的事务标识,检查本地事务状态列表中是否包含目标事务的状态信息,检查结果包括以下两种。
检查结果1、本地事务状态列表中不包含目标事务的状态信息。
在此种情况下,可以在本地事务状态列表中初始化该目标事务的状态信息,即在Local TS中插入一条与目标事务相关的记录,该记录中的值分别为数据读取请求携带的目标事务的事务标识rrqm.TID、数据读取请求携带的目标事务的第一逻辑生命周期的时间戳下界rrqm.Bts、数据读取请求携带的目标事务的第一逻辑生命周期的时间戳上界rrqm.Ets以及数据读取请求指示的目标事务当前的事务状态rrqm.Running。
在此种情况下,基于第一逻辑生命周期,确定数据读取请求指示的待读取数据项的可见版本数据的方式为:确定待读取数据项相对于第一逻辑生命周期的可见版本数据。
检查结果2、本地事务状态列表中包含目标事务的状态信息。
在此种情况下,说明在接收到数据读取请求之前,目标事务已访问过该数据节点设备,此时,可以更新该数据节点设备上的目标事务的状态信息,更新方法为:将目标事务的逻辑生命周期的时间戳下界T.Bts更新为查询到的时间戳下界T.Bts与数据读取请求中携带的时间戳下界rrqm.Bts(即第一逻辑生命周期的时间戳下界)中的最大值,也即是说,令T.Bts=max(T.Bts,rrqm.Bts);此外,还将目标事务的逻辑生命周期的时间戳上界T.Ets更新为查询到的时间戳上界T.Ets与数据读取请求中携带的时间戳上界rrqm.Ets(即第一逻辑生命周期的时间戳上界)中的最小值,也即是说,令T.Ets=min(T.Ets,rrqm.Ets)。将更新后的时间戳下界和更新后的时间戳上界构成的逻辑生命周期作为更新后的逻辑生命周期。
在此种情况下,基于第一逻辑生命周期,确定数据读取请求指示的待读取数据项的可见版本数据的方式为:基于第一逻辑生命周期,确定更新后的逻辑生命周期;确定待读取数据项相对于更新后的逻辑生命周期的可见版本数据。
需要说明的是,确定待读取数据项相对于第一逻辑生命周期的可见版本数据的实现方式与确定待读取数据项相对于更新后的逻辑生命周期的可见版本数据的实现方式类似,在本申请实施例中,以确定待读取数据项相对于第一逻辑生命周期的可见版本数据为例进行说明。
在一种可能实现方式中,在确定待读取数据项相对于第一逻辑生命周期的可见版本数据之前先对第一逻辑生命周期进行合法性检验,来判断第一逻辑生命周期是否有效。示例性地,对第一逻辑生命周期进行合法性检验的方式为:检验第一逻辑生命周期的时间戳下界是否小于第一逻辑生命周期的时间戳上界。当第一逻辑生命周期的时间戳下界不小于第一逻辑生命周期的时间戳上界,说明第一逻辑生命周期无效,此时,将本地事务状态列表中目标事务的事务状态由Running更新为Aborted(回滚)。此外,数据节点设备向协调节点设备返回携带Abort(回滚)消息的数据读取结果。数据读取结果表示为ReadReplyMessage(读取反馈消息),简称rrpm。对于数据读取结果携带Abort消息的情况,rrpm消息中的IsAbort字段等于1,即rrpm.IsAbort=1。
当第一逻辑生命周期的时间戳下界小于第一逻辑生命周期的时间戳上界时,说明第一逻辑生命周期有效,此时,执行确定待读取数据项相对于第一逻辑生命周期的可见版本数据的操作。
在一种可能实现方式中,确定待读取数据项相对于第一逻辑生命周期的可见版本数据的过程为:响应于待读取数据项的最新版本数据的创建时间戳小于第一逻辑生命周期的时间戳上界,将该最新版本数据作为可见版本数据;响应于待读取数据项的最新版本数据的创建时间戳不小于第一逻辑生命周期的时间戳上界,继续将待读取数据项的前一版本数据与第一逻辑生命周期的时间戳上界进行比对,直至确定出第一个创建时间戳小于第一逻辑生命周期的时间戳上界的某一版本数据,将该版本数据作为可见版本数据。
也就是说,数据节点设备在确定待读取数据项x相对于某一逻辑生命周期的可见版本数据的过程中,首先从待读取数据项x的最新版本数据开始检查,如果该逻辑生命周期的时间戳上界T.Ets大于最新版本数据的创建时间戳Wts,该最新版本数据即为相对于该逻辑生命周 期的可见版本数据。否则,该最新版本数据不是相对于该逻辑生命周期的可见版本数据,需要查找前一版本数据,直到找到第一个满足T.Ets>Wts的某一版本数据x.v为止,将该版本数据x.v作为相对于该逻辑生命周期的可见版本数据。
在一种可能实现方式中,在确定出可见版本数据后,将该可见版本数据x.v存储到该目标事务的读集中。可选地,这里的读集可以是本地读集,也可以是全局读集,在本申请实施例中以该读集为本地读集为例进行说明,能够避免因同步全局读集而带来的通信开销。
在示例性实施例中,一个事务的读集中记录了该事务需要读取的数据项的可见版本数据。需要说明的是,对一个分布式读事务而言,该分布式读事务的读集可以划分为本地读集和全局读集,本地读集存在于数据节点设备上,而全局读集存在于协调节点设备上。当然,协调节点设备可以定期将全局读集同步至各个数据节点设备上,使得数据节点设备上也能够维护事务的全局读集。
步骤2052:基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期。
在确定可见版本数据后,数据节点设备基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期。
在一些实施例中,当可见版本数据是指相对于第一逻辑生命周期的可见版本数据时,该步骤2052的实现方式为:直接基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期。当可见版本数据是指相对于根据第一逻辑生命周期确定的更新后的逻辑生命周期的可见版本数据时,该步骤2052的实现方式为:基于可见版本数据的创建时间戳和根据第一逻辑生命周期确定的更新后的逻辑生命周期,确定目标事务的第二逻辑生命周期。本申请实施例以直接基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期为例进行说明。
在一种可能实现方式中,直接基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期的方式为:调整第一逻辑生命周期的时间戳下界,使第一逻辑生命周期的时间戳下界大于可见版本数据x.v的创建时间戳,即,使得T.Bts>x.v.Wts,以消除写读异常;将调整后得到的逻辑生命周期作为第二逻辑生命周期。
在另一种可能实现方式中,对于可见版本数据为待读取数据项的最新版本数据的情况,直接基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期的方式为:调整第一逻辑生命周期的时间戳下界,使第一逻辑生命周期的时间戳下界大于可见版本数据x.v的创建时间戳,即,使得T.Bts>x.v.Wts,以消除写读异常;响应于可见版本数据对应的待写事务不为空,调整第一逻辑生命周期的时间戳上界,使第一逻辑生命周期的时间戳上界小于可见版本数据对应的待写事务的逻辑生命周期的时间戳下界,即使得T.Ets<T0.Bts(T0表示可见版本数据对应的待写事务),以消除读写冲突;将调整后得到的逻辑生命周期作为第二逻辑生命周期。
可见版本数据对应的待写事务WT为正在修改可见版本数据所对应的数据项,且通过验证的事务。示例性地,通过记录待写事务的事务标识来记录待写事务。在一些实施例中,对于可见版本数据为最新版本数据的情况,将目标事务的事务标识添加到可见版本数据的活跃事务集合中;将可见版本数据添加到目标事务的本地读集中。
活跃事务集合(RTlist)用于记录访问过该最新版本数据的活跃事务,也可以称为读事务列表,该活跃事务集合可以是数组的形式,也可以是列表、队列、堆栈等形式,本申请实施例不对活跃事务集合的形式进行具体限定,RTlist中每个元素可以是读取过上述最新版本数据的事务的事务标识(TID)。
步骤2053:将携带第二逻辑生命周期和可见版本数据的结果作为数据读取结果。
在确定第二逻辑生命周期和可见版本数据后,数据节点设备将携带第二逻辑生命周期和可见版本数据的结果作为数据读取结果,然后将携带第二逻辑生命周期和可见版本数据的数据读取结果返回协调节点设备,以使协调节点设备获取第二逻辑生命周期和可见版本数据。 示例性地,数据读取结果表示为ReadReplyMessage(读取反馈消息),简称rrpm。示例性地,携带第二逻辑生命周期和可见版本数据的rrpm中包括Bts、Ets和Value字段,其中,Bts字段和Ets字段分别记录了第二逻辑生命周期的时间戳下界和第二逻辑生命周期的时间戳上界,Value字段记录了可见版本数据的值。
在步骤206中,协调节点设备响应于数据节点设备返回的数据读取结果满足事务验证条件,向数据节点设备发送事务验证请求和本地写集。
在数据节点设备将数据读取结果返回协调节点设备后,协调节点设备判断数据读取结果是否满足事务验证条件,然后在确定数据读取结果满足事务验证条件时,向数据节点设备发送事务验证请求和本地写集,以使数据节点设备对目标事务进行验证。
示例性地,在协调节点设备判断数据读取结果是否满足事务验证条件的过程中,先判断数据读取结果是否携带Abort(回滚)消息,即检查rrpm中的IsAbort字段是否等于1。若数据读取结果携带Abort消息,即rrpm.IsAbort=1,则认为数据读取结果不满足事务验证条件,此时,进入到全局回滚阶段。
若数据读取结果不携带Abort消息,对第一事务状态列表中目标事务的逻辑生命周期进行更新,更新方式为:将第一逻辑生命周期的时间戳下界和第二逻辑生命周期的时间戳下界中的最大值作为目标事务的第三逻辑生命周期的时间戳下界,将第一逻辑生命周期的时间戳上界和第二逻辑生命周期的时间戳上界中的最小值作为目标事务的第三逻辑生命周期的时间戳上界。即,使T.Bts=max(T.Bts,rrpm.Bts)、T.Ets=min(T.Ets,rrpm.Ets);其中,括号内的T.Bts和T.Ets分别为更新前的逻辑生命周期(即第一逻辑生命周期)的时间戳下界和时间戳上界,rrpm.Bts和rrpm.Ets分别为数据读取结果携带的第二逻辑生命周期的时间戳下界和时间戳上界。
在对第一事务状态列表中目标事务的逻辑生命周期进行更新后,检查第一事务状态列表中的T.Bts是否小于T.Ets,即检查第三逻辑生命周期的时间戳下界是否小于第三逻辑生命周期的时间戳上界,以判断第三逻辑生命周期是否有效。当第三逻辑生命周期的时间戳下界不小于第三逻辑生命周期的时间戳上界时,第三逻辑生命周期无效,此种情况下,认为数据读取结果不满足事务验证条件,进入到全局回滚阶段;当第三逻辑生命周期的时间戳下界小于第三逻辑生命周期的时间戳上界时,第三逻辑生命周期有效,此种情况下,认为数据读取结果满足事务验证条件,向数据节点设备发送携带第三逻辑生命周期的事务验证请求。
在示例性实施例中,如果协调节点设备决定回滚目标事务,需要将第一事务状态列表中目标事务的全局事务状态修改为Gaborting(全局正在回滚),通知相关子节点(即数据节点设备)进行局部回滚。
在示例性实施例中,在发送事务验证请求之前,协调节点设备将第一事务状态列表中目标事务的全局事务状态修改为Gvalidating(全局正在验证)。示例性地,事务验证请求表示为ValidateRequestMessage(验证请求消息),简称vrm。示例性地,vrm中包括Bts和Ets字段。其中,Bts字段和Ets字段分别记录了目标事务在第一事务状态列表中最新的逻辑生命周期的时间戳下界和时间戳上界,即第三逻辑生命周期的时间戳下界和时间戳上界。
在一些实施例中,数据节点设备的数量为多个,在此种情况下,每个数据节点设备均返回一个数据读取结果,数据读取结果满足事务验证条件是指各个数据节点设备返回的各个数据读取结果均满足事务验证条件。此种情况下,第三逻辑生命周期为通过综合考虑各个数据读取结果确定的逻辑生命周期。
在一些实施例中,在读取完全部所需数据且将更新写到本地内存中后,认为满足事务验证条件。也就是说,协调节点设备响应于第三逻辑生命周期有效,且目标事务的全局写集存储到本地内存中,向数据节点设备发送事务验证请求。目标事务的全局写集由终端生成并传输到协调节点设备,或者由协调节点设备自行生成,本申请实施例对此不加以限定。
一个事务的写集中记录了该事务需要更新的数据项,与读集结构类似,同样可以使用内存链表结构来维护事务的写集。需要说明的是,对一个分布式写事务而言,该分布式写事务 的写集可以划分为本地写集和全局写集,本地写集存在于数据节点设备上,而全局写集存在于协调节点设备上。当然,协调节点设备可以定期将全局写集同步至各个数据节点设备上,使得数据节点设备上也能够维护事务的全局写集。
在将目标事务的全局写集写到协调节点设备的本地内存后,协调节点设备能够基于全局写集确定数据节点设备的本地写集,以将事务验证请求和本地写集一同发送给数据节点设备。数据节点设备的本地写集是指目标事务的全局写集中需要由数据节点设备负责写入的写集。
在目标事务的读取阶段,通信主要在协调节点设备和相关的数据节点设备之间发生,每成功读取一次数据需要两次通信:协调节点设备发送数据读取请求到相关的数据节点设备上、相关的数据节点设备返回数据读取结果给协调节点设备。因此,在数据读取阶段,假设n(n为大于1的整数)为远程读取的次数,那么最多需要进行2n次通信,最大通信量可以表示为n×(数据读取请求消息大小+数据读取结果消息大小)。在示例性实时中,当目标事务需要读取某相关的数据节点设备的多个数据项的数据时,将多个数据项的数据的数据读取请求打包发送,以批量读取这些数据,节省通信次数,提高数据读取效率。
在步骤207中,数据节点设备基于协调节点设备发送的事务验证请求和本地写集,获取目标事务的验证结果,将目标事务的验证结果返回协调节点设备。
数据节点设备接收到协调节点设备发送的事务验证请求和本地写集后,验证目标事务的合法性,以获取目标事务的验证结果。此阶段为事务提交前的事务合法性验证阶段。
数据节点设备的验证过程为本地验证过程,数据节点设备基于事务验证请求和本地写集,获取目标事务的验证结果的过程为数据节点设备执行本地验证操作的过程。在一种可能实现方式中,事务验证请求携带第三逻辑生命周期,第三逻辑生命周期为由协调节点设备基于第一逻辑生命周期和第二逻辑生命周期确定的有效逻辑生命周期。第三逻辑生命周期为协调节点设备发送事务验证请求之前维护的目标事务的最新逻辑生命周期。
在一种可能实现方式中,在数据节点设备基于事务验证请求和本地写集,获取目标事务的验证结果的过程中,数据节点设备先更新本地事务状态列表中目标事务T的状态信息,更新方式为:T.Bts=max(T.Bts,vrm.Bts)、T.Ets=min(T.Ets,vrm.Ets),其中,括号中的vrm.Bts和vrm.Ets分别为事务验证请求携带的第三逻辑生命周期的时间戳下界和时间戳上界。在本申请实施例中,在接收事务验证请求之前,数据节点设备的本地事务状态列表中维护的目标事务的逻辑生命周期为第二逻辑生命周期,为便于区分,将在接收事务验证请求之后且对目标事务的状态信息进行更新后,本地事务状态列表维护的目标事务的逻辑生命周期称为第四逻辑生命周期。
也就是说,数据节点设备将第三逻辑生命周期的时间戳下界和第二逻辑生命周期的时间戳下界中的最大值作为目标事务的第四逻辑生命周期的下界;将第三逻辑生命周期的时间戳上界和第二逻辑生命周期的时间戳上界中的最小值作为目标事务的第四逻辑生命周期的时间戳上界。由此,得到第四逻辑生命周期。需要说明的是,此处更新的是数据节点设备的本地事务状态列表中维护的目标事务的逻辑生命周期,此种更新能够用于事务并发访问控制,即用于保证事务一致性。
在示例性实施例中,对于可串行化隔离级别,在确定第四逻辑生命周期后,通过检查第四逻辑生命周期的时间戳下界是否小于第四逻辑生命周期的时间戳上界,来验证第四逻辑生命周期是否有效。
响应于第四逻辑生命周期的时间戳下界不小于第四逻辑生命周期的时间戳上界,说明第四逻辑生命周期无效,此时,目标事务的本地验证不通过,数据节点设备向协调节点设备返回携带Abort消息的验证结果。该Abort消息用于引发全局回滚。向协调节点设备返回目标事务的验证结果的过程可看作向协调节点设备发送本地验证反馈消息lvm的过程,对于目标事务的验证结果为携带Abort消息的验证结果的情况,本地验证反馈消息lvm中的IsAbort字段等于1,即lvm.IsAbort=1。
响应于第四逻辑生命周期的时间戳下界小于第四逻辑生命周期的时间戳上界,说明第四 逻辑生命周期有效,此种情况下,基于本地写集对应的各个待写入数据项的读事务相关信息和第四逻辑生命周期,确定目标事务的第五逻辑生命周期。第五逻辑生命周期是指在对本地写集中的各个待写入数据项进行读写冲突验证的过程中更新得到的逻辑生命周期。
在一种可能实现方式中,一个待写入数据项的读事务相关信息包括该一个待写入数据项的最大读事务时间戳和该一个待写入数据项的目标读事务的结束时间戳中的至少一项。其中,一个待写入数据项的最大读事务时间戳(记为Rts)用于指示读取过该一个待写入数据项的各个读事务的逻辑提交时间戳中的最大值,一个待写入数据项的目标读事务为该一个待写入数据项对应的本地验证通过或者处于提交阶段的读事务,目标读事务的结束时间戳为目标读事务的逻辑生命周期的时间戳上界。
在示例性实施例中,一个待写入数据项的目标读事务为该一个待写入数据项对应的活跃事务集合中本地验证通过或者处于提交阶段的读事务。通过检测该一个待写入数据项对应的活跃事务集合中各读事务的事务状态,即可确定该一个待写入数据项的目标读事务。
在一种可能实现方式中,在一个待写入数据项的读事务相关信息的三种不同情况下,基于本地写集对应的各个待写入数据项的读事务相关信息和第四逻辑生命周期,确定目标事务的第五逻辑生命周期的过程也有所不同。
情况1、一个待写入数据项的读事务相关信息包括该一个待写入数据项的最大读事务时间戳。
在此种情况1下,基于本地写集对应的各个待写入数据项的读事务相关信息和第四逻辑生命周期,确定目标事务的第五逻辑生命周期的过程为:基于各个待写入数据项的最大读事务时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期,其中,第五逻辑生命周期的时间戳下界大于各个待写入数据项的最大读事务时间戳中的最大值。
第四逻辑生命周期为在确定第五逻辑生命周期之前,数据节点设备的本地事务状态列表中维护的目标事务的最新逻辑生命周期。在一种可能实现方式中,基于各个待写入数据项的最大读事务时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期的方式为:基于各个待写入数据项的最大读事务时间戳对第四逻辑生命周期的时间戳下界进行调整,将调整后得到的逻辑生命周期作为第五逻辑生命周期。
示例性地,基于各个待写入数据项的最大读事务时间戳对第四逻辑生命周期的时间戳下界进行调整的方式为:使调整后的时间戳下界T.Bts=max(T.Bts,y.Rts+1),其中,括号内的T.Bts表示第四逻辑生命周期的时间戳下界,y.Rts表示各个待写入数据项的最大读事务时间戳中的最大值,数值1用于保证得到的第五逻辑生命周期的时间戳下界大于各个待写入数据项的最大读事务时间戳中的最大值。
在一些实施例中,数据节点设备在接收到本地写集后,先检测本地写集对应的各个待写入数据项的待写事务WT是否为空,若某一该待写入数据项的待写事务WT不为空,说明有其他事务正在修改该待写入数据项,且该事务已经进入了验证阶段,此时,需要回滚目标事务以消除写写冲突,即向协调节点设备返回携带Abort消息的验证结果。若各个待写入数据项的待写事务WT均为空,则将目标事务的事务标识赋值给各个待写入数据项的待写事务WT,以表示进入验证阶段的目标事务需要修改各个待写入数据项。在实现上,使用无锁的CAS(Compare and Swap,比较与交换)技术为待写入数据项y的待写事务WT进行赋值,以提高性能;或者,先对待写入数据项y的待写事务WT加锁,防止其他并发事务并发修改y,然后对加锁后的待写事务WT赋值。示例性地,在待写入数据项y上施加建议性锁,该建议性锁用于指示互斥对待写入数据项y的待写事务WT的修改操作。
情况2、一个待写入数据项的读事务相关信息包括该一个待写入数据项的目标读事务的结束时间戳。
在此种情况2下,基于本地写集对应的各个待写入数据项的读事务相关信息和第四逻辑生命周期,确定目标事务的第五逻辑生命周期的过程为:基于各个待写入数据项的目标读事务的结束时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期,其中,第五逻辑 生命周期的时间戳下界大于各个待写入数据项的目标读事务的结束时间戳中的最大值。
在一种可能实现方式中,基于各个待写入数据项的目标读事务的结束时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期的方式为:基于各个待写入数据项的目标读事务的结束时间戳对第四逻辑生命周期的时间戳下界进行调整,将调整后得到的逻辑生命周期作为第五逻辑生命周期。示例性地,基于各个待写入数据项的目标读事务的结束时间戳对第四逻辑生命周期的时间戳下界进行调整的方式为:使调整后的时间戳下界T.Bts=max(T.Bts,T1.Ets+1),其中,括号内的T.Bts表示第四逻辑生命周期的时间戳下界,T1.Ets表示各个待写入数据项的目标读事务的结束时间戳中的最大值,数值1用于保证得到的第五逻辑生命周期的时间戳下界大于各个待写入数据项的目标读事务的结束时间戳中的最大值。
需要说明的是,一个待写入数据项的目标读事务的数量可能为一个或多个,对于一个待写入数据项的目标读事务的数量为多个的情况,上述T1.Ets是指全部待写入数据项的全部目标读事务的结束时间戳中的最大值。
基于此种方式能够使目标事务的写操作的发生推迟到目标读事务的读操作之后,以避免读写冲突。
情况3、一个待写入数据项的读事务相关信息包括该一个待写入数据项的最大读事务时间戳和该一个待写入数据项的目标读事务的结束时间戳。
在此种情况3下,基于本地写集对应的各个待写入数据项的读事务相关信息和第四逻辑生命周期,确定目标事务的第五逻辑生命周期的过程为:基于各个待写入数据项的最大读事务时间戳以及各个待写入数据项的目标读事务的结束时间戳对第四逻辑生命周期进行连续的两次调整,将两次调整后得到的逻辑生命周期作为目标事务的第五逻辑生命周期。本申请实施例对两次调整的先后顺序不加以限定,示例性地,先基于各个待写入数据项的最大读事务时间戳对第四逻辑生命周期进行调整,然后基于各个待写入数据项的目标读事务的结束时间戳对一次调整后得到的逻辑生命周期进行调整。当然,在一些实施例中,还可以先基于各个待写入数据项的目标读事务的结束时间戳对第四逻辑生命周期进行调整,然后基于各个待写入数据项的最大读事务时间戳对一次调整后得到的逻辑生命周期进行调整。
示例性地,对于将两次调整后得到的逻辑生命周期作为第五逻辑生命周期的情况,在一次调整后得到逻辑生命周期后,若为可串行化隔离级别,先验证得到的逻辑生命周期的时间戳下界是否小于时间戳上界,若是,则继续进行下一次调整;若否,则直接认为本地验证失败,向协调节点设备返回携带Abort消息的验证结果。
在得到第五逻辑生命周期后,通过验证第五逻辑生命周期的时间戳下界是否小于第五逻辑生命周期的时间戳上界,来判断第五逻辑生命周期是否有效。响应于第五逻辑生命周期有效,将用于指示验证通过的验证结果作为目标事务的验证结果;响应于第五逻辑生命周期无效,将用于指示验证不通过的验证结果作为目标事务的验证结果。对于将用于指示验证通过的验证结果作为目标事务的验证结果的情况,数据节点设备的本地验证反馈消息lvm中记录目标事务在数据节点设备上得到的最新的逻辑生命周期(即第五逻辑生命周期)的时间戳下界Bts和时间戳上界Ets。示例性地,用于指示验证不通过的验证结果即为携带Abort消息的验证结果。
在示例性实施例中,当确定第五逻辑生命周期有效时,认为目标事务的本地验证通过,数据节点设备更新本地事务状态列表中目标事务的状态信息,将目标事务的事务状态更新为Validated(验证通过),即,使T.Status=Validated。在示例性实施例中,在确定目标事务的本地验证通过后,数据节点设备根据待写入数据项的更新值,创建待写入数据项的新版本数据。在示例性实施例中,为创建的新版本数据设置用于指示该新版本数据并未全局提交的第一标记。具有第一标记的新版本数据对外不可见。
需要说明的是,如果目标事务在数据节点设备中的本地验证不通过,则需要更新数据节点设备的本地事务状态列表中目标事务的事务状态为Aborted(回滚),即,使T.Status=Aborted。
在一种可能实现方式中,一个待写入数据项的活跃事务集合中,除了包括目标读事务外, 还包括正在运行的读事务,需要根据目标事务的第五逻辑生命周期调整正在运行的读事务的逻辑生命周期,以使得正在运行的读事务读不到目标事务新写入的数据,从而避免读写冲突现象,保证事务正确执行。示例性地,正在运行的读事务是指活跃事务集合中的事务状态为Running(正在运行)的事务。调整正在运行的读事务的逻辑生命周期方式为:使正在运行的读事务的逻辑生命周期的时间戳上界小于目标事务的第五逻辑生命周期的时间戳下界。假设正在运行的读事务为T2,则更新方式为T2.Ets=min(T2.Ets,T.Bts-1)。若某一正在运行的读事务更新后的逻辑生命周期的时间戳下界不小于时间戳上界,则通知该正在运行的读事务应该回滚。
从上述事务验证阶段可以看出,在目标事务的验证过程中,通信主要在协调节点设备和相关的数据节点设备之间发生,通信主要包含以下两步:协调节点设备向每个相关的数据节点设备发送事务验证请求及本地写集、相关的数据节点设备反馈验证结果给协调节点设备。因此,在目标事务的验证阶段,假设m(m为不小于1的整数)为与目标事务T相关的数据节点设备的数量,那么最多需要进行2m次通信,最大通信量可以表示为m×(事务验证请求消息大小+验证结果消息大小)+全局写集大小。
在步骤208中,协调节点设备基于数据节点设备返回的目标事务的验证结果,确定目标事务的处理指令,向数据节点设备发送处理指令,处理指令为提交指令或者回滚指令。
协调节点设备在接收到数据节点设备返回的验证结果后,根据接收到的验证结果来判断目标事务能否通过全局验证,进而确定目标事务的处理指令,向数据节点设备发送处理指令。其中,处理指令为提交指令或回滚指令。
在一种可能实现方式中,数据节点设备的数量为一个或多个,当数据节点设备的数量为多个时,每个数据节点设备均返回一个验证结果。
对于数据节点设备的数量为至少两个的情况,基于数据节点设备返回的目标事务的验证结果,确定目标事务的处理指令的过程为:响应于至少两个数据节点设备返回的至少两个验证结果中存在用于指示验证不通过的验证结果,将回滚指令作为目标事务的处理指令。响应于至少两个数据节点设备返回的至少两个验证结果均指示验证通过,将至少两个验证结果携带的逻辑生命周期的交集作为目标逻辑生命周期;响应于目标逻辑生命周期有效,将提交指令作为目标事务的处理指令;响应于目标逻辑生命周期无效,将回滚指令作为目标事务的处理指令。
在示例性实施例中,用于指示验证不通过的验证结果为携带Abort消息的验证结果,若某一验证结果不携带Abort消息,而是携带逻辑生命周期(即步骤207中确定的第五逻辑生命周期),则该验证结果指示验证通过。也就是说,在协调节点设备根据接收到的验证结果来判断目标事务能否通过全局验证的过程中,若接收到的验证结果中存在至少一个携带Abort消息的验证结果,即IsAbort字段等于1的lvm,表明目标事务没有通过全部的本地验证,目标事务的全局验证不通过,目标事务需要进行全局回滚。此种情况下,将回滚指令作为目标事务的处理指令。协调节点设备将第一事务状态列表中目标事务的全局事务状态更新为Gaborting(全局正在回滚)。协调节点设备向数据节点设备发送回滚指令,以通知数据节点设备进行本地回滚。示例性地,处理指令通过写入提交/回滚消息coarm发送,当处理指令为回滚指令时,coarm中的IsAbort字段等于1,即coarm.IsAbort=1。
若接收到的验证结果中不存在携带Abort消息的验证结果,或者接收到的验证结果均携带逻辑生命周期,则说明目标事务通过全部的本地验证。此种情况下,协调节点设备计算接收到的各个验证结果携带的逻辑生命周期的交集,得到目标逻辑生命周期。若目标逻辑生命周期的时间戳下界不小于目标逻辑生命周期的时间戳上界,则说明目标逻辑生命周期无效,确定目标事务的全局验证不通过,目标事务需要进行全局回滚,协调节点设备将回滚指令作为目标事务的处理指令。此外,协调节点设备还将第一事务状态列表中目标事务的全局事务状态更新为Gaborting(全局正在回滚),协调节点设备向数据节点设备发送回滚指令,以通知数据节点设备进行本地回滚。
若目标逻辑生命周期的时间戳下界小于目标逻辑生命周期的时间戳上界,则说明目标逻辑生命周期有效,确定目标事务的全局验证通过,协调节点设备从目标逻辑生命周期中随机选择一个时间戳为目标事务的逻辑提交时间戳Cts赋值,例如,选择目标逻辑生命周期的时间戳下界作为目标事务的逻辑提交时间戳。
在确定逻辑提交时间戳后,协调节点设备将第一事务状态列表中目标事务T的逻辑生命周期的时间戳下界以及逻辑生命周期的时间戳上界均更新为逻辑提交时间戳,即,使T.Bts=T.Ets=T.Cts。此外,将第一事务状态列表中目标事务的全局事务状态更新为Gcommitted(全局验证通过),请求全局时间戳生成集群为目标事务分配全局提交时间戳,记录到第一事务状态列表中目标事务的全局提交时间戳Gts字段中。此外,协调节点设备将提交指令作为目标事务的处理指令,向数据节点设备发送提交指令,以通知数据节点设备对目标事务进行提交。示例性地,对于处理指令通过写入提交/回滚消息coarm发送的情况,当处理指令为提交指令时,coarm中的IsAbort字段等于0,即coarm.IsAbort=0,coarm中的Cts和Gts字段中分别记录了目标事务的逻辑提交时间戳和目标事务的全局提交时间戳。
在步骤209中,数据节点设备响应于接收到协调节点设备发送的目标事务的处理指令,执行处理指令,处理指令为提交指令或者回滚指令。
数据节点设备接收到处理指令后,执行处理指令。数据节点设备执行处理指令的阶段为事务提交或回滚操作收尾阶段。
当处理指令为提交指令时,说明目标事务的全局验证通过,进入提交阶段,即将目标事务对数据的更新持久化到数据库中,并做一些后续清理工作。在示例性实施例中,数据节点设备接收到协调节点设备发送的提交指令之后,可以执行下述操作A至操作E。
操作A:对于目标事务的本地读集对应的每个数据项x,修改数据项x的最大读事务时间戳Rts,使数据项x的最大读事务时间戳Rts大于或等于目标事务的逻辑提交时间戳Cts,即,使x.Rts=max(x.Rts,T.Cts);将目标事务的事务标识TID从该数据项x的活跃事务列表RTlist中删除。
操作B:对于目标事务的本地写集对应的每个数据项y,执行以下操作:a)将数据项y原本的创建时间戳Wts修改为目标事务的逻辑提交时间戳T.Cts;b)将数据项y的最大读事务时间戳更新为原本的最大读事务时间戳和目标事务的逻辑提交时间戳中的最大值,即,使y.Rts=max(y.Rts,T.Cts);c)将数据项y持久化到数据库中,且将数据项y的标记由第一标记修改为第二标记,第二标记用于指示对外可见;d)将数据项y的活跃事务列表RTlist内容清空;e)将数据项y的待写事务WT内容清空。
操作C:清空目标事务的本地读集和本地写集。
操作D:将本地事务状态列表中目标事务的逻辑生命周期的时间戳下界和时间戳上界均更新为目标事务的逻辑提交时间戳,即,使T.Bts=T.Ets=T.Cts;将本地事务状态列表中目标事务的事务状态更新为Committed(提交完成)。需要说明的是,此时的本地事务状态列表,用于保证事务一致性,无需涉及全局事务状态的同步。
操作E:向协调节点设备返回提交成功的ACK(Acknowledge Character,确认字符)。
协调节点设备在接收到全部的数据节点设备返回的提交成功的ACK后,将第一事务状态列表中目标事务的全局事务状态修改为Gcommitted(全局提交完成),然后协调节点设备向数据节点设备发送状态信息清理指令,以使数据节点设备从本地事务状态列表中删除目标事务的状态信息。
当处理指令为回滚指令时,说明目标事务的全局验证不通过,需要进入全局回滚阶段,即将目标事务回滚,并做相应的清理工作。示例性地,清理工作内容包括:将目标事务的事务标识TID从目标事务的本地读集对应的每个数据项x的活跃事务列表RTlist中删除;清理目标事务的本地写集对应的每个数据项y对应的新创建数据,且将数据项y的待写事务WT的内容清空;清空目标事务的本地读集和本地写集;将本地事务状态列表中目标事务的事务状态更新为Aborted(回滚完成);向协调节点设备返回回滚完成的ACK。
协调节点设备在接收到全部的数据节点设备返回的回滚完成的ACK后,将第一事务状态列表中目标事务的全局事务状态修改为Gaborted(全局回滚完成),然后协调节点设备向数据节点设备发送状态信息清理指令,以使数据节点设备从本地事务状态列表中删除目标事务的状态信息。在一种可能实现方式中,协调节点设备批量向数据节点设备发送状态信息清理指令,以减少通信次数。
根据上述内容可知,在目标事务的提交/回滚阶段,通信主要在协调节点设备和相关的数据节点设备之间发生,通信主要包含以下两步:协调节点设备向每个相关的数据节点设备发送提交/回滚指令、每个相关的数据节点设备向协调节点设备发送相应的提交/回滚完成消息(ACK)。因此,提交/回滚阶段最多进行2m次通信,通信量的大小为m×(提交/回滚指令消息大小+提交/回滚完成消息大小),其中m(m为不小于1的整数)为目标事务T相关的数据节点设备的数量。
需要说明的是,本申请实施例以目标事务涉及读写操作为例进行了介绍,本申请实施例并不局限于此,对于目标事务仅涉及读操作或者仅涉及写操作的情况,依然能够根据本申请实施例提供的事务处理方法实现对事务的处理,本申请实施例不再一一赘述。
基于上述步骤201至步骤209处理事务的过程实现了事务的去中心化处理,能够解决并发事务之间的冲突操作带来的数据异常问题。从实现原理上看,本申请实施例提供的事务处理方法主要应用了OCC(Optimistic Concurrency Control,乐观并发控制)的算法框架,结合DTA(Dynamic Timestamp Allocation,动态分配时间戳)算法,减少网络传输的事务数据信息,提高分布式事务的验证效率,提升分布式事务的并发处理能力。此外,还结合MVCC(Mutil-Version Concurrency Control,多版本并发控制)实现无锁的数据读写,从而提升局部节点设备的并发处理能力。其中,DTA算法属于TO(Timestamp Ordering,时间戳排序)算法,事务的逻辑生命周期的时间戳下界和时间戳上界,可以动态调整。
本申请实施例提供的方法不受数据存储格式的影响,本申请实施例中的分布式数据库系统既支持键值式数据存储格式(KV数据存储格式)(如,HBase数据库系统中的数据存储格式),又支持段页式数据存储格式(如,PostgreSQL和MsSQL/InnoDB数据库系统中的数据存储格式)。
在示例性实施例中,对于段页式数据存储格式,在节点设备内建立数据缓冲区,以缓冲从共享的存储系统传输来的数据,从而加快下次获取数据的速度,缓冲的格式同下层的数据存储格式保持一致。从共享的存储系统中传输来的数据,缓冲在本地的数据缓冲区,事务结束但不被清理,直至本地数据缓冲区满或者有脏数据需要刷回共享的存储系统,或者缓冲失效(如,在其他节点设备上同样的数据被修改)。
事务提交前,每个节点设备向共享的存储系统算出事务日志(如,WAL日志),事务日志向共享的存储系统索要LSN值,该值是全局唯一且递增的一个值。在不同的数据存储格式下,事务处理过程中产生的事务日志具有不同的格式。示例性地,当数据存储格式为KV数据存储格式时,事务日志的格式如图3所示。
数据库系统维护的大表划分的各个区域(Region)共享一个日志文件,单个区域在日志中是按照时间顺序存储的,但是多个区域可能并不是完全按照时间顺序。每个日志最小单元由日志键(HLogKey)和日志编辑(WALEdit)两部分组成。其中,HLogKey由序列号(sequenceid)、时间戳(timestamp)、簇号(cluster ids)、区域名称(region name)以及表名称(table name)等组成,WALEdit由一系列的键值对(Key Value)组成,对一行上所有列(即所有Key Value)的更新操作,都包含在同一个WALEdit对象中,这主要是为了实现写入一行多个列时的原子性。sequenceid,是一个存储级别的自增序列号,区域的数据恢复和日志过期清除都要依赖它,示例性地,sequenceid是指事务日志的LSN值。
示例性地,当数据存储格式为段页式数据存储格式时,事务日志的格式如图4所示。每个Region共享一个日志文件,单个Region在日志中是按照时间顺序存储,且多个Region可能并不是完全按照时间顺序。每个日志最小单元不再由HLogkey和WALEdit两部分组成,而 是由一个日志记录(XLog Record)组成。
XLog Record由两部分构成,第一部分是头部信息,大小固定(如,24B(Bytes,字节),对应的结构体是XLogRecord;第二部分是日志记录数据(XLog Record data)。
XLog Record按存储的数据内容来划分,主要分为以下三类。
第1类:Record for backup block(备份块记录):存储full-write-page(全写页面)的block(块),这种类型的记录是为了解决页面部分写的问题。在checkpoint(检测点)完成后第一次修改数据页面,在记录此变更写入事务日志文件时整页写入(需设置相应的初始化参数,默认为打开)。
第2类:Record for tuple data block(元组数据块记录):用于存储页面中的元组变更。
第3类:Record for Checkpoint(检查点记录):在checkpoint发生时,在事务日志文件中记录checkpoint信息(其中包括Redo point(重做点))。
XLog Record data是存储实际数据的地方,由以下四个部分组成。
第1部分:0-N个XLogRecordBlockHeader(日志记录块头),每一个XLogRecordBlockHeader对应一个block data(块数据)。如果设置了BKPBLOCK_HAS_IMAGE标记,则在XLogRecordBlockHeader结构体后跟XLogRecordBlockImageHeader结构体;如果设置了BKPBLOCK_HAS_HOLE&BKPIMAGE_IS_COMPRESSED标记,则在XLogRecordBlockHeader结构体后跟XLogRecordBlockCompressHeader结构体;如果未设置BKPBLOCK_SAME_REL标记,则在XLogRecordBlockHeader结构体后跟RelFileNode。示例性地,在XLogRecordBlockHeader结构体后还可以跟BlockNumber(块编号)。
第2部分:XLogRecordDataHeader[Short|Long](日志记录数据头[短|长]),如数据大小<256Bytes,则使用Short格式,否则使用Long格式。
第3部分:block data(块数据):full-write-page data(全写页面数据)和tuple data(元组数据)。对于full-write-page data,如启用了压缩,则数据压缩存储,压缩后该page相关的元数据存储在XLogRecordBlockCompressHeader(日志记录块压缩头)
第4部分:main data(主要数据):记录checkpoint等日志数据。
示例性地,XLog Record的定义如下:
头部信息(固定大小的XLogRecord结构体)
XLogRecordBlockHeader结构体
XLogRecordBlockHeader结构体
...
XLogRecordDataHeader[Short|Long]结构体
block data
block data
...
main data
在一种可能实现方式中,对于数据存储格式为段页式数据存储格式的情况,当并发事务在不同节点设备(ES)上处理,且修改同一个页面上的不同数据项时,会发生页面级冲突导致数据覆盖问题。如,事务Ta在节点设备ES-1上修改X=2的数据项,事务Tb在节点设备ES-2上修改X=3的数据项,而X=2和X=3的数据项在同一个页面(page)上,此时,事务处理机制,是运行并发、并行事务执行的,在事务级不存在数据异常。但是,在页面级,却存在究竟应该是选取ES-1还是ES-2刷出的事务日志的选择,这导致对同一个物理页面的修改不能并存的问题出现。
在支持段页式数据存储格式的事务日志中,增加一个段页式列表(list),标出本段日志中的页面的地址(如,文件号、表空间号、在文件中的相对偏移等)和每个页面上正在执行写操作的事务标识。当事务日志刷出到底层共享的存储系统时,验证设备检查所有并发事务提交到共享的存储系统的事务中的list中的页面是否有重合,如果有,表明并发事务写过相 同的页面(如果写的是相同数据项,则在事务验证阶段,检测存在事务冲突并通过回滚已经把冲突解决掉),写的是相同页上的不同的数据项,此时有页面级冲突存在,发生数据覆盖事件,需要回滚其中一个节点设备ES对应的事务,使对应的事务被回滚的节点设备ES的事务日志不再刷出,避免问题发生。
在示例性实施例中,执行上述页面级冲突验证的主体为分布式数据库系统中的验证设备。该验证设备可以与任一节点设备处于同一物理机上,也可以是独立设备,本申请实施例对此不加以限定。
基于本申请实施例提供的事务处理方法,能够使分布式数据库系统既支持分布式事务、又能达到全局一致性的多读,能够通过去中心化事务处理技术兼顾性能。具备良好的带有事务属性特征的全局一致性多读和一致性多写的能力。基于本申请实施例提供的事务处理方法,能够为基于share-disk架构的分布式数据库系统,如知名的NoSQL(Non-relational SQL,泛指非关系型数据库)下的HBase数据库系统,提供去中心化的分布式事务处理方案,使得类似HBase的数据库系统具备了跨区、跨节点的高效的事务处理能力。
在本申请实施例中,根据各个节点设备分别对应的事务分配指标确定用于协调处理目标事务的协调节点设备,事务的分配过程无需考虑事务涉及的数据项,也无需考虑数据项的分布情况。基于此种方式,每个节点设备均能够作为去中心化的设备协调处理事务,使得事务能够跨节点处理,有利于提高事务的处理效率,事务处理的可靠性较高,有利于提升数据库系统的系统性能。
本申请实施例提供了一种事务处理系统,该事务处理系统包括协调节点设备和数据节点设备,协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,协调节点设备根据至少两个节点设备分别对应的事务分配指标确定,数据节点设备为至少两个节点设备中用于参与处理目标事务的节点设备;
其中,协调节点设备,用于获取目标事务的事务信息;基于目标事务的事务信息,向数据节点设备发送数据读取请求;
数据节点设备,用于基于协调节点设备发送的数据读取请求,获取数据读取结果,将数据读取结果返回协调节点设备;
协调节点设备,还用于响应于数据节点设备返回的数据读取结果满足事务验证条件,向数据节点设备发送事务验证请求和本地写集;
数据节点设备,还用于基于协调节点设备发送的事务验证请求和本地写集,获取目标事务的验证结果,将目标事务的验证结果返回协调节点设备;
协调节点设备,还用于基于数据节点设备返回的目标事务的验证结果,确定目标事务的处理指令,向数据节点设备发送处理指令,处理指令为提交指令或者回滚指令;
数据节点设备,还用于响应于接收到协调节点设备发送的目标事务的处理指令,执行处理指令。
在一种可能实现方式中,数据读取结果携带第二逻辑生命周期,第二逻辑生命周期由数据节点设备根据数据读取请求携带的目标事务的第一逻辑生命周期确定,第一逻辑生命周期由时间戳下界和时间戳上界构成;协调节点设备,还用于将第一逻辑生命周期的时间戳下界和第二逻辑生命周期的时间戳下界中的最大值作为目标事务的第三逻辑生命周期的时间戳下界;将第一逻辑生命周期的时间戳上界和第二逻辑生命周期的时间戳上界中的最小值作为目标事务的第三逻辑生命周期的时间戳上界;响应于第三逻辑生命周期有效,向数据节点设备发送携带第三逻辑生命周期的事务验证请求,第三逻辑生命周期有效用于指示第三逻辑生命周期的时间戳下界小于第三逻辑生命周期的时间戳上界。
在一种可能实现方式中,数据节点设备的数量为至少两个,协调节点设备,还用于响应于至少两个数据节点设备返回的至少两个验证结果中存在用于指示验证不通过的验证结果,将回滚指令作为目标事务的处理指令;响应于至少两个数据节点设备返回的至少两个验证结 果均指示验证通过,将至少两个验证结果携带的逻辑生命周期的交集作为目标逻辑生命周期;响应于目标逻辑生命周期有效,将提交指令作为目标事务的处理指令;响应于目标逻辑生命周期无效,将回滚指令作为目标事务的处理指令。
在一种可能实现方式中,数据读取请求携带目标事务的第一逻辑生命周期,第一逻辑生命周期由时间戳下界和时间戳上界构成;数据节点设备,用于基于第一逻辑生命周期,确定数据读取请求指示的待读取数据项的可见版本数据;基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期;将携带第二逻辑生命周期和可见版本数据的结果作为数据读取结果。
在一种可能实现方式中,事务验证请求携带目标事务的第三逻辑生命周期,第三逻辑生命周期为由协调节点设备基于第一逻辑生命周期和第二逻辑生命周期确定的有效逻辑生命周期;数据节点设备,还用于将第三逻辑生命周期的时间戳下界和第二逻辑生命周期的时间戳下界中的最大值作为目标事务的第四逻辑生命周期的时间戳下界;将第三逻辑生命周期的时间戳上界和第二逻辑生命周期的时间戳上界中的最小值作为目标事务的第四逻辑生命周期的时间戳上界;响应于第四逻辑生命周期有效,基于本地写集对应的各个待写入数据项的读事务相关信息和第四逻辑生命周期,确定目标事务的第五逻辑生命周期;响应于第五逻辑生命周期有效,将用于指示验证通过的验证结果作为目标事务的验证结果;响应于第五逻辑生命周期无效,将用于指示验证不通过的验证结果作为目标事务的验证结果。
在一种可能实现方式中,一个待写入数据项的读事务相关信息包括一个待写入数据项的最大读事务时间戳,一个待写入数据项的最大读事务时间戳用于指示读取过一个待写入数据项的各个读事务的逻辑提交时间戳中的最大值;数据节点设备,还用于基于各个待写入数据项的最大读事务时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期,第五逻辑生命周期的时间戳下界大于各个待写入数据项的最大读事务时间戳中的最大值。
在一种可能实现方式中,一个待写入数据项的读事务相关信息包括一个待写入数据项的目标读事务的结束时间戳,目标读事务为本地验证通过或者处于提交阶段的读事务,目标读事务的结束时间戳为目标读事务的逻辑生命周期的时间戳上界;数据节点设备,还用于基于各个待写入数据项的目标读事务的结束时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期,第五逻辑生命周期的时间戳下界大于各个待写入数据项的目标读事务的结束时间戳中的最大值。
上述实施例提供的系统与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
参见图5,本申请实施例提供了一种事务处理装置,该装置包括:
第一确定单元501,用于响应于目标事务的分配请求,确定共享同一个存储系统的至少两个节点设备分别对应的事务分配指标,一个节点设备对应的事务分配指标用于指示为该一个节点设备分配新事务的匹配度;
第二确定单元502,用于基于至少两个节点设备分别对应的事务分配指标,在至少两个节点设备中确定目标事务的协调节点设备,由协调节点设备对目标事务进行协调处理。
在一种可能实现方式中,第一确定单元501,用于确定事务分配模式,事务分配模式包括基于事务繁忙程度进行分配、基于设备繁忙程度进行分配和基于混合繁忙程度进行分配中的任一种;根据事务分配模式指示的确定方式,确定至少两个节点设备分别对应的事务分配指标。
在一种可能实现方式中,事务分配模式包括基于混合繁忙程度进行分配,第一确定单元501,还用于基于第一节点设备的事务处理数量、第一节点设备的设备资源使用率、事务处理数量权重、设备资源使用率权重以及权重调节参数,确定第一节点设备对应的事务分配指标,第一节点设备为至少两个节点设备中的任意一个节点设备。
在一种可能实现方式中,该装置还包括:
发送单元,用于将协调节点设备的设备标识信息发送给发起分配请求的终端,终端用于根据协调节点设备的设备标识信息,将目标事务的事务信息发送给协调节点设备,由协调节点设备基于事务信息对目标事务进行协调处理。
在一种可能实现方式中,分布式数据库系统支持键值式数据存储格式和段页式数据存储格式。
在本申请实施例中,根据各个节点设备分别对应的事务分配指标确定用于协调处理目标事务的协调节点设备,事务的分配过程无需考虑事务涉及的数据项,也无需考虑数据项的分布情况。基于此种方式,每个节点设备均能够作为去中心化的设备协调处理事务,使得事务能够跨节点处理,有利于提高事务的处理效率,事务处理的可靠性较高,有利于提升数据库系统的系统性能。
参见图6,本申请实施例提供了一种事务处理装置,该装置包括:
获取单元601,用于获取目标事务的事务信息;
第一发送单元602,用于基于目标事务的事务信息,向数据节点设备发送数据读取请求,数据节点设备为共享同一个存储系统的至少两个节点设备中用于参与处理目标事务的节点设备;
第二发送单元603,用于响应于数据节点设备返回的数据读取结果满足事务验证条件,向数据节点设备发送事务验证请求和本地写集;
确定单元604,用于基于数据节点设备返回的目标事务的验证结果,确定目标事务的处理指令;
第三发送单元605,用于向数据节点设备发送处理指令,处理指令为提交指令或者回滚指令,数据节点设备用于执行处理指令。
在一种可能实现方式中,数据读取结果携带第二逻辑生命周期,第二逻辑生命周期由数据节点设备根据数据读取请求携带的目标事务的第一逻辑生命周期确定,第一逻辑生命周期由时间戳下界和时间戳上界构成;第二发送单元603,用于将第一逻辑生命周期的时间戳下界和第二逻辑生命周期的时间戳下界中的最大值作为目标事务的第三逻辑生命周期的时间戳下界;将第一逻辑生命周期的时间戳上界和第二逻辑生命周期的时间戳上界中的最小值作为目标事务的第三逻辑生命周期的时间戳上界;响应于第三逻辑生命周期有效,向数据节点设备发送携带第三逻辑生命周期的事务验证请求,第三逻辑生命周期有效用于指示第三逻辑生命周期的时间戳下界小于第三逻辑生命周期的时间戳上界。
在一种可能实现方式中,数据节点设备的数量为至少两个,确定单元604,用于响应于至少两个数据节点设备返回的至少两个验证结果中存在用于指示验证不通过的验证结果,将回滚指令作为目标事务的处理指令;响应于至少两个数据节点设备返回的至少两个验证结果均指示验证通过,将至少两个验证结果携带的逻辑生命周期的交集作为目标逻辑生命周期;响应于目标逻辑生命周期有效,将提交指令作为目标事务的处理指令;响应于目标逻辑生命周期无效,将回滚指令作为目标事务的处理指令。
在本申请实施例中,根据各个节点设备分别对应的事务分配指标确定用于协调处理目标事务的协调节点设备,事务的分配过程无需考虑事务涉及的数据项,也无需考虑数据项的分布情况。基于此种方式,每个节点设备均能够作为去中心化的设备协调处理事务,使得事务能够跨节点处理,有利于提高事务的处理效率,事务处理的可靠性较高,有利于提升数据库系统的系统性能。
参见图7,本申请实施例提供了一种事务处理装置,该装置包括:
第一获取单元701,用于基于协调节点设备发送的数据读取请求,获取数据读取结果,协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,协调节点设备根据至少两个节点设备分别对应的事务分配指标确定;
返回单元702,用于将数据读取结果返回协调节点设备;
第二获取单元703,用于基于协调节点设备发送的事务验证请求和本地写集,获取目标事务的验证结果;
返回单元702,还用于将目标事务的验证结果返回协调节点设备;
执行单元704,用于响应于接收到协调节点设备发送的目标事务的处理指令,执行处理指令,处理指令为提交指令或者回滚指令。
在一种可能实现方式中,数据读取请求携带目标事务的第一逻辑生命周期,第一逻辑生命周期由时间戳下界和时间戳上界构成;第一获取单元701,用于基于第一逻辑生命周期,确定数据读取请求指示的待读取数据项的可见版本数据;基于可见版本数据的创建时间戳和第一逻辑生命周期,确定目标事务的第二逻辑生命周期;将携带第二逻辑生命周期和可见版本数据的结果作为数据读取结果。
在一种可能实现方式中,事务验证请求携带目标事务的第三逻辑生命周期,第三逻辑生命周期为由协调节点设备基于第一逻辑生命周期和第二逻辑生命周期确定的有效逻辑生命周期;第二获取单元703,用于将第三逻辑生命周期的时间戳下界和第二逻辑生命周期的时间戳下界中的最大值作为目标事务的第四逻辑生命周期的时间戳下界;将第三逻辑生命周期的时间戳上界和第二逻辑生命周期的时间戳上界中的最小值作为目标事务的第四逻辑生命周期的时间戳上界;响应于第四逻辑生命周期有效,基于本地写集对应的各个待写入数据项的读事务相关信息和第四逻辑生命周期,确定目标事务的第五逻辑生命周期;响应于第五逻辑生命周期有效,将用于指示验证通过的验证结果作为目标事务的验证结果;响应于第五逻辑生命周期无效,将用于指示验证不通过的验证结果作为目标事务的验证结果。
在一种可能实现方式中,一个待写入数据项的读事务相关信息包括该一个待写入数据项的最大读事务时间戳,该一个待写入数据项的最大读事务时间戳用于指示读取过该一个待写入数据项的各个读事务的逻辑提交时间戳中的最大值;第二获取单元703,还用于基于各个待写入数据项的最大读事务时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期,第五逻辑生命周期的时间戳下界大于各个待写入数据项的最大读事务时间戳中的最大值。
在一种可能实现方式中,一个待写入数据项的读事务相关信息包括该一个待写入数据项的目标读事务的结束时间戳,目标读事务为本地验证通过或者处于提交阶段的读事务,目标读事务的结束时间戳为目标读事务的逻辑生命周期的时间戳上界;第二获取单元703,还用于基于各个待写入数据项的目标读事务的结束时间戳和第四逻辑生命周期,确定目标事务的第五逻辑生命周期,第五逻辑生命周期的时间戳下界大于各个待写入数据项的目标读事务的结束时间戳中的最大值。
在本申请实施例中,根据各个节点设备分别对应的事务分配指标确定用于协调处理目标事务的协调节点设备,事务的分配过程无需考虑事务涉及的数据项,也无需考虑数据项的分布情况。基于此种方式,每个节点设备均能够作为去中心化的设备协调处理事务,使得事务能够跨节点处理,有利于提高事务的处理效率,事务处理的可靠性较高,有利于提升数据库系统的系统性能。
需要说明的是,上述实施例提供的装置在实现其功能时,仅以上述各功能单元的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元完成,即将设备的内部结构划分成不同的功能单元,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图8是本申请实施例提供的一种计算机设备的结构示意图,该计算机设备可因配置或性能不同而产生比较大的差异,可以包括一个或多个处理器(Central Processing Units,CPU)801和一个或多个存储器802,其中,该一个或多个存储器802中存储有至少一条计算机程序,该至少一条计算机程序由该一个或多个处理器801加载并执行,以使计算机设备实现上 述各个方法实施例提供的事务处理方法。当然,该计算机设备还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种非临时性计算机可读存储介质,该非临时性计算机可读存储介质中存储有至少一条计算机程序,该至少一条计算机程序由计算机设备的处理器加载并执行,以使计算机实现上述任一种事务处理方法。
在一种可能实现方式中,上述非临时性计算机可读存储介质可以是只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述任一种事务处理方法。
应当理解的是,本申请中术语“至少一个”是指一个或多个,“多个”或“至少两个”的含义均是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种事务处理方法,其中,所述方法应用于事务分配设备上,所述事务分配设备处于分布式数据库系统中,所述分布式数据库系统中还包括共享同一个存储系统的至少两个节点设备,所述方法包括:
    响应于目标事务的分配请求,确定所述至少两个节点设备分别对应的事务分配指标,一个节点设备对应的事务分配指标用于指示为所述一个节点设备分配新事务的匹配度;
    基于所述至少两个节点设备分别对应的事务分配指标,在所述至少两个节点设备中确定所述目标事务的协调节点设备,由所述协调节点设备对所述目标事务进行协调处理。
  2. 根据权利要求1所述的方法,其中,所述确定所述至少两个节点设备分别对应的事务分配指标,包括:
    确定事务分配模式,所述事务分配模式包括基于事务繁忙程度进行分配、基于设备繁忙程度进行分配和基于混合繁忙程度进行分配中的任一种;
    根据所述事务分配模式指示的确定方式,确定所述至少两个节点设备分别对应的事务分配指标。
  3. 根据权利要求2所述的方法,其中,所述事务分配模式包括基于混合繁忙程度进行分配,所述根据所述事务分配模式指示的确定方式,确定所述至少两个节点设备分别对应的事务分配指标,包括:
    基于第一节点设备的事务处理数量、所述第一节点设备的设备资源使用率、事务处理数量权重、设备资源使用率权重以及权重调节参数,确定所述第一节点设备对应的事务分配指标,所述第一节点设备为所述至少两个节点设备中的任意一个节点设备。
  4. 根据权利要求1-3任一所述的方法,其中,所述在所述至少两个节点设备中确定所述目标事务的协调节点设备之后,所述方法还包括:
    将所述协调节点设备的设备标识信息发送给发起所述分配请求的终端,所述终端用于根据所述协调节点设备的设备标识信息,将所述目标事务的事务信息发送给所述协调节点设备,由所述协调节点设备基于所述事务信息对所述目标事务进行协调处理。
  5. 根据权利要求1-3任一所述的方法,其中,所述分布式数据库系统支持键值式数据存储格式和段页式数据存储格式。
  6. 一种事务处理方法,其中,所述方法应用于协调节点设备上,所述协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定,所述方法包括:
    获取所述目标事务的事务信息;
    基于所述目标事务的事务信息,向数据节点设备发送数据读取请求,所述数据节点设备为所述至少两个节点设备中用于参与处理所述目标事务的节点设备;
    响应于所述数据节点设备返回的数据读取结果满足事务验证条件,向所述数据节点设备发送事务验证请求和本地写集;
    基于所述数据节点设备返回的所述目标事务的验证结果,确定所述目标事务的处理指令,向所述数据节点设备发送所述处理指令,所述处理指令为提交指令或者回滚指令,所述数据节点设备用于执行所述处理指令。
  7. 根据权利要求6所述的方法,其中,所述数据读取结果携带第二逻辑生命周期,所述第 二逻辑生命周期由所述数据节点设备根据所述数据读取请求携带的所述目标事务的第一逻辑生命周期确定,所述第一逻辑生命周期由时间戳下界和时间戳上界构成;所述响应于所述数据节点设备返回的数据读取结果满足事务验证条件,向所述数据节点设备发送事务验证请求,包括:
    将所述第一逻辑生命周期的时间戳下界和所述第二逻辑生命周期的时间戳下界中的最大值作为所述目标事务的第三逻辑生命周期的时间戳下界;将所述第一逻辑生命周期的时间戳上界和所述第二逻辑生命周期的时间戳上界中的最小值作为所述目标事务的第三逻辑生命周期的时间戳上界;
    响应于所述第三逻辑生命周期有效,向所述数据节点设备发送携带所述第三逻辑生命周期的事务验证请求,所述第三逻辑生命周期有效用于指示所述第三逻辑生命周期的时间戳下界小于所述第三逻辑生命周期的时间戳上界。
  8. 根据权利要求6或7任一所述的方法,其中,所述数据节点设备的数量为至少两个,所述基于所述数据节点设备返回的所述目标事务的验证结果,确定所述目标事务的处理指令,包括:
    响应于所述至少两个数据节点设备返回的至少两个验证结果中存在用于指示验证不通过的验证结果,将所述回滚指令作为所述目标事务的处理指令;
    响应于所述至少两个数据节点设备返回的至少两个验证结果均指示验证通过,将所述至少两个验证结果携带的逻辑生命周期的交集作为目标逻辑生命周期;
    响应于所述目标逻辑生命周期有效,将所述提交指令作为所述目标事务的处理指令;响应于所述目标逻辑生命周期无效,将所述回滚指令作为所述目标事务的处理指令。
  9. 一种事务处理方法,其中,所述方法应用于数据节点设备上,所述数据节点设备为共享同一个存储系统的至少两个节点设备中用于参与处理目标事务的节点设备,所述方法包括:
    基于协调节点设备发送的数据读取请求,获取数据读取结果,将所述数据读取结果返回所述协调节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定;
    基于所述协调节点设备发送的事务验证请求和本地写集,获取所述目标事务的验证结果,将所述目标事务的验证结果返回所述协调节点设备;
    响应于接收到所述协调节点设备发送的所述目标事务的处理指令,执行所述处理指令,所述处理指令为提交指令或者回滚指令。
  10. 根据权利要求9所述的方法,其中,所述数据读取请求携带所述目标事务的第一逻辑生命周期,所述第一逻辑生命周期由时间戳下界和时间戳上界构成;所述基于协调节点设备发送的数据读取请求,获取数据读取结果,包括:
    基于所述第一逻辑生命周期,确定所述数据读取请求指示的待读取数据项的可见版本数据;
    基于所述可见版本数据的创建时间戳和所述第一逻辑生命周期,确定所述目标事务的第二逻辑生命周期;
    将携带所述第二逻辑生命周期和所述可见版本数据的结果作为所述数据读取结果。
  11. 根据权利要求10所述的方法,其中,所述事务验证请求携带所述目标事务的第三逻辑生命周期,所述第三逻辑生命周期为由所述协调节点设备基于所述第一逻辑生命周期和所述第二逻辑生命周期确定的有效逻辑生命周期;所述基于所述协调节点设备发送的事务验证请求和本地写集,获取所述目标事务的验证结果,包括:
    将所述第三逻辑生命周期的时间戳下界和所述第二逻辑生命周期的时间戳下界中的最大 值作为所述目标事务的第四逻辑生命周期的时间戳下界;将所述第三逻辑生命周期的时间戳上界和所述第二逻辑生命周期的时间戳上界中的最小值作为所述目标事务的第四逻辑生命周期的时间戳上界;
    响应于所述第四逻辑生命周期有效,基于所述本地写集对应的各个待写入数据项的读事务相关信息和所述第四逻辑生命周期,确定所述目标事务的第五逻辑生命周期;
    响应于所述第五逻辑生命周期有效,将用于指示验证通过的验证结果作为所述目标事务的验证结果;响应于所述第五逻辑生命周期无效,将用于指示验证不通过的验证结果作为所述目标事务的验证结果。
  12. 根据权利要求11所述的方法,其中,一个待写入数据项的读事务相关信息包括所述一个待写入数据项的最大读事务时间戳,所述一个待写入数据项的最大读事务时间戳用于指示读取过所述一个待写入数据项的各个读事务的逻辑提交时间戳中的最大值;所述基于所述本地写集对应的各个待写入数据项的读事务相关信息和所述第四逻辑生命周期,确定所述目标事务的第五逻辑生命周期,包括:
    基于所述各个待写入数据项的最大读事务时间戳和所述第四逻辑生命周期,确定所述目标事务的第五逻辑生命周期,所述第五逻辑生命周期的时间戳下界大于所述各个待写入数据项的最大读事务时间戳中的最大值。
  13. 根据权利要求11所述的方法,其中,一个待写入数据项的读事务相关信息包括所述一个待写入数据项的目标读事务的结束时间戳,所述目标读事务为本地验证通过或者处于提交阶段的读事务,所述目标读事务的结束时间戳为所述目标读事务的逻辑生命周期的时间戳上界;所述基于所述本地写集对应的各个待写入数据项的读事务相关信息和所述第四逻辑生命周期,确定所述目标事务的第五逻辑生命周期,包括:
    基于所述各个待写入数据项的目标读事务的结束时间戳和所述第四逻辑生命周期,确定所述目标事务的第五逻辑生命周期,所述第五逻辑生命周期的时间戳下界大于所述各个待写入数据项的目标读事务的结束时间戳中的最大值。
  14. 一种事务处理系统,其中,所述事务处理系统包括协调节点设备和数据节点设备,所述协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定,所述数据节点设备为所述至少两个节点设备中用于参与处理所述目标事务的节点设备;
    所述协调节点设备,用于获取所述目标事务的事务信息;基于所述目标事务的事务信息,向所述数据节点设备发送数据读取请求;
    所述数据节点设备,用于基于所述协调节点设备发送的所述数据读取请求,获取数据读取结果,将所述数据读取结果返回所述协调节点设备;
    所述协调节点设备,还用于响应于所述数据节点设备返回的所述数据读取结果满足事务验证条件,向所述数据节点设备发送事务验证请求和本地写集;
    所述数据节点设备,还用于基于所述协调节点设备发送的所述事务验证请求和所述本地写集,获取所述目标事务的验证结果,将所述目标事务的验证结果返回所述协调节点设备;
    所述协调节点设备,还用于基于所述数据节点设备返回的所述目标事务的验证结果,确定所述目标事务的处理指令,向所述数据节点设备发送所述处理指令,所述处理指令为提交指令或者回滚指令;
    所述数据节点设备,还用于响应于接收到所述协调节点设备发送的所述目标事务的处理指令,执行所述处理指令。
  15. 一种事务处理装置,其中,所述装置包括:
    第一确定单元,用于响应于目标事务的分配请求,确定共享同一个存储系统的至少两个节点设备分别对应的事务分配指标,一个节点设备对应的事务分配指标用于指示为所述一个节点设备分配新事务的匹配度;
    第二确定单元,用于基于所述至少两个节点设备分别对应的事务分配指标,在所述至少两个节点设备中确定所述目标事务的协调节点设备,由所述协调节点设备对所述目标事务进行协调处理。
  16. 一种事务处理装置,其中,所述装置包括:
    获取单元,用于获取目标事务的事务信息;
    第一发送单元,用于基于所述目标事务的事务信息,向数据节点设备发送数据读取请求,所述数据节点设备为共享同一个存储系统的至少两个节点设备中用于参与处理所述目标事务的节点设备;
    第二发送单元,用于响应于所述数据节点设备返回的数据读取结果满足事务验证条件,向所述数据节点设备发送事务验证请求和本地写集;
    确定单元,用于基于所述数据节点设备返回的所述目标事务的验证结果,确定所述目标事务的处理指令;
    第三发送单元,用于向所述数据节点设备发送所述处理指令,所述处理指令为提交指令或者回滚指令,所述数据节点设备用于执行所述处理指令。
  17. 一种事务处理装置,其中,所述装置包括:
    第一获取单元,用于基于协调节点设备发送的数据读取请求,获取数据读取结果,所述协调节点设备为共享同一个存储系统的至少两个节点设备中用于对目标事务进行协调处理的节点设备,所述协调节点设备根据所述至少两个节点设备分别对应的事务分配指标确定;
    返回单元,用于将所述数据读取结果返回所述协调节点设备;
    第二获取单元,用于基于所述协调节点设备发送的事务验证请求和本地写集,获取所述目标事务的验证结果;
    所述返回单元,还用于将所述目标事务的验证结果返回所述协调节点设备;
    执行单元,用于响应于接收到所述协调节点设备发送的所述目标事务的处理指令,执行所述处理指令,所述处理指令为提交指令或者回滚指令。
  18. 一种计算机设备,其中,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述处理器加载并执行,以使所述计算机设备实现如权利要求1至5任一所述的事务处理方法,或者如权利要求6至8任一所述的事务处理方法,或者如权利要求9至13任一所述的事务处理方法。
  19. 一种非临时性计算机可读存储介质,其中,所述非临时性计算机可读存储介质中存储有至少一条计算机程序,所述至少一条计算机程序由处理器加载并执行,以使计算机实现如权利要求1至5任一所述的事务处理方法,或者如权利要求6至8任一所述的事务处理方法,或者如权利要求9至13任一所述的事务处理方法。
  20. 一种计算机程序产品,其中,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行如权利要求1至5任一所述的事务处理方法,或者如权利要求6至8任一所述的事务处理方法,或者如权利要求9至13任一所述的事务处理方法。
PCT/CN2021/126408 2020-11-27 2021-10-26 事务处理方法、系统、装置、设备、存储介质及程序产品 WO2022111188A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2023517375A JP2023541298A (ja) 2020-11-27 2021-10-26 トランザクション処理方法、システム、装置、機器、及びプログラム
EP21896690.1A EP4216061A4 (en) 2020-11-27 2021-10-26 TRANSACTION PROCESSING METHOD, SYSTEM, APPARATUS, DEVICE, RECORDING MEDIUM AND PROGRAM PRODUCT
US18/070,141 US20230099664A1 (en) 2020-11-27 2022-11-28 Transaction processing method, system, apparatus, device, storage medium, and program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011362629.2A CN112162846B (zh) 2020-11-27 2020-11-27 事务处理方法、设备及计算机可读存储介质
CN202011362629.2 2020-11-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/070,141 Continuation US20230099664A1 (en) 2020-11-27 2022-11-28 Transaction processing method, system, apparatus, device, storage medium, and program product

Publications (1)

Publication Number Publication Date
WO2022111188A1 true WO2022111188A1 (zh) 2022-06-02

Family

ID=73865889

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/126408 WO2022111188A1 (zh) 2020-11-27 2021-10-26 事务处理方法、系统、装置、设备、存储介质及程序产品

Country Status (5)

Country Link
US (1) US20230099664A1 (zh)
EP (1) EP4216061A4 (zh)
JP (1) JP2023541298A (zh)
CN (1) CN112162846B (zh)
WO (1) WO2022111188A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112162846B (zh) * 2020-11-27 2021-04-09 腾讯科技(深圳)有限公司 事务处理方法、设备及计算机可读存储介质
CN112950236B (zh) * 2021-03-31 2023-05-23 四川虹美智能科技有限公司 序列号写入方法、装置及计算机可读介质
CN115277735B (zh) * 2022-07-20 2023-11-28 北京达佳互联信息技术有限公司 数据的处理方法和装置、电子设备及存储介质
CN116389398B (zh) * 2023-05-30 2023-10-20 阿里巴巴(中国)有限公司 数据访问控制方法、车辆控制方法及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110153566A1 (en) * 2009-12-18 2011-06-23 Microsoft Corporation Optimistic serializable snapshot isolation
CN108958942A (zh) * 2018-07-18 2018-12-07 郑州云海信息技术有限公司 一种分布式系统分配任务方法、调度器和计算机设备
CN110287022A (zh) * 2019-05-28 2019-09-27 北京大米科技有限公司 一种调度节点选择方法、装置、存储介质及服务器
CN111597015A (zh) * 2020-04-27 2020-08-28 腾讯科技(深圳)有限公司 事务处理方法、装置、计算机设备及存储介质
CN112162846A (zh) * 2020-11-27 2021-01-01 腾讯科技(深圳)有限公司 事务处理方法、设备及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7702783B2 (en) * 2007-09-12 2010-04-20 International Business Machines Corporation Intelligent performance monitoring of a clustered environment
CN103092683B (zh) * 2011-11-07 2017-12-26 Sap欧洲公司 用于数据分析的基于启发式的调度
CN103324534B (zh) * 2012-03-22 2016-08-03 阿里巴巴集团控股有限公司 作业调度方法及其调度器
CN102831011B (zh) * 2012-08-10 2015-11-18 上海交通大学 一种基于众核系统的任务调度方法及装置
CN103731909B (zh) * 2013-12-30 2016-08-24 北京工业大学 一种移动计算终端低功耗设计方法
CN109842947B (zh) * 2017-11-24 2021-01-08 中国科学院计算技术研究所 一种面向基站任务的调度方法和系统
CN111176840B (zh) * 2019-12-20 2023-11-28 青岛海尔科技有限公司 分布式任务的分配优化方法和装置、存储介质及电子装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110153566A1 (en) * 2009-12-18 2011-06-23 Microsoft Corporation Optimistic serializable snapshot isolation
CN108958942A (zh) * 2018-07-18 2018-12-07 郑州云海信息技术有限公司 一种分布式系统分配任务方法、调度器和计算机设备
CN110287022A (zh) * 2019-05-28 2019-09-27 北京大米科技有限公司 一种调度节点选择方法、装置、存储介质及服务器
CN111597015A (zh) * 2020-04-27 2020-08-28 腾讯科技(深圳)有限公司 事务处理方法、装置、计算机设备及存储介质
CN112162846A (zh) * 2020-11-27 2021-01-01 腾讯科技(深圳)有限公司 事务处理方法、设备及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4216061A4 *

Also Published As

Publication number Publication date
CN112162846B (zh) 2021-04-09
JP2023541298A (ja) 2023-09-29
US20230099664A1 (en) 2023-03-30
EP4216061A4 (en) 2024-03-20
CN112162846A (zh) 2021-01-01
EP4216061A1 (en) 2023-07-26

Similar Documents

Publication Publication Date Title
US20230100223A1 (en) Transaction processing method and apparatus, computer device, and storage medium
CN111338766B (zh) 事务处理方法、装置、计算机设备及存储介质
US11388043B2 (en) System and method for data replication using a single master failover protocol
US20220335034A1 (en) Multi-master architectures for distributed databases
WO2022111188A1 (zh) 事务处理方法、系统、装置、设备、存储介质及程序产品
CN111143389B (zh) 事务执行方法、装置、计算机设备及存储介质
US10891267B2 (en) Versioning of database partition maps
CN111159252B (zh) 事务执行方法、装置、计算机设备及存储介质
CN109739935B (zh) 数据读取方法、装置、电子设备以及存储介质
US8386540B1 (en) Scalable relational database service
CN111597015B (zh) 事务处理方法、装置、计算机设备及存储介质
CN109710388B (zh) 数据读取方法、装置、电子设备以及存储介质
US11768885B2 (en) Systems and methods for managing transactional operation
US7693882B2 (en) Replicating data across the nodes in a cluster environment
CN111190935B (zh) 数据读取方法、装置、计算机设备及存储介质
US20230110826A1 (en) Log execution method and apparatus, computer device and storage medium
CN112307119A (zh) 数据同步方法、装置、设备及存储介质
CN112199427A (zh) 一种数据处理方法和系统
CN113010549A (zh) 基于异地多活系统的数据处理方法、相关设备及存储介质
WO2023216636A1 (zh) 事务处理方法、装置及电子设备
Zhou et al. GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database
WO2024022329A1 (zh) 一种基于键值存储系统的数据管理方法及其相关设备
US11914571B1 (en) Optimistic concurrency for a multi-writer database
CN113495896A (zh) 事务处理系统的管理方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21896690

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023517375

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021896690

Country of ref document: EP

Effective date: 20230421

NENP Non-entry into the national phase

Ref country code: DE