CN115145784A

CN115145784A - Operation environment monitoring method of distributed transaction submission protocol

Info

Publication number: CN115145784A
Application number: CN202210411985.1A
Authority: CN
Inventors: 潘鹤翔; 李耀明; 陈刚; 黄铭钧; 张美慧
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-10-04

Abstract

The disclosure relates to a method for monitoring the operating environment of a distributed transaction submission protocol, and belongs to the technical field of distributed transaction processing. The present disclosure maintains a system environment for each participant by using a RLSM robustness level state machine, the state maintained by the state machine being dynamically adjustable by inputting different parameters to track the environment of each participant in real time; the coordinator can determine the system environment level based on the state of the participant corresponding to the RLSM, and breaks through the fixed assumption of the system environment when the existing distributed transaction is submitted, so that the distributed transaction submission can dynamically adjust the submission protocol according to the system environment, and the efficiency of distributed transaction processing is improved. And further, the input of the state machine is set by the current protocol and the result thereof, so that the automatic adjustment of the RLSM state of the participant is realized. Input parameters when the state machine is further set to reduce the level are not fixed, and the input parameters are learned through reinforcement learning based on a historical submission protocol execution result, so that the distributed transaction processing environment is better met, and the transaction processing efficiency is improved.

Description

Distributed transaction submission protocol method for monitoring operating environment

The technical field is as follows:

the present disclosure relates to the field of distributed transaction processing technologies, and in particular, to a method for monitoring an operating environment of a distributed transaction commit protocol.

Background

Transactions are widely used in databases to store important information, integrating a series of key operations of users and guaranteeing ACID four attributes (atomicity, consistency, isolation, persistence). Among these four attributes, atomicity guarantees that operations in a transaction will occur simultaneously, but implementation of this property also introduces additional overhead for databases, especially distributed databases. In a distributed database, data is split and distributed across different nodes to achieve horizontal spreading. This presents new challenges to ensuring atomicity of transactions: all participating nodes need to be consistent with respect to submitting or rolling back a transaction. The distributed transaction submission problem is born and has received a great deal of attention from both the industry and academia. However, to our knowledge, existing distributed transaction commit protocols suffer from a fundamental drawback, namely their fixed assumptions about the system environment (node performance and network connection performance). This deficiency limits further improvements in distributed database efficiency.

Disclosure of Invention

The present disclosure is directed to overcome or partially overcome the above technical problems, and provides an environment detection method for distributed transaction commit, which enables a coordinator to comprehensively master a system environment and timely change a distributed transaction commit protocol according to a change in the environment, thereby improving efficiency of a distributed database.

In a first aspect, an embodiment of the present disclosure provides a method for monitoring an operating environment of a distributed transaction commit protocol, including a coordinator C ^* And several participants C of transaction T _T ，

C ^* Respectively maintaining a robustness level state machine RLSM for each participant;

the RLSM contains three states, representing three failure levels of the environment in which the distributed transaction commits: a no fault level, a crash fault level, and a network fault level;

the RLSM is transferred among three states according to different inputs;

C ^* according to C _T Determining an environment level L for the RLSM state of each participant, wherein L is a no fault level, a crash fault level, or a network fault level, and is used for determining a protocol Π submitted by the distributed transaction T _T 。

In a second aspect, an embodiment of the present disclosure provides an electronic device, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.

In a third aspect, the present disclosure provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method of the first aspect.

Has the advantages that:

the method provided by the present disclosure maintains the system environment of each participant by using a RLSM robustness level state machine, the state maintained by the state machine can be dynamically adjusted by inputting different parameters to track the environment of each participant in real time; the coordinator can determine the system environment level based on the state of the participant corresponding to the RLSM, and breaks through the fixed assumption of the system environment when the existing distributed transaction is submitted, so that the distributed transaction submission can dynamically adjust the submission protocol according to the system environment, and the efficiency of distributed transaction processing is improved. Furthermore, the input of the state machine is set by the current distributed transaction commit protocol and the execution result thereof, so that the state of the state machine of the participant RLSM can be automatically adjusted, namely, the state of the participant is dynamically determined by the execution result of the previous commit protocol, and the current state of the participant determines the commit protocol to be adopted when the next distributed transaction is committed. Furthermore, input parameters when the state machine is set to reduce the level are not fixed, and are learned through reinforcement learning based on a historical submission protocol execution result, so that the distributed transaction processing environment is better met, and the distributed transaction processing efficiency is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a schematic 2PC protocol flow diagram;

FIG. 2 is a schematic 3PC protocol flow diagram;

FIG. 3 is a schematic diagram of a conventional distributed system;

FIG. 4 is a schematic diagram of a distributed system architecture with context awareness capabilities;

FIG. 5 is a schematic diagram of a robust level state machine RLSM provided by the present disclosure;

FIG. 6 is a schematic diagram of a reinforcement learning optimizer provided in the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

FIG. 8 is a diagram illustrating the behavior of various protocols in a stable environment; wherein, (a) is the expression that the Throughput (Throughput) increases with the Number of clients (Client Number), and (b) is the expression that the delay (Latency) increases with the Number of clients (Client Number);

FIG. 9 is a _CF And alpha _NF For the schematic diagram of the influence of PRAC throughput, ts-CF and ts-NF respectively represent collapse errors and network errors with the period of τ; wherein (a) is alpha under Periodic crash error (Periodic crash failure) _CF For the impact of PRAC throughput, (b) is alpha under Periodic network error (Periodic network failure) _NF Impact on PRAC throughput;

FIG. 10 shows a _CF Schematic of the impact of a PRAC in an unstable environment; wherein (a) is alpha _CF The influence of the ratio of Error path execution (Error Rate) is (b) α _CF The effect on the Average robustness Level (Average Level);

FIG. 11 is a diagram illustrating a comparison between various hierarchical protocols;

FIG. 12 is a graph illustrating reinforcement learning and a comparison of throughput for different fixed parameters; wherein (a) is when alpha _CF Comparison of PRAC throughput when adjusted by Reinforcement Learning (RL), fixed at 1 (fast robust degraded switching), fixed at 128 (reduced false assumption), (b) is when α is _NF Comparison of PRAC throughput when adjusted by Reinforcement Learning (RL), fixed at 1 (fast robustness degraded switching), fixed at 128 (reduced false assumptions).

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

The currently widely used distributed transaction Commit Protocols are the two-phase Commit protocol (2 PC or two-phase Commit), as shown in FIG. 1, in particular, nadia Nouaali, anne Doucet, and Habiba drias.2005.A two-phase Commit protocol for mobile wireless environment. In Proceedings of the 16th automation database conference-Volume 39.135-143, and the three-phase Commit protocol (3 PC or three-phase Commit), as shown in FIG. 2, in particular, dalle Sk. 1981.Non blocking Commit Protocols (SIGMOD' 81). Association for compatibility, new York, NY, USA,133-142. Nattps/dohturg/5810.2318/2318.2318. While 2PC cannot guarantee non-blocking when a failure occurs, 3PC, although it guarantees non-blocking by introducing a new phase, its extra message transmission, however, reduces the distributed transaction commit efficiency, especially when the system is in trouble-free operation for a long time. The industry divides the expected failure at commit of a distributed transaction into two categories:

1. breakdown failure: the node suspends or terminates operation.

2. And (3) network failure: the transmission of messages between nodes takes more time than expected.

From these two failures we divide the execution of the distributed transaction commit protocol into three categories, in particular we consider only network failures between participants:

1. executing without faults: no crash failure nor network failure occurs.

2. Crash fault execution: no network failure occurs, but a crash failure may occur.

3. Network failure execution: both types of failures may occur.

A distributed transaction commit protocol that is resistant to network failures must be resistant to crash failures, and a distributed transaction commit protocol that is resistant to crash failures must also execute correctly in a non-failure environment. Thus, the prior art selects a distributed transaction commit protocol based on the failure level of the environment in which it is to execute. If identity, validity and non-blocking must be ensured even when a network failure occurs, based on the requirements of an application environment, 3 PCs are probably selected, but when the 3 PCs are operated in an environment where network failure occurs occasionally and are in a non-failure environment for most of the time, the submission efficiency of distributed transactions is greatly reduced, and the throughput performance of the system is lost. Similarly, when 2PC is selected as the commit protocol, although the system throughput is high, it may be blocked when a crash or network failure occurs and the system becomes unusable. The reason for the above problem is that the system cannot dynamically perceive the environment, adjusting the distributed commit protocol it should currently use based on the dynamic transformation of the environment. Accordingly, the present disclosure provides an environment detection method for distributed transaction commit, which is capable of monitoring environmental conditions of a system in real time.

Fig. 3 is a prototype of a conventional distributed system, in which a coordinator C receives requests of transactions T from clients, synchronously sends the requests to a plurality of related participants, and feeds back the requests to the clients after the requests are successfully executed. Here, the transaction T is performed based on a fixed commit protocol adopted on a fixed assumption of the system environment, regardless of whether the entire system fails or a crash failure or a network failure occurs at this time.

Thus, the present disclosure proposes to coordinate at coordinator C ^* Setting an environment detector, executing the following environment detection method, sensing system environment, and sending the system environment to the computer C ^* Providing needs by several participants C _T The distributed commit protocol proposal for the current transaction T is executed as shown in figure 4.

FIG. 5 is a method for monitoring an operating environment of a distributed transaction commit protocol according to an embodiment of the present disclosure, including a coordinator C ^* And several participants C of the distributed transaction T _T Wherein

the RLSM contains three states, representing three failure levels for distributed transaction commit: no fault level, crash fault level and network fault level; as shown in fig. 5;

the RLSM is transferred among three states according to different inputs;

Coordinator C ^* Maintaining a system environment for each participant by using RLSM, the environment being dynamically adjustable by inputting different parameters to track the status of each participant in real time; the coordinator can determine the system environment level based on the states of the participants, and breaks the fixed assumption of the system environment when the existing distributed transaction is submitted, so that the distributed transaction submission can dynamically change a submission protocol according to the system environment, and the efficiency of distributed transaction processing is improved. For example, when the number of participants is 3, C ^* One RLS is maintained for each of the 3 participantsM, each RLSM is in a non-fault state, a breakdown fault state or a network fault state at the current moment, and is switched among different states according to input when C ^* When a transaction is to be committed, the current environment level of the system can be determined according to the states of the three RLSMs, and a commit protocol to be used is further determined according to the level. Thus, C ^* The environment monitoring can be dynamically implemented, the environment state of the system can be known in real time, and the submission strategy can be adjusted based on the current environment level of the system.

C ^* According to C _T The RLSM status determination environment level L of each participant can be determined by various schemes, such as C _T The best state in the middle, C _T Average status of middle participants, etc. Preferably, the present disclosure will C _T And the RLSM state grade corresponding to the worst participant in the middle system environment is given to L, and the system environment sequentially comprises a no-fault grade, a breakdown fault grade and a network fault grade from good to bad, namely the participant environment in the no-fault grade state is taken as the best environment, the breakdown fault grade is taken as the next, and the worst is the network fault grade. Thus, a commit protocol chosen based on the worst-state participant must be able to satisfy the transaction commit requirements of participants that are better than their state.

Specifically, the initial state of the RLSM is set to a fault-free level;

inputs are CF, NF, FF (alpha) _CF ) Or FF (alpha) _NF ) CF denotes a crash fault, NF denotes a network fault, FF (alpha) _CF ) Indicating a robustness-reducing transition from a crash failure level, i.e. said RLSM at a crash failure level, at a continuous α _CF After the next fault-free execution, the state is reduced to a fault-free level; FF (alpha) _NF ) Indicating a reduced robustness transition from the network failure level, i.e. said RLSM at the network failure level, at a continuous α _NF After the next fault-free execution, the state is reduced to a fault-free level;

the RLSM transitions among three states according to different inputs:

when the RLSM is in a fault-free level, if input CF is received, robustness improvement is carried out, and the RLSM is transferred to a collapse fault level, and if input NF is received, robustness improvement is carried out, and the RLSM is transferred to a network fault level;

when the RLSM is in a breakdown fault level, if input NF is received, robustness improvement is carried out, and the RLSM is transferred to a network fault level; if it receives the input FF (alpha) _CF ) Then, the robustness is reduced and the grade is transferred to a fault-free grade;

when RLSM is in network fault level, if input FF (alpha) is received _NF ) A transition to a no fault level is made with reduced robustness.

The operation process of the RLSM may have different settings in different application scenarios, such as setting the initial state to the worst level, or setting randomly, and the input and state transition process according to the input are also different according to the application environment. In this example, the initial state of the RLSM is set to the best state, i.e., no fault level, which objectively reflects the fact that the network or device is typically fault-free. In addition, the input area is divided into symbol representations of several different situations corresponding to the objective facts, and then state transition is carried out according to the corresponding symbols input when the corresponding situations occur, so as to correspond to the real situations of the objective world. If the RLSM is in the no-fault level, if the input CF is received, that is, the participant corresponding to the RLSM has a crash fault, the participant should be adjusted to be in a crash fault state, so that the robustness level of the participant is set to be improved at this time, and the participant is shifted to the crash fault level; if the input NF is received, that is, the participant corresponding to the RLSM has a network fault, the participant should be adjusted to the network fault state, so the robustness level of the participant should be improved at this time, and the participant is directly transferred to the network fault level. Similarly, when the RLSM is in the crash fault level, if the input NF is received, the robustness improvement action is performed, and the state is transferred to the network fault level. While in the case of a robustness-reducing transition, the RLSM in the state of a breakdown fault level is required to be in a continuous alpha state _CF After the next faultless execution, the state is lowered to the faultless level, i.e. the input FF (alpha) is received _CF ) Then the state is transferred to the fault-free level; RLSM in network failure class state at continuous alpha _NF After the next faultless execution, the state is only reduced to a faultless level, i.e.Receives an input FF (alpha) _NF ) The state is transferred to the no fault level.

Specifically, the input of RLSM is according to Π _T And execution result R of T _T The method is determined by the following principle:

if pi _T For fault-free level agreement, check R _T Judging whether a malignant network fault exists or not according to the result, if so, inputting NF, and adjusting the L to be a network fault level; otherwise check R _T Whether a participant represented by the middle participant has a malignant breakdown fault or not is judged, if yes, CF is input, and the L is adjusted to be a breakdown fault level;

if pi _T For crash fault level protocol, check R _T Judging whether a malignant network fault exists or not according to the result in (1), if so, inputting NF (NF) and adjusting the L to be a network fault grade; otherwise checking the number of consecutive faultless executions, if the number reaches alpha _CF Then input FF (alpha) _CF )；

If pi _T For network failure level agreement, the number of consecutive failure-free executions is checked, if the number reaches alpha _NF Then input FF (alpha) _NF )。

The input to the RLSM can come from a variety of sources, such as C ^* According to the state of the participant collected by computer programs such as a network management process or a network monitoring thread, the programs can be regularly communicated with the participant to acquire the state of the participant, when the state of the participant is changed, the input of the RLSM is determined by the corresponding state, and if a breakdown fault occurs, CF is input into the RLSM. This example is preferably implemented by the execution protocol Π of the transaction T _T And execution result R _T To determine the input that causes the RLSM state transition, C may be enabled ^* Implementing a self-running loop that does not rely on other means, i.e. when a transaction needs to be executed it requests an environment level from an environment detector (i.e. the environment detection method of the present disclosure), which feeds back its level, i.e. the commit protocol Π of the transaction _T ，Π _T After being executed, C ^* Will execute result R _T And then fed back to the environment detector according to R _T II and _T determining the input to RLSM, i.e. determining whether RLSM is to beA state transition is performed. Thus, a positive loop is formed in which the protocol is decided to be submitted according to the environmental state, and the environmental state is monitored and transferred according to the protocol execution result.

Specifically, the determination condition of the malignant collapse failure is as follows: if pi _T For a non-failure level protocol or a crash failure level protocol, then R is checked _T Whether there are missing participant results in (1), if present, for C ^* The network connection with the participants is synchronous connection, and the result loss shows that the corresponding participants encounter the crash fault before the result is sent, so that the fact that the malignant crash fault occurs on the missing participants is judged;

determination conditions of a malignant network failure: if pi _T Is a no failure level protocol and no malignant crash failure is detected, then R is checked _T If so, judging that a malignant network fault occurs among the participants; if pi _T For a crash fault class protocol, then if R _T If there is a conflict, it is determined that a malicious network failure has occurred between the participants. This is because, by the nature of the distributed transaction commit protocol, the decisions of the various participants should remain consistent when there are no intolerable faults, and thus it is known that intolerable faults must occur during the protocol execution.

Result R of execution under different protocols by all participants _T And judging whether the participant has a malignant crash fault or a malignant network fault so that the state machine transfer is determined only by the execution result of the transaction T, thereby simplifying the design complexity of the system, avoiding the extra overhead brought to the system by detecting the fault and improving the independence of the system.

In particular, alpha _CF And/or alpha _NF Obtained by reinforcement learning based on L. The Learning can be performed by using the existing reinforcement Learning method, such as Deep Q-Learning and Policy Gradient method.

α _CF 、α _NF The presence of two parameters makes the robustness-reducing transition not necessarily immediate, since we can completely sacrifice some efficiency rather than correctness to perform using a highly robust level protocolThe operation of the low robustness level protocol. Alpha is alpha _CF And alpha _NF Regulating the protocol between two factor selection: smaller parameters mean more sensitive transitions, which gives us more opportunity to process transactions with lightweight protocols; while larger parameters mean more cautious transitions, which makes us more immune to running on time-consuming failure paths. The user may manually adjust both parameters, but manually adjusting these parameters is not easy for the user, and only using a preset fixed value may present a sacrifice in efficiency. Therefore, we set α _CF And/or alpha _NF The information processing method can be obtained through reinforcement learning based on L, and parameter values obtained through reinforcement learning based on execution results when historical transactions are executed can fully reflect environment changes of all participants, so that reasonable tracking of the environment and efficient processing of the transactions are achieved. The system uses the adjusted environmental level for reinforcement learning, i.e., if there is a failure in the execution of the transaction T, the failure is first used to adjust L, which is then sent to the reinforcement learning optimizer. Later execution of the transaction will determine the protocol based on the adjusted L. For example, if we select a failure-free level protocol for T and a network failure occurs during the protocol execution process, L will be changed to a network failure level and sent to the reinforcement learning optimizer.

At alpha _CF The reinforcement learning is realized by a reinforcement learning optimizer, which comprises a collector, a decision maker and a learning device; as shown in fig. 6, in which,

the collector receives and caches the L, if the L is different from the last cached environment level, the buffer area is emptied, the average throughput mu and the L of the system during the buffer area buffering period are sent to the decision maker together, and a reset instruction is sent to the learner; if the L is the same as the last cached environment level, if the buffer area is full, emptying the buffer area, and sending the average throughput mu and the L of the system during the buffer period of the buffer area to a decision maker together;

the decision maker is provided with a counter and stores parameters alpha ', alpha' representing alpha needed to be acquired by reinforcement learning _CF (ii) a The initial value of the counter is 1; alpha' is k as an initial value, k is a real number, and is differentValues that may occur during the operation of the decision maker; in this example, k = -1 is set; the decision maker receives the (L, mu) number pairs and judges that:

if L is a no-fault level, the counter is decreased by 1, if the counter is 0, then a decision is made, otherwise, the operation is exited;

if L is not a no fault level, then [ reset ] is carried out;

the following is included:

if alpha' is not k, then the corresponding RLSM transition is triggered, setting the input of RLSM to FF (alpha) _CF ) (ii) a And [ reset ];

if α' is k, sending μ to the learner and obtaining a feedback number H, and a feedback decision α that may be sent back together; if H is 0, triggering corresponding RLSM transfer, and setting a counter to be 1; otherwise, setting the counter to be H; if alpha is successfully received, setting alpha' as alpha;

said [ reset ] comprises the following:

if alpha' is k, setting the counter to be 1;

if the alpha 'is not k, setting the counter as alpha';

the learner adjusts the reinforcement learning model to an initial state if receiving a reset instruction sent by the collector; if mu sent by the decision maker is received, training a reinforcement learning model by taking the mu as a return, and feeding back a decision number H fed back by the model to the decision maker; and if the training of the reinforcement learning model is finished, converting the decision scheme into a numerical value alpha and feeding back the numerical value alpha to the decision maker.

For parameter alpha _NF The process of reinforcement learning is the same as above, only the alpha in the reinforcement learning is needed _CF Is replaced by alpha _NF I.e. and not less than a _CF The learning of (1) is that the RLSM corresponding to a participant is learned by a reinforcement learning optimizer; for alpha _NF The learning of (1) is that all RLSMs corresponding to the participants use a reinforcement learning optimizer to learn, and the learning process is not repeated herein.

Existing reinforcement learning methods, such as the standard Markov Decision Process (Markov Decision Process), may be employedMDP), for the parameter α _CF And (6) learning. Preferably, this example uses a q-learning model. The reinforcement learning optimizer is additionally provided with a collector and a decision maker on the basis of the learner constructed by the reinforcement learning model, wherein the collector receives and caches the environment level L from the RLSM, and integrates and sends cached results to the decision maker when the level changes or the cache area is full. The cache of the collector avoids excessively frequent access to the decision maker and the reinforcement learning model, reduces the calculation overhead brought to the system by the part and avoids influencing the processing efficiency of the affairs. And the decision maker processes the received result, and sends the average throughput mu to the learner for learning when the counter clears zero, and obtains a decision. Specifically, after the training of the reinforcement learning model of the learner is completed, the decision maker will locally cache the parameter α learned by the learner, and accordingly perform the next decision. The decision maker converts the decision learned by the learner into parameters, so that the system is convenient to use, and meanwhile, the parameters are cached when learning is completed, so that continuous access to the learner is avoided, and the calculation overhead is further reduced.

Aiming at distributed transaction submission, the state machine RLSM is creatively added to quickly monitor the system environment, a set of feasible scheme is provided for the distributed submission protocol without using the assumption about the fixation of the system environment, a method for sensing the system environment by the distributed transaction submission protocol is provided, and further more optimization spaces are captured. In addition, the method and the device also add reinforcement learning to improve the adaptability of the protocol and the performance of the protocol in an unstable environment, and realize automatic parameter adjustment of the system. To our knowledge, RLSM is the first to utilize existing results of the protocol and incorporate a reinforcement learning, real-time system environment (crash failure, network failure) monitor.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device may execute the processing flow provided in the foregoing method embodiment, and as shown in fig. 7, an internet of things device 110 includes: memory 111, processor 112, computer programs, and communications interface 113; wherein the computer program is stored in the memory 111 and configured to be executed by the processor 112 for performing the method as described above.

In addition, the embodiment of the present disclosure also provides a computer readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the above embodiment.

The following experiments were made to verify the above-described environmental detection method.

In an experiment, corresponding protocols are designed for a non-failure level and a breakdown failure level respectively, and aiming at the breakdown failure level protocol, the participants exchange votes among themselves instead of passing through a coordinator by utilizing the property of connection synchronization among the participants, so that the message delay required by transaction execution is reduced. Aiming at a fault-free level protocol, on the basis of a breakdown fault level protocol, a technology of implicit voting is introduced, and message transmission among participants is further reduced. For the network failure level we directly use 3PC.

We have integrated these optimized protocols into a distributed transaction submission protocol and have it written PRAC using the environmental monitoring methods mentioned in this disclosure, based on which we have conducted extensive experiments to evaluate the performance improvement brought about by the environmental monitoring methods.

In the following, two aspects of implementation details and evaluation are presented:

comparison of PRAC with standard protocols such as 2PC (two-phase commit) and 3PC (three-phase commit) and one of the most advanced protocols at present, G-PAC (Sujaya Maiyya, faisal Nawab, divyakant Agrawal, and Amr El Abbadi.2019. Unity consensus data management and atomic recommendation for effective closed data management. Proceedings of the VLDB entity 12,5 (2019), 611-623.). We also tested a centralized variant of G-PAC, named C-PAC.

2. The effect of reinforcement learning on improving the adaptability of the PRAC in an unstable system environment.

(1) Implementation of an Experimental System

We implement the RLSM based distributed transaction commit protocol PRAC using Golang and Python. Our system processes transactions in a Percolator-like manner, using PreRead to limit the work of the distributed transaction commit protocol to the commit of write operations, thereby avoiding the impact of other parts on the experimental comparisons. All protocols are established on the same key value storage, and APIs such as Commit, rollBack, preRead and the like are shared, so that the influence of irrelevant factors is avoided. We map rows of a database table into key-value pairs and modify the storage size according to how much data is. For two-phase commit, we have consulted the implementation design in TiDB (2021. TiDB. Https:// githu. Com/pingcap/TiDB. Online; accessed: 2021-09-01.) in the 2PC two-phase commit protocol, a message will be retransmitted three times when it is not replied. For the 3PC three phase Commit protocol, we adopted the design mentioned In (Suyash Gupta and Mohammad Sadoghi.2018.Easy Commit: A Non-blocking Two-phase Commit protocol. In EDBT.157-168.). The realization of G-PAC comes from its authors Sujaya Maiyya, faisal Nawab, divyakant Agrawal, and Amr El Abbadi.2019. Unity sensors and atom compliance for effective closed data management. Proc. VLDB Endow.12,5 (January 2019), 611-623. We also implement a centralized version of the G-PAC, denoted C-PAC, where the head node of the protocol is fixed as the leader. Similar to the PAC, C-PAC requires three phases to perform a transaction: it (i) collects initial votes from all nodes, (ii) agrees on most nodes, and (iii) sends decisions asynchronously to all nodes.

(2) Experimental setup

We performed experiments using google E2 servers deployed in four different data centers. Including one coordinator and three participants. These data centers are located in city A (C) ₁ ) B, B market (C) ₂ ) C city (C) ₃ ) And D market (C) ^* ). We used D market (C) ^* ) As coordinator nodes, the other nodes act as participant nodes. For all experiments, we used a computationally optimized E2 midrange machine, equipped with 2 vCPU, 4GB memory, debian GNU/Linux 10 (buster) system, and 10GB disks.

For small scale testing, we used the YCSB-like dataset [ (Sujaya Maiiya, faisal Nawab, divyakant Agrawal, and am El Abbadi.2019. Unity consistency consensus and atom consensus for effective closed data management. Proceedings of the VLDB entity 12,5 (2019), 611-623.), (Brian F Cooper, adam Siersblin, erwin Tam, raghu Ramakrein, and Russell Sears.2010.Benchmark closed system with YCSB. In Proceedings of the 1st ACM system compatibility. 143-154. Quadry Q. Added similarity and sample probability. 2107.11378 (2021) ] to evaluate the performance of all protocols. In the test, the client continuously generates transactions to read and write multiple records with a closed-loop thread, and in order to simulate the real environment, we have 90% of the operations concentrated on 10% of the data objects. In the original test, all transactions were cross-node, which may not reflect real-world operations well. Thus, in this experiment we let 30% of the transactions involve only a single node, with the remainder of the transactions spanning the nodes. For large data sets we use TPC-C, which is a standard test set for OLTP systems, including three types of reads and writes and two types of read-only transactions. We build warehouses in three data centers separately. Each warehouse contains 10 different areas, each maintaining information for 3000 customers. We simulate the client using closed-loop threads that constantly send transactions to the coordinator and control the amount of data by adjusting the number of threads.

The experiment involved three sets of parameters: (i) Using two parameters alpha _CF And alpha _NF To control the number of consecutive fault-free executions required for a crash fault and for a downward transition in the network fault level. (ii) In experiments, we created two different environments: a stable environment and an unstable environment to simulate the occurrence of a fault in the real world. In particular, we generate faults at a specific frequency in an unstable environment: every 2 s is a period, the system will collapse or network fail in the first s, and then recover to normal in the remaining s. (iii) We use the network buffer parameter r to adjust the delay of the transmission of long messages. We multiply r by the longest message delay σ between every two participants to compute the upper message delay limit used in the algorithm.

(3) Testing in a Stable Environment

First, we evaluated the performance of the PRAC, 2PC, 3PC, G-PAC, and C-PAC protocols in a stable environment without injecting faults. As shown in fig. 8, fig. 8 shows the results of these protocols tested in a stable environment using the YCSB-like data set, with the abscissa being the Number of clients (Client Number), (a) the ordinate being the Throughput (Throughput) in the graph, and (b) the ordinate being the delay (Latency) in the graph. Here, PRACs exhibit significant improvements in delay and throughput. When there are 512 clients (indicated by dashed lines), PRAC has a 2.30-fold, 2.67-fold, 2.47-fold, and 1.62-fold improvement in throughput over G-PAC, C-PAC, and 2PC, respectively, while its latency is 94.8%, 145.3%, 10.4%, and 22.8% of the other protocols, respectively. Here, we explain the cause of this result. First, PRAC is preferred over 2PC because PRAC is non-blocking and therefore less susceptible to high contention hysteresis messages than 2 PC. In addition, PRAC is also superior to 3PC, C-PAC and G-PAC due to its adaptive capability to the system environment: in a stable environment, the PRAC switches to a more lightweight, fault-free level than other protocols to process the transaction. Please note that in this experiment, G-PAC achieves more advantage in delay than 2PC, but less advantage in throughput, which is slightly different from the results reported in the literature (Sujaya Maiyya, facial Nawab, divyakant Agrawal, and am El abba.2019. Unity sensing and atom compliance for effective closed data management. Proceedings of the VLDB energy 12,5 (2019), 611-623.). This is because each transaction is retried 3 times in our 2PC implementation, increasing throughput at the expense of higher latency. The asynchronous decision phase of the C-PAC reduces its latency overhead. Also, it does not require re-election of the leader when performing the transaction, achieving lower latency than G-PAC.

(4) Testing in unstable environments

We evaluate the behavior of PRACs in unstable environments by injecting a crash or network failure into the node and network connection. This fault injection is controlled by the parameter τ: the fault, once triggered, will last for τ seconds with a period of 2 τ seconds. We perform the experiment after all machines start up for 5+, 2 τ seconds, to ensure the results are stable. Each protocol was run 10 times and the results averaged.

We adjust the parameter α _CF And alpha _NF To balance between level conversion overhead and conversion speed. In particular, alpha _CF And alpha _NF The controlling RLSM needs to see how many fault-free executions to reset to the fault-free level. We fixed alpha in each experimental run _CF And alpha _NF And they were varied between different experimental groups to explore the effect of this parameter on the protocol. In FIG. 9, we investigated different α under different environmental stability _CF And alpha _HF Impact on PRAC performance.

Specifically, fig. 9 (a) shows different α's in the failure to crash (Periodic crash failure) level _CF Throughput at value, it can be observed that throughput varies with alpha _CF The increase in (c) first rises and then falls. We attribute this non-monotonic trend to increasing alpha _CF Two opposite effects are brought about: first, when α is _CF When added, PRACs require more fault-free execution before the RLSM transitions to the fault-free level, so the frequency of occurrence of robust degraded transitions becomes lower. Also because of this, fewer transactions will enter the wrong path. FIG. 10 (a) shows when α _CF When increased, transactions entering the wrong path will be reduced, i.e., reduced unnecessary robustness degradation transitions and transaction reruns. In particular, as shown by the 1s-CF line in fig. 9 (a), this overhead saving is particularly significant when the environment is unstable (τ =1 s). In this case, the system will have a crash failure lasting 1 second every 2 seconds, while the throughput of the PRAC will vary with a _CF Increasing from 1 to 16, increasing from 618tps to 778tps. On the other hand, careful robustness-degrading switching compromises the adaptability of PRACs. RLSM tends to remain at a more stringent level and thus misses opportunities to process transactions using lightweight protocols, as shown in fig. 10 (b). The average level of PRAC for all τ follows α _NF And increases. This indicates that more transactions are processed at a more restrictive level. It explains why with a in the case of τ being 1s _NF The throughput up to 128,PRAC drops to 466tps。

FIG. 9 (b) reports PRAC at different α _NF Throughput of (a) _NF Robust degraded switching of control network failure levels). We can observe that PRAC always prefers a smaller a _NF . When tau takes 1s, 4s or 16s, respectively, PRAC is always at alpha _NF Approaching 1, its peak performance is reached. Their throughputs reach 1540, 1303 and 1273tps, respectively, as shown by the dashed lines. We will convert alpha to _CF And alpha _NF Due to differences in performance gain, i.e. crash fault level protocol PRAC _CF And network failure level agreement PRAC _NF The throughput gap between. In particular, we compare three non-switchable versions of PRACs, each operating at a fixed level. FIG. 11 compares protocol slave crash failure levels (PRACs) _CF ) Or network failure level (PRAC) _NF ) Switching to a non-faulted level (PRAC) _FF ) The throughput gain of (1). It can be observed that the former achieves a throughput improvement of about 1.6 times, while the latter is up to 2.14 times. This difference explains why PRACs are always better suited to use smaller alpha' s _NF I.e. a fast robust degradation transition. PRACs still gain performance from a more relaxed level despite the increase in false assumptions and the consequent costly corrections.

We have demonstrated in the above experiments that the robustness degradation transformation parameter α _CF It is difficult to adjust. Specifically, fig. 9 (a) shows that as the system becomes unstable, α _CF The optimum value of (c) will change from 2 to 16. This represents the difficulty of manually adjusting the parameters and encourages us to use Reinforcement Learning (RL) to feedback the adjusted parameters as the performance of the system changes.

FIG. 10 when α is _CF And alpha _NF The PRAC achieved throughput for reinforcement learning when set to 1 and set to 128, respectively. In particular, the reinforcement learning learns alpha for the environments 1s-CF,4s-CF and 16s-CF respectively _CF =16,4,1. While it learns alpha for environmental 1s-NF,4s-NF and 16s-NF respectively _NF ＝1，2，2。

FIG. 12 (a) reports when α _CF And alpha _NF For strengthening the learning placeThe PRAC achieves the throughput when set to 1 and 128, respectively. In particular, the reinforcement learning learns alpha for the environments 1s-CF,4s-CF and 16s-CF respectively _CF =16,4,1. While it learns alpha for environmental 1s-NF,4s-NF and 16s-NF respectively _NF =1,2,2. When alpha is _CF Comparison of PRAC throughput when adjusted by Reinforcement Learning (RL), fixed at 1 (fast robustness degraded switching), fixed at 128 (reduced false assumptions). We can observe that RL adjusted α in all experimental settings (faults occur periodically with τ =1, 4 and 16 s) _CF Always resulting in better throughput. More specifically, when τ =1s, RL learned α _CF Obtain the phase ratio alpha _CF Throughput improvement of 25.7% for =1 when compared to α _CF =128, the boost amount reached even 66.7%. In particular, in further investigations we found that alpha was learned _CF The (average value) is close to 16 in this setup. In contrast, the performance gain decreases when τ =4 s: using the learned parameters only improves throughput by 6.4%. This is because α learned at this time _CF Is 4, close to a fixed value alpha _CF =1, resulting in only a small performance difference. Nevertheless, its throughput is still greater than α _CF 39.1% higher by = 128. Finally, when τ =16s, the parameter sum α is learned _CF =1 achieved the same performance (779 tps). This is in line with our expectation, since the parameter learned at this time is 1, and a is fixed _CF The effect of =1 is exactly the same. Furthermore, they all compare α _CF =128 improved throughput by 21.9%. In general, this experiment shows that using RL to adjust α in the level of crash failure _CF Can generate the ratio alpha _CF Better results are assigned to fixed values.

In addition, we also apply RL to adjust α _NF And the throughput is compared with alpha _NF Comparisons were made with fixes of 1 and 128. FIG. 12 (b) shows the results of an experiment using RL to adjust α _NF The throughput at parameter is significantly better than alpha _NF Throughput at =128, however compare a _NF =1, the throughput is slightly behind and close to that of the case. In particular, when τ is configured to be 1s, α adjusted by RL _NF And alpha _NF =1 yields a similar throughput (1540 tps). This is because in this setting, α is learned _NF Is also 1. We describe the optimal parameter selection for τ =1, 4 and 16s in the previous fig. 9 (b), where α =1 _NF =1 is the optimum value. This means that RL can still help us to convert α _NF Adjust to near optimal value: it can produce an almost optimal setting alpha _NF Throughput as high as 1. However, in practice, we propose to configure the parameter α _NF It is set directly to 1 to support fast robust degraded transitions for best performance.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims

1. A method for monitoring the running environment of distributed transaction submission protocol includes coordinator C ^* And several participants C of the distributed transaction T _T The method is characterized in that:

the RLSM is transferred among three states according to different inputs;

C ^* according to C _T Determining an environment level L of the RLSM state of each participant, wherein L is a no fault level, a crash fault level, or a network fault level, and wherein L is used to determine a protocol Π of commitment of the distributed transaction T _T 。

2. The method of claim 1, wherein: said L is said C _T And the system environment is sequentially a fault-free level, a breakdown fault level and a network fault level from good to bad.

3. The method of claim 2, wherein:

the initial state of the RLSM is a fault-free level;

the input is CF, NF, FF (alpha) _CF ) Or FF (alpha) _NF ) CF denotes a crash fault, NF denotes a network fault, FF (alpha) _CF ) Indicating a robustness-reducing transition from a crash failure level, i.e. said RLSM at a crash failure level, at a continuous α _CF After the next fault-free execution, the state is reduced to a fault-free level; FF (alpha) _NF ) Indicating a reduced robustness transition from the network failure level, i.e. said RLSM at the network failure level, at a continuous α _NF After the next fault-free execution, the state is reduced to a fault-free level;

the RLSM is transferred among three states according to different inputs:

when the RLSM is in a breakdown fault level, if input NF is received, robustness improvement is carried out, and the RLSM is transferred to a network fault level; if an input FF (alpha) is received _CF ) Then a robustness-reducing transfer is performedTo a no fault level;

if an input FF (alpha) is received when the RLSM is in a network fault level _NF ) A transition to a no fault level with reduced robustness is made.

4. The method of claim 3, wherein: said input is according to Π _T And execution result R of the T _T Is determined by the following principle:

if pi _T Checking said R for a fault-free level protocol _T Judging whether a malignant network fault exists or not according to the result, if so, inputting NF, and adjusting the L to be a network fault level; otherwise checking said R _T Whether the participant represented by the participant has a malignant breakdown fault or not is judged, if yes, CF is input, and the L is adjusted to be a breakdown fault level;

if pi _T Checking said R for crash fault level protocol _T Judging whether a malignant network fault exists or not according to the result in (1), if so, inputting NF (NF) and adjusting the L to be a network fault grade; otherwise checking the number of consecutive fault-free executions, if the number reaches said alpha _CF Then input FF (alpha) _CF )；

If pi _T Checking the number of consecutive fault-free executions for a network fault level agreement, if the number reaches said alpha _NF Then input FF (alpha) _NF )。

5. The method of claim 4, wherein:

determination conditions of the malignant crash failure: if pi _T For a non-failure level protocol or a crash failure level protocol, then check R _T If the missing participant result exists, judging that a malignant breakdown fault occurs on the missing participant;

determination conditions of the malignant network fault: if pi _T Is a no failure level protocol and no malignant crash failure is detected, then R is checked _T If so, determining that the conflict occurs among the participantsA malignant network failure; if pi _T For a crash fault level protocol, then if R _T If there is a conflicting decision, it is determined that a malicious network failure has occurred between the participants.

6. The method of claim 4, wherein: a is said _CF And obtaining through reinforcement learning based on the L.

7. The method of claim 6, wherein: the reinforcement learning is realized through a reinforcement learning optimizer, and the reinforcement learning optimizer comprises a collector, a decision maker and a learner; wherein, the first and the second end of the pipe are connected with each other,

the collector receives and caches the L, if the L is different from the last cached environment level, the collector empties a buffer area, sends the average throughput mu of the system during the buffer area buffering period to a decision maker together with the L, and sends a reset instruction to the learner; if the L is the same as the last cached environment level and the buffer area is full, emptying the buffer area and sending the average throughput mu of the system during the buffer area buffering period and the L to a decision maker;

the decision maker is provided with a counter and stores parameters alpha ', alpha' representing the alpha needed to be learned by reinforcement learning _CF (ii) a The initial value of the counter is 1; the initial value of alpha' is k, and k is a real number and is different from a value possibly appearing in the operation process of the decision maker; the decision maker receives the (L, mu) number pairs and judges that:

if L is the no-fault grade, the counter is decreased by 1, if the counter is 0, then decision is made, otherwise, the operation is exited;

if L is not the no fault level, performing [ reset ];

the following is included:

if α' is not k, the corresponding RLSM transition is triggered, i.e. the input of the RLSM is set to FF (α) _CF )；

If α' is k, sending μ to the learner and obtaining a feedback digital sheet, and a feedback decision α that may be sent back together; if H is 0, triggering corresponding RLSM transfer, and setting a counter to be 1; otherwise, setting the counter to be H; if alpha is successfully received, setting alpha' as alpha;

said [ reset ] comprises the following:

if alpha' is k, setting the counter to be 1;

if the alpha 'is not k, setting the counter as alpha';

the learner adjusts the reinforcement learning model to an initial state if receiving a reset instruction sent by the collector; if mu sent by the decision maker is received, the mu is used as a return to train the reinforcement learning model, and a decision number H fed back by the model is fed back to the decision maker; and if the training of the reinforcement learning model is finished, converting the decision scheme into a numerical value alpha and feeding back the numerical value alpha to the decision maker.

8. The method of claim 4, wherein: a is said _NF And obtaining through reinforcement learning based on the L.

9. The method of claim 8, wherein: the reinforcement learning is realized through a reinforcement learning optimizer, and the reinforcement learning optimizer comprises a collector, a decision maker and a learner; wherein, the first and the second end of the pipe are connected with each other,

the decision maker is provided with a counter and stores parameters alpha ', alpha' representing the alpha needed to be learned by reinforcement learning _NF (ii) a The initial value of the counter is 1; the initial value of alpha' is k, and k is a real number and is different from a value possibly appearing in the operation process of the decision maker; the decision maker receives the (L, mu) number pairs and judges that:

if L is not the no fault level, performing [ reset ];

the following is included:

if alpha' is not k, triggering the corresponding RLSM transfer, namely setting the C _T The input of the RLSM corresponding to each participant is FF (alpha) _NF )；

said [ reset ] comprises the following:

if alpha' is k, setting the counter to be 1;

if the alpha 'is not k, setting the counter as alpha';

10. The method according to claim 7 or 9, characterized in that: the reinforcement learning model is a q-learning model.