CN115878696A

CN115878696A - High-availability method and device for distributed data processing cluster

Info

Publication number: CN115878696A
Application number: CN202310201739.8A
Authority: CN
Inventors: 张军朋; 王元; 张乐; 伍斯; 罗盛君; 孙振宇; 李大鹏; 李晓伟; 李楠; 张汉勇; 郭延臣
Original assignee: China Xian Satellite Control Center
Current assignee: China Xian Satellite Control Center
Priority date: 2023-03-06
Filing date: 2023-03-06
Publication date: 2023-03-31
Anticipated expiration: 2043-03-06
Also published as: CN115878696B

Abstract

The invention discloses a high-availability method and a high-availability device for a distributed data processing cluster, wherein the method comprises the following steps of: when a distributed lock controller shared by a plurality of data processing subunits initiates lock grabbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache or not, if so, reading the first lock information in the distributed cache and judging whether node identifiers of the first lock information are consistent or not, if so, processing the node identifiers, creating second lock information and setting the data processing state of a service state controller to be active, wherein the second lock information is the first lock information updated by using a latest updating timestamp and a universal unique identification code, storing the second lock information, and processing data by using one data processing subunit with the active data processing state or one data processing service unit or one node as a main state. The method can solve the availability problem of the distributed lock under the condition of discontinuous cluster time.

Description

High-availability method and device for distributed data processing cluster

Technical Field

The invention relates to the technical field of high availability of data processing systems, in particular to a high availability method and device of a distributed data processing cluster.

Background

In order to ensure the reliability of data processing, a conventional data processing system usually adopts a master/standby mode with nodes as centers, the mode requires that two nodes communicate with each other to perform data interaction, data processing is performed only on a master node, the master node synchronizes the state of the master node, and the synchronized content includes information such as master/standby marks, data processing states and the like. Along with the rapid rise of data processing scale and complexity, the defects of the traditional centralized data processing system gradually appear, the function of a standby machine is not fully exerted, the system is not independently controllable, the expansibility is not high, the deployment is not flexible enough, and the development requirements of more and more complex and various data processing cannot be met.

In order to overcome the disadvantages of the traditional centralized data processing system architecture, the data processing system may be designed as a distributed architecture, and the system is split into a plurality of data processing service units according to functions, each data processing unit includes a plurality of data processing sub-units, and the data processing service units operate in a cluster environment. Data processing service units operating in a cluster environment need to address the consistency of data processing as well as high availability issues. The same data cannot be processed multiple times, which would have catastrophic consequences. For data processing service units in a cluster environment, it is a good idea to solve these problems with distributed locks.

At present, many distributed lock solutions exist at home and abroad, and the common solutions are divided into two types, namely lock services based on a distributed consistency algorithm, such as Zookeeper and Chubby; one is lock service based on distributed cache implementation and its variants, such as lock service implemented using Redis and RedLock implemented based on Redis. The distributed lockset realized based on the Redis equal distributed cache has the characteristics of simplicity, reliability, high efficiency and the like, and a high-availability distributed data processing system can be realized based on the design. However, the distributed lock implemented by the conventional distributed cache generally sets the validity period of the key, and judges whether the lock exists according to the existence of the key. The distributed lock realized by the method has higher requirements on the time continuity and consistency of each node of the cluster, and if the time among the nodes is inconsistent and the nodes frequently synchronize the time from the time reference source, the distributed lock may fail frequently, so that the service state switching which should not occur in the system data processing occurs.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. To this end, the first aspect of the present invention provides a distributed data processing cluster high availability method, which is applied to a data processing subunit, or a plurality of data processing service units, or a plurality of nodes processed by a plurality of data clusters, wherein each node is provided with a plurality of different data processing service units and a distributed cache, each data processing service unit includes a plurality of data processing subunits, a cluster includes a plurality of data processing service units with the same function of the node, and a distributed lock controller and a service state controller are provided on the node, the method includes:

when a distributed lock controller shared by a plurality of data processing subunits initiates lock grabbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache or not, wherein the first lock information comprises a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identification code and a processing mark;

if the first lock information exists, reading the first lock information in the distributed cache and judging whether the node identification of the first lock information is consistent with the node where the data processing subunit is located;

if the node identifiers are consistent and the processing marks are processing, creating second lock information and setting the data processing state of the service state controller to be a first state, wherein the second lock information is the first lock information updated by the latest updating time stamp and the universal unique identification code;

and writing the second lock information into the distributed cache, and processing the data by taking a data processing subunit with the data processing state being the first state as a main state, or taking a data processing service unit as a main state, or taking a node as a main state.

Further, if the first lock information of the data processing subunit does not exist, writing the first lock information of the data processing subunit into the distributed cache and setting the data processing state of the node to be the second state.

Further, if the node identification of the first lock information is inconsistent with the node where the data processing subunit is located, setting the data processing state to be a second state and judging whether the continuously read second lock information is consistent, if so, setting the count of the lock invariance counter to be increased by one, and if the count of the lock invariance counter exceeds a preset value, creating third lock information.

Further, if the second lock information read continuously is inconsistent, the first lock information read latest is saved and the count of the lock invariance counter is cleared, and the data processing subunit is in a lock snatching state.

Further, the plurality of digital processing sub-units includes at least one digital processing sub-unit.

Further, still include:

when data is processed at a new node, fourth lock information is created and written into the distributed cache, the lock identifier of the fourth lock information is a new data processing subunit identifier, the creation timestamp is the current time, the update timestamp is the current time, the node identifier is a node identifier of the new node, the universal unique identifier code is a new random universal unique identifier code, and the processing mark is processing.

Further, when the data processing state is changed, the fifth lock information is newly created.

The invention provides a distributed data processing cluster high-availability device, which is positioned on a plurality of data processing subunits processed by a data cluster, or a plurality of data processing service units, or a plurality of nodes, wherein each node is provided with a plurality of different data processing service units and a distributed cache, each data processing service unit comprises a plurality of data processing subunits, the cluster comprises a plurality of data processing service units with the same functions of the nodes, and the nodes are provided with distributed lock controllers and service state controllers;

the first judging module is used for continuously inquiring whether first lock information of the plurality of data processing subunits exists in the distributed cache or not when a distributed lock controller shared by the plurality of data processing subunits initiates lock grabbing, wherein the first lock information comprises a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identification code and a processing mark;

the second judging module is used for reading the first lock information in the distributed cache and judging whether the node identification of the first lock information is consistent with the node where the data processing subunit is located if the first lock information exists;

the state identification module is used for creating second lock information and setting the data processing state of the service state controller to be a first state if the node identifications are consistent and the processing marks are processing, wherein the second lock information is first lock information updated by using the latest update timestamp and the universal unique identification code;

and the main/standby determining module is used for writing the second lock information into the distributed cache, and taking a data processing subunit with the data processing state being the first state as a main state, or taking a data processing service unit as the main state, or taking a node as the main state to process the data.

The present invention also provides an electronic device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, the at least one instruction, the at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement the distributed data processing cluster high availability method of the first aspect.

The invention also provides a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the distributed data processing cluster high availability method of the first aspect.

Compared with the prior art, the embodiment of the invention provides a high-availability method and a device for a distributed data processing cluster, which have the following beneficial effects:

1. the invention judges the validity of the distributed lock, not only judges the existence of the lock, but also judges the mechanism of the continuous change of the information content of the lock object, thus solving the usability problem of the distributed lock under the condition of discontinuous cluster time;

2. the high-availability mechanism method provided by the invention is not limited by the number of nodes;

3. the high availability method provided by the invention can independently control a plurality of data processing subunits in each data processing service unit, and the high availability control is not limited to a single data processing service unit or the whole node.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description of the embodiment or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art it is also possible to derive other drawings from these drawings without inventive effort.

Fig. 1 is a flowchart of a high availability method for a distributed data processing cluster according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a data processing cluster of a distributed data processing cluster high availability method according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a determination process of a distributed data processing cluster high availability method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a distributed data processing cluster high-availability device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The present specification provides method steps as described in the examples or flowcharts, but more or fewer steps may be included based on routine or non-invasive labor. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures.

As shown in fig. 1-3, the present invention provides a high availability method for a distributed data processing cluster, which is applied to a data processing subunit, or a plurality of data processing service units, or a plurality of nodes processed by a plurality of data clusters, where each node is provided with a plurality of different data processing service units and a distributed cache, each data processing service unit includes a plurality of data processing subunits, the cluster includes a plurality of data processing service units with the same function of the node, and the node is provided with a distributed lock controller and a service state controller, and the method includes:

step 1, when a distributed lock controller shared by a plurality of data processing subunits initiates lock grabbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache or not, wherein the first lock information comprises a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identification code and a processing mark;

it should be noted that, as shown in fig. 2, the cluster may include a node 101, a node 102, a node 103, and a network switch 113. The nodes 101, 102 and 103 run Redis services 104, 105 and 106 respectively. The nodes 101, 102 and 103 run data processing service units 107, 108 and 109, respectively 107, 108 and 109. The node 101, the node 102 and the node 103 respectively run a data processing service unit 110, a data processing service unit 111 and a data processing service unit 112, the data processing service unit 107, the data processing service unit 108 and the data processing service unit 109 form a first service cluster, and the data processing service unit 110, the data processing service unit 111 and the data processing service unit 112 form a second service cluster.

The data processing service unit 107 includes a data processing subunit 1071 and a data processing subunit 1072, the data processing service unit 108 includes a data processing subunit 1081 and a data processing subunit 1082, and the data processing service unit 109 includes a data processing subunit 1091 and a data processing subunit 1092; the data processing service unit 110 includes a data processing sub-unit 1101 and a data processing sub-unit 1102, the data processing service unit 111 includes a data processing sub-unit 1111 and a data processing sub-unit 1112, and the data processing service unit 112 includes a data processing sub-unit 1121 and a data processing sub-unit 1122. Homogeneous data subunits have a high availability for data processing.

Node 101, node 102, and node 103 are interconnected via network switch 113. The nodes 101, 102 and 103 may be servers, industrial personal computers and PCs of the architecture such as x86_64, MIPS64 and ARM, and the operating system running thereon is a linux operating system capable of supporting the architecture thereof. The Redis service 104, the Redis service 105 and the Redis service 106 form a Redis high-availability cluster. It should be understood that the composition of Redis service 104, redis service 105, and Redis service 106 into a Redis highly available cluster is merely illustrative, and the present invention does not necessarily use Redis services to store distributed locks.

It should be understood that the number of nodes of the cluster in FIG. 1, as well as the number of data processing service units running on each node, the number of sub-units in each data processing service unit, are illustrative, and the data processing service units running on each node need not be the same. According to actual needs, any number of nodes can be provided, and the data processing service units running on each node can be flexibly configured.

In step 1, when initiating a lock preemption, a distributed lock controller shared by multiple data processing subunits continuously queries whether first lock information of the multiple data processing subunits exists in a distributed cache, wherein the distributed lock also has reentry, that is, the distributed lock has the ability of locking again under the condition of holding the lock and the ability of locking again by a non-lock holder, the key point of realizing the distributed lock is a lock preemption mechanism, and the core idea is as follows: the first person changes the data into own identification, such as service IP, and the later person finds that the identification exists, and then the lock is failed to be preempted and the waiting is continued. And when the first person finishes the execution method, clearing the mark and continuing to rob the lock by other persons. When the lock is robbed, a distributed lock controller is respectively arranged among the data processing subunits in each data processing service unit, or all the data processing subunits in each data processing service unit as a whole, or all the data processing subunits in all the data processing service units in each node as a whole, and the distributed lock controller is used for continuously inquiring whether the first lock information of a plurality of data processing subunits exists in the distributed cache. The information content contained in the first lock information distributed lock object comprises a lock identification, a creation timestamp, an update timestamp, a node identification, a universal unique identification code and a processing mark. The distributed lock identification has a one-to-one correspondence with the data processing subunits it locks. And judging whether the distributed lock exists or not according to whether a corresponding key exists in Redis or not.

In the present embodiment, the first case: each node is provided with a plurality of data processing service units, the same data processing service unit of each node in the cluster forms a service cluster, each data processing service unit comprises a plurality of data processing subunits with different functions, and each subunit comprises a distributed lock controller and a state controller. The method is suitable for determining the main and standby data processing subunits.

In the second case, all data processing subunits of one or several data processing service units share one distributed lock controller and one service state controller. The method is suitable for determining the main and standby data processing service units.

In a third case, all data processing subunits of all data processing service units on a node share a distributed lock. The method is suitable for determining the main node and the standby node of the node.

Step 2, if the first lock information exists, reading the first lock information in the distributed cache and judging whether the node identification of the first lock information is consistent with the node where the data processing subunit is located;

in step 2, by reading the first lock information, it is determined whether the owner of the lock is the node according to the first lock information, and it is determined whether the node identifier in the read first lock information is consistent with the node identifier, if so, the owner of the lock is the node, and if not, the owner of the lock is not the node.

Step 3, if the node identifiers are consistent and the processing marks are processing, second lock information is created and the data processing state of the service state controller is set to be a first state, and the second lock information is the first lock information which is updated by using the latest updating time stamp and the universal unique identification code;

in this step, it is further determined whether the node identifier in the first lock information is a process, and if the node identifier is a process, it indicates that the server, the application, the thread, or the process is processing data at this time, and the lock cannot be released at this time, so that the lock occupancy needs to be strengthened. At this time, a new lock object is constructed, namely the second lock object carries second lock information, the lock identifier in the second lock information is the same data processing subunit identifier, the creation timestamp is the first lock information creation time of the read first lock object, the update timestamp is the current time, the node identifier is the node identifier of the node, the universal unique identifier code is a new random universal unique identifier code, the processing mark is processing, the lock object is written into Redis, and the data processing state is set to be a first state, namely Active.

And 4, writing the second lock information into the distributed cache, and processing the data by taking a data processing subunit of which the data processing state is the first state as a main state, or taking a data processing service unit as the main state, or taking a node as the main state.

In step 4, after the second lock information is created, the second lock information obtained from the Redis is the latest lock information of the latest data processing subunit, and it is determined whether the data processing subunit continuously occupies the lock by determining whether the node identifier in the second lock information is consistent with the node identifier. Meanwhile, the data processing state of the data processing subunit is set to be a first state, namely Active, which means that the data processing subunit is a main processing unit, and the other data processing subunits are standby processing units.

In summary, in the first case, the cluster includes a plurality of data processing nodes, each node has a plurality of data processing service units, the same data processing service unit of each node in the cluster forms a service cluster, each data processing service unit includes a plurality of data processing sub-units with different functions, each data processing sub-unit includes a distributed lock controller and a state controller, and the processing method of the distributed lock controller includes: newly building a lock, and writing the distributed lock object into a distributed cache after the distributed lock object is built by the initiating node; updating the lock, wherein the initiating node continuously establishes a new distributed lock object and updates the lock object in the distributed cache; replacing the lock, judging the validity of the distributed lock, if the distributed lock is invalid, establishing a new distributed lock object, and updating the lock object in the distributed cache; the processing method of the state controller is state maintenance, the data processing state of the subunit is maintained, the data processing subunit is in a main state, only the subunit with the Active state performs corresponding data processing in the data processing service units on all nodes in the service cluster, and other non-Active subunits are in a standby state and do not perform corresponding data processing. The second case and the third case are similar to the first case.

In a possible implementation manner provided by the present invention, if there is no first lock information of the data processing subunit, the first lock information of the data processing subunit is written into the distributed cache and the data processing state of the node is set to the second state.

In the embodiment provided by the present invention, as shown in fig. 3, in the first case, the distributed lock controller of the data processing sub-unit sets the state to be the second state, namely Passive, if the lock does not exist while reading the first lock information, and the second case and the third case are similar to the first case.

In a possible implementation manner provided by the present invention, if the node identifier of the first lock information is inconsistent with the node where the data processing subunit is located, the data processing state is set to be the second state, that is, passive, and it is determined whether the continuously read second lock information is consistent, if so, the count of the lock invariance counter is set to be one more, and if the count of the lock invariance counter exceeds the preset value, the third lock information is created.

In the embodiment provided by the present invention, as shown in fig. 3, if the node identifier of the first lock information is not consistent with the node where the data processing subunit is located, and the owner of the lock is not the node at this time, the state controller sets the state to the second state, that is, pass, and the distributed lock controller determines whether the second lock information that is continuously read is consistent, and if the second lock information is consistent, the distributed lock controller sets the count of the lock invariance counter to be incremented by one, which indicates that the lock is faulty, and may be caused by a server, an application program, a thread, or a process, and the count of the lock invariance counter is incremented by one, and if the count of the lock invariance counter exceeds a preset value, a lock is created, that is, third lock information is created, and the lock identifier in the lock information content of the distributed lock controller is the identifier of the corresponding data processing subunit, the creation timestamp is the current time, the update timestamp is the current time, the node identifier of the node identifier, the unique random unique identifier, and the processing identifier of the current node are new unique identifier, and the write operation is performed.

The method for judging the continuous changeability of the lock information content comprises the following steps that the node identification of the first lock information is inconsistent with the node where the data processing subunit is located: reading the information content of the lock object, comparing the information content with the information content of the object read last time, and initializing the value of the lock invariance counter to an initial value if the information content of the object is inconsistent; if the object information content is consistent with the last read object information content, adding 1 to the lock invariance counter value; if the value of the lock invariance counter is greater than a preset threshold value, which is 3 in the present embodiment, the content of the lock object information is considered to be continuously unchanged; the comparison comprises comparing the values of the creation timestamp, the update timestamp, the node identifier, the universal unique identification code and the processing mark one by one, and if one is inconsistent, the result is considered to be inconsistent.

In a possible implementation manner provided by the present invention, if the second lock information read continuously is inconsistent, the first lock information read most recently is saved and the count of the lock invariance counter is cleared, and the data processing subunit is in a lock snatching state.

In the embodiment provided by the present invention, as shown in fig. 3, it is described at this time that the lock is in use, no operation is performed, the counter is cleared, and no person-writing operation is performed.

In one possible embodiment, the plurality of digital processing subunits comprises at least one digital processing subunit.

In the embodiment provided by the present invention,

the number of data processing subunits is not limited.

In one possible embodiment, the present invention further includes:

when data is processed at a new node, fourth lock information is created and written into the distributed cache, the lock identifier of the fourth lock information is a new data processing subunit identifier, the creation timestamp is the current time, the update timestamp is the current time, the node identifier is a node identifier of the new node, the universal unique identifier is a new random universal unique identifier, and the processing mark is processing.

In the embodiment provided by the present invention,

if the data is required to be processed at a certain node, namely when external intervention is carried out, a new lock object is constructed, the lock identifier in the information content of the new lock object is the identifier of the corresponding data processing subunit, the creation timestamp is the current time, the update timestamp is also the current time, the node identifier is the node identifier of the node, the universal unique identifier code is a new random universal unique identifier code, and the processing mark is used for processing, so that the lock writing operation is carried out; if the subunit which is originally in the Active state needs to be excluded, the lock object is read, a new lock object is constructed, the creation timestamp of the new lock object is the creation timestamp in the read lock object information content, the update timestamp of the new lock object is also the current time, the node identification is the node identification in the read lock object information content, the universal unique identification code is the new random universal unique identification code, the processing mark is unprocessed, and the write-lock operation is performed.

In one possible embodiment provided by the present invention,

in the embodiment provided by the invention, if the subunit in the first state is required to be excluded, the lock object is read, a new lock object is constructed, the creation timestamp of the new lock object is the creation timestamp in the read lock object information content, the update timestamp is also the current time, the node identifier is the node identifier in the read lock object information content, the universal unique identifier is a new random universal unique identifier, the processing flag is unprocessed, the write-lock operation is carried out, and if the node identifier in the lock object is consistent with the node identifier and the processing flag is unprocessed, the state is the third state, namely Unready; and if the node identification in the lock object is not consistent with the node identification, the state is set to be a second state, namely Passive.

As shown in fig. 4, another aspect of the present invention provides a distributed data processing cluster high availability apparatus 200, including:

the first judging module 201 is configured to continuously query whether first lock information of the multiple data processing subunits exists in the distributed cache when a distributed lock controller shared by the multiple data processing subunits initiates lock robbing, where the first lock information includes a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identifier, and a processing flag;

a second determining module 202, configured to, if there is the first lock information, read the first lock information in the distributed cache and determine whether a node identifier of the first lock information is consistent with a node where the data processing subunit is located;

the state identification module 203 is configured to create second lock information and set the data processing state of the service state controller to be the first state if the node identifiers are consistent and the processing flag is processing, where the second lock information is the first lock information updated by using the latest update timestamp and the universal unique identification code;

the Active-standby determining module 204 is configured to write the second lock information into the distributed cache, and use a data processing subunit with a data processing state of Active as the master state, or use a data processing service unit as the master state, or use a node as the master state to process data.

In yet another embodiment provided by the present invention, an apparatus is also provided, which includes a processor and a memory storing at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement the distributed data processing cluster high availability method described in the embodiments of the present invention.

In yet another embodiment provided by the present invention, a computer readable storage medium is also provided, in which at least one instruction, at least one program, code set, or instruction set is stored, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the distributed data processing cluster high availability method described in the embodiment of the present invention.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes a plurality of computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to be performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes an integration of multiple available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A distributed data processing cluster high-availability method is applied to a data processing subunit, or a plurality of data processing service units, or a plurality of nodes processed by a plurality of data clusters, wherein each node is provided with a plurality of different data processing service units and a distributed cache, each data processing service unit comprises a plurality of data processing subunits, the cluster comprises a plurality of data processing service units with the same functions of the nodes, and the method is characterized in that the nodes are provided with distributed lock controllers and service state controllers, and comprises the following steps:

when the distributed lock controller shared by the plurality of data processing subunits initiates lock grabbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in the distributed cache or not, wherein the first lock information comprises a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identification code and a processing mark;

if the node identifiers are consistent and the processing marks are processing, creating second lock information and setting the data processing state of the service state controller to be a first state, wherein the second lock information is the first lock information which is updated by using the latest update timestamp and the universal unique identification code;

and writing the second lock information into the distributed cache, and taking the data processing subunit with the data processing state being the first state as a main state, or taking the data processing service unit as a main state, or taking the node as a main state to process the data.

2. The distributed data processing cluster high availability method of claim 1, wherein if there is no first lock information for said data processing subunit, writing a first lock information for said data processing subunit into said distributed cache and setting said data processing state of said node to a second state.

3. The distributed data processing cluster high availability method according to claim 1, wherein if the node identification of the first lock information is not consistent with the node where the data processing subunit is located, setting the data processing state to a second state and determining whether the second lock information read continuously is consistent, if so, setting a count of a lock invariance counter to be increased by one, and if the count of the lock invariance counter exceeds a preset value, creating third lock information.

4. The distributed data processing cluster high availability method according to claim 3, wherein if the second lock information read continuously is inconsistent, the first lock information read most recently is saved and the count of the lock invariance counter is cleared, and the data processing subunit is in a lock snatching state.

5. The distributed data processing cluster high availability method of claim 1, wherein said plurality of digital processing sub-units comprises at least two of said digital processing sub-units.

6. The distributed data processing cluster high availability method of claim 1, further comprising:

7. The distributed data processing cluster high availability method of claim 1, wherein fifth lock information is newly created when the data processing state is changed.

8. A distributed data processing cluster high-availability device is located on a data processing subunit, or a plurality of data processing service units, or a plurality of nodes that a plurality of data clusters handled, every be equipped with a plurality of differences on the node data processing service unit and a distributed cache, every data processing service unit contains a plurality ofly the data processing subunit, the cluster includes a plurality of the function of node is the same the data processing service unit, its characterized in that, be equipped with distributed lock controller and service state controller on the node, the device includes:

the first judging module is used for continuously inquiring whether first lock information of a plurality of data processing subunits exists in a distributed cache or not when the distributed lock controller shared by the plurality of data processing subunits initiates lock grabbing, wherein the first lock information comprises a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identification code and a processing mark;

a second determining module, configured to, if the first lock information exists, read the first lock information in the distributed cache and determine whether the node identifier of the first lock information is consistent with the node where the data processing subunit is located;

a state identification module, configured to create second lock information and set a data processing state of the service state controller to a first state if the node identifiers are consistent and the processing flag is processing, where the second lock information is first lock information updated by using the latest update timestamp and the universal unique identification code;

and the master-slave determining module is used for writing the second lock information into the distributed cache, and taking the data processing subunit with the data processing state being the first state as a master state, or taking the data processing service unit as a master state, or taking the node as a master state to process data.

9. An electronic device, comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the distributed data processing cluster high availability method of any one of claims 1-7.

10. A computer readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the distributed data processing cluster high availability method of any one of claims 1-7.