CN115878696B - High availability method and device for distributed data processing cluster - Google Patents

High availability method and device for distributed data processing cluster Download PDF

Info

Publication number
CN115878696B
CN115878696B CN202310201739.8A CN202310201739A CN115878696B CN 115878696 B CN115878696 B CN 115878696B CN 202310201739 A CN202310201739 A CN 202310201739A CN 115878696 B CN115878696 B CN 115878696B
Authority
CN
China
Prior art keywords
data processing
lock
state
node
lock information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310201739.8A
Other languages
Chinese (zh)
Other versions
CN115878696A (en
Inventor
张军朋
王元
张乐
伍斯
罗盛君
孙振宇
李大鹏
李晓伟
李楠
张汉勇
郭延臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Xian Satellite Control Center
Original Assignee
China Xian Satellite Control Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Xian Satellite Control Center filed Critical China Xian Satellite Control Center
Priority to CN202310201739.8A priority Critical patent/CN115878696B/en
Publication of CN115878696A publication Critical patent/CN115878696A/en
Application granted granted Critical
Publication of CN115878696B publication Critical patent/CN115878696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a high availability method and a device for a distributed data processing cluster, comprising the following steps: when a distributed lock controller shared by a plurality of data processing subunits initiates a lock robbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache, if so, reading the first lock information in the distributed cache and judging whether node identifications of the first lock information are consistent, if so, creating second lock information and setting a data processing state of a service state controller to be active, wherein the second lock information is the first lock information updated by using the latest update time stamp and the universal unique identification code, storing the second lock information, and processing the data by using one data processing subunit or one data processing service unit or one node as a main state. The method can solve the usability problem of the distributed lock under the condition of discontinuous cluster time.

Description

High availability method and device for distributed data processing cluster
Technical Field
The invention relates to the technical field of high availability of data processing systems, in particular to a method and a device for high availability of a distributed data processing cluster.
Background
In order to ensure the reliability of data processing, the traditional data processing system often adopts a main-standby mode taking a node as a center, the mode needs two nodes to communicate with each other for data interaction, the data processing is only performed on the main node, the standby node synchronizes the state of the main node, and the synchronized content comprises information such as a main-standby mark, a data processing state and the like. Along with the rapid rise of the scale and complexity of data processing, the defects of the traditional centralized data processing system are developed gradually, the function of a standby machine is not fully exerted, the system is involuntarily controllable, the expansibility is not high, the deployment is not flexible enough, and the development requirements of more and more complex and diversified data processing cannot be met.
In order to overcome the disadvantages of the conventional centralized data processing system architecture, the data processing system may be designed as a distributed architecture, and the system may be functionally split into a plurality of data processing service units, each data processing unit including a number of data processing sub-units, which operate in a clustered environment. A data processing service unit operating in a clustered environment needs to address data processing consistency issues as well as high availability issues. The same data cannot be processed multiple times, otherwise it would have disastrous consequences. For data processing service units in a clustered environment, it is a good idea to solve these problems with distributed locks.
At present, there are many distributed lock solutions at home and abroad, and the common lock solutions are divided into two types, namely, lock services based on a distributed consistency algorithm, such as Zookeeper, chubby; one is a distributed cache implementation based lock service and its variants, such as a lock service implemented using Redis and a RedLock based on Redis, etc. The distributed lockset based on the distributed cache realization such as Redis has the characteristics of simplicity, reliability, high efficiency and the like, and can realize a high-availability distributed data processing system based on the design. However, a conventional distributed lock implemented based on a distributed cache generally determines whether a lock exists based on the existence of a key by setting the validity period of the key. The distributed lock realized by the method has higher requirements on the time continuity and consistency of all nodes of the cluster, and if the time between the nodes is inconsistent and the nodes synchronize time frequently from a time reference source, the distributed lock can be frequently disabled, so that the service state switching which cannot happen is caused to the system data processing.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, a first aspect of the present invention proposes a distributed data processing cluster high availability method applied to a plurality of data processing subunits of data processing clusters, or a plurality of data processing service units, or a plurality of nodes, each node being provided with a plurality of different data processing service units and a distributed cache, each data processing service unit comprising a plurality of data processing subunits, the clusters comprising a plurality of data processing service units of the same function of the nodes, the nodes being provided with distributed lock controllers and service state controllers, the method comprising:
when a distributed lock controller shared by a plurality of data processing subunits initiates a lock robbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache or not, wherein the first lock information comprises a lock identifier, a creation time stamp, an update time stamp, a node identifier, a universal unique identifier code and a processing mark;
if the first lock information exists, the first lock information in the distributed cache is read, and whether the node identification of the first lock information is consistent with the node where the data processing subunit is located is judged;
if the node identifiers are consistent and the processing marks are processing, creating second lock information and setting the data processing state of the service state controller to be a first state, wherein the second lock information is first lock information updated by using the latest update time stamp and the universal unique identification code;
and writing the second lock information into the distributed cache, and processing the data by taking one data processing subunit with the data processing state being the first state as a main state, or taking one data processing service unit as a main state, or taking one node as the main state.
Further, if the first lock information of the data processing subunit does not exist, the first lock information of the data processing subunit is written into the distributed cache, and the data processing state of the node is set to be a second state.
Further, if the node identification of the first lock information is inconsistent with the node where the data processing subunit is located, setting the data processing state to be the second state and judging whether the continuously read second lock information is consistent, if so, setting the count of the lock invariance counter to be increased by one, and if the count of the lock invariance counter exceeds a preset value, creating third lock information.
Further, if the second lock information read continuously is inconsistent, the latest first lock information read is saved, the count of the lock invariance counter is cleared, and the data processing subunit is in a lock robbing state.
Further, the plurality of digital processing subunits includes at least one digital processing subunit.
Further, the method further comprises the following steps:
when data is processed at a new node, fourth lock information is created and written into the distributed cache, a lock identifier of the fourth lock information is a new data processing subunit identifier, a creation time stamp is the current time, an update time stamp is the current time, the node identifier is a node identifier of the new node, the universal unique identifier is a new random universal unique identifier, and the processing mark is processing.
Further, when the data processing state is changed, fifth lock information is newly built.
In another aspect, the present invention provides a distributed data processing cluster high availability device, located on a plurality of data processing subunits, or a plurality of data processing service units, or a plurality of nodes, where each node is provided with a plurality of different data processing service units and a distributed cache, each data processing service unit includes a plurality of data processing subunits, the cluster includes a plurality of data processing service units with the same function of the nodes, and the nodes are provided with distributed lock controllers and service state controllers, and the device includes;
the first judging module is used for continuously inquiring whether first lock information of the plurality of data processing subunits exists in the distributed cache when the distributed lock controller shared by the plurality of data processing subunits initiates the lock robbing, wherein the first lock information comprises a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identifier code and a processing mark;
the second judging module is used for reading the first lock information in the distributed cache and judging whether the node identification of the first lock information is consistent with the node where the data processing subunit is located if the first lock information exists;
the state identification module is used for creating second lock information and setting the data processing state of the service state controller to be a first state if the node identifications are consistent and the processing marks are processing, wherein the second lock information is the first lock information updated by the latest update time stamp and the universal unique identification code;
and the main and standby determining module is used for writing the second lock information into the distributed cache, and processing the data by taking one data processing subunit with the data processing state being the first state as a main state, one data processing service unit as a main state or one node as a main state.
The present invention also provides an electronic device comprising a processor and a memory in which at least one instruction, at least one program, code set, or instruction set is stored, the at least one instruction, at least one program, code set, or instruction set being loaded and executed by the processor to implement the distributed data processing cluster high availability method of the first aspect.
The present invention also provides a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, at least one program, code set, or instruction set being loaded and executed by a processor to implement the distributed data processing cluster high availability method of the first aspect.
The embodiment of the invention provides a high availability method and a device for a distributed data processing cluster, which have the following beneficial effects compared with the prior art:
1. the invention judges the validity of the distributed lock, not only judges the existence of the lock, but also judges the mechanism of the continuous change of the information content of the lock object, thereby solving the usability problem of the distributed lock under the condition of discontinuous cluster time;
2. the high-availability mechanism method provided by the invention is not limited by the number of nodes;
3. the high availability method provided by the invention can independently perform high availability control on a plurality of data processing subunits in each data processing service unit, and the high availability control is not limited to a single data processing service unit or the whole node.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the following description will make a brief introduction to the drawings used in the description of the embodiments or the prior art. It should be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained from these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a flow chart of a method for high availability of a distributed data processing cluster according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data processing cluster structure of a method for enhancing availability of a distributed data processing cluster according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for determining availability of a distributed data processing cluster according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a distributed data processing cluster high availability device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment).
As shown in fig. 1-3, the present invention provides a high availability method of a distributed data processing cluster, applied to a plurality of data processing subunits of data processing cluster processing, or a plurality of data processing service units, or a plurality of nodes, each node being provided with a plurality of different data processing service units and a distributed cache, each data processing service unit comprising a plurality of data processing subunits, the cluster comprising a plurality of data processing service units of the same function of the nodes, the nodes being provided with a distributed lock controller and a service state controller, the method comprising:
step 1, when a distributed lock controller shared by a plurality of data processing subunits initiates a lock robbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache or not, wherein the first lock information comprises a lock identifier, a creation time stamp, an update time stamp, a node identifier, a universal unique identifier code and a processing mark;
it should be noted that, as shown in fig. 2, a cluster may include a node 101, a node 102, a node 103, and a network switch 113. Redis service 104, redis service 105, and Redis service 106 are respectively executed in node 101, node 102, and node 103. The nodes 101, 102, and 103 run data processing service units 107, 108, and 109, respectively, 107, 108, and 109. The node 101, the node 102, and the node 103 respectively run a data processing service unit 110, a data processing service unit 111, and a data processing service unit 112, where the data processing service unit 107, the data processing service unit 108, and the data processing service unit 109 form a first service cluster, and the data processing service unit 110, the data processing service unit 111, and the data processing service unit 112 form a second service cluster.
The data processing service unit 107 includes a data processing subunit 1071 and a data processing subunit 1072, the data processing service unit 108 includes a data processing subunit 1081 and a data processing subunit 1082, and the data processing service unit 109 includes a data processing subunit 1091 and a data processing subunit 1092; the data processing service unit 110 includes a data processing subunit 1101 and a data processing subunit 1102, the data processing service unit 111 includes a data processing subunit 1111 and a data processing subunit 1112, and the data processing service unit 112 includes a data processing subunit 1121 and a data processing subunit 1122. The same class of data subunits have a high availability for data processing.
Nodes 101, 102, 103 are connected to each other by a network switch 113. Node 101, node 102 and node 103 may be x86_64, MIPS64, servers of ARM and other architecture, industrial personal computers, PCs, and the operating system running on them is a linux operating system capable of supporting its architecture. Redis service 104, redis service 105, and Redis service 106 constitute a Redis high availability cluster. It should be appreciated that the composition of Redis service 104, redis service 105, and Redis service 106 into a Redis high availability cluster is illustrative only and the present invention does not necessarily require the use of a Redis service storage distributed lock.
It should be appreciated that the number of nodes of the cluster in FIG. 1, as well as the number of data processing service units running on each node, the number of subunits in each data processing service unit is illustrative and that the data processing service units running on each node need not be identical. According to actual needs, the data processing service unit can have any number of nodes, and the data processing service unit running on each node can be flexibly configured.
In step 1, when a distributed lock controller shared by a plurality of data processing subunits initiates a lock robbing, continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache, wherein the distributed lock also has reentrant property, namely has the capability of locking again under the condition of holding the lock and the capability of locking again by a non-holding person, the key of realizing the distributed lock is a lock robbing mechanism, and the core idea is that: the former person changes the data into own identification, such as service IP, and the latter person finds that the identification exists, and then the lock is robbed and failed to wait. And (5) ending the execution method of the first people, clearing the mark, and continuing to lock by other people. When the lock is robbed, a distributed lock controller is respectively arranged among the data processing subunits in each data processing service unit, or all the data processing subunits in each data processing service unit are taken as a whole, or all the data processing subunits in all the data processing service units in each node are taken as a whole, and whether first lock information of a plurality of data processing subunits exists in the distributed cache is continuously inquired through the distributed lock controller. The information content contained in the first lock information distributed lock object comprises a lock identifier, a creation time stamp, an update time stamp, a node identifier, a universal unique identifier code and a processing mark. The distributed lock identifiers have a one-to-one correspondence with the data processing subunits they lock. The determination of whether a distributed lock exists is based on whether there is a corresponding key in Redis.
In the present embodiment, the first case: each node is provided with a plurality of data processing service units, the same data processing service unit of each node in the cluster forms a service cluster, each data processing service unit comprises a plurality of data processing subunits with different functions, and each subunit comprises a distributed lock controller and a state controller. The method is suitable for primary and backup determination of the data processing subunit.
In the second case, all data processing subunits of one or more data processing service units share a distributed lock controller and a service state controller. The method is suitable for primary and backup determination of the data processing service unit.
In the third case, all data processing subunits of all data processing service units on a node share a distributed lock. The method is suitable for primary and backup determination of the node.
Step 2, if the first lock information exists, the first lock information in the distributed cache is read, and whether the node identification of the first lock information is consistent with the node where the data processing subunit is located is judged;
in step 2, by reading the first lock information, whether the owner of the lock is the own node is judged according to the first lock information, whether the node identification in the read first lock information is consistent with the own node identification is judged according to the judgment, if so, the owner of the lock is the own node, and if not, the owner of the lock is not the own node.
Step 3, if the node identifiers are consistent and the processing marks are processing, creating second lock information and setting the data processing state of the service state controller to be a first state, wherein the second lock information is first lock information updated by using the latest update time stamp and the universal unique identification code;
in this step, it is further determined whether the node identifier in the first lock information is processed, and if so, it is indicated that the server, the application, the thread, or the process is processing data at this time, and at this time, the lock cannot be released, and it is necessary to strengthen the lock occupancy. At this time, a new lock object, namely a second lock object, is constructed, the second lock object carries second lock information, a lock identifier in the second lock information is the same data processing subunit identifier, a creation time stamp is the first lock information creation time of the first lock object which is already read, an update time stamp is the current time, a node identifier is a node identifier of the node, a universal unique identifier code is a new random universal unique identifier code, a processing mark is processed, the lock object is written into Redis, and a data processing state is set to be a first state, namely Active.
And 4, writing the second lock information into the distributed cache, and processing the data by taking one data processing subunit with the data processing state as the first state as the main state, or taking one data processing service unit as the main state, or taking one node as the main state.
In step 4, after the second lock information is created, the second lock information obtained from the Redis is the latest lock information of the current data processing subunit, and whether the current data processing subunit is continuously occupied by the lock is determined by judging whether the node identifier in the second lock information is consistent with the current node identifier. Meanwhile, the data processing state of the data processing subunit is set to be a first state, namely Active, wherein the data processing subunit is represented as a main processing unit, and other data processing subunits are standby processing units.
In summary, in a first case, a cluster includes a plurality of data processing nodes, each node includes a plurality of data processing service units, the same data processing service unit of each node in the cluster forms a service cluster, each data processing service unit includes a plurality of data processing subunits with different functions, each data processing subunit includes a distributed lock controller and a state controller, and a processing method of the distributed lock controller includes: newly building a lock, and writing a distributed lock object into a distributed cache after the initiating node builds the distributed lock object; updating the lock, wherein the initiating node continuously establishes a new distributed lock object and updates the lock object in the distributed cache; changing the lock, judging the validity of the distributed lock, if the distributed lock is invalid, establishing a new distributed lock object, and updating the lock object in the distributed cache; the processing method of the state controller is state maintenance, the data processing state of the sub-units is maintained, the data processing sub-units are in a main state, only the sub-units with the states of Active are used for corresponding data processing, and other non-Active sub-units are in standby states and do not perform corresponding data processing in the data processing service units on all nodes in the service cluster. The second case and the third case are similar to the first case.
In one possible implementation manner provided by the invention, if the first lock information of the data processing subunit does not exist, the first lock information of the data processing subunit is written into the distributed cache, and the data processing state of the node is set to be the second state.
In the embodiment provided in the present invention, as shown in fig. 3, in the first case, when the distributed lock controller of the data processing subunit reads the first lock information, if the lock is not present, the state controller sets the state to the second state, i.e., the Passive, and the second case and the third case are similar to the first case.
In one possible implementation manner provided by the invention, if the node identifier of the first lock information is inconsistent with the node where the data processing subunit is located, the data processing state is set to be the second state, i.e. the Passive state, and whether the continuously read second lock information is consistent is judged, if so, the count of the lock invariance counter is set to be increased by one, and if the count of the lock invariance counter exceeds a preset value, third lock information is created.
In the embodiment provided by the invention, as shown in fig. 3, the distributed lock controller of the data processing subunit judges that if the node identifier of the first lock information is inconsistent with the node where the data processing subunit is located, at this time, the lock owner is not the node, the state controller sets the state to be the second state, i.e. Passive, the distributed lock controller judges whether the continuously read second lock information is consistent, if so, the count of the lock invariance counter is set to be increased by one, at this time, the lock is indicated to be faulty, possibly because the server, the application program, the thread or the process has a problem, the count of the lock invariance counter is increased by one, if the count of the lock invariance counter exceeds a preset value, the lock is newly established, i.e. the third lock information is created, the lock identifier in the lock information content is identified with the corresponding data processing subunit, the creation time stamp is the current time, the node identifier of the node identifier is the node identifier of the node, the universal unique identifier is a new random unique identifier, and the processing identifier is processed, and the lock operation is performed.
The method for judging the continuous changeability of the lock information content comprises the following steps that the node identification of the first lock information is inconsistent with the node where the data processing subunit is located: reading the lock object information content, comparing the lock object information content with the object information content read last time, and initializing the value of the lock invariance counter to be an initial value if the object information content is inconsistent; if the object information content is consistent with the object information content read last time, adding 1 to the lock invariance counter value; if the lock invariance counter value is larger than a certain preset threshold value, and the threshold value is 3 in the embodiment, the information content of the lock object is considered to be continuously unchanged; the comparison comprises a comparison of the values of the creation time stamp, the update time stamp, the node identification, the universal unique identification code and the processing mark, and if there is an inconsistency, the result is considered to be inconsistent.
In one possible implementation manner provided by the invention, if the second lock information read continuously is inconsistent, the latest first lock information read is saved, the count of the lock invariance counter is cleared, and the data processing subunit is in a locking state.
In the embodiment provided by the invention, as shown in fig. 3, the lock is used, no operation is performed, the counter is cleared, and no operation is performed by a writer.
In one possible embodiment of the present invention, the plurality of digital processing subunits includes at least one digital processing subunit.
In the embodiments provided by the present invention,
the number of data processing subunits is not limited.
In one possible embodiment provided by the present invention, the method further includes:
when data is processed at a new node, fourth lock information is created and written into the distributed cache, a lock identifier of the fourth lock information is a new data processing subunit identifier, a creation time stamp is the current time, an update time stamp is the current time, the node identifier is a node identifier of the new node, the universal unique identifier is a new random universal unique identifier, and the processing mark is processing.
In the embodiments provided by the present invention,
if the data is needed to be processed at a certain node, namely, when external intervention is performed, a new lock object is constructed, a lock mark in the information content of the new lock object is processed by a corresponding data processing subunit mark, a creation time stamp is the current time, an update time stamp is the current time, a node mark is the node mark of the node, a universal unique identification code is a new random universal unique identification code, and a processing mark is processed, so that the write lock operation is performed; if the subunit which is in the Active state originally needs to be excluded, the lock object is read, a new lock object is constructed, the creation time stamp is the creation time stamp in the information content of the read lock object, the update time stamp is the current time, the node identification is the node identification in the information content of the read lock object, the universal unique identification code is the new random universal unique identification code, and the processing mark is not processed, and the write lock operation is carried out.
In one possible embodiment provided by the present invention,
in the embodiment provided by the invention, if the subunit which is in the first state originally needs to be excluded, the lock object is read, a new lock object is constructed, the creation time stamp of the new lock object is the creation time stamp in the information content of the read lock object, the update time stamp is also the current time, the node identifier is the node identifier in the information content of the read lock object, the universal unique identifier code is the new random universal unique identifier code, and the processing mark is not processed, the write lock operation is performed, and if the node identifier in the lock object is consistent with the node identifier and the processing mark is not processed, the state is the third state, namely Unready; if the node identification in the lock object is inconsistent with the node identification, the state is set to a second state, namely a Passive state.
As shown in fig. 4, another aspect of the present invention provides a distributed data processing cluster high availability apparatus 200, comprising:
a first judging module 201, configured to continuously query whether first lock information of the plurality of data processing subunits exists in the distributed cache when a distributed lock controller shared by the plurality of data processing subunits initiates a lock rob, where the first lock information includes a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identifier code, and a processing flag;
a second judging module 202, configured to, if the first lock information exists, read the first lock information in the distributed cache and judge whether a node identifier of the first lock information is consistent with a node where the data processing subunit is located;
the state identification module 203 is configured to create second lock information and set a data processing state of the service state controller to a first state if the node identifiers are consistent and the processing flag is processing, where the second lock information is first lock information updated by using the latest update timestamp and the universal unique identifier;
the primary and standby determining module 204 is configured to write the second lock information into the distributed cache, and process the data with one data processing subunit with an Active data processing state as a primary state, one data processing service unit as a primary state, or one node as a primary state.
In yet another embodiment of the present invention, there is also provided an apparatus including a processor and a memory storing at least one instruction, at least one program, a set of codes, or a set of instructions loaded and executed by the processor to implement the distributed data processing cluster high availability method described in the embodiments of the present invention.
In yet another embodiment of the present invention, a computer readable storage medium is provided, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored, where the at least one instruction, the at least one program, the set of codes, or the set of instructions are loaded and executed by a processor to implement the distributed data processing cluster high availability method described in the embodiments of the present invention.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes a plurality of computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of a plurality of available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

1. A distributed data processing cluster high availability method applied to a plurality of data processing subunits of data cluster processing, or a plurality of data processing service units, or a plurality of nodes, wherein each node is provided with a plurality of different data processing service units and a distributed cache, each data processing service unit comprises a plurality of data processing subunits, and the cluster comprises a plurality of data processing service units with the same functions of the nodes, and the method is characterized in that the nodes are provided with a distributed lock controller and a service state controller, and the method comprises the following steps:
when the distributed lock controller shared by the plurality of data processing subunits initiates a robbery lock, continuously inquiring whether first lock information of the plurality of data processing subunits exists in the distributed cache or not, wherein the first lock information comprises a lock identifier, a creation timestamp, an update timestamp, a node identifier, a universal unique identifier code and a processing mark;
if the first lock information exists, the first lock information in the distributed cache is read, and whether the node identification of the first lock information is consistent with the node where the data processing subunit is located is judged;
if the node identifiers are consistent and the processing marks are processing, creating second lock information and setting the data processing state of the service state controller to be a first state, wherein the second lock information is first lock information updated by using the latest update time stamp and the universal unique identification code;
and writing the second lock information into the distributed cache, and processing the data by taking the data processing subunit with the data processing state being the first state as a main state, or taking the data processing service unit as the main state, or taking the node as the main state.
2. A distributed data processing cluster high availability method as defined in claim 1, wherein if there is no first lock information for the data processing subunit, writing the first lock information for the data processing subunit into the distributed cache and setting the data processing state of the node to a second state.
3. The method of claim 1, wherein if the node identifier of the first lock information is inconsistent with the node in which the data processing subunit is located, setting the data processing state to a second state and determining whether the continuously read second lock information is consistent, if so, setting a count of a lock invariance counter to be incremented by one, and if the count of the lock invariance counter exceeds a preset value, creating third lock information.
4. A distributed data processing cluster high availability method as claimed in claim 3, wherein said second lock information read in succession is inconsistent, said first lock information read last is saved and the count of said lock invariance counter is cleared, said data processing subunit being in a preemptive lock state.
5. A distributed data processing cluster high availability method according to claim 1, wherein said plurality of digital processing subunits comprises at least two of said digital processing subunits.
6. A distributed data processing cluster high availability method as recited in claim 1, further comprising:
when data is processed in a new node, fourth lock information is created and written into the distributed cache, a lock identifier of the fourth lock information is a new data processing subunit identifier, a creation time stamp is the current time, an update time stamp is the current time, the node identifier is a node identifier of the new node, a universal unique identifier is a new random universal unique identifier, and a processing mark is processing.
7. A distributed data processing cluster high availability method according to claim 1, wherein fifth lock information is newly established upon changing the data processing state.
8. A distributed data processing cluster high availability device located on a plurality of data processing subunits, or a plurality of data processing service units, or a plurality of nodes, each of said nodes having a plurality of different data processing service units and a distributed cache, each of said data processing service units comprising a plurality of said data processing subunits, said cluster comprising a plurality of said data processing service units of the same function of said nodes, characterized in that said nodes have distributed lock controllers and service state controllers, said device comprising:
the first judging module is used for continuously inquiring whether first lock information of the plurality of data processing subunits exists in a distributed cache when the distributed lock controller shared by the plurality of data processing subunits initiates a robbery lock, wherein the first lock information comprises a lock identifier, a creation time stamp, an update time stamp, a node identifier, a universal unique identifier code and a processing mark;
the second judging module is used for reading the first lock information in the distributed cache and judging whether the node identification of the first lock information is consistent with the node where the data processing subunit is located if the first lock information exists;
the state identification module is used for creating second lock information and setting the data processing state of the service state controller to be a first state if the node identifications are consistent and the processing mark is processing, wherein the second lock information is first lock information updated by the latest update time stamp and the universal unique identification code;
and the main and standby determining module is used for writing the second lock information into the distributed cache, and processing the data by taking the data processing subunit with the data processing state being the first state as a main state, or taking the data processing service unit as the main state, or taking the node as the main state.
9. An electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set that is loaded and executed by the processor to implement the distributed data processing cluster high availability method of any of claims 1-7.
10. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the distributed data processing cluster high availability method of any of claims 1-7.
CN202310201739.8A 2023-03-06 2023-03-06 High availability method and device for distributed data processing cluster Active CN115878696B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310201739.8A CN115878696B (en) 2023-03-06 2023-03-06 High availability method and device for distributed data processing cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310201739.8A CN115878696B (en) 2023-03-06 2023-03-06 High availability method and device for distributed data processing cluster

Publications (2)

Publication Number Publication Date
CN115878696A CN115878696A (en) 2023-03-31
CN115878696B true CN115878696B (en) 2023-04-28

Family

ID=85761965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310201739.8A Active CN115878696B (en) 2023-03-06 2023-03-06 High availability method and device for distributed data processing cluster

Country Status (1)

Country Link
CN (1) CN115878696B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6523078B1 (en) * 1999-11-23 2003-02-18 Steeleye Technology, Inc. Distributed locking system and method for a clustered system having a distributed system for storing cluster configuration information
CN108874552A (en) * 2018-06-28 2018-11-23 杭州云英网络科技有限公司 Distributed lock executes method, apparatus and system, application server and storage medium
CN111258976A (en) * 2018-12-03 2020-06-09 北京京东尚科信息技术有限公司 Distributed lock implementation method, system, device and storage medium
CN114048265A (en) * 2021-11-11 2022-02-15 北京知道创宇信息技术股份有限公司 Task processing method and device, electronic equipment and computer readable storage medium
CN115277712A (en) * 2022-07-08 2022-11-01 北京城市网邻信息技术有限公司 Distributed lock service providing method, device and system and electronic equipment
CN115714722A (en) * 2022-12-15 2023-02-24 中国西安卫星测控中心 Dynamic configuration method, device and system for cluster network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7403945B2 (en) * 2004-11-01 2008-07-22 Sybase, Inc. Distributed database system providing data and space management methodology

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6523078B1 (en) * 1999-11-23 2003-02-18 Steeleye Technology, Inc. Distributed locking system and method for a clustered system having a distributed system for storing cluster configuration information
CN108874552A (en) * 2018-06-28 2018-11-23 杭州云英网络科技有限公司 Distributed lock executes method, apparatus and system, application server and storage medium
CN111258976A (en) * 2018-12-03 2020-06-09 北京京东尚科信息技术有限公司 Distributed lock implementation method, system, device and storage medium
CN114048265A (en) * 2021-11-11 2022-02-15 北京知道创宇信息技术股份有限公司 Task processing method and device, electronic equipment and computer readable storage medium
CN115277712A (en) * 2022-07-08 2022-11-01 北京城市网邻信息技术有限公司 Distributed lock service providing method, device and system and electronic equipment
CN115714722A (en) * 2022-12-15 2023-02-24 中国西安卫星测控中心 Dynamic configuration method, device and system for cluster network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"异构网络中高能效的资源虚拟化管理技术研究";李楠;《中国优秀硕士学位论文全文数据库电子期刊 信息科技辑》;I136-1426 *
动态数据处理平台分布式缓存替换算法仿真;王庆桦;;计算机仿真(02);299-303 *
基于Zookeeper的分布式锁服务及性能优化;刘芬;王芳;田昊;;计算机研究与发展(S1);238-243 *

Also Published As

Publication number Publication date
CN115878696A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN104715001B (en) The method and system of write operation is performed for the shared resource in the cluster to data handling system
US10979286B2 (en) Method, device and computer program product for managing distributed system
US7149853B2 (en) System and method for synchronization for enforcing mutual exclusion among multiple negotiators
CN102455942B (en) Method and system for dynamic migration of WAN virtual machines
CN108183961A (en) A kind of distributed caching method based on Redis
JP2565658B2 (en) Resource control method and apparatus
CN110597910A (en) Remote data synchronization method, device and system
CN111506592B (en) Database upgrading method and device
JP3222125B2 (en) Database sharing method between systems
CN113268472B (en) Distributed data storage system and method
CN111049886B (en) Multi-region SDN controller data synchronization method, server and system
CN112230853A (en) Storage capacity adjusting method, device, equipment and storage medium
CN113420029B (en) Global ID generation method, device, equipment and medium for distributed system
CN108509296B (en) Method and system for processing equipment fault
CN111104250A (en) Method, apparatus and computer program product for data processing
CN115878696B (en) High availability method and device for distributed data processing cluster
CN112714022A (en) Control processing method and device for multiple clusters and computer equipment
CN112243030A (en) Data synchronization method, device, equipment and medium of distributed storage system
CN112000850A (en) Method, device, system and equipment for data processing
CN111611550A (en) Computer system, computer device and authorization management method
CN113448512B (en) Takeover method, device and equipment for cache partition recovery and readable medium
CN115292408A (en) Master-slave synchronization method, device, equipment and medium for MySQL database
CN116561217A (en) Metadata management system and method
JP2002366381A (en) Dynamic exchange processing method for object
CN114020279A (en) Application software distributed deployment method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant