CN115914375A - Disaster tolerance processing method and device for distributed message platform - Google Patents

Disaster tolerance processing method and device for distributed message platform Download PDF

Info

Publication number
CN115914375A
CN115914375A CN202211562828.7A CN202211562828A CN115914375A CN 115914375 A CN115914375 A CN 115914375A CN 202211562828 A CN202211562828 A CN 202211562828A CN 115914375 A CN115914375 A CN 115914375A
Authority
CN
China
Prior art keywords
cluster
emergency
migration
recovery processing
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211562828.7A
Other languages
Chinese (zh)
Inventor
蔡佳纯
钟小威
冯子杰
杨旭杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211562828.7A priority Critical patent/CN115914375A/en
Publication of CN115914375A publication Critical patent/CN115914375A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The embodiment of the application provides a distributed message platform disaster recovery processing method and a device, wherein the method comprises the following steps: collecting working data of the existing cluster nodes through a preset monitoring component, and comparing and evaluating the collected working data; executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result; the method and the device can solve the problems of multiple emergency manual steps and low timeliness, and ensure the high available disaster tolerance capability of the distributed message platform.

Description

Disaster tolerance processing method and device for distributed message platform
Technical Field
The present application relates to the field of distributed technologies, and in particular, to a distributed message platform disaster recovery processing method and apparatus.
Background
Distributed messaging systems are currently being used in the real world business of many large companies. The existing emergency method mainly comprises the steps of manually judging the cluster condition and a Topic migration scheme when the conditions of CPU (Central processing Unit) height and consumption accumulation and the like occur in the cluster use condition, and then manually establishing a Kafka cluster and a Topic mode for emergency. According to the scheme, manual cluster migration Topic needs to be established in emergency, corresponding Topic is established in the newly-added emergency cluster, and the upstream and downstream modification program configuration version related to Topic is informed to be reconnected to the emergency cluster.
The inventor finds that in the prior art, due to the fact that a server side is required to manually judge and manually create the emergency cluster and the Topic, a client side modifies code configuration and then restarts connection, manual maintenance cost is high, emergency treatment time is long, and treatment timeliness is low.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a distributed message platform disaster recovery processing method and device, which can overcome the problem of multiple time-dependent and low time-dependent emergency manual steps and ensure the high available disaster recovery capability of the distributed message platform.
In order to solve at least one of the above problems, the present application provides the following technical solutions:
in a first aspect, the present application provides a distributed message platform disaster recovery processing method, including:
collecting working data of the existing cluster nodes through a preset monitoring component, and comparing and evaluating the collected working data;
and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
Further, the comparing and evaluating the collected working data includes:
comparing the collected working data with a preset performance threshold value;
and if the working data exceed the preset performance threshold, judging that the corresponding cluster node meets the emergency migration condition.
Further, the executing, according to the comparison evaluation result, a cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition includes:
closing a message queue of the cluster nodes meeting the emergency migration condition according to the comparison evaluation result;
and executing data migration operation corresponding to the message queue in the corresponding emergency cluster.
Further, after the data migration operation corresponding to the message queue is executed in the corresponding emergency cluster, the method includes:
updating the cluster routing relation in the platform service list;
and establishing communication connection between the client and the emergency cluster according to the updated cluster routing relation so as to perform message production and consumption.
Further, if the working data exceeds the preset performance threshold, it is determined that the corresponding cluster node meets an emergency migration condition, including:
and if at least one of the CPU load, inflow and outflow quantity, storage, partition number and message accumulation conditions of the existing cluster nodes in production exceeds a corresponding preset performance threshold, judging that the corresponding cluster nodes meet the emergency migration condition.
Further, still include:
setting the available state of the message queue of the cluster node with the performance reaching the bottleneck as a comparison evaluation result to be closed;
an emergency cluster corresponding to the cluster node is created and a message queue of the emergency cluster is created to perform a data migration operation.
In a second aspect, the present application provides a distributed message platform disaster recovery processing apparatus, including:
the node monitoring and evaluating module is used for acquiring the working data of the existing cluster nodes through a preset monitoring component and comparing and evaluating the acquired working data;
and the cluster emergency migration module is used for executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration conditions according to the comparison evaluation result.
Further, the node monitoring and evaluating module comprises:
the performance comparison unit is used for carrying out numerical comparison on the collected working data and a preset performance threshold;
and the emergency triggering unit is used for judging that the corresponding cluster node meets the emergency migration condition if the working data exceeds the preset performance threshold.
Further, the cluster emergency migration module includes:
the service closing unit is used for closing the message queue of the cluster node meeting the emergency migration condition according to the comparison evaluation result;
and the data migration unit is used for executing data migration operation corresponding to the message queue in the corresponding emergency cluster.
Further, the cluster emergency migration module further includes:
the list updating unit is used for updating the cluster routing relation in the platform service list;
and the newly-built routing unit is used for establishing communication connection between the client and the emergency cluster according to the updated cluster routing relation so as to perform message production and consumption.
In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the distributed message platform disaster recovery processing method when executing the program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the distributed message platform disaster recovery processing method.
In a fifth aspect, the present application provides a computer program product comprising a computer program/instructions which, when executed by a processor, implement the steps of the distributed message platform disaster recovery processing method.
According to the technical scheme, the distributed message platform disaster recovery processing method and device are provided, working data of existing cluster nodes are collected through the preset monitoring component, and the collected working data are compared and evaluated; and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration conditions according to the comparison evaluation result, thereby overcoming the problem of multiple times and low timeliness of emergency manual steps and ensuring the high available disaster tolerance capability of the distributed message platform.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following descriptions are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a distributed message platform disaster recovery processing method in an embodiment of the present application;
fig. 2 is a second schematic flowchart of a distributed message platform disaster recovery processing method according to an embodiment of the present application;
fig. 3 is a third schematic flowchart of a disaster recovery processing method for a distributed message platform according to an embodiment of the present application;
fig. 4 is a fourth schematic flowchart of a disaster recovery processing method for a distributed message platform in the embodiment of the present application;
fig. 5 is one of the structural diagrams of a distributed message platform disaster recovery processing apparatus in the embodiment of the present application;
fig. 6 is a second structural diagram of a distributed message platform disaster recovery processing device in an embodiment of the present application;
fig. 7 is a third structural diagram of a distributed message platform disaster recovery processing apparatus in an embodiment of the present application;
fig. 8 is a fourth structural diagram of a distributed message platform disaster recovery processing device in the embodiment of the present application;
fig. 9 is a block diagram of a distributed message platform disaster recovery processing system according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
In view of the problems in the prior art, the application provides a distributed message platform disaster recovery processing method and device, which collects the working data of the existing cluster nodes through a preset monitoring component, and compares and evaluates the collected working data; and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration conditions according to the comparison evaluation result, thereby overcoming the problem of multiple times and low timeliness of emergency manual steps and ensuring the high available disaster tolerance capability of the distributed message platform.
In order to overcome the problem of multiple time efficiency and low timeliness of emergency manual steps and ensure high available disaster tolerance capability of a distributed message platform, the application provides an embodiment of a distributed message platform disaster tolerance processing method, and referring to fig. 1, the distributed message platform disaster tolerance processing method specifically includes the following contents:
step S101: the method comprises the steps of collecting working data of the existing cluster nodes through a preset monitoring component, and comparing and evaluating the collected working data.
Optionally, in the present application, the Kafka cluster provides a message service, the application producer client, the consumer client and the Kafka cluster establish a connection, and perform message production and consumption, and data such as cluster Topic, node, access information and the like are synchronized with the management control platform of the management control unit. The performance data acquisition unit is used for acquiring data such as CPU load, inflow and outflow, storage, partition number, message accumulation and the like of the existing cluster nodes in production. After the performance capacity data is collected, the management control unit is used for comparing and evaluating various performance index data, and the situation of the cluster topic performance capacity with the index data exceeding the threshold value is transmitted to the emergency migration unit.
Step S102: and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
Optionally, in the application, the cluster Topic whose performance capacity condition reaches the bottleneck may be migrated in an emergency according to the data transmitted by the performance capacity evaluation module. The service will be closed by graying out Topic in clusters with bottlenecks in performance capacity, creating the corresponding Topic in the emergency cluster. And a service notification module in the emergency migration unit notifies corresponding application developers of the emergency migration condition and the corresponding cluster Topic information mails to restart producer clients and consumer clients, and application support personnel perform analysis processing.
As can be seen from the above description, the distributed message platform disaster recovery processing method provided in the embodiment of the present application can acquire the working data of the existing cluster node through the preset monitoring component, and compare and evaluate the acquired working data; and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration conditions according to the comparison evaluation result, thereby overcoming the problem of multiple times and low timeliness of emergency manual steps and ensuring the high available disaster tolerance capability of the distributed message platform.
In an embodiment of the distributed message platform disaster recovery processing method according to the present application, referring to fig. 2, the following may be further included:
step S201: and comparing the collected working data with a preset performance threshold value.
Step S202: and if the working data exceed the preset performance threshold, judging that the corresponding cluster node meets the emergency migration condition.
Optionally, in the present application, the Kafka cluster provides a message service, the application producer client, the consumer client and the Kafka cluster establish a connection, and perform message production and consumption, and data such as cluster Topic, node, access information and the like are synchronized with the management control platform of the management control unit. The performance data acquisition unit is used for acquiring data such as CPU load, inflow and outflow, storage, partition number, message accumulation and the like of the existing cluster nodes in production. After the performance capacity data is collected, the management control unit is used for comparing and evaluating various performance index data, and the situation of the cluster topic performance capacity with the index data exceeding the threshold value is transmitted to the emergency migration unit.
In an embodiment of the distributed message platform disaster recovery processing method according to the present application, referring to fig. 3, the following may be further included:
step S301: and closing the message queue of the cluster node meeting the emergency migration condition according to the comparison evaluation result.
Step S302: and executing data migration operation corresponding to the message queue in the corresponding emergency cluster.
Optionally, in the application, the cluster Topic whose performance capacity condition reaches the bottleneck may be migrated in an emergency according to the data transmitted by the performance capacity evaluation module. The service will be closed by graying out Topic in clusters with bottlenecks in performance capacity, creating the corresponding Topic in the emergency cluster. And the service notification module in the emergency migration unit notifies the emergency migration condition and the corresponding cluster Topic information mail to corresponding application developers to restart the producer client, restart the consumer client and analyze and process the application support personnel.
In an embodiment of the distributed message platform disaster recovery processing method according to the present application, referring to fig. 4, the following may be further included:
step S401: and updating the cluster routing relation in the platform service list.
Step S402: and establishing communication connection between the client and the emergency cluster according to the updated cluster routing relation so as to perform message production and consumption.
Optionally, the service discovery cluster in the service discovery unit of the present application implements an automatic routing Kafka emergency cluster, and the client establishes a connection with the corresponding Kafka emergency cluster to perform message production and consumption. The specific implementation principle of the service discovery unit is as follows:
1) The service discovery cluster interacts with the management control platform, a service list of the management control platform is synchronized, the service list comprises information of the Topic, the cluster and the node, whether the cluster exits or is added into the Kafka service discovery cluster is detected in a timing mode, and the service list is updated;
2) The application producer client and the consumer client interact with the service discovery cluster, the application producer and the consumer initiate service address metadata requests, and the service discovery cluster waits for the metadata information to be returned by the service discovery cluster;
3) The service discovery cluster interacts with the Kafka cluster according to the service list and metadata requests initiated by the producer client and the consumer client, the service discovery module detects the availability of the nodes, namely, after detecting whether the nodes can establish connection or not, the service discovery module forwards the client metadata requests to the corresponding Kafka emergency cluster, analyzes metadata information returned by the Kafka cluster, and encapsulates the metadata information and returns the metadata information to the producer client and the consumer client;
4) And the application producer client establishes connection with the corresponding Kafka emergency cluster according to the metadata information returned by the service discovery cluster, so as to produce messages and realize the interaction of the Kafka emergency cluster. And the application consumer client establishes connection with the corresponding Kafka emergency cluster according to the metadata information returned by the service discovery cluster, so that the message consumption is carried out, and the interaction of the Kafka emergency cluster is realized.
In order to overcome the problem of low multiple timeliness of emergency manual steps and ensure high available disaster tolerance capability of a distributed message platform, the present application provides an embodiment of a distributed message platform disaster tolerance processing apparatus for implementing all or part of contents of the distributed message platform disaster tolerance processing method, and referring to fig. 5, the distributed message platform disaster tolerance processing apparatus specifically includes the following contents:
and the node monitoring and evaluating module 10 is used for acquiring the working data of the existing cluster nodes through the preset monitoring component and comparing and evaluating the acquired working data.
And the cluster emergency migration module 20 is configured to execute a cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
As can be seen from the above description, the distributed message platform disaster recovery processing device provided in the embodiment of the present application can acquire the working data of the existing cluster node through the preset monitoring component, and compare and evaluate the acquired working data; and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result, thereby overcoming the problem of multiple emergency manual steps and low timeliness and ensuring the high available disaster tolerance capability of the distributed message platform.
In an embodiment of the distributed message platform disaster recovery processing apparatus according to the present application, referring to fig. 6, the node monitoring and evaluating module 10 includes:
and the performance comparison unit 11 is used for performing numerical comparison on the acquired working data and a preset performance threshold.
And the emergency triggering unit 12 is configured to determine that the corresponding cluster node meets an emergency migration condition if the working data exceeds the preset performance threshold.
In an embodiment of the distributed message platform disaster recovery processing apparatus of the present application, referring to fig. 7, the cluster emergency migration module 20 includes:
and the service closing unit 21 is configured to close the message queue of the cluster node meeting the emergency migration condition according to the comparison evaluation result.
And the data migration unit 22 is configured to perform a data migration operation corresponding to the message queue in the corresponding emergency cluster.
In an embodiment of the distributed message platform disaster recovery processing apparatus of the present application, referring to fig. 8, the cluster emergency migration module 20 further includes:
and the list updating unit 23 is configured to update the cluster routing relationship in the platform service list.
And the newly-built routing unit 24 is used for establishing communication connection between the client and the emergency cluster according to the updated cluster routing relation so as to perform message production and consumption.
Referring to fig. 9, in order to further explain the present solution, the present application further provides a specific application example for implementing the distributed message platform disaster recovery processing method by using the distributed message platform disaster recovery processing apparatus, which specifically includes the following contents:
the client unit 101 includes a producer client and a consumer client, and establishes a connection with the Kafka cluster 102 to produce a consumption message. The performance data acquisition unit 103 acquires Kafka cluster performance capacity related data and sends the data to the management control unit 104. The management control unit 104 includes a management control platform and a performance capacity evaluation module, the management control platform registers information such as Topic and producer and consumer in the Kafka cluster, the performance capacity evaluation module receives the monitoring data of the performance data acquisition unit 103, performs performance capacity evaluation through an evaluation algorithm, and reports the evaluation result to the emergency migration unit 105. The emergency migration unit 105 comprises an emergency migration module and a service notification module, the emergency migration module receives 104 the performance evaluation result of the unit, performs emergency migration on the cluster Topic with the performance capacity bottleneck, stops service of the Topic in the original cluster, and creates a corresponding Topic in the Kafka emergency cluster 107. The service discovery unit 106 is responsible for accepting the client request of the client unit 101, forwarding the client request to the Kafka emergency cluster 107, resolving metadata information returned by the Kafka emergency cluster 107, and encapsulating the metadata information returned to the client unit 101. The client unit 101 generates the consumption message by interacting with the service discovery unit 106 to obtain metadata information and establishing a connection with the Kafka emergency cluster 107.
Specifically, client unit 101: and the Kafka application client is accessed to the Kafka cluster, and the Kafka application client comprises a producer client and a consumer client, is used for accessing the upstream and the downstream of the Kafka theme to the Kafka cluster, sending a metadata acquisition request, receiving metadata information and establishing connection with Kafka cluster nodes to perform data production and consumption.
Kafka cluster 102: and the application client is supported to access through the Topic of Topic, perform message production and message consumption, receive a metadata request and return metadata information, establish connection with the application client which is correspondingly accessed, and provide message service. And controlling the start-stop service of the cluster Topic according to an emergency transfer module of the emergency transfer unit. And data such as the cluster Topic, the cluster nodes, the access information and the like are registered in the management control platform.
The performance data acquisition unit 103: the monitoring component is used for collecting and monitoring data such as CPU load, inflow and outflow quantity, storage, partition number, message accumulation condition and the like of the cluster node. And collecting and transmitting the cluster performance data once every minute.
The management control unit 104: the system comprises a management control platform, a corresponding database and a performance capacity evaluation module. The management control platform records information of a Topic corresponding to the Kafka cluster, a client and the like, and supports a service discovery cluster to synchronize a service list from the management control platform; the performance capacity evaluation module evaluates each collected index data of the cluster nodes through an evaluation algorithm, takes each index data collected by the performance data collection unit 103 and a set corresponding index threshold as input items, is responsible for data comparison and evaluation, and transmits the cluster Topic performance capacity condition of which the index data exceeds the threshold to the emergency migration module of the emergency migration unit 105. The parameters such as the threshold value in the performance capacity evaluation algorithm can be set by a service provider which actually provides service according to the Kafka node equipment condition.
Emergency transfer unit 105: the system comprises an emergency migration module and a service notification module. The emergency migration module receives the performance evaluation result of the management control unit 104, triggers a cluster with bottleneck in performance capacity to perform Topic graying and closing service, and creates a corresponding Topic in the synchronous emergency cluster to perform Topic migration; and the service notification module notifies the automatic mail to relate to the evaluation result of the Kafka cluster performance capacity corresponding to the access application and notifies the access application to restart the client and perform problem troubleshooting.
The service discovery unit 106: the service discovery cluster in the service discovery unit realizes automatic routing of the Kafka emergency cluster, and the client establishes connection with the corresponding Kafka emergency cluster to perform message production and consumption. The specific implementation principle of the service discovery unit is as follows:
1) The service discovery cluster interacts with the management control platform, a service list of the management control platform is synchronized, the service list comprises information of the Topic, the cluster and the node, whether the cluster exits or the Kafka service discovery cluster is added or not is detected in a timing mode, and the service list is updated;
2) The application producer client and the consumer client interact with the service discovery cluster, the application producer and the consumer initiate service address metadata requests, and the service discovery cluster waits for the metadata information to be returned;
3) The service discovery cluster interacts with the Kafka cluster according to the service list and metadata requests initiated by the producer client and the consumer client, the service discovery module detects the availability of the nodes, namely, after detecting whether the nodes can establish connection or not, the service discovery module forwards the client metadata requests to the corresponding Kafka emergency cluster, analyzes metadata information returned by the Kafka cluster, and encapsulates the metadata information and returns the metadata information to the producer client and the consumer client;
4) And the application producer client establishes connection with the corresponding Kafka emergency cluster according to the metadata information returned by the service discovery cluster, so as to produce messages and realize the interaction of the Kafka emergency cluster. And establishing connection between the application consumer client and the corresponding Kafka emergency cluster according to the metadata information returned by the service discovery cluster, consuming the message and realizing the interaction of the Kafka emergency cluster.
Kafka emergency cluster 107: in order to guarantee the KAFAK emergency cluster which is built for fast emergency, a service discovery unit metadata request is received 106 and metadata information is returned, connection is established with the application client which is accessed correspondingly, message service is provided, and the application client is supported to produce messages and consume messages through Topic access.
According to the method and the system, the capacity of quickly ensuring normal provision of the message service in emergency is provided, emergency manual operation is reduced, emergency migration efficiency and timeliness are improved, operation risk is reduced, emergency speed is increased, and production safety is guaranteed more stably.
In terms of hardware, in order to overcome the problem of low multiple timeliness of emergency manual steps and ensure high available disaster tolerance capability of a distributed message platform, the present application provides an embodiment of an electronic device for implementing all or part of contents in the distributed message platform disaster tolerance processing method, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the distributed message platform disaster recovery processing device and relevant equipment such as a core service system, a user terminal, a relevant database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the logic controller may be implemented with reference to the embodiment of the distributed message platform disaster recovery processing method and the embodiment of the distributed message platform disaster recovery processing apparatus in the embodiment, and the contents thereof are incorporated herein, and repeated details are not repeated here.
It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, a smart wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In practical applications, part of the distributed message platform disaster recovery processing method may be executed on the electronic device side as described above, or all operations may be completed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device may further include a processor if all operations are performed in the client device.
The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
Fig. 10 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 10, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 10 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the functions of the distributed message platform disaster recovery processing method may be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:
step S101: the method comprises the steps of collecting working data of the existing cluster nodes through a preset monitoring component, and comparing and evaluating the collected working data.
Step S102: and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
As can be seen from the above description, the electronic device provided in the embodiment of the present application collects the working data of the existing cluster node through the preset monitoring component, and performs comparison evaluation on the collected working data; and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration conditions according to the comparison evaluation result, thereby overcoming the problem of multiple times and low timeliness of emergency manual steps and ensuring the high available disaster tolerance capability of the distributed message platform.
In another embodiment, the distributed message platform disaster recovery processing apparatus may be configured separately from the central processing unit 9100, for example, the distributed message platform disaster recovery processing apparatus may be configured as a chip connected to the central processing unit 9100, and the function of the distributed message platform disaster recovery processing method may be implemented by the control of the central processing unit.
As shown in fig. 10, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 10; in addition, the electronic device 9600 may further include components not shown in fig. 10, which can be referred to in the prior art.
As shown in fig. 10, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes referred to as an EPROM or the like. The memory 9140 could also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132 to implement general telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the distributed message platform disaster recovery processing method with the execution subject being the server or the client in the foregoing embodiment, where the computer-readable storage medium stores a computer program thereon, and when the computer program is executed by a processor, the computer program implements all the steps in the distributed message platform disaster recovery processing method with the execution subject being the server or the client in the foregoing embodiment, for example, when the processor executes the computer program, the processor implements the following steps:
step S101: the method comprises the steps of collecting working data of existing cluster nodes through a preset monitoring component, and comparing and evaluating the collected working data.
Step S102: and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application acquires the working data of the existing cluster node through the preset monitoring component, and performs comparison evaluation on the acquired working data; and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration conditions according to the comparison evaluation result, thereby overcoming the problem of multiple times and low timeliness of emergency manual steps and ensuring the high available disaster tolerance capability of the distributed message platform.
The embodiments of the present application further provide a computer program product capable of implementing all steps in the distributed message platform disaster recovery processing method with a server or a client as an execution subject in the foregoing embodiments, where the computer program/instruction is executed by a processor to implement the steps of the distributed message platform disaster recovery processing method, for example, the computer program/instruction implements the following steps:
step S101: the method comprises the steps of collecting working data of the existing cluster nodes through a preset monitoring component, and comparing and evaluating the collected working data.
Step S102: and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
As can be seen from the above description, in the computer program product provided in the embodiment of the present application, the preset monitoring component is used to collect the working data of the existing cluster node, and compare and evaluate the collected working data; and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration conditions according to the comparison evaluation result, thereby overcoming the problem of multiple times and low timeliness of emergency manual steps and ensuring the high available disaster tolerance capability of the distributed message platform.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A distributed message platform disaster recovery processing method is characterized by comprising the following steps:
collecting working data of the existing cluster nodes through a preset monitoring component, and comparing and evaluating the collected working data;
and executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
2. The distributed message platform disaster recovery processing method according to claim 1, wherein the comparing and evaluating the collected working data comprises:
comparing the collected working data with a preset performance threshold value;
and if the working data exceed the preset performance threshold, judging that the corresponding cluster node meets the emergency migration condition.
3. The distributed message platform disaster recovery processing method according to claim 1, wherein the performing, according to the comparison evaluation result, a cluster message queue emergency migration operation on the cluster nodes satisfying the emergency migration condition includes:
closing a message queue of the cluster nodes meeting the emergency migration condition according to the comparison evaluation result;
and executing data migration operation corresponding to the message queue in the corresponding emergency cluster.
4. The distributed message platform disaster recovery processing method according to claim 3, wherein after performing the data migration operation corresponding to the message queue in the corresponding emergency cluster, the method comprises:
updating the cluster routing relation in the platform service list;
and establishing communication connection between the client and the emergency cluster according to the updated cluster routing relation so as to carry out message production and consumption.
5. The distributed message platform disaster recovery processing method according to claim 2, wherein the determining that the corresponding cluster node satisfies the emergency migration condition if the working data exceeds the preset performance threshold includes:
and if at least one of the CPU load, inflow and outflow quantity, storage, partition number and message accumulation conditions of the existing cluster nodes in production exceeds a corresponding preset performance threshold, judging that the corresponding cluster nodes meet the emergency migration condition.
6. The distributed message platform disaster recovery processing method according to claim 3, further comprising:
setting the available state of the message queue of the cluster node with the performance reaching the bottleneck as a comparison evaluation result to be closed;
an emergency cluster corresponding to the cluster node is created and a message queue of the emergency cluster is created to perform a data migration operation.
7. A distributed message platform disaster recovery processing apparatus, comprising:
the node monitoring and evaluating module is used for acquiring the working data of the existing cluster nodes through a preset monitoring component and comparing and evaluating the acquired working data;
and the cluster emergency migration module is used for executing cluster message queue emergency migration operation on the cluster nodes meeting the emergency migration condition according to the comparison evaluation result.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the distributed message platform disaster recovery processing method according to any one of claims 1 to 6 when executing the program.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the distributed message platform disaster recovery processing method according to any one of claims 1 to 6.
10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the distributed message platform disaster recovery processing method according to any of claims 1 to 6.
CN202211562828.7A 2022-12-07 2022-12-07 Disaster tolerance processing method and device for distributed message platform Pending CN115914375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211562828.7A CN115914375A (en) 2022-12-07 2022-12-07 Disaster tolerance processing method and device for distributed message platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211562828.7A CN115914375A (en) 2022-12-07 2022-12-07 Disaster tolerance processing method and device for distributed message platform

Publications (1)

Publication Number Publication Date
CN115914375A true CN115914375A (en) 2023-04-04

Family

ID=86491219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211562828.7A Pending CN115914375A (en) 2022-12-07 2022-12-07 Disaster tolerance processing method and device for distributed message platform

Country Status (1)

Country Link
CN (1) CN115914375A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117420967A (en) * 2023-12-19 2024-01-19 北京比格大数据有限公司 Method and system for improving storage performance of software acquisition data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117420967A (en) * 2023-12-19 2024-01-19 北京比格大数据有限公司 Method and system for improving storage performance of software acquisition data
CN117420967B (en) * 2023-12-19 2024-03-12 北京比格大数据有限公司 Method and system for improving storage performance of software acquisition data

Similar Documents

Publication Publication Date Title
CN111031058A (en) Websocket-based distributed server cluster interaction method and device
CN111950988B (en) Distributed workflow scheduling method and device, storage medium and electronic equipment
CN111800443B (en) Data processing system and method, device and electronic equipment
CN110764881A (en) Distributed system background retry method and device
CN111464352A (en) Call link data processing method and device
CN113055479A (en) Self-adaptive processing method, device and system for distributed service cluster load
CN111782470A (en) Distributed container log data processing method and device
CN111782473A (en) Distributed log data processing method, device and system
CN115914375A (en) Disaster tolerance processing method and device for distributed message platform
CN113645287B (en) Automobile message storage method and device and automobile message storage system
CN114257532A (en) Server side state detection method and device
CN112152879A (en) Network quality determination method and device, electronic equipment and readable storage medium
CN111190731A (en) Cluster task scheduling system based on weight
CN114237896A (en) Distributed node resource dynamic scheduling method and device
CN112416641B (en) Method for detecting restarting of controlled end node in master-slave architecture and master control end node
CN112395103B (en) Method and device for sending delay message and storage medium
CN114697339A (en) Load balancing method and device under centralized architecture
CN114374614A (en) Network topology configuration method and device
CN112463514A (en) Monitoring method and device for distributed cache cluster
Ganchev et al. A cloud-based service recommendation system for use in UCWW
CN111343172A (en) Network access authority dynamic processing method and device
CN112799863A (en) Method and apparatus for outputting information
CN112766698B (en) Application service pressure determining method and device
CN113342501B (en) System fault processing method and device
CN116860453A (en) Distributed message platform data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination