CN112291082A - Computer room disaster recovery processing method, terminal and storage medium - Google Patents

Computer room disaster recovery processing method, terminal and storage medium Download PDF

Info

Publication number
CN112291082A
CN112291082A CN202011058742.1A CN202011058742A CN112291082A CN 112291082 A CN112291082 A CN 112291082A CN 202011058742 A CN202011058742 A CN 202011058742A CN 112291082 A CN112291082 A CN 112291082A
Authority
CN
China
Prior art keywords
machine room
instruction
cluster
terminal
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011058742.1A
Other languages
Chinese (zh)
Other versions
CN112291082B (en
Inventor
石鹏
宋磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dami Technology Co Ltd
Original Assignee
Beijing Dami Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dami Technology Co Ltd filed Critical Beijing Dami Technology Co Ltd
Priority to CN202011058742.1A priority Critical patent/CN112291082B/en
Publication of CN112291082A publication Critical patent/CN112291082A/en
Application granted granted Critical
Publication of CN112291082B publication Critical patent/CN112291082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/55Prevention, detection or correction of errors
    • H04L49/557Error correction, e.g. fault recovery or fault tolerance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems
    • H04L12/5601Transfer mode dependent, e.g. ATM
    • H04L2012/5625Operations, administration and maintenance [OAM]
    • H04L2012/5627Fault tolerance and recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)

Abstract

The application discloses a computer room disaster recovery processing method, a terminal and a storage medium. The method for processing the computer room disaster recovery comprises the steps that when a first computer room fault is detected, a terminal with the fault in the first computer room is determined to be a target terminal; the target terminal is used for processing the message in the first cluster of the first machine room; sending a disaster tolerance instruction to the target terminal of the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster. The scheme ensures that the first machine room has no influence on the service use kafka service when the first machine room fails, and avoids the condition that overstocked messages of the failed machine room cannot be consumed in time due to the switching of the whole machine room during disaster recovery.

Description

Computer room disaster recovery processing method, terminal and storage medium
Technical Field
The present application relates to the field of computer room disaster recovery processing, and in particular, to a computer room disaster recovery processing method, a terminal, and a storage medium.
Background
kafka is a common message queue middleware, and if the service is strongly dependent on kafka, the service using kafka is not affected when the service provider is required to be ensured to be failed. The current existing solution is to respectively deploy two sets of kafka clusters in two machine rooms, when one machine room fails, the kafka traffic of the failed machine room is switched to the kafka of the other machine room, and the user terminal consuming the kafka by the service also consumes the kafka of the other machine room.
Disclosure of Invention
In order to solve the above problem, embodiments of the present application provide a computer room disaster recovery processing method, a terminal, and a storage medium.
In a first aspect, an embodiment of the present application provides a method for processing a computer room disaster tolerance, where the method includes:
when a first machine room fault is detected, determining a terminal with a fault in the first machine room as a target terminal; the target terminal is used for processing the message in the first cluster of the first machine room;
sending a disaster tolerance instruction to the target terminal of the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster.
Optionally, the target terminal is a server;
the sending of the disaster recovery instruction to the target terminal of the second machine room includes:
sending a first instruction to the server of the second machine room; the first instruction is used for instructing the server of the second machine room to start and configure first configuration information of the server of the first machine room; the first configuration information is configuration information required by the server of the second machine room to write a message into the first cluster;
after the server of the second machine room is started, sending a second instruction to the server of the second machine room; the second instruction is used for instructing the server of the second computer room to write a message to the first cluster.
Optionally, the target terminal is a user terminal;
the sending of the disaster recovery instruction to the target terminal of the second machine room includes:
sending a third instruction to the user terminal of the second machine room; the third instruction is used for instructing the user terminal of the second machine room to start and configuring second configuration information of the user terminal of the first machine room; the second configuration information is configuration information required by the user terminal of the second machine room to read messages from the first cluster;
after the user terminal of the second machine room is started, sending a fourth instruction to the user terminal of the second machine room; the fourth instruction is configured to instruct the user terminal of the second equipment room to read a message to the first cluster.
Optionally, after the user terminal in the second machine room is turned on and a fourth instruction is sent to the user terminal in the second machine room, the method further includes:
detecting whether a server of the first machine room fails;
when the server of the first machine room fails, sending a fifth instruction to the server of the second machine room; the fifth instruction is used for instructing the server of the second machine room to start and writing a message to a second cluster of the second machine room;
after the server of the second machine room is started, sending a sixth instruction to the user terminal of the second machine room; the sixth instruction is configured to instruct the user terminal in the second equipment room to read a message to the first cluster and the second cluster at the same time.
Optionally, the first cluster includes at least two partitions, each located at a different geographical location.
Optionally, when detecting that the first machine room is faulty, before determining that the terminal in the first machine room that has the fault is the target terminal, the method further includes:
recording partition position information of the first cluster;
when the partition fault of the first cluster is detected, acquiring the partition position information corresponding to the fault partition;
and generating and sending warning information to a preset manager terminal based on the partition position information.
Optionally, after the sending the disaster recovery instruction to the target terminal in the second computer room, the method further includes:
when the target terminal of the first machine room is detected to be repaired, sending a reset instruction to the target terminal of the second machine room; the reset instruction is used for instructing the target terminal of the second machine room to stop processing the message to the first cluster.
In a second aspect, the present application provides a management terminal, comprising:
the system comprises a detection module, a processing module and a processing module, wherein the detection module is used for determining a terminal with a fault in a first machine room as a target terminal when the fault of the first machine room is detected; the target terminal is used for processing the message in the first cluster of the first machine room;
the sending module is used for sending a disaster tolerance instruction to the target terminal of the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster.
Optionally, the sending module includes:
a first sending unit, configured to send a first instruction to the server in the second computer room; the first instruction is used for instructing the server of the second machine room to start and configure first configuration information of the server of the first machine room; the first configuration information is configuration information required by the server of the second machine room to write a message into the first cluster;
the second sending unit is used for sending a second instruction to the server of the second machine room after the server of the second machine room is started; the second instruction is used for instructing the server of the second computer room to write a message to the first cluster.
Optionally, the sending module includes:
a third sending unit, configured to send a third instruction to the user terminal in the second equipment room; the third instruction is used for instructing the user terminal of the second machine room to start and configuring second configuration information of the user terminal of the first machine room; the second configuration information is configuration information required by the user terminal of the second machine room to read messages from the first cluster;
a fourth sending unit, configured to send a fourth instruction to the user terminal of the second machine room after the user terminal of the second machine room is turned on; the fourth instruction is configured to instruct the user terminal of the second equipment room to read a message to the first cluster.
Optionally, the sending module further includes:
a detection unit for detecting whether a server of the first machine room is failed;
a fifth sending unit, configured to send a fifth instruction to the server of the second machine room when the server of the first machine room fails; the fifth instruction is used for instructing the server of the second machine room to start and writing a message to a second cluster of the second machine room;
a sixth sending unit, configured to send a sixth instruction to the user terminal in the second machine room after the server in the second machine room is started; the sixth instruction is configured to instruct the user terminal in the second equipment room to read a message to the first cluster and the second cluster at the same time.
Optionally, the first cluster includes at least two partitions, each located at a different geographical location.
Optionally, the terminal further includes:
the recording module is used for recording the partition position information of the first cluster;
the acquisition module is used for acquiring the partition position information corresponding to the fault partition when the partition fault of the first cluster is detected;
and the generating module is used for generating and sending warning information to a preset manager terminal based on the partition position information.
Optionally, the terminal further includes:
the recovery processing module is used for sending a reset instruction to the target terminal of the second machine room when the target terminal of the first machine room is detected to be repaired; the reset instruction is used for instructing the target terminal of the second machine room to stop processing the message to the first cluster.
In a third aspect, an embodiment of the present application provides a management terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect or any one of the possible implementation manners of the first aspect when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, where the computer program, when executed by a processor, implements a method as provided in the first aspect or any one of the possible implementations of the first aspect.
In one or more embodiments of the present application, when detecting a failure of a first machine room, a management terminal determines a target terminal having the failure in the first machine room. Because the second machine room is also provided with the target terminal, the management terminal can send the disaster recovery instruction to the target terminal in the second machine room after determining the target terminal, so that the target terminal in the second machine room replaces the target terminal in the first machine room to continue to process the message of the first cluster in the first machine room, thereby ensuring that the kafka service is not affected when the first machine room is in fault, and avoiding the situation that the overstocked message of the fault machine room cannot be consumed in time due to the switching of the whole machine room in disaster recovery.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a computer room disaster recovery processing system according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a method for processing disaster tolerance of a machine room according to an embodiment of the present application;
fig. 3 is a schematic flow chart of another computer room disaster recovery processing method according to an embodiment of the present application;
fig. 4 is a schematic flow chart of another computer room disaster recovery processing method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a management terminal according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another management terminal according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In the following description, the terms "first" and "second" are used for descriptive purposes only and are not intended to indicate or imply relative importance. The following description provides embodiments of the present application, where different embodiments may be substituted or combined, and thus the present application is intended to include all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes feature A, B, C and another embodiment includes feature B, D, then this application should also be considered to include an embodiment that includes one or more of all other possible combinations of A, B, C, D, even though this embodiment may not be explicitly recited in text below.
The following description provides examples, and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For example, the described methods may be performed in an order different than the order described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.
It should be noted that, in a distributed system, the message middleware is widely used to exchange data between systems, which facilitates asynchronous decoupling. There are now many open source message middleware, and the middleware testing group has three common message products, kafka, RabbitMQ, and rocktmq. Wherein kafka is a LinkedIn open-source distributed publish-subscribe message system, which currently belongs to Apache top level project. kafka is primarily characterized by Pull-based patterns for handling message consumption, pursuing high throughput, initially for log collection and transmission. The version 0.8 starts to support copying, does not support transaction, has no strict requirements on message repetition, loss and error, and is suitable for data collection service of internet service generating a large amount of data. The RabbitMQ is an open source message queue system developed by Erlang language and is realized based on an AMQP protocol. The main features of AMQP are message oriented, queue, routing (including point-to-point and publish/subscribe), reliability, security. The AMQP protocol is more used in enterprise systems, and has high requirements on data consistency, stability and reliability, and the requirements on performance and throughput are further. The RocktMQ is a message middleware of an Ali open source, is developed by pure Java, has the characteristics of high throughput and high availability, and is suitable for large-scale distributed system application. The rocktmq idea originates from Kafka, but is not a Copy of Kafka, optimizes reliable transmission and transaction of messages, and is widely applied to scenes such as transaction, recharge, stream calculation, message push, log streaming processing, and bindlog distribution in the ali group at present.
In the examples of the present application, kafka will be taken as an example to explain the scheme of the present application.
It should also be noted that kafka is a message system based on publish and subscribe. It is commonly referred to as a "distributed commit log" or "distributed streaming platform". It is characterized by the ability to provide high throughput for both distribution and subscription. kafka can produce about 25 thousand messages per second (50MB) and process 55 thousand messages per second (110 MB). Whereas kafka generally consists of three parts: the kafka cluster comprises a kafka producer (producer, i.e. a generator of a message), a kafka cluster and a kafka consumer (consumer, i.e. a consumer of the message), wherein the kafka producer is responsible for writing the message into the kafka cluster, one or more brokers (i.e. caching agents) are contained in the kafka cluster, and one or more servers in the kafka cluster are collectively called a borker), and the kafka consumer is responsible for reading the message written in the kafka cluster and consuming the message.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a computer room disaster recovery processing system according to an embodiment of the present disclosure.
As shown in fig. 1, the computer room disaster recovery processing system may include a first computer room 10, a second computer room 20, and a management terminal 30, where the first computer room 10 includes a first server 101, a first cluster 102, and a first user terminal 103, and the second computer room includes a second server 201, a second cluster 202, and a second user terminal 203. The first machine room 10 and the second machine room 20 may be two machine rooms set up at different geographical locations.
A cluster may have multiple kafka servers, and a kafka server may be regarded as a brooker, i.e., the first cluster 102 and the second cluster 202 may be a server group integrating multiple kafka servers.
The first server 101 and the second server 201 are message producers, and are kafka producer terminals that send messages to brokers in the cluster, and specifically may be servers or clients on which kafka producer clients are installed.
The first user terminal 103 and the second user terminal 203 are message consumers, and are kafka consumer terminals that read messages from brokers in the cluster, and may specifically be servers or clients that are equipped with kafka consumer clients.
The management terminal 30 is responsible for managing and controlling the entire kafka message transmission process, and may send instructions to the first server 101, the first cluster 102, the first user terminal 103, the second server 201, the second cluster 202, and the second user terminal 203 through the communication link, respectively, so as to change, for example, sending a message object to the second server 201 and reading the message object by the second user terminal 203, where the management terminal 30 may specifically be a server terminal loaded with kafka management system software.
It should be noted that, one or more first servers 101, second servers 201, first user terminals 103, and second user terminals 203 may be provided, the number of servers and user terminals in the computer room disaster recovery processing system shown in fig. 1 is only an example, and in a specific reality, the computer room disaster recovery processing system may include any number of first servers 101, second servers 201, first user terminals 103, and second user terminals 203, which is not limited in this application.
Illustratively, when a teacher wants to send a red packet to students in a class group, the teacher sends a red packet sending request to the first server 101 through the mobile phone terminal of the teacher, and the first server 101 generates red packet information according to the red packet sending request and writes the red packet information into the first cluster 102. After reading the red envelope information from the first cluster 102, the first user terminal 103 consumes the red envelope information, so that the red envelope information is sent to the mobile phone terminals of students, and the red envelope information is sent. When the first server 101 fails, the management terminal 30 sends an instruction to the second server 201, so that the second server 201 replaces the first server 101 to receive the red packet request sent by the teacher's mobile phone terminal and write red packet information into the first cluster 102. When the first user terminal 103 fails, the management terminal 30 sends an instruction to the second user terminal 203, so that the second user terminal 203 replaces the first user terminal 103 to read the red packet information in the first cluster 102 and consume the red packet information, and the red packet information is sent to the mobile phone terminals of students.
Next, a computer room disaster recovery processing method provided in an embodiment of the present application is described with reference to a computer room disaster recovery processing system shown in fig. 1.
Referring to fig. 2, fig. 2 is a schematic flowchart of a method for processing a computer room disaster tolerance according to an embodiment of the present application. In an embodiment of the present application, the method includes:
s201, when a first machine room fault is detected, determining a terminal with the fault in the first machine room as a target terminal; the target terminal is used for processing the message in the first cluster of the first computer room.
In the embodiment of the present application, the execution subject of the present application may be a management terminal. When the management terminal detects a failure of the first machine room, the management terminal first determines what the specific failed terminal is in the first machine room and determines it as a target terminal. The target terminal may be a server in the first room for writing messages to the first cluster or may be a user terminal in the first room for reading messages from the first cluster.
S202, sending a disaster tolerance instruction to the target terminal of the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster.
In the embodiment of the application, after determining the target terminal with the fault in the first machine room, the management terminal sends the disaster recovery instruction to the target terminal in the second machine room, so that the target terminal in the second machine room replaces the target terminal in the first machine room to write or read the message to or from the first cluster after receiving the disaster recovery instruction.
In this embodiment of the application, the target terminal with a fault in the first machine room may have the following:
the first condition is as follows: the target terminal with the fault in the first machine room is the server of the first machine room, which means that the server of the first machine room has the fault and cannot generate a new message to be written into the first cluster. The management terminal sends a disaster tolerance instruction to the server of the second machine room, and the server of the second machine room starts and replaces the server of the first machine room to produce and send the message to the first cluster after receiving the disaster tolerance instruction.
Case two: the target terminal with the fault in the first machine room is the user terminal of the first machine room, which means that the user terminal of the first machine room has the fault and cannot read a new message from the first cluster. The management terminal sends a disaster tolerance instruction to the user terminal of the second machine room, and the user terminal of the second machine room starts and replaces the user terminal of the first machine room to read the message of the first cluster and consume the message after receiving the disaster tolerance instruction.
In one embodiment, the first cluster includes at least two partitions, each located in a different geographic location.
In the embodiment of the application, the first cluster is provided with at least two partitions, and the kafkabroks in the first cluster can be deployed into all available partitions on average. The partitions can be arranged in different areas, namely, the partitions can be equivalently arranged in a plurality of different machine rooms. This ensures that there is no impact on the availability of kafka when a single available partition of the cluster fails.
It is possible that the available partitions of the first room and the second room may be different, for example: the first cluster of the first machine room is provided with 3 available partitions, and the second cluster of the second machine room is provided with 2 available partitions.
It is possible that the leader (the "master" copy of multiple copies per partition, the object for which the producer sends data preferentially, and the object for which the consumer consumes data preferentially) and the copy of kafka may be located in different partitions, thereby ensuring that a partition failure will not affect the message consumption of the kafka consumer, i.e., the user terminal.
In one possible implementation, step S201 further includes:
recording partition position information of the first cluster;
when the partition fault of the first cluster is detected, acquiring the partition position information corresponding to the fault partition;
and generating and sending warning information to a preset manager terminal based on the partition position information.
In this embodiment of the present application, because the partitions of the first cluster are set in different regions, the management terminal first records the partition location information of all the partitions, and when it detects that a partition in the first cluster fails, the management terminal obtains the previously recorded partition location information corresponding to the partition. The management terminal generates warning information based on the partition position information and sends the warning information to a preset manager terminal. The administrator terminal may be a terminal used by an administrator of the first cluster that is set in advance. The warning message may be a text message or a voice message containing the content "the first cluster partition located in xx region fails, please go to view".
In one possible implementation, step S202 is followed by:
when the target terminal of the first machine room is detected to be repaired, sending a reset instruction to the target terminal of the second machine room; the reset instruction is used for instructing the target terminal of the second machine room to stop processing the message to the first cluster.
In the embodiment of the application, only the first machine room is opened to be used as the main machine room at ordinary times, the second machine room is temporarily closed, and the first machine room is used when disaster recovery is performed. Therefore, after the fault of the target terminal in the first machine room is repaired, the target terminal in the second machine room needs to be switched back to the target terminal in the first machine room to continue to process the message of the first cluster. And after detecting that the target terminal of the first machine room is repaired, the management terminal sends a reset instruction to the target terminal of the second machine room to enable the target terminal of the second machine room to stop processing the message of the first cluster.
Possibly, the management terminal sends a seventh instruction to the target terminal of the first room, where the seventh instruction is used to instruct the target terminal of the first room to restart processing the message of the first cluster.
Through the steps, when the management terminal detects the fault of the first machine room, the management terminal determines the target terminal with the fault in the first machine room. Because the second machine room is also provided with the target terminal, the management terminal can send the disaster recovery instruction to the target terminal in the second machine room after determining the target terminal, so that the target terminal in the second machine room replaces the target terminal in the first machine room to continue to process the message of the first cluster in the first machine room, thereby ensuring that the kafka service is not affected when the first machine room is in fault, and avoiding the situation that the overstocked message of the fault machine room cannot be consumed in time due to the switching of the whole machine room in disaster recovery.
Referring to fig. 3, fig. 3 is a schematic flow chart of another computer room disaster recovery processing method provided in the embodiment of the present application. As shown in fig. 3, the method includes:
s301, when the first machine room is detected to be in fault, determining that the terminal with the fault in the first machine room is a target terminal.
The detailed process is as described in step S201, and therefore, is not described herein again.
S302, sending a first instruction to the server of the second machine room; the first instruction is used for instructing the server of the second machine room to start and configure first configuration information of the server of the first machine room; the first configuration information is configuration information required by the server of the second computer room to write a message to the first cluster.
In the embodiment of the application, the target terminal with the fault in the first machine room is a server. The management terminal sends a first instruction to the server of the second machine room, the server of the second machine room is in a closed state in an initial state, and after receiving the first instruction, the management terminal is switched to an open state and configures first configuration information of the server of the first machine room. Since the message themes (e.g., red envelope themes) handled by different clusters are different, the servers of the second room need to be configured with the same configuration as the servers of the first room to be able to write the message of the theme.
S303, after the server of the second machine room is started, sending a second instruction to the server of the second machine room; the second instruction is used for instructing the server of the second computer room to write a message to the first cluster.
In the embodiment of the application, after the server in the second machine room is started and the configuration information of the related theme is configured, the management terminal sends the second instruction to the server in the second machine room. And after receiving the second instruction, the server in the second machine room associates with the first cluster and starts to take over the server in the first machine room to produce and write a new message into the first cluster.
Through the steps, when the management terminal detects the fault of the first machine room, the management terminal determines the target terminal with the fault in the first machine room. Because the second machine room is also provided with the target terminal, the management terminal can send the first instruction to the server in the second machine room after determining that the target terminal is the server, so that the server in the second machine room replaces the server in the first machine room to continue to write the message into the first cluster in the first machine room, the fact that the kafka service is used in the business when the first machine room breaks down is guaranteed, and the situation that the overstocked message of the fault machine room cannot be consumed in time due to the switching of the whole machine room during disaster tolerance is avoided.
Referring to fig. 4, fig. 4 is a schematic flowchart of another computer room disaster recovery processing method provided in the embodiment of the present application. As shown in fig. 4, the method includes:
s401, when a first machine room fault is detected, determining a terminal with the fault in the first machine room as a target terminal; the target terminal is used for processing the message in the first cluster of the first computer room.
The detailed process is shown in step S201, and therefore, is not described herein again.
S402, sending a third instruction to the user terminal of the second machine room; the third instruction is used for instructing the user terminal of the second machine room to start and configuring second configuration information of the user terminal of the first machine room; the second configuration information is configuration information required by the user terminal of the second machine room to read a message from the first cluster.
In the embodiment of the application, the target terminal with the fault in the first machine room is the user terminal. The management terminal sends a third instruction to the user terminal of the second machine room, the user terminal of the second machine room is in a closed state in the initial state, and after receiving the third instruction, the management terminal is switched to an open state and configures second configuration information of the user terminal of the first machine room. Since the message themes (e.g. red envelope themes) handled by different clusters are different, the user terminal of the second room needs to be configured with the same configuration as the user terminal of the first room to be able to read the message of the theme.
S403, after the user terminal of the second machine room is started, sending a fourth instruction to the user terminal of the second machine room; the fourth instruction is configured to instruct the user terminal of the second equipment room to read a message to the first cluster.
In this embodiment of the application, after the user terminal in the second machine room starts and configures the configuration information of the related theme, the management terminal sends the fourth instruction to the user terminal in the second machine room. And after receiving the fourth instruction, the user terminal of the second computer room associates with the first cluster and starts to take over the user terminal of the first computer room to read and consume the message of the first cluster.
S404, detecting whether the server of the first machine room is in fault.
In the embodiment of the application, after the management terminal determines that the target terminal is the user terminal, the management terminal also detects whether the server of the first machine room has the same fault.
S405, when the server of the first machine room fails, sending a fifth instruction to the server of the second machine room; the fifth instruction is used for instructing the server of the second computer room to start and write a message to a second cluster of the second computer room.
In this embodiment of the application, if the server in the first machine room also fails, the management terminal sends a fifth instruction to the server in the second machine room. And after receiving the fifth instruction, the server of the second machine room is switched to the open state and writes the message into the second cluster of the second machine room.
S406, after the server of the second machine room is started, sending a sixth instruction to the user terminal of the second machine room; the sixth instruction is configured to instruct the user terminal in the second equipment room to read a message to the first cluster and the second cluster at the same time.
In the embodiment of the application, because both the server and the user terminal of the first machine room have faults, and the user terminal of the second machine room is already reading the overstocked message in the first cluster, if the server of the second machine room writes the message to the first cluster, the newly produced message is written into the first cluster across the machine room and then read back across the machine room, so that the efficiency is low and the resources are occupied. Therefore, the newly produced message is directly written into the second cluster, and the management terminal sends a sixth instruction to the user terminal of the second machine room, so that the user terminal of the second machine room can simultaneously read the messages of the first cluster and the second cluster, thereby ensuring the consumption efficiency of the new message and processing the overstocked first cluster message.
Through the steps, when the management terminal detects the fault of the first machine room, the management terminal determines the target terminal with the fault in the first machine room. Because the second machine room is also provided with the target terminal, the management terminal can send the disaster recovery instruction to the target terminal in the second machine room after determining the target terminal, so that the target terminal in the second machine room replaces the target terminal in the first machine room to continue to process the message of the first cluster in the first machine room, thereby ensuring that the kafka service is not affected when the first machine room is in fault, and avoiding the situation that the overstocked message of the fault machine room cannot be consumed in time due to the switching of the whole machine room in disaster recovery. If the server and the user terminal of the first machine room are detected to be in fault, the user terminal of the second machine room reads the overstocked message of the first cluster, and simultaneously, the server of the second machine room directly writes the newly generated message into the second cluster, so that the message consumption efficiency is ensured, and the excessive occupation of resources is reduced.
The management terminal provided in the embodiment of the present application will be described in detail below with reference to fig. 5. It should be noted that, the management terminal shown in fig. 5 is used for executing the method of the embodiment shown in fig. 2 to 4 of the present application, and for convenience of description, only the portion related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, please refer to the embodiment shown in fig. 2 to 4 of the present application.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a management terminal according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
the system comprises a detection module 501, a processing module and a processing module, wherein the detection module 501 is used for determining a terminal with a fault in a first machine room as a target terminal when the fault of the first machine room is detected; the target terminal is used for processing the message in the first cluster of the first machine room;
a sending module 502, configured to send a disaster tolerance instruction to the target terminal in the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster.
In one embodiment, the sending module comprises:
a first sending unit, configured to send a first instruction to the server in the second computer room; the first instruction is used for instructing the server of the second machine room to start and configure first configuration information of the server of the first machine room; the first configuration information is configuration information required by the server of the second machine room to write a message into the first cluster;
the second sending unit is used for sending a second instruction to the server of the second machine room after the server of the second machine room is started; the second instruction is used for instructing the server of the second computer room to write a message to the first cluster.
In one embodiment, the sending module comprises:
a third sending unit, configured to send a third instruction to the user terminal in the second equipment room; the third instruction is used for instructing the user terminal of the second machine room to start and configuring second configuration information of the user terminal of the first machine room; the second configuration information is configuration information required by the user terminal of the second machine room to read messages from the first cluster;
a fourth sending unit, configured to send a fourth instruction to the user terminal of the second machine room after the user terminal of the second machine room is turned on; the fourth instruction is configured to instruct the user terminal of the second equipment room to read a message to the first cluster.
In one embodiment, the sending module further comprises:
a detection unit for detecting whether a server of the first machine room is failed;
a fifth sending unit, configured to send a fifth instruction to the server of the second machine room when the server of the first machine room fails; the fifth instruction is used for instructing the server of the second machine room to start and writing a message to a second cluster of the second machine room;
a sixth sending unit, configured to send a sixth instruction to the user terminal in the second machine room after the server in the second machine room is started; the sixth instruction is configured to instruct the user terminal in the second equipment room to read a message to the first cluster and the second cluster at the same time.
In one embodiment, the first cluster includes at least two partitions, each located in a different geographic location.
In one embodiment, the terminal further includes:
the recording module is used for recording the partition position information of the first cluster;
the acquisition module is used for acquiring the partition position information corresponding to the fault partition when the partition fault of the first cluster is detected;
and the generating module is used for generating and sending warning information to a preset manager terminal based on the partition position information.
In one embodiment, the terminal further includes:
the recovery processing module is used for sending a reset instruction to the target terminal of the second machine room when the target terminal of the first machine room is detected to be repaired; the reset instruction is used for instructing the target terminal of the second machine room to stop processing the message to the first cluster.
It is clear to a person skilled in the art that the solution according to the embodiments of the present application can be implemented by means of software and/or hardware. The "unit" and "module" in this specification refer to software and/or hardware that can perform a specific function independently or in cooperation with other components, where the hardware may be, for example, a Field-Programmable Gate Array (FPGA), an Integrated Circuit (IC), or the like.
Each processing unit and/or module in the embodiments of the present application may be implemented by an analog circuit that implements the functions described in the embodiments of the present application, or may be implemented by software that executes the functions described in the embodiments of the present application.
Referring to fig. 6, a schematic structural diagram of a management terminal according to an embodiment of the present application is shown, where the management terminal may be used to implement the methods in the embodiments shown in fig. 2 to fig. 4. As shown in fig. 6, the management terminal 600 may include: at least one central processor 601, at least one network interface 604, a user interface 603, a memory 605, at least one communication bus 602.
Wherein a communication bus 602 is used to enable the connection communication between these components.
The user interface 603 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 603 may also include a standard wired interface and a wireless interface.
The network interface 604 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
Central processor 601 may include one or more processing cores, among others. The central processor 601 connects the various parts within the overall terminal 600 using various interfaces and lines, and performs various functions of the terminal 600 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 605, and calling data stored in the memory 605. Optionally, the central Processing unit 601 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The Central Processing Unit 601 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the central processor 601, but may be implemented by a single chip.
The Memory 605 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 605 includes a non-transitory computer-readable medium. The memory 605 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 605 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 605 may alternatively be at least one storage device located remotely from the central processor 601. As shown in fig. 6, memory 605, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and program instructions.
In the management terminal 600 shown in fig. 6, the user interface 603 is mainly used for providing an input interface for a user and acquiring data input by the user; the processor 601 may be configured to invoke the computer room disaster recovery processing application stored in the memory 605, and specifically execute the following operations:
when a first machine room fault is detected, determining a terminal with a fault in the first machine room as a target terminal; the target terminal is used for processing the message in the first cluster of the first machine room;
sending a disaster tolerance instruction to the target terminal of the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A method for processing disaster tolerance of a machine room is characterized by comprising the following steps:
when a first machine room fault is detected, determining a terminal with a fault in the first machine room as a target terminal; the target terminal is used for processing the message in the first cluster of the first machine room;
sending a disaster tolerance instruction to the target terminal of the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster.
2. The method of claim 1, wherein the target terminal is a server;
the sending of the disaster recovery instruction to the target terminal of the second machine room includes:
sending a first instruction to the server of the second machine room; the first instruction is used for instructing the server of the second machine room to start and configure first configuration information of the server of the first machine room; the first configuration information is configuration information required by the server of the second machine room to write a message into the first cluster;
after the server of the second machine room is started, sending a second instruction to the server of the second machine room; the second instruction is used for instructing the server of the second computer room to write a message to the first cluster.
3. The method of claim 1, wherein the target terminal is a user terminal;
the sending of the disaster recovery instruction to the target terminal of the second machine room includes:
sending a third instruction to the user terminal of the second machine room; the third instruction is used for instructing the user terminal of the second machine room to start and configuring second configuration information of the user terminal of the first machine room; the second configuration information is configuration information required by the user terminal of the second machine room to read messages from the first cluster;
after the user terminal of the second machine room is started, sending a fourth instruction to the user terminal of the second machine room; the fourth instruction is configured to instruct the user terminal of the second equipment room to read a message to the first cluster.
4. The method according to claim 3, wherein after the sending a fourth instruction to the user terminal of the second room after the user terminal of the second room is turned on, further comprising:
detecting whether a server of the first machine room fails;
when the server of the first machine room fails, sending a fifth instruction to the server of the second machine room; the fifth instruction is used for instructing the server of the second machine room to start and writing a message to a second cluster of the second machine room;
after the server of the second machine room is started, sending a sixth instruction to the user terminal of the second machine room; the sixth instruction is configured to instruct the user terminal in the second equipment room to read a message to the first cluster and the second cluster at the same time.
5. The method of claim 1, wherein the first cluster comprises at least two partitions, each located in a different geographic location.
6. The method of claim 5, wherein when the first machine room fault is detected, before determining that the terminal with the fault in the first machine room is the target terminal, further comprising:
recording partition position information of the first cluster;
when the partition fault of the first cluster is detected, acquiring the partition position information corresponding to the fault partition;
and generating and sending warning information to a preset manager terminal based on the partition position information.
7. The method according to claim 1, wherein after sending the disaster recovery instruction to the target terminal in the second computer room, the method further comprises:
when the target terminal of the first machine room is detected to be repaired, sending a reset instruction to the target terminal of the second machine room; the reset instruction is used for instructing the target terminal of the second machine room to stop processing the message to the first cluster.
8. A management terminal, characterized in that the management terminal comprises:
the system comprises a detection module, a processing module and a processing module, wherein the detection module is used for determining a terminal with a fault in a first machine room as a target terminal when the fault of the first machine room is detected; the target terminal is used for processing the message in the first cluster of the first machine room;
the sending module is used for sending a disaster tolerance instruction to the target terminal of the second machine room; the disaster recovery instruction is used for instructing the target terminal of the second computer room to process the message of the first cluster.
9. A management terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202011058742.1A 2020-09-30 2020-09-30 Disaster recovery processing method, terminal and storage medium for machine room Active CN112291082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011058742.1A CN112291082B (en) 2020-09-30 2020-09-30 Disaster recovery processing method, terminal and storage medium for machine room

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011058742.1A CN112291082B (en) 2020-09-30 2020-09-30 Disaster recovery processing method, terminal and storage medium for machine room

Publications (2)

Publication Number Publication Date
CN112291082A true CN112291082A (en) 2021-01-29
CN112291082B CN112291082B (en) 2023-08-29

Family

ID=74422647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011058742.1A Active CN112291082B (en) 2020-09-30 2020-09-30 Disaster recovery processing method, terminal and storage medium for machine room

Country Status (1)

Country Link
CN (1) CN112291082B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254274A (en) * 2021-04-21 2021-08-13 北京大米科技有限公司 Message processing method, device, storage medium and server
CN113810456A (en) * 2021-02-09 2021-12-17 京东科技信息技术有限公司 Data acquisition method, device, system, computer equipment and storage medium
CN114095343A (en) * 2021-11-18 2022-02-25 深圳壹账通智能科技有限公司 Disaster recovery method, device, equipment and storage medium based on double-active system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106341454A (en) * 2016-08-23 2017-01-18 世纪龙信息网络有限责任公司 Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method
CN107395729A (en) * 2017-07-27 2017-11-24 深圳乐信软件技术有限公司 A kind of consumption system of message queue, method and device
CN111130835A (en) * 2018-11-01 2020-05-08 中国移动通信集团河北有限公司 Data center dual-active system, switching method, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106341454A (en) * 2016-08-23 2017-01-18 世纪龙信息网络有限责任公司 Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method
CN107395729A (en) * 2017-07-27 2017-11-24 深圳乐信软件技术有限公司 A kind of consumption system of message queue, method and device
CN111130835A (en) * 2018-11-01 2020-05-08 中国移动通信集团河北有限公司 Data center dual-active system, switching method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810456A (en) * 2021-02-09 2021-12-17 京东科技信息技术有限公司 Data acquisition method, device, system, computer equipment and storage medium
CN113254274A (en) * 2021-04-21 2021-08-13 北京大米科技有限公司 Message processing method, device, storage medium and server
CN114095343A (en) * 2021-11-18 2022-02-25 深圳壹账通智能科技有限公司 Disaster recovery method, device, equipment and storage medium based on double-active system

Also Published As

Publication number Publication date
CN112291082B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN112291082B (en) Disaster recovery processing method, terminal and storage medium for machine room
US9575745B1 (en) Immediately launching applications
US10601680B2 (en) Application resiliency using APIs
CN110750393B (en) Method, device, medium and equipment for avoiding network service double-machine hot standby brain cracking
CN109376197B (en) Data synchronization method, server and computer storage medium
CN109788068B (en) Heartbeat state information reporting method, device and equipment and computer storage medium
CN107666493B (en) Database configuration method and equipment thereof
CN103607428A (en) Method of accessing shared memory and apparatus thereof
CN111147274B (en) System and method for creating a highly available arbitration set for a cluster solution
CN112636992B (en) Dynamic routing method, device, equipment and storage medium
US20210311768A1 (en) Switching between master and standby container systems
CN112286723A (en) Computer room disaster recovery control method, terminal and storage medium
CN115328752B (en) Cluster simulation method and system for Kubernetes control plane test
US8832215B2 (en) Load-balancing in replication engine of directory server
CN112286904A (en) Cluster migration method and device and storage medium
CN107145399B (en) Shared memory management method and shared memory management equipment
CN110138753B (en) Distributed message service system, method, apparatus, and computer-readable storage medium
CN112363815B (en) Redis cluster processing method and device, electronic equipment and computer readable storage medium
CN111092828A (en) Network operation method, device, equipment and storage medium
CN110321199B (en) Method and device for notifying common data change, electronic equipment and medium
CN112350921A (en) Message processing method, terminal and storage medium
CN106851535B (en) Method and device for sharing Bluetooth by multiple systems
CN109151016B (en) Flow forwarding method and device, service system, computing device and storage medium
JP2012524923A (en) Method, apparatus, and computer program for maintaining service in a high availability environment (method for maintaining service in a high availability environment)
CN111405313A (en) Method and system for storing streaming media data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant