CN112256497A

CN112256497A - Universal high-availability service implementation method, system, medium and terminal

Info

Publication number: CN112256497A
Application number: CN202011172230.8A
Authority: CN
Inventors: 嵇斌
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2021-01-22
Anticipated expiration: 2040-10-28
Also published as: CN112256497B

Abstract

The invention provides a method, a system, a medium and a terminal for realizing a universal high-availability service, wherein the method comprises the following steps: when a service needs to be started, request information is sent to the high-availability agent service through a daemon service agent, and the high-availability agent service in each node is inquired according to identification information in the service information; writing the node identification of the first node into the service path; after the writing is finished, monitoring the service in real time and starting a working state; monitoring the service paths in a distributed database; controlling the second node to switch the working state according to the real-time monitoring result of the service and the change of the service path; the invention can provide a general framework to integrate the prior traditional service without additional secondary development work, so that the requirement of high availability can be achieved with minimum cost, and the traditional problems of high switching delay and the need of additional more system resources are solved.

Description

Universal high-availability service implementation method, system, medium and terminal

Technical Field

The present invention relates to the field of computers and communications, and in particular, to a method, a system, a medium, and a terminal for implementing a general high-availability service.

Background

In the field of computer clusters, high availability of services is a very important function, and generally, the primary purpose of high availability of services is to tolerate the inevitable faults of software and hardware for realizing the services and to migrate to a standby service node in time once the software and hardware dependent on the services have the faults.

Currently, the migration of common services adopts the following ways:

one is a heartbeat detection method, which mainly detects the service of a working node (Active), and when the service is found to be abnormal or inoperable, executes a related switching process to switch the service of the working node (Active) to a Standby node (Standby), but the heartbeat detection has a tolerance time when the maximum service is unavailable, which increases the waiting delay of service switching (increases the time when the service is unavailable). Also, the standby service of the standby node may be in an off (cold standby) or on (hot standby) state. The benefit of cold backup is not to use too much system resources (CPU, memory, etc.), but the disadvantage is to further increase the delay of the handoff (increase the time when service is unavailable). The opposite of hot backup is that the hot backup does not bring extra switching delay, but needs real-time standby, which increases the overhead of the system.

Secondly, a load sharing mode is adopted, and another requirement of the mode is that the service single node may not meet the performance requirement. The load sharing has the advantages that the relation of the main node and the standby node does not exist, and the service of all the nodes is in an Active state. The disadvantage is that the handling of the state must be agreed between the load balancer and the service.

In addition, both of the above approaches require additional development work, such as agreement content for a heartbeat, tolerance of a time interval for failure, and so on. Therefore, a new general service implementation method with high availability is needed to overcome the problem of non-concurrent solution of high handover delay and the need for additional more system resources.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present invention provides a method, system, medium and terminal for implementing a universal high-availability service, so as to solve the above technical problems.

The invention provides a general high-availability service implementation method, which comprises the following steps:

when a service needs to be started, sending request information to a high-availability agent service through a daemon service agent, wherein the request information comprises service identification information for identifying the service;

the high-availability agent service in each node inquires according to the identification information in the service information and respectively acquires an inquiry result;

the query result of the high-availability proxy service in the first node comprises a service path, and the high-availability proxy service of the first node writes the node identification of the first node into the service path;

after the writing is finished, monitoring the service in real time, feeding back the service to the daemon service agent, and starting the working state of the service in the first node according to the feedback;

when the query result of the high-availability agent service of the second node is not empty, judging that the service exists in an instance, and monitoring the service path in the distributed database;

and controlling the second node to switch the working state and start the working state of the service according to the real-time monitoring result of the service and the change of the service path in the monitored distributed database.

Optionally, the daemon service agent sends request information to the high-availability agent service in a message queue manner, and applies for becoming a working state through the request information.

Optionally, after the high-availability proxy service of the first node writes the service path and the node identifier of the first node into a distributed database, setting a release time for the service path; and after the service in the first node starts the working state according to the feedback, the state is periodically updated to the distributed database through the high-availability proxy service within the release time.

Optionally, the distributed database is a highly available key value storage system, and the service process is monitored in real time through the service daemon.

Optionally, when the first node monitors that the currently running service fails, deleting a service path in the distributed database through the high-availability proxy service;

when monitoring the change of the service path in the distributed database, the second node triggers re-inquiry to obtain an inquiry result, and when the result is empty, the high-availability proxy service of the second node writes the node identifier of the second node into the service path;

and after the writing is finished, continuing to monitor the service in real time, feeding back to the daemon service agent, controlling the second node to switch the working state, and starting the working state of the service.

Optionally, when the network interruption occurs in the first node, after the network interruption exceeds a preset time threshold, the second node is controlled to switch the working state, and the service working state is started.

The present invention also provides a universal high availability service framework system, comprising: a distributed database, a highly available agent service module for providing a highly available agent service, a daemon service agent module for providing a daemon service agent,

Optionally, the high-availability agent service module includes a distributed data client and a message queue service unit, and the daemon service agent module includes a process monitoring unit and a message queue unit.

The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

The present invention also provides an electronic terminal, comprising: a processor and a memory;

the memory is adapted to store a computer program and the processor is adapted to execute the computer program stored by the memory to cause the terminal to perform the method as defined in any one of the above.

The invention has the beneficial effects that: the universal high-availability service implementation method, the system, the medium and the terminal can provide a universal framework to integrate the conventional services without additional secondary development work, so that the high-availability requirement can be achieved at the minimum cost, and the conventional problems of high switching delay and the need of additional more system resources are solved.

Drawings

Fig. 1 is a schematic diagram of a component deployment relationship of a general method for implementing a high availability service in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a high availability proxy service component of a general implementation method of a high availability service in an embodiment of the present invention.

Fig. 3 is a schematic diagram of a service agent component of a general method for implementing a high availability service in an embodiment of the present invention.

Fig. 4 is a service starting and active/standby switching flow of a general high available service implementation method in an embodiment of the present invention.

Fig. 5 is a schematic normal state diagram of a node 1 in a general method for implementing a high availability service in the embodiment of the present invention.

Fig. 6 is a schematic diagram of node 2 switching when node 1 fails in the general method for implementing high available service in the embodiment of the present invention.

Fig. 7 is a flowchart illustrating a general method for implementing a high availability service according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present invention, however, it will be apparent to one skilled in the art that embodiments of the present invention may be practiced without these specific details, and in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring embodiments of the present invention.

As shown in fig. 1, the method for implementing a general high availability service in this embodiment includes:

s101, when a service needs to be started, request information is sent to a high-availability agent service through a daemon service agent, and the request information comprises service identification information used for identifying the service;

s102, the high-availability agent service in each node carries out inquiry according to the identification information in the service information and respectively obtains inquiry results;

s103, the query result of the high-availability proxy service in the first node comprises a service path, and the high-availability proxy service of the first node writes the node identifier of the first node into the service path;

s104, after the writing is finished, monitoring the service in real time, feeding back the service to a daemon service agent, and starting the working state of the service in the first node according to the feedback;

s105, when the query result of the high-availability agent service of the second node is not empty, judging that a service exists in an instance, and monitoring a service path in a distributed database;

and S106, controlling the second node to switch the working state and starting the working state of the service according to the real-time monitoring result of the service and the change of the monitored service path in the distributed database.

In this embodiment, the high-availability Agent Service (Agent), the daemon Service Agent (Proxy), the distributed database, the Service daemon, and other components are mainly used to complete the operation, and in a high-reliability cluster system, at least two nodes are usually involved, and a deployment relationship between a node and the above components and between services (services) of specific services is shown in fig. 1. Compared with the traditional cold backup mode and hot backup mode, the method in the embodiment can quickly provide high availability based on Active-Standby (Active-Standby) for the service. And the standby node running under the framework can complete the switching in shorter switching time when the switching is needed and use less system resources when the standby node is standby.

In the embodiment, the high-availability agent service mainly consists of a distributed data client and a message queue service, and the high-availability agent service is associated with a daemon service agent and a distributed database, as shown in fig. 2. The main components of the daemon service agent are as follows: the process monitoring module and the message queue client form an association between a daemon service agent and a high-availability agent service and an application service, as shown in FIG. 3. The daemon service agent in this embodiment may send request information to the high-availability agent service in a message queue manner, and apply for becoming a working state through the request information. When a Service needs to be started, an instance of a daemon Service agent (Proxy) needs to be started as a dependency. In this embodiment, the Service at this time is not really started, the Service needs to wait for confirmation of the daemon Service agent Proxy, and this mechanism can be provided by the Service daemon. The daemon Service Agent Proxy sends a message application to a high-availability Agent Service (Agent) in a message queue mode to become a working state (Active), the high-availability Agent Service Agent identifies the Service (Service Name) by using a unique identifier in the Service information, and meanwhile, the high-availability Agent Service Agent also has an identity (Node ID) which is located in the identifier to identify a Node.

In this embodiment, after the high-availability proxy service of the first node writes the service path and the node identifier of the first node into the distributed database, release time is set for the service path; after the Service in the first node starts the working state according to the feedback, the state is periodically updated to the distributed database through the high-availability proxy Service within the release time, and the Agent of the first node queries the distributed database by using the Service identifier (Service Name) in the two unique identifiers in the above steps, optionally, the distributed database in this embodiment adopts a distributed KV library (high-availability Key/Value storage system), and the query path includes, for example: if the result is not null, the corresponding path in the distributed KV library is: for example: the/services/< Service Name >, writes its Node ID, and sets the path with a release time t0, for example, 30 seconds. Similarly, the high-availability Agent of the second node executes the same query operation, and if the result returned by the query is not null, the service is considered to have an existing instance.

In this embodiment, the highly available proxy service Agent of the first node, after the write is completed, starts to actually listen to the state of the service process using the function of the service daemon (system). And simultaneously returning a Proxy program result, and then formally starting the Service on the first node to provide the Service. The Agent periodically and continuously updates the state of the distributed KV library (within the release time t 0)

In this embodiment, when the second node queries that the current Service already exists, the Agent process starts a monitoring process to monitor/services/< Service Name > path change, and when the path changes, the Agent can obtain timely feedback. At this time, Proxy is always in a waiting state, so Service is not really started, and the purpose of reducing system overhead is achieved.

In this embodiment, when the first node monitors that the currently running service fails, deleting a service path in the distributed database through the high-availability proxy service; when the second node monitors the change of the service path in the distributed database, re-query is triggered to obtain a query result, and when the result is empty, the high-availability proxy service of the second node writes the node identifier of the second node into the service path; and after the writing is finished, continuing to monitor the service in real time, feeding back to the daemon service agent, controlling the second node to switch the working state, and starting the working state of the service. When the network interruption occurs to the first node, the second node is controlled to switch the working state and start the service working state after the preset time threshold value is exceeded.

A specific embodiment of a cluster system with 2 nodes is described below:

s01, when a Service needs to be started, an instance of a daemon Service agent (Proxy) needs to be started as a dependency.

And S02.proxy sends a message application to a high-availability Agent Service (Agent) in a message queue mode to become a working state (Active), and the Agent uses the unique identifier in the Service information to identify the Service (Service Name).

S03, the Agent of the node 1 queries the distributed KV library by using a Service Name in the two unique identifiers in the step, wherein the query path comprises the following steps: if the result is not null, the corresponding path in the distributed KV library is: for example: the/services/< Service Name >, writes its Node ID, and sets the path to a release time t0 (e.g., 30 seconds).

And S04, similarly, executing the same query operation by the Agent of the node 2, and if the query returned result is not null, considering that the service already has an existing instance.

And S05, after the Agent of the node 1 finishes writing, beginning to monitor the state of the service process in real time by using the function of a service daemon (systemd). And simultaneously, returning a Proxy program result, and then formally starting the Service on the node 1 to provide the Service. The Agent continuously updates the state to the distributed KV repository periodically (for release time t 0).

And S06, under the condition that the node 2 inquires that the current Service exists, the Agent process starts a monitoring process to monitor/services/Service Name > path change, and when the path changes, the Agent feeds back.

S07, when the Service (Service) currently running in the node 1 fails, the Service exits abnormally, at the moment, the Agent on the node 1 can monitor the abnormality at the first time, and at the moment, the Agent can delete/services/Service Name paths to the distributed KV library. Node 1 will then repeat from step 1.

S08, when the/services/Service Name > path in the distributed KV library is deleted, the node 2 can timely monitor that the monitored path changes, the node 2 starts new re-query from the step 3, the query result is expected to be empty, and therefore the step 5 is entered, the working state is switched, and the Service is started to provide Service for the outside.

S09. if a network outage occurs at the node that is working (Active), the distributed KV vault can sense the existence of such a failure after a maximum t0 timeout time, triggering step S08 above. The main service initiation and handover steps are shown in fig. 4.

The above-mentioned steps S08- > S03- > S05 correspond to the entire state switching process, and these changes are monitored in real time, so there is no timeout of the fault tolerance time. The overall process timeout depends on the corresponding time of operation of the distributed database, and in a local area network, such timeout is usually in the order of milliseconds or even 1 millisecond. The switching time is greatly improved compared to heartbeat detection (typically at least 1 second). Step S06 corresponds to the service status of the Standby node, at this time, the Standby (Standby) service is in a Standby state, and all the resource overhead is not allocated, so that the purpose of saving system resources is achieved.

A specific embodiment of a cluster system with 3 nodes is described below:

for example, the installation of a product deploys a service. The service needs to run on 3 nodes while only one service instance is running, which is waiting for two services to be ready. When an instance of a service fails, one of the two services waiting for a ready state needs to be able to take over immediately.

For example, the service for installing the deployment service in this embodiment is: service "

The distributed database adopted in the embodiment is a distributed KV library, and the distributed KV library is deployed on 3 nodes to form a distributed consistent database. The high available proxy service running on each node is: eha-agent of the order list of the agent,

the daemon service agent is: eha-proxy, the deployment of which is shown in FIG. 5. Wherein eha-proxy provides the ability to dynamically bind the application service monitored, and eha-proxy provides the service through the service description of system, in this embodiment, when node 1 fails, as in the above method,

nodes

2 and 3 can migrate the traffic to

node

2 or 3 in time according to the change, as shown in fig. 6.

Correspondingly, the present embodiment further provides a universal high-availability service framework system, including: a distributed database, a highly available agent service module for providing a highly available agent service, a daemon service agent module for providing a daemon service agent,

The high-availability agent service module in the embodiment comprises a distributed data client and a message queue service unit, and the daemon service agent module comprises a process monitoring unit and a message queue unit.

The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements any of the methods in the present embodiments.

The present embodiment further provides an electronic terminal, including: a processor and a memory;

the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the method in the embodiment.

The computer-readable storage medium in the present embodiment can be understood by those skilled in the art as follows: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The electronic terminal provided by the embodiment comprises a processor, a memory, a transceiver and a communication interface, wherein the memory and the communication interface are connected with the processor and the transceiver and are used for completing mutual communication, the memory is used for storing a computer program, the communication interface is used for carrying out communication, and the processor and the transceiver are used for operating the computer program so that the electronic terminal can execute the steps of the method.

In this embodiment, the Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In the above embodiments, unless otherwise specified, the description of common objects by using "first", "second", etc. ordinal numbers only indicate that they refer to different instances of the same object, rather than indicating that the objects being described must be in a given sequence, whether temporally, spatially, in ranking, or in any other manner. In the above-described embodiments, reference in the specification to "the present embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least some embodiments, but not necessarily all embodiments. The multiple occurrences of "the present embodiment" do not necessarily all refer to the same embodiment.

In the embodiments described above, although the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory structures (e.g., dynamic ram (dram)) may use the discussed embodiments. The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims.

The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The foregoing embodiments are merely illustrative of the principles of the present invention and its efficacy, and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A method for realizing a universal high-availability service is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the daemon service agent sends request information to the high availability agent service by means of message queue, and applies for working status by the request information.

3. The method for implementing universal high availability service according to claim 1, wherein after the high availability proxy service of the first node writes the service path and the node identifier of the first node into a distributed database, a release time is set for the service path; and after the service in the first node starts the working state according to the feedback, the state is periodically updated to the distributed database through the high-availability proxy service within the release time.

4. The method as claimed in claim 3, wherein the distributed database is a high availability key-value storage system, and the service process is monitored in real time by a service daemon process.

5. The generic high availability service implementation method of claim 4,

when the first node monitors that the service running at present has a fault, deleting a service path in the distributed database through the high-availability proxy service;

6. The method as claimed in claim 4, wherein when the network outage occurs in the first node, the second node is controlled to switch the working state and start the service working state after a preset time threshold is exceeded.

7. A universal high availability service framework system, comprising: a distributed database, a highly available agent service module for providing a highly available agent service, a daemon service agent module for providing a daemon service agent,

8. The universal high availability service framework system according to claim 7, wherein the high availability agent service module comprises a distributed data client and a message queue service unit, and the daemon service agent module comprises a process monitoring unit and a message queue unit.

9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the method of any one of claims 1 to 6.

10. An electronic terminal, comprising: a processor and a memory;

the memory is for storing a computer program and the processor is for executing the computer program stored by the memory to cause the terminal to perform the method of any of claims 1 to 6.