CN112965847A - Fault processing method, device, equipment and storage medium of micro-service architecture - Google Patents

Fault processing method, device, equipment and storage medium of micro-service architecture Download PDF

Info

Publication number
CN112965847A
CN112965847A CN202110236680.7A CN202110236680A CN112965847A CN 112965847 A CN112965847 A CN 112965847A CN 202110236680 A CN202110236680 A CN 202110236680A CN 112965847 A CN112965847 A CN 112965847A
Authority
CN
China
Prior art keywords
component
service
target
determining
components
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110236680.7A
Other languages
Chinese (zh)
Other versions
CN112965847B (en
Inventor
许超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110236680.7A priority Critical patent/CN112965847B/en
Publication of CN112965847A publication Critical patent/CN112965847A/en
Application granted granted Critical
Publication of CN112965847B publication Critical patent/CN112965847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Hardware Redundancy (AREA)

Abstract

The disclosure relates to a fault processing method, a fault processing device, equipment and a storage medium of a micro-service architecture, relating to the technical field of computers, in particular to the fields of cloud computing, internet of things and the like. The specific implementation scheme is as follows: determining a fault component causing the abnormal operation parameters under the condition that the abnormal operation parameters are detected; determining a target replacement component having the same function as the failed component; and disconnecting the downstream component of the failed component from the failed component and establishing the connection of the downstream component of the failed component and the target replacement component. When the operating parameters in the micro service architecture are abnormal, the quick positioning and connection conversion of the fault can be realized, and the function of multi-activity in different places can be realized. And the normal operation of the micro-service architecture is met.

Description

Fault processing method, device, equipment and storage medium of micro-service architecture
Technical Field
The present disclosure relates to the technical field of computers, and in particular, to the fields of cloud computing, internet of things, and the like.
Background
As an architectural model, the microservice architecture is used to implement the partitioning of a complex system or application into multiple microservices, each of which may implement an independent business logic.
In order to reduce the development cost of developers, the micro-service architecture abstracts and summarizes the communication logic of each micro-service program by separating the service logic and the communication logic in each micro-service program to form a proxy program for each micro-service program.
The agent program is responsible for acting on the associated micro service program to perform data communication so as to realize the service management function. A plurality of agents for acting on a plurality of micro-servers for data communication form a Service Mesh.
The related art has the defect that service management cannot be performed in time when the service management function is executed.
Disclosure of Invention
The disclosure provides a fault handling method, a device, equipment and a storage medium for a micro-service architecture.
According to an aspect of the present disclosure, there is provided a method for fault handling of a microservice architecture, which may include the steps of:
determining a fault component causing the abnormal operation parameters under the condition that the abnormal operation parameters are detected;
determining a target replacement component having the same function as the failed component;
and disconnecting the downstream component of the failed component from the failed component and establishing the connection of the downstream component of the failed component and the target replacement component.
According to another aspect of the present disclosure, a fault handling apparatus of a micro service architecture is provided, which may specifically include the following components:
the fault finding module is used for determining a fault component causing the abnormal operation parameters under the condition that the abnormal operation parameters are detected;
the target replacing component determining module is used for determining a target replacing component with the same function as the failed component;
and the transfer module is used for disconnecting the downstream component of the failed component from the failed component and establishing the connection between the downstream component of the failed component and the target replacing component.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, in the face of the condition that the operating parameters in the micro-service architecture are abnormal, the quick positioning and connection conversion of the fault can be realized, and the function of multi-activity in different places is realized. And the normal operation of the micro-service architecture is met.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow diagram of a method of fault handling for a microservice architecture in accordance with the present disclosure;
FIG. 2 is a flow diagram of a determine target replacement component according to the present disclosure;
FIG. 3 is a flow diagram of a determine target replacement component according to the present disclosure;
FIG. 4 is a schematic diagram of a microservice architecture according to the present disclosure;
FIG. 5 is a schematic diagram of a microservice architecture according to the present disclosure;
FIG. 6 is a schematic diagram of a fault handling apparatus in accordance with the microservice architecture of the present disclosure;
fig. 7 is a block diagram of an electronic device for implementing a fault handling method of a microservice architecture of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, the present disclosure provides a method for fault handling of a microservice architecture, which may include the steps of:
s101: determining a fault component causing the abnormal operation parameters under the condition that the abnormal operation parameters are detected;
s102: determining a target replacement component having the same function as the failed component;
s103: and disconnecting the downstream component of the failed component from the failed component and establishing the connection of the downstream component of the failed component and the target replacement component.
The execution subject of the above embodiment may be a control terminal in the micro service architecture. The microservice architecture can include at least one serving grid system component, each of which can be deployed in an Internet Data Center (IDC).
The internet data centers may be located in different geographical areas (geographic locations). For example, it may be located in different cities or even different countries; alternatively, the system may be installed in different areas of the same city.
At least a control plane component and a data plane component having upstream and downstream communication logic may be included in the serving grid system components.
The data plane component is used for taking over all flow entering and exiting the micro-service program, and the core service management capability is realized.
For example, the core's service administration capabilities may include several categories of connectivity, security, control, and observation.
Under each category, it can be subdivided into different subcategories.
For example, connection categories may include sub-categories of load balancing, blowing, fault injection, and retry.
Security categories may include authentication, authorization, and encryption, among other sub-categories.
The control categories may include sub-categories of access, rate, and quota.
The observation category may include sub-categories of dynamic acquisition, call chain tracking and monitoring, and the like.
In one example, the data plane component may include a Sidecar (Sidecar) component that may be deployed with the microservice program.
The control plane is used for pushing contents such as service policy information to the data plane. Taking service management as an example for executing load balancing, the control plane may send random policy information, minimum connection policy information, or polling policy information to the sidecar component according to an instruction of the user.
The control plane may include a Pilot component.
In addition, the service grid system component may further include a storage system component for storing service policy information for controlling the data plane oriented push. When the control plane pushes the service policy information to the data plane, the corresponding service policy information needs to be acquired from the storage system component. Wherein, each service grid system component can share one storage system component. Alternatively, each service grid system component may be provided with a storage system component.
Under the condition that each service grid system component is provided with a storage system component in a matching way, the content stored by each storage system component can be the same with each other by using the consistency guarantee.
The interaction of the control plane components and the data plane components may include the following two categories. One is that the control plane pushes service strategy information to the data plane; and secondly, the data plane component sends a request to the control plane to acquire service policy information. In the case where a large number of data plane components send a large number of requests to the control plane component, a large number of requests are easily formed to cause impact on the control plane component, and finally, the control plane component fails. Once the control plane fails, the failed control plane component can no longer handle the requests sent by the data plane component, resulting in failure propagation.
In addition, the interaction between the control plane component and the storage system component also comprises two types, wherein the control plane component sends a policy request to the storage system component and is issued by the storage system component. Similarly, when the number of requests is large or the amount of data to be transmitted is large, a failure is likely to occur. The above is merely an exemplary proposed failure formation cause of the microservice architecture, and the actual failure formation cause includes multiple causes, which are not described in detail herein.
Service discovery techniques may be utilized to monitor operational parameters of components of a services grid system to enable real-time discovery of faults. For example, in the case where it is detected that a port of the control plane component is not available, or the CPU load exceeds a threshold value, it may be determined that an abnormality occurs in the operation parameter. And analyzing the abnormal parameters to determine the fault component causing the abnormal operation parameters.
For example, a storage system component failure, a piloting component failure, and/or a sidecar component failure, etc. may be determined. For example, in a case that it is determined that the operating parameter abnormality occurs in the control plane component in the serving grid system component of the a-zone, the control plane component in the serving grid system component of the a-zone may be determined as a failed component.
Second, service grid system components in other regions query for replacement components having the same functionality as the failed component. In the embodiment of the present application, service grid system components arranged in different regions may be set to be disaster-tolerant to each other. That is, each of the serving grid system components are consistent with each other so that when any one node of the serving grid system components fails, a replacement node can be quickly determined from the other serving grid system components.
For example, in the case where the control plane component in the serving grid system component of the a-zone is determined to be the failed component, the control plane component in the serving grid system component of the B-zone may be used as the substitute node. In this way, the connection between the downstream component of the failed component (data plane component in the serving grid system component of the a region) and the failed component (control plane component in the serving grid system component of the a region) can be disconnected, and the connection between the downstream component of the failed component can be switched to the control plane component in the serving grid system component of the B region.
For single failed components, a fail-safe (FailSafe) fault tolerance mechanism may be implemented, for multiple failed components or cluster failures, a fast failure (FailFast) fault tolerance mechanism and a FailOver (FailOver) fault tolerance mechanism may be implemented. In addition, for a failed component, a fail-back (FailBack) fault tolerance mechanism is also included.
By the scheme, when the operating parameters in the micro service architecture are abnormal, rapid positioning and connection conversion of faults can be realized, and the function of multi-activity in different places is realized. And the normal operation of the micro-service architecture is met.
As shown in fig. 2, in an embodiment, the determining that the failed component in step S102 has the same function as the replacement component may further include the following sub-steps:
s201: determining a plurality of candidate replacement components having the same function as the failed component;
s202: a target replacement component is determined from the candidate replacement components according to a predetermined rule.
In the case where the service grid system components disposed in different geographic locations are set to be disaster-tolerant to each other, the storage system components in each service grid system component may be determined as candidate replacement components having the same function.
In addition, the control plane component in each service grid system component can also be determined as a candidate substitute component with the same function; alternatively, the storage system components in each service grid system component may also be determined as candidate replacement components with the same functionality.
The predetermined rule may be a bottom-of-pocket rule. For example, there are A, B, C three geographic service grid system components. Candidate replacement components in the first to third orders may be determined (illustratively) in order A, B, C, depending on the geographic location of each service grid system, the hardware conditions of each service grid system, and so on. That is, if B, C any of the service grid system components in the zone fail, the service grid system component in the zone A may be determined to be a candidate replacement component. In the event of a failure of a service grid system component of the a domain, the service grid system component of the B domain may be determined to be the target replacement component.
Through the scheme, under the condition that a plurality of candidate replacing components exist, the target replacing component can be directly determined according to the preset rule. The time for comparison with each other may be omitted so that the target replacement component is selected in the most efficient manner.
As shown in FIG. 3, in one embodiment, determining a target replacement component from the candidate replacement components according to a predetermined rule includes:
s301: determining a physical distance of the failed component from each candidate replacement component;
s302: a target replacement component is determined from the plurality of candidate replacement components based on the physical distance.
Illustratively, A, B, C three regions are Beijing, Nanjing, and Guangzhou, respectively. Under the condition that the control surface component in the service grid system component of Beijing is determined to have a fault, the physical distance from the Beijing to the Nanjing and the physical distance from the Beijing to Guangzhou can be respectively tested.
By comparing the distances, the physical distance from Nanjing to Beijing is shorter than the physical distance from Guangzhou to Beijing. Based on this, the control plane component in the service grid system component of Nanjing may be determined to be the target replacement component.
Through the scheme, under the condition that the components in any region have faults, the target with the shorter distance can be selected to replace the components according to the physical distance. Therefore, the time delay situation of service processing can be reduced to the maximum extent after the target replacing component is put into operation.
In one embodiment, the micro-service architecture includes service grid system components disposed in different regions;
the service grid system component comprises a plurality of functional components with upstream and downstream relations, and each functional component correspondingly executes different functions.
In the current embodiment, the service grid system components may be located in different geographic regions. Illustratively, the different regions may be different cities or different countries. The service grid system component is used as an acquisition and execution main body of the service strategy, and can provide services for users of corresponding cities or countries nearby through a plurality of functional components with upstream and downstream relations, so that the access efficiency of the users of corresponding geographic positions is improved.
In one embodiment, the plurality of functional components having an upstream and downstream relationship comprises: a storage system component and a service policy execution component;
the function of the storage system component comprises storing a plurality of service strategy information of the micro service;
the service policy enforcement component functions include determining and enforcing a target service policy from a plurality of service policy information.
In one embodiment, the service policy enforcement component includes a control plane component and a data plane component;
the control plane component is used for sending target service strategy information determined from the multiple service strategy information to the data plane component according to the strategy information determination instruction;
the data plane component functions include executing a service policy corresponding to the target service policy information according to the received target service policy information.
In the example shown in fig. 4, three regions are included, respectively beijing, nanjing, and guangzhou.
The services grid system component may include a storage system component and a services policy enforcement component having an upstream and downstream relationship. And the service policy enforcement component may be further comprised of a control plane component and a data plane component.
The solid lines in figure 4 represent the upstream and downstream communication diagrams of the components of the services grid system in the absence of a fault. The dashed line may be a schematic diagram of the adjusted upstream and downstream relationship in the event of a failure. For example, in the event of a failure of a control plane component of Beijing, the data plane component of Beijing may be connected downstream of the control plane component of Guangzhou or Nanjing.
In addition, the system can further comprise a service administration platform, and the user can issue a policy information determination instruction through the service administration platform, for example, the system can include instructions corresponding to actions such as issuing, adjusting and updating the service policy information.
By the scheme, under the condition that any node in the service grid system component has a fault, the fast switching can be realized. So as to meet the requirements of multiple activities in different places.
As shown in connection with FIG. 5, in one embodiment, the plurality of functional components in an upstream and downstream relationship further comprises a distribution layer component;
the distribution layer component is used for carrying out consistency adjustment on the stored contents in the case that the stored contents of the storage system components in the service grid system components of different regions are inconsistent.
The micro-service architecture corresponding to fig. 5 is further improved on the basis of the micro-service architecture shown in fig. 4. And when the user changes the strategy corresponding to the service management capability, the distribution layer component can distribute the changed strategy according to the service strategy information change condition of the user on the service management platform. The consistency adjustment of the storage content of the storage system components in the service grid system components in different regions is met.
Illustratively, the user performs policy upgrade on the service administration platform in the a zone. And the distribution layer component acquires the upgraded strategy and distributes the upgraded strategy to the storage system components of each region so as to meet the requirement of realizing full-scale storage of the storage system components of each region. Firstly, the consistency of the storage system components of each region can be ensured, and secondly, the storage system components of each region can be realized to be disaster-tolerant.
As shown in fig. 6, the present disclosure relates to a fault handling apparatus of a microservice architecture, which may specifically include the following components:
a fault finding module 601, configured to determine a faulty component causing an abnormal operating parameter when the abnormal operating parameter is detected;
a target replacement component determination module 602 for determining a target replacement component having the same function as the failed component;
the transferring module 603 is configured to disconnect the downstream component of the failed component from the failed component, and establish a connection between the downstream component of the failed component and the target replacement component.
In an embodiment, the target replacement component determining module 602 may specifically include:
a candidate replacement component determination sub-module for determining a plurality of candidate replacement components having the same function as the failed component;
and the target replacing component determining and executing submodule is used for determining the target replacing component from the candidate replacing components according to a preset rule.
In one embodiment, the target replacement component determines an execution submodule comprising:
a physical distance determination unit for determining a physical distance of the failed component from each candidate replacement component;
a target replacement component is determined from the plurality of candidate replacement components based on the physical distance.
In one embodiment, the micro-service architecture includes service grid system components disposed in different regions;
the service grid system component comprises a storage system component, a control plane component and a data plane component which have upstream and downstream relations;
the storage system component is used for storing a plurality of service strategy information of the microservice;
the control plane component is used for determining an instruction according to the strategy information and sending target service strategy information determined from the multiple kinds of service strategy information to the data plane component;
and the data plane component is used for executing the service strategy corresponding to the target service strategy information according to the received target service strategy information.
In one embodiment, the components in the microservice architecture further include a distribution layer component;
and the distribution layer component is used for performing consistency adjustment on the storage content under the condition that the storage content of the storage system components in the service grid system components in different regions is inconsistent.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 comprises a computing unit 710, which may perform various suitable actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)720 or a computer program loaded from a storage unit 780 into a Random Access Memory (RAM) 730. In the RAM 730, various programs and data required for the operation of the device 700 can also be stored. The computing unit 710, the ROM 720 and the RAM 730 are connected to each other by a bus 740. An input/output (I/O) interface 750 is also connected to bus 740.
Various components in electronic device 700 are connected to I/O interface 750, including: an input unit 760 such as a keyboard, a mouse, and the like; an output unit 770 such as various types of displays, speakers, and the like; a storage unit 780 such as a magnetic disk, an optical disk, or the like; and a communication unit 790 such as a network card, modem, wireless communication transceiver, etc. The communication unit 790 allows the electronic device 700 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
Computing unit 710 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 710 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 710 performs the various methods and processes described above, such as a fault handling method. For example, in some embodiments, the fault handling method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 780. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 700 via ROM 720 and/or communications unit 790. When the computer program is loaded into RAM 730 and executed by computing unit 710, one or more steps of the fault handling method described above may be performed. Alternatively, in other embodiments, the computing unit 710 may be configured to perform the fault handling method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A fault handling method of a micro-service architecture comprises the following steps:
under the condition that the operating parameters are detected to be abnormal, determining a fault component causing the abnormal operating parameters;
determining a target replacement component having the same function as the failed component;
and disconnecting the downstream component of the failed component from the failed component, and establishing the connection between the downstream component of the failed component and the target replacement component.
2. The method of claim 1, wherein said determining a target replacement component having the same functionality as the failed component comprises:
determining a plurality of candidate replacement components having the same function as the failed component;
determining the target replacement component from the plurality of candidate replacement components according to a predetermined rule.
3. The method of claim 2, the determining the target replacement component from the candidate replacement components according to a predetermined rule, comprising:
determining a physical distance of the failed component from each of the candidate replacement components;
determining a target replacement component from the plurality of candidate replacement components based on the physical distance.
4. The method of claim 1, wherein the micro-service architecture includes service grid system components located in different geographic regions;
the service grid system component comprises a plurality of functional components with upstream and downstream relations, and each functional component correspondingly executes different functions.
5. The method of claim 4, wherein the plurality of functional components having an upstream and downstream relationship comprises: a storage system component and a service policy execution component;
the function of the storage system component comprises storing a plurality of service strategy information of the micro service;
the function of the service policy enforcement component includes determining and enforcing a target service policy from the plurality of service policy information.
6. The method of claim 5, wherein the service policy enforcement component comprises a control plane component and a data plane component;
the control plane component is used for sending target service strategy information determined from the multiple service strategy information to the data plane component according to a strategy information determination instruction;
and the function of the data plane component comprises executing a service strategy corresponding to the target service strategy information according to the received target service strategy information.
7. The method of claim 5, wherein the plurality of functional components having an upstream and downstream relationship further comprises a distribution layer component;
the distribution layer component has the function of performing consistency adjustment on the stored contents under the condition that the stored contents of the storage system components in the service grid system components in different regions are inconsistent.
8. A fault handling apparatus of a microservice architecture, comprising:
the fault finding module is used for determining a fault component causing the abnormal operation parameter under the condition that the abnormal operation parameter is detected;
a target replacement component determination module for determining a target replacement component having the same function as the failed component;
and the transfer module is used for disconnecting the downstream component of the failed component from the failed component and establishing the connection between the downstream component of the failed component and the target replacing component.
9. The apparatus of claim 8, wherein the target replacement component determination module comprises:
a candidate replacement component determination sub-module for determining a plurality of candidate replacement components having the same function as the failed component;
and the target replacing component determining and executing submodule is used for determining the target replacing component from the candidate replacing components according to a preset rule.
10. The apparatus of claim 8, wherein the target replacement component determines an execution submodule comprising:
a physical distance determination unit for determining a physical distance of the failed component from each of the candidate replacement components;
determining a target replacement component from the plurality of candidate replacement components based on the physical distance.
11. The apparatus of claim 8, wherein the micro-service architecture comprises service grid system components located in different regions;
the service grid system component comprises a plurality of functional components with upstream and downstream relations, and each functional component correspondingly executes different functions.
12. The apparatus of claim 11, wherein the plurality of functional components in an upstream and downstream relationship comprises: a storage system component and a service policy execution component;
the function of the storage system component comprises storing a plurality of service strategy information of the micro service;
the function of the service policy enforcement component includes determining and enforcing a target service policy from the plurality of service policy information.
13. The apparatus of claim 12, wherein the service policy enforcement component comprises a control plane component and a data plane component;
the control plane component is used for sending target service strategy information determined from the multiple service strategy information to the data plane component according to a strategy information determination instruction;
and the function of the data plane component comprises executing a service strategy corresponding to the target service strategy information according to the received target service strategy information.
14. The apparatus of claim 12, wherein the plurality of functional components in an upstream and downstream relationship further comprises a distribution layer component;
the distribution layer component has the function of performing consistency adjustment on the stored contents under the condition that the stored contents of the storage system components in the service grid system components in different regions are inconsistent.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.
16. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.
CN202110236680.7A 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture Active CN112965847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110236680.7A CN112965847B (en) 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110236680.7A CN112965847B (en) 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture

Publications (2)

Publication Number Publication Date
CN112965847A true CN112965847A (en) 2021-06-15
CN112965847B CN112965847B (en) 2024-05-24

Family

ID=76276952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110236680.7A Active CN112965847B (en) 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture

Country Status (1)

Country Link
CN (1) CN112965847B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434218A (en) * 2021-07-06 2021-09-24 北京百度网讯科技有限公司 Micro-service configuration method, device, electronic equipment and medium
CN116192863A (en) * 2023-01-13 2023-05-30 中科驭数(北京)科技有限公司 Micro-service flow processing method, DPU service grid deployment method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741675A (en) * 2006-06-30 2010-06-16 三菱电机株式会社 Communication node and ring forming method and ring establishing method for communication system
CN103297396A (en) * 2012-02-28 2013-09-11 国际商业机器公司 Management failure transferring device and method in cluster system
CN103401944A (en) * 2013-08-14 2013-11-20 青岛大学 Service combination dynamic reconstruction system
CN107566508A (en) * 2017-09-19 2018-01-09 广东电网有限责任公司信息中心 A kind of short message micro services system for automating O&M
CN109446008A (en) * 2018-10-31 2019-03-08 Oppo广东移动通信有限公司 A kind of failure cause detection method, failure cause detection device and terminal device
CN111722988A (en) * 2020-06-11 2020-09-29 苏州浪潮智能科技有限公司 Fault switching method and device for data space nodes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741675A (en) * 2006-06-30 2010-06-16 三菱电机株式会社 Communication node and ring forming method and ring establishing method for communication system
CN103297396A (en) * 2012-02-28 2013-09-11 国际商业机器公司 Management failure transferring device and method in cluster system
CN103401944A (en) * 2013-08-14 2013-11-20 青岛大学 Service combination dynamic reconstruction system
CN107566508A (en) * 2017-09-19 2018-01-09 广东电网有限责任公司信息中心 A kind of short message micro services system for automating O&M
CN109446008A (en) * 2018-10-31 2019-03-08 Oppo广东移动通信有限公司 A kind of failure cause detection method, failure cause detection device and terminal device
CN111722988A (en) * 2020-06-11 2020-09-29 苏州浪潮智能科技有限公司 Fault switching method and device for data space nodes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵建涛;黄立松;: "微服务故障诊断相关技术研究探讨", 网络新媒体技术, no. 01 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434218A (en) * 2021-07-06 2021-09-24 北京百度网讯科技有限公司 Micro-service configuration method, device, electronic equipment and medium
CN113434218B (en) * 2021-07-06 2023-08-15 北京百度网讯科技有限公司 Micro-service configuration method, micro-service configuration device, electronic equipment and medium
CN116192863A (en) * 2023-01-13 2023-05-30 中科驭数(北京)科技有限公司 Micro-service flow processing method, DPU service grid deployment method and system
CN116192863B (en) * 2023-01-13 2023-11-28 中科驭数(北京)科技有限公司 Micro-service flow processing method, DPU service grid deployment method and system

Also Published As

Publication number Publication date
CN112965847B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
JP7450750B2 (en) Methods, apparatus, electronic devices, systems and storage media for configuring microservices
US20190235979A1 (en) Systems and methods for performing computing cluster node switchover
CN112965847B (en) Fault processing method, device, equipment and storage medium of micro-service architecture
CN112527567A (en) System disaster tolerance method, device, equipment and storage medium
CN113254205B (en) Load balancing system, method and device, electronic equipment and storage medium
CN114978936A (en) Method, system and storage medium for upgrading shared service platform
US20180278610A1 (en) Optimizing Data Replication Across Multiple Data Centers
US20220129601A1 (en) Techniques for generating a configuration for electrically isolating fault domains in a data center
JP2012231636A (en) Monitoring control system for electric power system
CN112905486A (en) Service integration test method, device and system
CN111443962A (en) Transaction limiting method and device
CN114051029B (en) Authorization method, authorization device, electronic equipment and storage medium
CN115550363A (en) Node hierarchical management method and device and electronic equipment
CN114205414A (en) Data processing method, device, electronic equipment and medium based on service grid
CN115665263A (en) Flow allocation method, device, server and storage medium
CN112559084B (en) Method, apparatus, device, storage medium and program product for administering services
CN114070716A (en) Application management system, application management method, and server
CN115408199A (en) Disaster tolerance processing method and device for edge computing node
CN112988800A (en) Data processing method and device based on distributed environment
CN114424170A (en) Operation management apparatus, system, method, and non-transitory computer-readable medium storing program
CN116938826B (en) Network speed limiting method, device and system and electronic equipment
CN115695288A (en) Login control method and device, electronic equipment and storage medium
CN115801357A (en) Global exception handling method, device, equipment and storage medium
CN117015787A (en) Communication server, method, user equipment and payment gateway
CN116112437A (en) Traffic scheduling method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant