CN112965847B - Fault processing method, device, equipment and storage medium of micro-service architecture - Google Patents

Fault processing method, device, equipment and storage medium of micro-service architecture Download PDF

Info

Publication number
CN112965847B
CN112965847B CN202110236680.7A CN202110236680A CN112965847B CN 112965847 B CN112965847 B CN 112965847B CN 202110236680 A CN202110236680 A CN 202110236680A CN 112965847 B CN112965847 B CN 112965847B
Authority
CN
China
Prior art keywords
component
service
determining
fault
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110236680.7A
Other languages
Chinese (zh)
Other versions
CN112965847A (en
Inventor
许超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110236680.7A priority Critical patent/CN112965847B/en
Publication of CN112965847A publication Critical patent/CN112965847A/en
Application granted granted Critical
Publication of CN112965847B publication Critical patent/CN112965847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Hardware Redundancy (AREA)

Abstract

The disclosure relates to a fault processing method, device, equipment and storage medium of a micro-service architecture, relates to the technical field of computers, and particularly relates to the fields of cloud computing, internet of things and the like. The specific implementation scheme is as follows: determining a fault component which causes the abnormality of the operation parameter under the condition that the abnormality of the operation parameter is detected; determining a target replacement component having the same function as the failed component; and disconnecting the downstream component of the fault component from the fault component, and establishing connection between the downstream component of the fault component and the target replacement component. In the face of abnormal operation parameters in the micro-service architecture, the rapid positioning of faults and connection conversion can be realized, and the functions of multiple activities in different places can be realized. The normal operation of the micro-service architecture is satisfied.

Description

Fault processing method, device, equipment and storage medium of micro-service architecture
Technical Field
The disclosure relates to the technical field of computers, in particular to the fields of cloud computing, internet of things and the like.
Background
As an architectural model, a micro-service architecture is used to implement the segmentation of a complex system or application into multiple micro-service programs, each of which may implement an independent business logic.
In order to reduce development cost of developers, the micro-service architecture abstracts and generalizes communication logic of each micro-service program by separating business logic and communication logic in each micro-service program to form a proxy program for each micro-service program.
The agent program is responsible for data communication with the micro-service program with which the agent is associated to implement the service governance function. A plurality of agents for agent data communication with the plurality of micro-services form a service grid (SERVICE MESH).
The related technology has the defect that service management cannot be performed in time when the service management function is executed.
Disclosure of Invention
The disclosure provides a fault processing method, device and equipment of a micro-service architecture and a storage medium.
According to an aspect of the present disclosure, there is provided a fault handling method of a micro service architecture, which may include the steps of:
Determining a fault component which causes the abnormality of the operation parameter under the condition that the abnormality of the operation parameter is detected;
determining a target replacement component having the same function as the failed component;
And disconnecting the downstream component of the fault component from the fault component, and establishing connection between the downstream component of the fault component and the target replacement component.
According to another aspect of the present disclosure, there is provided a fault handling apparatus of a micro-service architecture, which may specifically include the following components:
the fault discovery module is used for determining a fault component causing the abnormality of the operation parameter under the condition that the abnormality of the operation parameter is detected;
the target replacement component determining module is used for determining a target replacement component with the same function as the fault component;
and the transfer module is used for disconnecting the downstream component of the fault component from the fault component and establishing the connection between the downstream component of the fault component and the target replacement component.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, under the condition that the abnormal operation parameters in the micro-service architecture occur, the rapid positioning of faults and connection conversion can be realized, and the functions of multiple activities in different places can be realized. The normal operation of the micro-service architecture is satisfied.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a fault handling method according to the micro-service architecture of the present disclosure;
FIG. 2 is a flow chart for determining a target replacement component according to the present disclosure;
FIG. 3 is a flow chart for determining a target replacement component according to the present disclosure;
FIG. 4 is a schematic diagram of a micro-service architecture according to the present disclosure;
FIG. 5 is a schematic diagram of a micro-service architecture according to the present disclosure;
FIG. 6 is a schematic diagram of a fault handling apparatus according to the micro-service architecture of the present disclosure;
Fig. 7 is a block diagram of an electronic device for implementing a fault handling method for a micro-service architecture of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, the present disclosure provides a fault handling method of a micro-service architecture, which may include the steps of:
S101: determining a fault component which causes the abnormality of the operation parameter under the condition that the abnormality of the operation parameter is detected;
s102: determining a target replacement component having the same function as the failed component;
s103: and disconnecting the downstream component of the fault component from the fault component, and establishing connection between the downstream component of the fault component and the target replacement component.
The execution subject of the above embodiment may be a control end in a micro-service architecture. The micro-service architecture may include at least one service grid system component, each of which may be deployed at an internet data center (IDC, internet Data Center).
The internet data center may be located in different geographical areas (geographic locations). For example, it may be located in different cities or even different countries; or may be located in different areas of the same city, etc.
At least the control plane component and the data plane component with upstream and downstream communication logic may be included in the service grid system component.
The data plane component is used for taking over all the flow entering and exiting the micro service program, and realizing the service management capability of the core.
For example, the service governance capabilities of the core may include several categories of connectivity, security, control, and observation.
Under each category, it may be subdivided into different subcategories.
For example, the connection categories may include sub-categories of load balancing, fusing, fault injection, and retry.
Security categories may include subcategories of authentication, authorization, and encryption.
The control categories may include subcategories of access, rate, and quota.
The observation categories may include sub-categories such as dynamic acquisition, call chain tracking, and monitoring.
In one example, the data plane components may include a sidecar (Sidecar) component, which may be deployed in correspondence with the microservice program.
The control plane is used for pushing contents such as service policy information and the like to the data plane. Taking service management as an example for executing load balancing, the control plane may send random policy information, minimum connection policy information or polling policy information to the boundary car component according to a user instruction.
The control plane may include a Pilot (Pilot) component.
In addition, the service grid system component may further comprise a storage system component for storing service policy information for control plane-oriented pushing. When the control plane pushes service policy information to the data plane, corresponding service policy information needs to be acquired from the storage system component. Wherein each service grid system component may share a storage system component. Or each service grid system component may be provided with a storage system component in its own complement.
Under the condition that each service grid system component is respectively provided with a storage system component in a matched mode, the content stored by each storage system component can be guaranteed by consistency, and the storage contents are identical to each other.
The interaction of the control plane component and the data plane component may include the following two types. The control surface pushes service strategy information to the data surface; and secondly, the data plane component sends a request to the control plane to acquire service policy information. In the case where a large number of data plane components send a large number of requests to a control plane component, it is easy to form a large number of requests to impact the control plane component, which ultimately results in a failure of the control plane component. Once the control plane fails, the failed control plane component can no longer process the requests sent by the data plane component, resulting in failure propagation.
In addition, the interaction between the control plane component and the storage system component also comprises two types, and the control plane component sends a policy request to the storage system component and is issued by the storage system component. Similarly, when the number of requests is large or the amount of data to be issued is large, a failure is likely to occur. The foregoing merely exemplifies one failure forming reason of the micro service architecture, and the actual failure forming reason includes various types, which are not described herein in detail.
The monitoring of the operating parameters of the service grid system components may be performed using service discovery techniques to enable real-time discovery of faults. For example, in the event that a port of the control plane component is detected as being unavailable or the CPU load exceeds a threshold, an abnormality in the operating parameters may be determined. And analyzing the abnormal parameters to determine the fault components causing the abnormal operation parameters.
For example, storage system component failures, pilot component failures, and/or side car component failures may be determined. For example, in a case where it is determined that an operation parameter abnormality occurs in a control plane component in a service grid system component of the a zone, the control plane component in the service grid system component of the a zone may be determined as a failure component.
Second, service grid system components in other territories query for replacement components that have the same function as the failed component. In the embodiment of the application, the service grid system components arranged in different regions can be set to be disaster tolerant. That is, each service grid system component has consistency with each other so that when any one of the service grid system components fails, a replacement node can be quickly determined from the other service grid system components.
For example, in the case where the control plane component in the service grid system component of the a zone is determined to be a failure component, the control plane component in the service grid system component of the B zone may be taken as a substitute node. Based on this, the connection between the downstream component of the faulty component (the data plane component in the service grid system component of the a domain) and the faulty component (the control plane component in the service grid system component of the a domain) can be disconnected, and the connection between the downstream component of the faulty component can be switched to the control plane component in the service grid system component of the B domain.
For single failed components, fail-safe (FailSafe) fault-tolerant mechanisms may be implemented, for multiple failed components or cluster faults, fast fail-over (FailFast) fault-tolerant mechanisms may be implemented, as well as fail-over (failo ver) fault-tolerant mechanisms. Additionally, for failed components, a failure automatic recovery (FailBack) fault tolerance mechanism is included.
Through the scheme, under the condition that the abnormal operation parameters in the micro-service architecture occur, the rapid positioning and connection conversion of faults can be realized, and the functions of multiple activities in different places are realized. The normal operation of the micro-service architecture is satisfied.
As shown in fig. 2, in one embodiment, the replacement component for which the failed component is determined to have the same function in step S102 may further include the following sub-steps:
s201: determining a plurality of candidate replacement components having the same function as the failed component;
s202: the target replacement component is determined from the candidate replacement components according to a predetermined rule.
In the case where service grid system components provided at different geographical locations are set to be disaster tolerant to each other, storage system components in each service grid system component may be determined as candidate replacement components having the same function.
In addition, the control plane component in each service grid system component can be determined as a candidate replacement component with the same function; or a storage system component in each service grid system component may also be determined as a candidate replacement component having the same functionality.
The predetermined rule may be a spam rule. For example, there are A, B, C service grid system components for three zones. The candidate replacement components of the first through third orders may be determined (illustratively) sequentially in the order A, B, C based on factors such as the geographic location of each service grid system, the hardware condition of each service grid system, and the like. That is, in the event of a failure of a service grid system component in any of B, C zones, the service grid system component in zone a may be determined as a candidate replacement component. In the event of a failure of a serving grid system component of the a zone, a serving grid system component of the B zone may be determined as the target replacement component.
By the scheme, in the case that a plurality of candidate replacement components exist, the target replacement component can be directly determined in a mode of directly according to the preset rule. The time of mutual comparison may be omitted so that the target replacement component is selected in the most efficient manner.
As shown in fig. 3, in one embodiment, determining a target replacement component from candidate replacement components according to a predetermined rule includes:
s301: determining a physical distance of the failed component from each candidate replacement component;
S302: a target replacement component is determined from the plurality of candidate replacement components based on the physical distance.
Illustratively, A, B, C regions are Beijing, nanjing, guangzhou, respectively. In the case that the control plane component in the service grid system component of Beijing is determined to be faulty, the physical distance from Beijing to Nanj and the physical distance from Beijing to Guangzhou can be tested respectively.
By distance comparison, it can be obtained that the physical distance from south Beijing to Beijing is shorter than the physical distance from Guangzhou to Beijing. Based on this, the control plane component in the service grid system component of south Beijing may be determined as the target replacement component.
By the above scheme, when the component in any region fails, the target with a relatively close distance can be selected to replace the component according to the physical distance. Therefore, the time delay condition of service processing can be reduced to the maximum extent after the target replacement component is put into operation.
In one embodiment, the micro-service architecture includes service grid system components disposed in different regions;
the service grid system component includes a plurality of functional components having an upstream-downstream relationship, each functional component performing a different function.
In the current embodiment, the service grid system components may be located in different territories. For example, the different regions may be different cities or different countries. The service grid system component is used as a main body for acquiring and executing the service strategy, and a plurality of functional components with upstream and downstream relations can provide service for users in corresponding cities or countries nearby, so that the access efficiency of the users in corresponding geographic positions is improved.
In one embodiment, a plurality of functional components having an upstream-downstream relationship includes: a storage system component, a service policy enforcement component;
the functions of the storage system component include storing a plurality of service policy information for the micro-service;
the functions of the service policy enforcement component include determining and enforcing a target service policy from a variety of service policy information.
In one embodiment, the service policy enforcement component includes a control plane component and a data plane component;
The function of the control plane component comprises the steps of determining instructions according to strategy information, and sending target service strategy information determined from various service strategy information to the data plane component;
The function of the data plane component includes executing a service policy corresponding to the target service policy information based on the received target service policy information.
In the example shown in fig. 4, three regions are included, beijing, nanjing, and Guangzhou, respectively.
The service grid system component may include a storage system component and a service policy enforcement component having an upstream-downstream relationship. And the service policy enforcement component may be further comprised of a control plane component and a data plane component.
The solid line in fig. 4 represents a schematic of upstream and downstream communication of the service grid system components in the absence of a failure. The dashed line may be a schematic of the post-adjustment upstream-downstream relationship in the event of a failure. For example, in the event of a failure of a Beijing control plane component, the Beijing data plane component may be connected downstream of the Guangzhou or Nanj control plane components.
In addition, the system also comprises a service management platform, and the user can issue a strategy information determining instruction through the service management platform, for example, the system can comprise instructions corresponding to the actions of issuing, adjusting, updating and the like of the service strategy information.
By the scheme, under the condition that any node in the service grid system component fails, quick switching can be realized. To meet the demand of multiple activities in different places.
As shown in connection with fig. 5, in one embodiment, the plurality of functional components having an upstream-downstream relationship further includes a distribution layer component;
the function of the distribution layer component comprises that the distribution layer component is used for adjusting the consistency of the storage contents under the condition that the storage system components in the service grid system components of different regions are inconsistent in the storage contents.
The corresponding micro-service architecture of fig. 5 is further refined based on the micro-service architecture shown in fig. 4. Under the condition that the user changes the strategy corresponding to the service governance capability, the distribution layer component can distribute the changed strategy according to the service strategy information change condition of the user on the service governance platform. So as to meet the requirement of adjusting consistency of storage contents of storage system components in service grid system components in different regions.
Illustratively, the user performs policy upgrade on the service governance platform of the region a. The distribution layer component acquires the updated strategy and distributes the updated strategy to the storage system components of all regions so as to meet the requirement of realizing full-scale storage of the storage system components of all regions. Firstly, the consistency of the storage system components in each region can be ensured, and secondly, the storage system components in each region can be realized to be disaster tolerant.
As shown in fig. 6, the present disclosure relates to a fault handling apparatus of a micro-service architecture, which may specifically include the following components:
a fault discovery module 601, configured to determine a faulty component that causes an abnormality in an operation parameter when an abnormality in the operation parameter is detected;
a target replacement component determination module 602 for determining a target replacement component having the same function as the failed component;
And a transfer module 603, configured to disconnect the downstream component of the failed component from the failed component, and establish a connection between the downstream component of the failed component and the target replacement component.
In one embodiment, the target replacement component determination module 602 may specifically include:
a candidate replacement component determination submodule for determining a plurality of candidate replacement components having the same function as the failed component;
the target replacement component determination execution sub-module is configured to determine a target replacement component from a plurality of candidate replacement components according to a predetermined rule.
In one embodiment, the target replacement component determines an execution sub-module comprising:
a physical distance determining unit configured to determine a physical distance between the faulty component and each candidate replacement component;
A target replacement component is determined from the plurality of candidate replacement components based on the physical distance.
In one embodiment, the micro-service architecture includes service grid system components disposed in different regions;
The service grid system component comprises a storage system component, a control plane component and a data plane component which have an upstream-downstream relationship;
the storage system component is used for storing various service strategy information of the micro service;
The control plane component is used for sending target service strategy information determined from various service strategy information to the data plane component according to the strategy information determining instruction;
The data plane component is used for executing the service strategy corresponding to the target service strategy information according to the received target service strategy information.
In one embodiment, the components in the micro-service architecture further comprise a distribution layer component;
The distribution layer component is used for adjusting consistency of storage contents under the condition that the storage contents of the storage system components in the service grid system components in different regions are inconsistent.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 7 shows a schematic block diagram of an electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 710 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 720 or a computer program loaded from a storage unit 780 into a Random Access Memory (RAM) 730. In RAM 730, various programs and data required for the operation of device 700 may also be stored. The computing unit 710, ROM 720, and RAM 730 are connected to each other by a bus 740. An input output (I/O) interface 750 is also connected to bus 740.
Various components in electronic device 700 are connected to I/O interface 750, including: an input unit 760 such as a keyboard, a mouse, etc.; an output unit 770 such as various types of displays, speakers, etc.; a storage unit 780 such as a magnetic disk, an optical disk, or the like; and a communication unit 790 such as a network card, modem, wireless communication transceiver, etc. The communication unit 790 allows the electronic device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 710 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 710 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 710 performs the various methods and processes described above, such as fault handling methods. For example, in some embodiments, the fault handling method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 780. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 720 and/or the communication unit 790. When the computer program is loaded into RAM 730 and executed by computing unit 710, one or more steps of the fault handling method described above may be performed. Alternatively, in other embodiments, the computing unit 710 may be configured to perform the fault handling method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (13)

1. A fault processing method of a micro service architecture, wherein the micro service architecture comprises service grid system components arranged in different regions, comprises the following steps:
Under the condition that the abnormal operation parameters are detected, analyzing the abnormal operation parameters in the operation parameters, and determining a fault component which causes the abnormal operation parameters;
Determining a target replacement component having the same function as the failed component;
Disconnecting a downstream component of a failed component from the failed component, and establishing a connection between the downstream component of the failed component and the target replacement component;
Wherein said determining a target replacement component having the same function as said failed component comprises:
determining a plurality of candidate replacement components having the same function as the failed component;
determining the target replacement component from the plurality of candidate replacement components according to a predetermined rule;
wherein said determining said target replacement component from said candidate replacement components according to a predetermined rule comprises:
Determining the physical distance between the fault component and each candidate replacement component, wherein the physical distance is used for representing the distance between the region where the fault component is located and the region where the candidate replacement component is located, the fault component is a functional component with faults in the service grid system components, and the service grid system components in different regions are set to be disaster recovery;
the target replacement component is determined from the plurality of candidate replacement components based on the physical distance.
2. The method of claim 1, wherein the service grid system component comprises a plurality of functional components having an upstream-downstream relationship, each of the functional components correspondingly performing a different function.
3. The method of claim 2, wherein the plurality of functional components having an upstream-downstream relationship comprises: a storage system component, a service policy enforcement component;
The functions of the storage system component include storing various service policy information of the micro-service;
the functions of the service policy enforcement component include determining and enforcing a target service policy from the plurality of service policy information.
4. The method of claim 3, wherein the service policy enforcement component comprises a control plane component and a data plane component;
the function of the control plane component comprises the steps of determining instructions according to strategy information, and sending target service strategy information determined from the plurality of service strategy information to the data plane component;
The function of the data plane component comprises executing a service policy corresponding to the target service policy information according to the received target service policy information.
5. The method of claim 3, wherein the plurality of functional components having an upstream-downstream relationship further comprises a distribution layer component;
the function of the distribution layer component comprises the step of adjusting consistency of storage contents under the condition that the storage contents of the storage system components in the service grid system components in different regions are inconsistent.
6. A fault handling apparatus for a micro-service architecture, the micro-service architecture including service grid system components disposed in different regions, comprising:
The fault discovery module is used for analyzing abnormal parameters in the operation parameters under the condition that the abnormal operation parameters are detected, and determining a fault component which causes the abnormal operation parameters;
a target replacement component determining module for determining a target replacement component having the same function as the failed component;
The transfer module is used for disconnecting the downstream component of the fault component from the fault component and establishing the connection between the downstream component of the fault component and the target replacement component;
Wherein the target replacement component determination module comprises:
A candidate replacement component determination submodule for determining a plurality of candidate replacement components having the same function as the failed component;
A target replacement component determination execution sub-module for determining the target replacement component from the plurality of candidate replacement components according to a predetermined rule;
wherein the target replacement component determines an execution sub-module comprising:
a physical distance determining unit, configured to determine a physical distance between the faulty component and each candidate replacement component, where the physical distance is used to represent a distance between a region where the faulty component is located and a region where the candidate replacement component is located, where the faulty component is a functional component in which a fault occurs in the service grid system components, and the service grid system components in different regions are set to be disaster recovery;
And determining a target replacement component from the plurality of candidate replacement components according to the physical distance.
7. The apparatus of claim 6, wherein the service grid system component comprises a plurality of functional components having an upstream-downstream relationship, each of the functional components correspondingly performing a different function.
8. The apparatus of claim 7, wherein the plurality of functional components having an upstream-downstream relationship comprise: a storage system component, a service policy enforcement component;
The functions of the storage system component include storing various service policy information of the micro-service;
the functions of the service policy enforcement component include determining and enforcing a target service policy from the plurality of service policy information.
9. The apparatus of claim 8, wherein the service policy enforcement component comprises a control plane component and a data plane component;
the function of the control plane component comprises the steps of determining instructions according to strategy information, and sending target service strategy information determined from the plurality of service strategy information to the data plane component;
The function of the data plane component comprises executing a service policy corresponding to the target service policy information according to the received target service policy information.
10. The apparatus of claim 9, wherein the plurality of functional components having an upstream-downstream relationship further comprises a distribution layer component;
the function of the distribution layer component comprises the step of adjusting consistency of storage contents under the condition that the storage contents of the storage system components in the service grid system components in different regions are inconsistent.
11. An electronic device, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5.
12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 5.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 5.
CN202110236680.7A 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture Active CN112965847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110236680.7A CN112965847B (en) 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110236680.7A CN112965847B (en) 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture

Publications (2)

Publication Number Publication Date
CN112965847A CN112965847A (en) 2021-06-15
CN112965847B true CN112965847B (en) 2024-05-24

Family

ID=76276952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110236680.7A Active CN112965847B (en) 2021-03-03 2021-03-03 Fault processing method, device, equipment and storage medium of micro-service architecture

Country Status (1)

Country Link
CN (1) CN112965847B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434218B (en) * 2021-07-06 2023-08-15 北京百度网讯科技有限公司 Micro-service configuration method, micro-service configuration device, electronic equipment and medium
CN116192863B (en) * 2023-01-13 2023-11-28 中科驭数(北京)科技有限公司 Micro-service flow processing method, DPU service grid deployment method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741675A (en) * 2006-06-30 2010-06-16 三菱电机株式会社 Communication node and ring forming method and ring establishing method for communication system
CN103297396A (en) * 2012-02-28 2013-09-11 国际商业机器公司 Management failure transferring device and method in cluster system
CN103401944A (en) * 2013-08-14 2013-11-20 青岛大学 Service combination dynamic reconstruction system
CN107566508A (en) * 2017-09-19 2018-01-09 广东电网有限责任公司信息中心 A kind of short message micro services system for automating O&M
CN109446008A (en) * 2018-10-31 2019-03-08 Oppo广东移动通信有限公司 A kind of failure cause detection method, failure cause detection device and terminal device
CN111722988A (en) * 2020-06-11 2020-09-29 苏州浪潮智能科技有限公司 Fault switching method and device for data space nodes

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741675A (en) * 2006-06-30 2010-06-16 三菱电机株式会社 Communication node and ring forming method and ring establishing method for communication system
CN103297396A (en) * 2012-02-28 2013-09-11 国际商业机器公司 Management failure transferring device and method in cluster system
CN103401944A (en) * 2013-08-14 2013-11-20 青岛大学 Service combination dynamic reconstruction system
CN107566508A (en) * 2017-09-19 2018-01-09 广东电网有限责任公司信息中心 A kind of short message micro services system for automating O&M
CN109446008A (en) * 2018-10-31 2019-03-08 Oppo广东移动通信有限公司 A kind of failure cause detection method, failure cause detection device and terminal device
CN111722988A (en) * 2020-06-11 2020-09-29 苏州浪潮智能科技有限公司 Fault switching method and device for data space nodes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
微服务故障诊断相关技术研究探讨;赵建涛;黄立松;;网络新媒体技术(第01期);全文 *

Also Published As

Publication number Publication date
CN112965847A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN112965847B (en) Fault processing method, device, equipment and storage medium of micro-service architecture
JP7450750B2 (en) Methods, apparatus, electronic devices, systems and storage media for configuring microservices
WO2016206456A1 (en) Physical machine upgrading method, service migration method and apparatus
CN103188098B (en) A kind of disaster tolerance switching method, system and device
CN109873714B (en) Cloud computing node configuration updating method and terminal equipment
CN112527567A (en) System disaster tolerance method, device, equipment and storage medium
CN111683139A (en) Method and apparatus for balancing load
JP5647561B2 (en) Power system supervisory control system
CN111510480A (en) Request sending method and device and first server
KR20230091168A (en) Techniques for creating configurations to electrically isolate fault domains in data centers
CN117687778A (en) Redis cluster switching method and device, electronic equipment and storage medium
CN117076196A (en) Database disaster recovery management and control method and device
CN114070889B (en) Configuration method, traffic forwarding device, storage medium, and program product
CN114143196B (en) Instance configuration updating method, device, equipment, storage medium and program product
CN114051029B (en) Authorization method, authorization device, electronic equipment and storage medium
CN112559084B (en) Method, apparatus, device, storage medium and program product for administering services
CN115454872A (en) Database test method, device, equipment and storage medium
CN104426704A (en) Integration network device and service integration method thereof
CN114205414A (en) Data processing method, device, electronic equipment and medium based on service grid
CN112596922B (en) Communication management method, device, equipment and medium
CN116938826B (en) Network speed limiting method, device and system and electronic equipment
CN118733331A (en) Self-pointing disaster recovery management method and device
CN115695288A (en) Login control method and device, electronic equipment and storage medium
CN115801357A (en) Global exception handling method, device, equipment and storage medium
CN116800665A (en) Edge node management method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant