WO2023116276A1 - 故障处理方法、装置、电子设备及存储介质 - Google Patents

故障处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023116276A1
WO2023116276A1 PCT/CN2022/132370 CN2022132370W WO2023116276A1 WO 2023116276 A1 WO2023116276 A1 WO 2023116276A1 CN 2022132370 W CN2022132370 W CN 2022132370W WO 2023116276 A1 WO2023116276 A1 WO 2023116276A1
Authority
WO
WIPO (PCT)
Prior art keywords
business
business service
fault
event
service nodes
Prior art date
Application number
PCT/CN2022/132370
Other languages
English (en)
French (fr)
Inventor
常诚
刘建华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023116276A1 publication Critical patent/WO2023116276A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test buses, lines or interfaces, e.g. stuck-at or open line faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing

Definitions

  • the embodiments of the present application relate to the technical field of mobile communications, and in particular to a fault handling method, device, electronic equipment, and storage medium.
  • each professional network of the same manufacturer commercializes its own capabilities, resulting in a situation of professional single-domain autonomy, and there are barriers between manufacturers, resulting in a situation of single-domain autonomy of manufacturers, and fault location is through inter-domain If it is carried out manually one by one, the integration of business processes will be limited by the barriers between manufacturers and the cooperation between various professional fields.
  • problems such as high development difficulty, long cycle, and low coordination efficiency, resulting in failure to effectively handle faults.
  • the main purpose of the embodiments of the present application is to propose a fault handling method, device, electronic equipment, and storage medium to realize the decoupling and independent design of business services of each manufacturer/professional network, to locate unknown fault events, and efficiently Handle failure events.
  • an embodiment of the present application provides a fault handling method, including: acquiring business services provided by each business service node; arranging business service nodes corresponding to each fault event according to the business services provided by each business service node ; Wherein, the fault event is corresponding to a plurality of business service nodes; in the case of detecting a fault event, scheduling the multiple business service nodes corresponding to the detected fault event to perform the fault event respectively Detecting: obtaining the detection results of the fault events by the multiple business service nodes respectively; determining the cause of the fault according to the detection results of the fault events by the multiple business service nodes respectively.
  • an embodiment of the present application further provides a fault processing device, including: a first acquisition module, configured to acquire business services provided by each business service node; an orchestration module, configured to provide The business service arranges the business service node corresponding to each fault event; wherein, the fault event corresponds to a plurality of business service nodes; the scheduling module is used to schedule the detected fault event when the fault event is detected The plurality of business service nodes corresponding to the event respectively detect the fault event; the second acquisition module is used to obtain the detection results of the fault event respectively by the plurality of business service nodes; the determination module is used to detect the fault event according to The plurality of business service nodes respectively determine the cause of the failure according to the detection results of the failure event.
  • an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the Instructions executed by at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the above fault handling method.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and implementing the above fault handling method when the computer program is executed by a processor.
  • the fault handling method proposed in the embodiment of the present application obtains the business services provided by each business service node, and arranges the business service nodes corresponding to each fault event according to the business services provided by each business service node, wherein the fault event is related to multiple business services corresponding to the node, in the case of detecting a fault event, dispatch multiple business service nodes corresponding to the detected fault event, detect the fault event, and obtain the detection results of multiple business service nodes for the fault event respectively, and then The cause of the failure is determined according to the detection results of the failure events by multiple business service nodes.
  • Fig. 1 is the schematic diagram of the working mechanism of a kind of EDA provided according to an embodiment of the present application
  • FIG. 2 is an overall architecture diagram of an orchestration/execution engine provided according to an embodiment of the present application
  • Fig. 3 is a transaction flowchart of a fault handling method provided according to an embodiment of the present application.
  • FIG. 4 is an interaction diagram for implementing step 301 according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a processing service flow provided according to an embodiment of the present application.
  • FIG. 6 is an interaction diagram for implementing step 302 according to an embodiment of the present application.
  • FIG. 7 is an interaction diagram for implementing steps 303 and 304 according to an embodiment of the present application.
  • FIG. 8 is a structural diagram of an RPC model provided according to an embodiment of the present application.
  • Fig. 9 is a panoramic view of a component provided according to an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a practical application provided according to an embodiment of the present application.
  • Fig. 11 is a schematic diagram of a fault handling device provided according to an embodiment of the present application.
  • Fig. 12 is a structural diagram of an electronic device provided according to an embodiment of the present application.
  • each manufacturer/professional network splits the large and complex business system into many highly cohesive business services, and each business service is responsible for relatively independent logic.
  • the original business processing logic may not change, but to achieve business value, it is not the ability of a single service, but through the relationship between business services Collaborate to achieve complete end-to-end business processes.
  • the business process will be pre-defined, that is, the collaborative relationship between services will be determined in advance according to the business process in the design phase, and the dependencies and calls of the services will be preset in the domain layer or application layer in the running phase. In order, after the service is deployed, it can run according to the pre-defined business process.
  • This way of presetting the relationship between services is easy to implement and low in cost when the business process and scenarios are determined.
  • there are often many uncertainties in actual business processes and scenarios which generally change according to user needs and troubleshooting.
  • the method of pre-setting business processes in advance cannot respond flexibly and quickly to changes in business processes and scenarios, which brings additional costs, including re-development testing, on-site redeployment verification, etc.; become difficult to maintain.
  • the embodiment of the present application arranges the business service nodes corresponding to each fault event, that is, pre-defines the cooperative relationship between the business services of each manufacturer/professional network according to the fault event, ensuring the separation of business services of each manufacturer/professional network Decoupling, business services only implement their own atomic processing logic, and the collaboration relationship between services is maintained through a special service collaboration relationship program, that is, the fault events corresponding to each fault event arranged according to the business service provided by each business service node in the embodiment of this application Business service node.
  • the fault handling method of the embodiment of the present application provides a low-code capability opening method, which can directly allow users to combine and superimpose application scenarios conveniently and quickly, so as to achieve the goal of efficient intelligent operation and
  • An embodiment of the present application relates to a fault handling method, which is applied to an event-driven architecture (Event Driven Architecture, EDA), wherein, the schematic diagram of the working mechanism of EDA is shown in Figure 1, including: a business arrangement/execution engine to implement The fault handling method of the embodiment of this application.
  • Event Driven Architecture Event Driven Architecture
  • the overall architecture diagram of the orchestration/execution engine is shown in Figure 2, including: Application Programming Interface (Application Programming Interface, API), service module, and storage module.
  • Application Programming Interface Application Programming Interface
  • API Application Programming Interface
  • service module service module
  • storage module storage module
  • the API includes: managing task queues, metadata, and executing task queues.
  • the management task queue is used to start and manage tasks; metadata, such as performance data, alarm data, etc., is used to define tasks, that is, business processes; the execution task queue is used to obtain tasks and execute them.
  • Service module including: business flow service, task service, state machine service, and queue service.
  • the state machine service matches the task with the current task state, schedules the task according to the recognized task state, or updates the task state.
  • state machine service and queue service are also used to manage and schedule tasks.
  • the storage module is used to store persistent data.
  • Fault location and full-process business processing in the professional communication field such as core network side and wireless side; self-intelligent communication network cross-professional fault location and full-process end-to-end business processing under the same manufacturer, for example, Wireless side-core network side-bearer side; fault location and full-process end-to-end business processing under different manufacturers in the intelligent communication network across professional fields, for example, wireless side-core network side-bearer side; serving enterprises Low-code opening of business processing capabilities for user (To Business, ToB) scenarios.
  • Step 301 acquire business services provided by each business service node.
  • the registration information of each business service node is received, and the registration information includes the name of the business service or the unique identifier of the business service.
  • business services include one or any combination of the following: components, data, services, algorithms, and scripts in the planning, construction, operation and maintenance, and optimization processes in the communication field.
  • the registration information also includes: service startup interface, event listening interface and sending event definition interface, then after obtaining the business services provided by each business service node, the service starting interface, event listening interface and sending event definition will be uniformly encapsulated interface. Add the service startup interface, event listening interface, and sending event definition interface to the registration information, so that the orchestration/execution engine encapsulates the three interfaces for the business service node to call. Therefore, the business service node only needs to focus on its own business logic implementation.
  • the registration information needs to comply with the specification of the orchestration engine.
  • the service startup interface, event monitoring interface, and sending event definition interface refer to the practice of service plug-ins.
  • the public plug-in installation package supports parsing and sending. Each business service node only needs to be packaged according to the file and directory specification definitions.
  • step 301 can also be implemented in the manner shown in Figure 4, specifically including:
  • the business service node initiates a registration request to the message middleware.
  • the message middleware monitors the registration request, and updates and stores the registration information in the service perspective, so that the service perspective presents the registration information.
  • Step 302 according to the business service provided by each business service node, arrange the business service nodes corresponding to each fault event; wherein, the fault event corresponds to multiple business service nodes.
  • each business service node displayed on the human-computer interaction interface (User Interface, UI), receive the operation information of each business service selected by the user for each fault event, and The operation information is used to generate the processing business flow of each fault event, wherein the processing business flow includes the business service node where the business service selected by the user is located, and the scheduling sequence of each business service node.
  • the scheduling sequence includes: concurrent scheduling and/or upstream and downstream scheduling.
  • the business process is started by obtaining the processing business flow of each failure event.
  • FIG. 5 the schematic diagram of the processing business flow of each fault event generated by the business arrangement/execution engine according to the operation information of each business service selected by the user is shown in Figure 5, including: business service node A, business service node B, Business service node C, business service An, business service Bn, and business service Cn.
  • step 302 can also be implemented in the manner shown in Figure 6, specifically including:
  • the orchestration transformation sends the format-converted operation information to the service orchestration, so as to generate a processing service flow.
  • Step 303 when a fault event is detected, schedule multiple business service nodes corresponding to the detected fault event to detect the fault event respectively.
  • multiple business service nodes corresponding to the detected fault events are started through the uniformly encapsulated service startup interface, and the detected fault events are notified to corresponding multiple business service nodes through the uniformly encapsulated sending event definition interface , triggering multiple business service nodes to detect fault events respectively.
  • multiple service service nodes that process the service flow are scheduled to detect the fault event respectively.
  • the concurrently scheduled business service nodes detect the fault event respectively;
  • the processing business flow of the detected fault event includes upstream and downstream scheduling
  • the business service node prioritizes scheduling the upstream business service node, and decides to end the processing of the business flow or schedule the downstream business service node according to the execution result of the upstream business service node.
  • multiple business service nodes that schedule and process business flows detect fault events separately to more accurately locate faults.
  • step 304 the detection results of the failure events performed by multiple business service nodes are obtained.
  • the detection results of the fault events respectively performed by the multiple business service nodes are obtained.
  • multiple business service nodes can obtain the detection results of fault events respectively to confirm the execution status of business services, and then decide whether to continue to schedule downstream services or end the current business flow.
  • step 303 and step 304 can also be implemented in the manner shown in Figure 7, specifically including:
  • the orchestration/execution engine starts a business flow, specifically: delivering operation information of business services, for example, business service A and business service B.
  • the message middleware monitors the service flow and starts the service flow. Specifically: monitor business service A, start business service A on business service node A, then monitor business service B, and start business service B on business service node B.
  • the business service node A and the business service node B respectively return the execution results of the business service A and the business service B to the message middleware.
  • the orchestration/execution engine delivers the operation information of the business service C.
  • the message middleware monitors the business service C, and starts the business service C on the business service node C.
  • the business service node C returns the execution result of the business service C to the message middleware.
  • the message middleware returns the execution result of the business flow to the orchestration/execution engine.
  • Step 305 Determine the cause of the failure according to the detection results of the failure event by multiple business service nodes.
  • the detection result of the fault event is sent to the composite service node, which triggers the composite service node to perform comprehensive evaluation based on the detection result of the fault event, obtains the comprehensive evaluation result of the fault event by the composite service node, and obtains the cause of the fault.
  • the combined service node comprehensively evaluates the detection results of the fault events according to the rules of the expert database or the analysis results of machine learning, so as to obtain the comprehensive evaluation results of the fault events and determine the cause of the fault.
  • business services are implemented by business service nodes and communicate through APIs. Specifically, it can be realized in the following two ways: one is the Representational State Transfer (REST) interface called by the process engine, and the other is by periodically checking the status of the suspended business service.
  • REST Representational State Transfer
  • Hypertext Transfer Protocol Hypertext Transfer Protocol
  • HTTP Hypertext Transfer Protocol
  • this embodiment is implemented based on a remote procedure call (Remote Procedure Call, RPC) communication model, wherein business services run on different servers, and communicate with the server through HTTP, and adopt a polling model to manage business services queue.
  • RPC Remote Procedure Call
  • the structure of the RPC model is shown in Figure 8, including: business service node 1, business service node 2, business service node 3, APIHTTP, business service queue, management and execution service module, orchestration/execution engine, database and index.
  • the component panorama of this embodiment is shown in FIG. 9 , including: process, task, history, monitoring, client, communication and background.
  • the process includes a process engine, and the process engine uses Domain Specified Language (DSL) to write the process definition file.
  • DSL Domain Specified Language
  • the process definition file is a file in a data exchange format, which supports handwritten definitions, and also supports drag-and-drop generation through the interface .
  • Tasks including task creation, task deletion, task cancellation, task list, and parallel computing tasks.
  • History including functions such as historical tasks, historical activities, and query processes.
  • Monitoring including the monitoring engine and task scheduling function, wherein the monitoring engine is used to make corresponding decisions on the business service running status of each business service node when the business flow process is lengthy; task scheduling is used for distributed timing scheduling .
  • Client and communication including business service queues, business service requests, business service nodes and cross-language functions.
  • cross-language is supported in the process of business service implementation, for example, languages such as JAVA, Python, and go.
  • asset services include: layout component sets, scene template sets, industry model sets, data sets, business services, and scripts/software development kits/APIs.
  • Domain business services include: planning application services, construction application services, operation and maintenance application services, and optimization application services.
  • the design domain includes: interface layout, script function enhancement, business logic layout, data layout, business intelligence (Business Intelligence, BI) analysis, AI algorithm layout, and data access and governance.
  • BI Business Intelligence
  • the execution domain includes: execution engine, orchestration engine, and unified data platform.
  • the business service nodes corresponding to each fault event are arranged, wherein the fault event corresponds to multiple business service nodes.
  • dispatch multiple business service nodes corresponding to the detected fault event to detect the fault event, and obtain the detection results of the fault event from multiple business service nodes respectively, and then according to multiple business service nodes
  • the nodes determine the cause of the failure based on the detection results of the failure events.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • FIG. 11 is The schematic diagram of the fault processing device in this embodiment includes: a first acquisition module 1101 , an arrangement module 1102 , a scheduling module 1103 , a second acquisition module 1104 and a determination module 1105 .
  • the first acquiring module 1101 is configured to acquire business services provided by each business service node.
  • the first obtaining module 1101 is further configured to receive registration information of each business service node; wherein, the registration information includes the name of the business service.
  • the first obtaining module 1101 is also used to uniformly encapsulate the Service startup interface, event listening interface and sending event definition interface.
  • the orchestration module 1102 is configured to arrange business service nodes corresponding to each fault event according to the business services provided by each business service node; wherein, a fault event corresponds to multiple business service nodes.
  • the orchestration module 1102 is also used to display the business services provided by each business service node on the human-computer interaction interface; receive the operation information of each business service selected by the user for each fault event; The operation information of each fault event is generated to generate the processing business flow of each fault event, wherein the processing business flow includes the business service node where the business service selected by the user is located, and the scheduling sequence of each business service node.
  • the scheduling module 1103 is configured to schedule a plurality of business service nodes corresponding to the detected fault event to respectively detect the fault event when a fault event is detected.
  • the scheduling module 1103 is further configured to start multiple business service nodes corresponding to detected fault events through the uniformly encapsulated service start interface; through the uniformly encapsulated sending event definition interface, the detected fault The event is notified to corresponding multiple business service nodes, and the multiple business service nodes are triggered to detect the fault event respectively.
  • the scheduling module 1103 is further configured to schedule multiple business service nodes for processing the service flow to respectively detect the failure event according to the processing service flow of the detected failure event.
  • the scheduling module 1103 is further configured to: in the case that the degree sequence includes: concurrent scheduling and/or upstream and downstream scheduling, when the processing service flow of the detected fault event includes concurrently scheduled business service nodes, concurrently schedule each The business service nodes detect the fault events separately; when the processing business flow of the detected fault event includes the business service nodes scheduled upstream and downstream, the upstream business service nodes are dispatched first, and the decision is made to end according to the execution results of the upstream business service nodes Process business flow or schedule downstream business service nodes.
  • the second obtaining module 1104 is configured to obtain the detection results of the fault events by multiple business service nodes respectively.
  • the second obtaining module 1104 is further configured to obtain detection results of fault events by multiple business service nodes respectively by monitoring channels respectively corresponding to multiple business service nodes.
  • the second obtaining module 1104 is further configured to obtain the detection results of the fault events respectively by the plurality of business service nodes through the uniformly packaged event monitoring interface.
  • the determination module 1105 is configured to determine the cause of the failure according to the detection results of the failure events by the plurality of business service nodes.
  • the determining module 1105 is further configured to send the detection result of the fault event to the composite service node, triggering the composite service node to perform comprehensive evaluation based on the detection result of the fault event; to obtain the comprehensive evaluation result of the fault event by the composite service node, Get the cause of the failure.
  • this embodiment is an apparatus embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment.
  • the relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • FIG. 12 Another embodiment of the present application relates to an electronic device, as shown in FIG. 12 , including: at least one processor 1201; and a memory 1202 communicatively connected to the at least one processor 1201; wherein, the memory 1202 stores Instructions that can be executed by the at least one processor 1201, the instructions are executed by the at least one processor 1201, so that the at least one processor 1201 can execute the fault handling methods in the foregoing embodiments.
  • the memory and the processor are connected by a bus
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
  • Another embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请实施例涉及移动通讯技术领域,公开了一种故障处理方法、装置、电子设备及存储介质。上述故障处理方法包括:获取各业务服务节点提供的业务服务;根据所述各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点;其中,所述故障事件与多个业务服务节点相对应;在检测到故障事件的情况下,调度与所述检测到的故障事件对应的所述多个业务服务节点分别对所述故障事件进行检测;获取所述多个业务服务节点分别对所述故障事件进行检测的结果;根据所述多个业务服务节点分别对所述故障事件进行检测的结果,确定故障原因。

Description

故障处理方法、装置、电子设备及存储介质
相关申请
本申请要求于2021年12月21日申请的、申请号为202111574524.8的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及移动通讯技术领域,特别涉及一种故障处理方法、装置、电子设备及存储介质。
背景技术
在数字化经济高速发展的背景下,网络运维的自动化、数字化、智能化已成为通信行业的共识。近些年,各大运营商和标准组织等先后推出了自智网络白皮书,即自智网络水平分为L0-L5六个等级。例如,“单域自治、跨域协同”的三层框架与四个闭环的自智网络参考框架。通过将运维任务进行拆分,各厂家各专业提供运维能力,运营商可以根据运维流程进行流程组装,实现运维/运营的自动化,并强化运维知识,推进运维能力的自主沉淀,激发运维转型活力和核心能力的掌控力。
然而,目前的框架中同厂家的各专业网将自己的能力产品化,造成专业单域自治的局面,以及各厂家之间存在壁垒,造成厂家单域自治的局面,而故障定位是通过领域间一段段的人工方式进行,则业务流程的拉通就会受限于各厂家之间的壁垒,以及各专业领域之间的配合。在面对故障变化比较复杂的业务场景或者未知故障时,例如自智网络中的故障事件,存在开发难度大、周期长、以及协同的效率低的问题,导致无法有效地实现故障的处理工作。
发明内容
本申请实施例的主要目的在于提出一种故障处理方法、装置、电子设备及存储介质,实现各厂家/专业网的业务服务的解耦和独立设计,可以进行未知故障事件的定位,并高效地处理故障事件。
为至少实现上述目的,本申请实施例提供了一种故障处理方法,包括:获取各业务服务节点提供的业务服务;根据所述各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点;其中,所述故障事件与多个业务服务节点相对应;在检测到故障事件的情况下,调度与所述检测到的故障事件对应的所述多个业务服务节点分别对所述故障事件进行检测;获取所述多个业务服务节点分别对所述故障事件进行检测的结果;根据所述多个业务服务节点分别对所述故障事件进行检测的结果,确定故障原因。
为至少实现上述目的,本申请实施例还提供一种故障处理装置,包括:第一获取模块,用于获取各业务服务节点提供的业务服务;编排模块,用于根据所述各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点;其中,所述故障事件与多个业务服务节点相对应;调度模块,用于在检测到故障事件的情况下,调度与所述检测到的故障事件对应的所述多个业务服务节点分别对所述故障事件进行检测;第二获取模块,用于获取所述多个业务 服务节点分别对所述故障事件的检测结果;确定模块,用于根据所述多个业务服务节点分别对所述故障事件的检测结果,确定故障原因。
为至少实现上述目的,本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的故障处理方法。
为至少实现上述目的,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的故障处理方法。
本申请实施例提出的故障处理方法,通过获取各业务服务节点提供的业务服务,并根据各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点,其中,故障事件与多个业务服务节点相对应,在检测到故障事件的情况下,调度与检测到的故障事件对应的多个业务服务节点,对故障事件进行检测,并获取多个业务服务节点分别对故障事件的检测结果,然后根据多个业务服务节点分别对故障事件的检测结果,确定故障原因。通过编排各故障事件对应的业务服务节点,即根据故障事件预先定义各厂家/专业网的业务服务之间的协作关系,保证了各厂家/专业网的业务服务的解耦和独立设计,使得业务流程不易中断和重组,在实际的自智网络故障处理场景中可以依据实际需要的业务服务进行灵活自由的组合,从而进行未知故障事件的定位,并高效地处理故障事件。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标识的元件表示为类似的元件,除非有特别的申明,附图中的图不构成比例限制。
图1是根据本申请一个实施例提供的一种EDA的工作机制的示意图;
图2是根据本申请一个实施例提供的一种编排/执行引擎的整体架构图;
图3是根据本申请一个实施例提供的一种故障处理方法事务流程图;
图4是根据本申请一个实施例提供的一种实现步骤301的交互图;
图5是根据本申请一个实施例提供的一种的处理业务流的示意图;
图6是根据本申请一个实施例提供的一种实现步骤302的交互图;
图7是根据本申请一个实施例提供的一种实现步骤303和步骤304的交互图;
图8是根据本申请一个实施例提供的一种RPC模型的结构图;
图9是根据本申请一个实施例提供的一种组件全景图;
图10是根据本申请一个实施例提供的一种实际应用示意图;
图11是根据本申请一个实施例提供的一种故障处理装置的示意图;
图12是根据本申请一个实施例提供的一种电子设备的结构图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下 各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
为了便于理解,现对本申请实施例的相关技术进行以下说明:
各厂家/专业网业务服务架构应用的系统把大而复杂的业务系统拆分成许多高内聚的业务服务,每个业务服务负责相对独立的逻辑。当一个系统被拆分成了很多新的业务服务后,原有的业务处理逻辑可能并没有发生变化,但是要实现业务价值,不是看单个服务的能力,而是要通过各业务服务之间的协作来实现完整的端到端业务流程。
通常情况下,业务流程会预先定义好,即在设计阶段会根据业务流程提前确定好各服务之间的协作关系,在运行阶段通过在领域层或应用层提前预置好服务的依赖关系和调用顺序,服务部署后,按照预先定义的业务流程运行即可。这种将服务之间关系预置的方式,在业务流程和场景确定的情况下,容易实现,成本低。然而,实际的业务流程和场景往往存在很多的不确定性,一般都会根据用户需要和故障处理的不断变化。那么,提前预置业务流程的方式就无法灵活快速应对业务流程和场景的变化,从而带来了额外的成本,包括重新开发测试,现场重新部署验证等;随着故障处理场景的变化的不断复杂化,会变得难以维护。
同时专业领域的业务流程和场景的不断变化,业务处理能力实际下沉到各厂家/专业网领域,为了使得跨域/跨厂家的业务流程更灵活的组合和协作,打破一般业务流的固定流程组合方式,本申请实施例通过编排各故障事件对应的业务服务节点,即根据故障事件预先定义各厂家/专业网的业务服务之间的协作关系,保证了各厂家/专业网的业务服务的分离解耦,业务服务只实现自己原子处理逻辑,服务之间的协作关系通过专门的服务协作关系程序来维护,即本申请实施例中根据各业务服务节点提供的业务服务编排的各故障事件对应的业务服务节点。同时本申请实施例的故障处理方法提供了一种低代码的能力开放方式,可以直接让用户方便快捷的进行应用场景的组合与叠加,从而达到高效的智能运维/智慧运营的目标。
本申请的一个实施例涉及一种故障处理方法,应用于事件驱动架构(Event Driven Architecture,EDA),其中,EDA的工作机制的示意图如图1所示,包括:业务编排/执行引擎,以实现本申请实施例的故障处理方法。
在一个例子中,编排/执行引擎的整体架构图如图2所示,包括:应用程序编程接口(Application Programming Interface,API),服务模块,存储模块。
具体而言,API包括:管理任务队列,元数据,以及执行任务队列。
其中,管理任务队列用于启动并管理任务;元数据,例如,性能数据,告警数据等,用于定义任务,即业务流程;执行任务队列,用于获取任务并执行。
服务模块,包括:业务流服务,任务服务,状态机服务,以及队列服务。
具体地,当业务流事件发生时,例如,任务失败或任务完成,状态机服务匹配任务与当前的任务状态,根据识别到的任务状态调度任务,或者更新任务状态。
其中,状态机服务与队列服务还用于管理和调度任务。
存储模块,用于存储持久化数据。
本申请实施例的应用场景包括但不限于以下场景:
专业通信领域下的故障定位和全流程的业务处理,例如,核心网侧,无线侧;自智通讯网络跨专业领域下的同厂家下的故障定位和全流程得端到端业务处理,例如,无线侧-核心网 侧-承载侧;自智通讯网络跨专业领域下的异厂家下的故障定位和全流程得端到端业务处理,例如,无线侧-核心网侧-承载侧;服务于企业用户(To Business,ToB)场景的业务处理能力的低代码开放。
下面对本实施例的故障处理方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。本实施例的故障处理方法的实现流程如图3所示,具体包括:
步骤301,获取各业务服务节点提供的业务服务。
具体而言,接收各业务服务节点的注册信息,注册信息包括业务服务的名称,或业务服务的唯一标识。其中,业务服务包括以下之一或任意组合:通讯领域的规划、建设、运维和优化过程中的组件、数据、业务、算法以及脚本。通过获取各业务服务节点提供的业务服务的注册信息,以使得业务服务能够在编排/执行引擎中被编排。
在一个例子中,注册信息还包括:服务启动接口、事件监听接口和发送事件定义接口,则在获取各业务服务节点提供的业务服务之后,会统一封装服务启动接口、事件监听接口和发送事件定义接口。在注册信息中增加服务启动接口、事件监听接口和发送事件定义接口,以使编排/执行引擎在封装三个接口供业务服务节点调用,因此,业务服务节点只需关注自身的业务逻辑实现。
需要说明的是,注册信息需要符合编排引擎的规范。另外,服务启动接口、事件监听接口和发送事件定义接口参照服务插件的做法,由公共插件安装包支持解析和发送,各业务服务节点只需要按照文件和目录规范定义打包即可。
在一个例子中,步骤301还可以通过如图4的方式实现,具体包括:
S401,业务服务节点向消息中间件发起注册请求。
S402,消息中间件监听注册请求,并将注册信息更新存储至服务透视,供服务透视呈现注册信息。
S403,将注册请求更新缓存至编排/执行引擎,供编排/执行引擎编排各故障事件对应的业务服务节点。
步骤302,根据各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点;其中,故障事件与多个业务服务节点相对应。
具体而言,在人机交互界面(User Interface,UI)上展示各业务服务节点提供的业务服务,接收用户针对各故障事件选择的各业务服务的操作信息,并根据用户选择的各业务服务的操作信息,生成各故障事件的处理业务流,其中,所述处理业务流包括用户选择的业务服务所在的业务服务节点,以及各业务服务节点的调度顺序。调度顺序包括:并发调度和/或上下游调度。通过获取各故障事件的处理业务流,以开始业务流程。
在一个例子中,业务编排/执行引擎根据用户选择的各业务服务的操作信息,生成的各故障事件的处理业务流的示意图如图5所示,包括:业务服务节点A,业务服务节点B,业务服务节点C,业务服务An,业务服务Bn,以及业务服务Cn。
在一个例子中,步骤302还可以通过如图6的方式实现,具体包括:
S601,从UI上获取用户针对各故障事件选择的各业务服务的操作信息。
S602,将操作信息发送给编排转换,以进行格式转换。
S603,编排转换将经过格式转换后的操作信息发送给服务编排,以生成处理业务流。
S604,将处理业务流通过编排转换返回至UI。
其中,UI界面的主要操作及展示逻辑如表1所示,包括:
Figure PCTCN2022132370-appb-000001
步骤303,在检测到故障事件的情况下,调度与检测到的故障事件对应的多个业务服务节点分别对故障事件进行检测。
具体而言,通过统一封装的服务启动接口,启动检测到的故障事件对应的多个业务服务节点,通过统一封装的发送事件定义接口,将检测到的故障事件通知给对应的多个业务服务节点,触发多个业务服务节点分别对故障事件进行检测。
在一个例子中,根据检测到的故障事件的处理业务流,调度处理业务流的多个业务服务节点分别对故障事件进行检测。具体为:当检测到的故障事件的处理业务流包括并发调度的业务服务节点,并发调度各业务服务节点分别对故障事件进行检测;当检测到的故障事件的处理业务流包括上下游调度的的业务服务节点,优先调度上游的业务服务节点,并根据上游的业务服务节点的执行结果决策结束处理业务流或调度下游的业务服务节点。根据调度顺序,调度处理业务流的多个业务服务节点分别对故障事件进行检测,以更精准的进行故障定位。
步骤304,获取多个业务服务节点分别对故障事件进行检测的结果。
具体而言,通过监听与多个业务服务节点分别对应的信道,获取多个业务服务节点分别对故障事件进行检测的结果。通过建通信道来获取多个业务服务节点分别对故障事件进行检测的结果,以确认业务服务的执行状态,进而决定是否继续调度下游服务,或者结束当前业务流。
在本实施例中,步骤303和步骤304还可以通过如图7所示的方式实现,具体包括:
S701,编排/执行引擎启动业务流,具体为:下发业务服务的操作信息,例如,业务服务A,业务服务B。
S702,消息中间件监听业务流并启动业务流。具体为:监听业务服务A,并在业务服务节点A启动业务服务A,然后监听业务服务B,并在业务服务节点B启动业务服务B。
S703,业务服务节点A和业务服务节点B分别向消息中间件返回业务服务A和业务服务B的执行结果。
S704,编排/执行引擎下发业务服务C的操作信息。
S705,消息中间件监听业务服务C,并在业务服务节点C启动业务服务C。
S706,业务服务节点C向消息中间件返回业务服务C的执行结果。
S707,消息中间件向编排/执行引擎返回业务流的执行结果。
步骤305,根据多个业务服务节点分别对故障事件进行检测的结果,确定故障原因。
具体而言,将故障事件的检测结果发送给组合服务节点,触发组合服务节点基于故障事件的检测结果进行综合评估,获取组合服务节点对故障事件的综合评估结果,得到故障原因。
其中,组合服务节点具体依据专家库规则,或者机器学习的分析结果对故障事件的检测结果进行综合评估,以得到故障事件的综合评估结果,确定故障原因。
本领域技术人员可以理解的是,在本实施例中,业务服务由业务服务节点实现,通过API进行通信。具体可以通过以下两种方式实现:一种是由流程引擎调用的表述性状态传递(Representational State Transfer,REST)接口,另一种是通过定期检查挂起的业务服务的状态。
其中,API通过超文本传输协议(Hypertext Transfer Protocol,HTTP)提供,使用HTTP可以与不同客户端集成,或者添加其他协议。
在一个例子中,本实施例基于远程过程调用(Remote Procedure Call,RPC)的通信模型来实现,其中业务服务在不同的服务器上运行,并通过HTTP与服务器进行通信,采用轮询模型管理业务服务队列。其中,RPC模型的结构如图8所示,包括:业务服务节点1,业务服务节点2,业务服务节点3,APIHTTP,业务服务队列,管理和执行服务模块,编排/执行引擎,数据库以及索引。
在一个例子中,本实施例的组件全景图如图9所示,包括:流程、任务、历史、监控、客户端、通信以及后台。
其中,流程包括流程引擎,流程引擎用领域专用语言(Domain Specified Language,DSL)来编写流程定义文件,流程定义文件是一种数据交换格式的文件,支持手写定义,同时也支持通过界面拖拽生成。
任务,包括任务创建、任务删除、任务撤销、任务列表,以及并行计算任务等功能。
历史,包括历史任务,历史活动和查询流程等功能。
监控,包括监控引擎和任务调度功能,其中,监控引擎用于在业务流的流程冗长时,对每个业务服务节点的业务服务运行情况状态做出相应决策;任务调度用于进行分布式定时调度。
客户端及通信,包括业务服务队列,业务服务请求,业务服务节点以及跨语言功能。其中,业务服务实现过程中支持跨语言,例如,JAVA、Python、go等语言。
后台,包括后台管理功能。
本申请实施例在自智通讯网络的端到端业务处理/故障定位的实际应用如图10所示,包括:资产服务,领域业务服务,设计与和执行域。
其中,资产服务包括:布局组件集,场景模板集,行业模型集,数据集,业务服务,以及脚本/软件开发工具包/API。
领域业务服务包括:规划应用服务,建设应用服务,运维应用服务,以及优化应用服务。
设计域包括:界面编排,脚本功能增强,业务逻辑编排,数据编排,商业智能(Business Intelligence,BI)分析,AI算法编排,以及数据接入、治理。
执行域包括:执行引擎,编排引擎以及统一数据平台。
本实施例中,通过获取各业务服务节点提供的业务服务,并根据各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点,其中,故障事件与多个业务服务节点相对应,在检测到故障事件的情况下,调度与检测到的故障事件对应的多个业务服务节点,对故障事件进行检测,并获取多个业务服务节点分别对故障事件的检测结果,然后根据多个业务服务节点分别对故障事件的检测结果,确定故障原因。通过编排各故障事件对应的业务服务节点,即根据故障事件预先定义各厂家/专业网的业务服务之间的协作关系,保证了各厂家/专业网的业务服务的解耦和独立设计,使得业务流程不易中断和重组,在实际的自智网络故障处理场景中可以依据实际需要的业务服务进行灵活自由的组合,从而进行未知故障事件的定位,并高效地处理故障事件。
需要说明的是,本实施方式中的上述各示例均为方便理解进行的举例说明,并不对本申请的技术方案构成限定。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请的另一个实施例涉及一种故障处理装置,下面对本实施例的故障处理装置的细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本例的必须,图11是本实施例所述的故障处理装置的示意图,包括:第一获取模块1101、编排模块1102、调度模块1103、第二获取模块1104以及确定模块1105。
具体而言,第一获取模块1101,用于获取各业务服务节点提供的业务服务。
在一个例子中,第一获取模块1101,还用于接收各业务服务节点的注册信息;其中,注册信息包括所述业务服务的名称。
在一个例子中,第一获取模块1101,还用于在注册信息还包括:服务启动接口、事件监听接口和发送事件定义接口的情况下,在获取各业务服务节点提供的业务服务之后,统一封装服务启动接口、事件监听接口和发送事件定义接口。
编排模块1102,用于根据各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点;其中,故障事件与多个业务服务节点相对应。
在一个例子中,编排模块1102,还用于在人机交互界面上展示各业务服务节点提供的业务服务;接收用户针对各故障事件选择的各业务服务的操作信息;根据用户选择的各业务服务的操作信息,生成各故障事件的处理业务流,其中,处理业务流包括用户选择的业务服务所在的业务服务节点,以及各业务服务节点的调度顺序。
调度模块1103,用于在检测到故障事件的情况下,调度与检测到的故障事件对应的多个业务服务节点分别对故障事件进行检测。
在一个例子中,调度模块1103,还用于通过统一封装的所述服务启动接口,启动检测到的故障事件对应的多个业务服务节点;通过统一封装的发送事件定义接口,将检测到的故障事件通知给对应的多个业务服务节点,触发多个业务服务节点分别对故障事件进行检测。
在一个例子中,调度模块1103,还用于根据检测到的故障事件的处理业务流,调度处理业务流的多个业务服务节点分别对故障事件进行检测。
在一个例子中,调度模块1103,还用于在度顺序包括:并发调度和/或上下游调度的情况下,当检测到的故障事件的处理业务流包括并发调度的业务服务节点,并发调度各业务服务节点分别对故障事件进行检测;当检测到的故障事件的处理业务流包括上下游调度的的业务服务节点,优先调度上游的业务服务节点,并根据上游的业务服务节点的执行结果决策结束处理业务流或调度下游的业务服务节点。
第二获取模块1104,用于获取多个业务服务节点分别对故障事件的检测结果。
在一个例子中,第二获取模块1104,还用于通过监听与多个业务服务节点分别对应的信道,获取多个业务服务节点分别对故障事件的检测结果。
在一个例子中,第二获取模块1104,还用于通过统一封装的所述事件监听接口,获取所述多个业务服务节点分别对所述故障事件的检测结果。
确定模块1105,用于根据多个业务服务节点分别对故障事件的检测结果,确定故障原因。
在一个例子中,确定模块1105,还用于将故障事件的检测结果发送给组合服务节点,触发组合服务节点基于故障事件的检测结果进行综合评估;获取组合服务节点对故障事件的综合评估结果,得到故障原因。
不难发现,本实施例为与上述方法实施例对应的装置实施例,本实施例可以与上述方法实施例互相配合实施。上述实施例中提到的相关技术细节和技术效果在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述实施例中。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本申请另一个实施例涉及一种电子设备,如图12所示,包括:至少一个处理器1201;以及,与所述至少一个处理器1201通信连接的存储器1202;其中,所述存储器1202存储有可被所述至少一个处理器1201执行的指令,所述指令被所述至少一个处理器1201执行,以使所述至少一个处理器1201能够执行上述各实施例中的故障处理方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。
本申请另一个实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程 序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (11)

  1. 一种故障处理方法,包括:
    获取各业务服务节点提供的业务服务;
    根据所述各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点;其中,所述故障事件与多个业务服务节点相对应;
    在检测到故障事件的情况下,调度与所述检测到的故障事件对应的所述多个业务服务节点分别对所述故障事件进行检测;
    获取所述多个业务服务节点分别对所述故障事件进行检测的结果;
    根据所述多个业务服务节点分别对所述故障事件进行检测的结果,确定故障原因。
  2. 根据权利要求1所述的故障处理方法,其中,所述获取所述多个业务服务节点分别对所述故障事件的检测结果,包括:
    通过监听与所述多个业务服务节点分别对应的信道,获取所述多个业务服务节点分别对所述故障事件的检测结果。
  3. 根据权利要求1所述的故障处理方法,其中,所述获取各业务服务节点提供的业务服务,包括:
    接收各业务服务节点的注册信息;
    其中,所述注册信息包括所述业务服务的名称或业务服务的唯一标识。
  4. 根据权利要求3所述的故障处理方法,其中,所述注册信息还包括:服务启动接口、事件监听接口和发送事件定义接口;
    所述方法还包括:
    在所述获取各业务服务节点提供的业务服务之后,统一封装所述服务启动接口、所述事件监听接口和所述发送事件定义接口;
    所述调度与所述检测到的故障事件对应的所述多个业务服务节点分别对所述故障事件进行检测,包括:
    通过统一封装的所述服务启动接口,启动所述检测到的故障事件对应的所述多个业务服务节点;
    通过统一封装的所述发送事件定义接口,将所述检测到的故障事件通知给对应的所述多个业务服务节点,触发所述多个业务服务节点分别对所述故障事件进行检测;
    所述获取所述多个业务服务节点分别对所述故障事件的检测结果,包括:
    通过统一封装的所述事件监听接口,获取所述多个业务服务节点分别对所述故障事件的检测结果。
  5. 根据权利要求1至4中任一项所述的故障处理方法,其中,所述根据所述各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点,包括:
    在人机交互界面上展示所述各业务服务节点提供的业务服务;
    接收用户针对各故障事件选择的各业务服务的操作信息;
    根据所述用户选择的各业务服务的操作信息,生成各故障事件的处理业务流,其中,所述处理业务流包括所述用户选择的所述业务服务所在的业务服务节点,以及各所述业务服务节点的调度顺序。
  6. 根据权利要求5所述的故障处理方法,其中,所述调度与所述检测到的故障事件对应的所述多个业务服务节点分别对所述故障事件进行检测,包括:
    根据所述检测到的故障事件的处理业务流,调度所述处理业务流的多个业务服务节点分别对所述故障事件进行检测。
  7. 根据权利要求6所述的故障处理方法,其中,所述调度顺序包括:并发调度或上下游调度;
    所述根据所述检测到的故障事件的处理业务流,调度所述处理业务流的多个业务服务节点分别对所述故障事件进行检测,包括:
    当所述检测到的故障事件的处理业务流包括并发调度的业务服务节点,并发调度各业务服务节点分别对所述故障事件进行检测;
    当所述检测到的故障事件的处理业务流包括上下游调度的的业务服务节点,优先调度上游的业务服务节点,并根据所述上游的业务服务节点的执行结果决策结束所述处理业务流或调度下游的业务服务节点。
  8. 根据权利要求1至4中任一项所述的故障处理方法,其中,所述根据所述多个业务服务节点分别对所述故障事件的检测结果,确定故障原因,包括:
    将所述故障事件的检测结果发送给组合服务节点,触发所述组合服务节点基于所述故障事件的检测结果进行综合评估;
    获取所述组合服务节点对所述故障事件的综合评估结果,得到所述故障原因。
  9. 一种故障处理装置,包括:
    第一获取模块,用于获取各业务服务节点提供的业务服务;
    编排模块,用于根据所述各业务服务节点提供的业务服务编排各故障事件对应的业务服务节点;其中,所述故障事件与多个业务服务节点相对应;
    调度模块,用于在检测到故障事件的情况下,调度与所述检测到的故障事件对应的所述多个业务服务节点分别对所述故障事件进行检测;
    第二获取模块,用于获取所述多个业务服务节点分别对所述故障事件的检测结果;
    确定模块,用于根据所述多个业务服务节点分别对所述故障事件的检测结果,确定故障原因。
  10. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至8中任一项所述的故障处理方法。
  11. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至8中任一项所述的故障处理方法。
PCT/CN2022/132370 2021-12-21 2022-11-16 故障处理方法、装置、电子设备及存储介质 WO2023116276A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111574524.8A CN116361081A (zh) 2021-12-21 2021-12-21 一种故障处理方法、装置、电子设备及存储介质
CN202111574524.8 2021-12-21

Publications (1)

Publication Number Publication Date
WO2023116276A1 true WO2023116276A1 (zh) 2023-06-29

Family

ID=86901257

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132370 WO2023116276A1 (zh) 2021-12-21 2022-11-16 故障处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN116361081A (zh)
WO (1) WO2023116276A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108306748A (zh) * 2017-01-12 2018-07-20 阿里巴巴集团控股有限公司 网络故障定位方法、装置及交互装置
CN108833184A (zh) * 2018-06-29 2018-11-16 腾讯科技(深圳)有限公司 服务故障定位方法、装置、计算机设备及存储介质
US20200153502A1 (en) * 2018-11-13 2020-05-14 Infinera Corporation Method and apparatus for rapid recovery of optical power after transient events in c+l band optical networks
US20200403985A1 (en) * 2019-06-19 2020-12-24 Hewlett Packard Enterprise Development Lp Method for federating a cluster from a plurality of computing nodes
CN113435846A (zh) * 2021-06-30 2021-09-24 深圳平安智汇企业信息管理有限公司 业务流程编排方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108306748A (zh) * 2017-01-12 2018-07-20 阿里巴巴集团控股有限公司 网络故障定位方法、装置及交互装置
CN108833184A (zh) * 2018-06-29 2018-11-16 腾讯科技(深圳)有限公司 服务故障定位方法、装置、计算机设备及存储介质
US20200153502A1 (en) * 2018-11-13 2020-05-14 Infinera Corporation Method and apparatus for rapid recovery of optical power after transient events in c+l band optical networks
US20200403985A1 (en) * 2019-06-19 2020-12-24 Hewlett Packard Enterprise Development Lp Method for federating a cluster from a plurality of computing nodes
CN113435846A (zh) * 2021-06-30 2021-09-24 深圳平安智汇企业信息管理有限公司 业务流程编排方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN116361081A (zh) 2023-06-30

Similar Documents

Publication Publication Date Title
CN111401903B (zh) 区块链消息处理方法、装置、计算机以及可读存储介质
JP7042317B2 (ja) ローカルまたは分散型コンピュータ・システムにおける柔軟なノード構成方法およびシステム
CN109889575B (zh) 一种边缘环境下的协同计算平台系统及方法
CN112882813B (zh) 任务调度方法、装置、系统及电子设备
CN107729139B (zh) 一种并发获取资源的方法和装置
CN111813570A (zh) 一种电力物联网的事件驱动型消息交互方法
US20060282886A1 (en) Service oriented security device management network
US11818152B2 (en) Modeling topic-based message-oriented middleware within a security system
US11294740B2 (en) Event to serverless function workflow instance mapping mechanism
US10079865B2 (en) Method and system for an ontology based request/reply service
CN109656690A (zh) 调度系统、方法和存储介质
US7500251B2 (en) Method and system for managing programs for web service system
CN102497453A (zh) 远端程序的调用装置和调用方法
CN109743399B (zh) 一种体检中心多任务调度的内外网数据传输方法及系统
CN110308984A (zh) 一种用于处理地理分布式数据的跨集群计算系统
CN109558239A (zh) 一种任务调度方法、装置、系统、计算机设备和存储介质
US20220171652A1 (en) Distributed container image construction scheduling system and method
US20220179711A1 (en) Method For Platform-Based Scheduling Of Job Flow
KR20210129584A (ko) 동적으로 할당된 클라우드 작업자 관리 시스템 및 그의 방법
EP4024761A1 (en) Communication method and apparatus for multiple management domains
WO2023116276A1 (zh) 故障处理方法、装置、电子设备及存储介质
CN109525443B (zh) 分布式前置采集通讯链路的处理方法、装置和计算机设备
US9323509B2 (en) Method and system for automated process distribution
CN111913784A (zh) 任务调度方法及装置、网元、存储介质
CN115617768A (zh) 日志管理方法、系统、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909592

Country of ref document: EP

Kind code of ref document: A1