CN115941442A - Business fault analysis method and device, electronic equipment and medium - Google Patents

Business fault analysis method and device, electronic equipment and medium Download PDF

Info

Publication number
CN115941442A
CN115941442A CN202211531745.1A CN202211531745A CN115941442A CN 115941442 A CN115941442 A CN 115941442A CN 202211531745 A CN202211531745 A CN 202211531745A CN 115941442 A CN115941442 A CN 115941442A
Authority
CN
China
Prior art keywords
service
alarm information
domain
fault
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211531745.1A
Other languages
Chinese (zh)
Inventor
张帆
杨艳松
王宏鼎
刘雪峰
许�鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202211531745.1A priority Critical patent/CN115941442A/en
Publication of CN115941442A publication Critical patent/CN115941442A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a service fault analysis method, a service fault analysis device, electronic equipment and a medium. The method comprises the following steps: the method comprises the steps of obtaining service alarm information and non-service alarm information reported by a controller, establishing association between each service and the alarm information, wherein the service alarm information can be directly associated with related services, the non-service alarm information is indirectly associated with the services through a service end-to-end path, when a single-domain service fault is detected, obtaining alarm information corresponding to the fault service according to the association relation, and analyzing to obtain a single-domain service fault analysis result. The method of the application performs OTN special line fault analysis by taking the service as the center, and can perform service fault analysis timely and accurately.

Description

Business fault analysis method and device, electronic equipment and medium
Technical Field
The present application relates to communications technologies, and in particular, to a method and an apparatus for analyzing a service fault, an electronic device, and a medium.
Background
An Optical Transport Network (OTN) is a basic Network of an operator, and an OTN dedicated line has been a primary choice for many clients to perform ad hoc networking and service bearing due to its advantages of high bandwidth, low delay, high security, and high privacy. In practical application, service is a direct requirement of a client, and service failure is a direct mapping of OTN dedicated line failure alarm, so failure analysis of OTN dedicated line service is an important ring for operator network maintenance.
In the prior art, an XML interface protocol is adopted for fault analysis, resource alarm information reported by a controller is acquired, and a fault analysis result is obtained by analyzing and processing the acquired resource alarm information.
In the scheme, the XML interface can only be used for realizing non-service fault analysis by analyzing the resource state, and the service fault depends on the declaration of a client and can only be passively cleared, so that the service fault analysis is not timely.
Disclosure of Invention
The application provides a service fault analysis method, a service fault analysis device, electronic equipment and a medium, which are used for solving the problem that service fault analysis cannot be timely and accurately carried out.
In one aspect, the present application provides a method for analyzing a service fault, including: acquiring alarm information reported by a controller, wherein the alarm information comprises service alarm information and non-service alarm information; executing association processing to establish an association relationship between each service and the alarm information, wherein the association processing comprises the following steps: aiming at the service alarm information, establishing association between the service alarm information and the service represented by the service field according to the service field in the service alarm information; aiming at the non-service alarm information, determining a service related to the non-service alarm information according to a service end-to-end path, and establishing association between the non-service alarm information and the service; if the single-domain fault service is detected to exist, acquiring alarm information corresponding to the single-domain fault service according to the incidence relation, and performing fault analysis according to the alarm information corresponding to the single-domain fault service to acquire a service fault analysis result.
In a possible implementation manner, the acquiring alarm information reported by the controller includes: establishing long connection with a controller through a YANG-SSE or Websocket interface; receiving the message event reported by the controller through the established long connection; and analyzing and identifying the message event reported by the controller based on an ACTN or data structure YANG model of a T-API protocol to obtain the alarm information in the message event.
In a possible implementation manner, after the obtaining of the alarm information reported by the controller, the method further includes: carrying out format standardization processing on the format of the alarm information; screening a target field in alarm information, and removing fields except the target field in the alarm information, wherein the target field comprises a field for representing the type of information, a field for representing time, a field for representing a name and a field for representing a state, and the type of information comprises service alarm information and non-service alarm information; and writing the current warning information into a database.
In a possible implementation manner, before writing the current warning information into the database, the method further includes: and removing repeated alarms, synonymous alarms and edge alarms in the alarm information.
In a possible implementation, the method further includes: if the existence of the cross-domain fault service is detected, determining each single-domain service under the cross-domain fault service according to inter-domain link and time slot allocation; detecting whether the cross-domain service fault is related to the fault of each single-domain service; if so, analyzing by combining a knowledge base and experience rules according to the service fault analysis result of each single-domain service to obtain a fault analysis result of the cross-domain fault service; and if not, acquiring link alarm information of a cross-domain link corresponding to the cross-domain fault service, and analyzing by combining a single-domain service fault analysis result, a knowledge base and an experience rule corresponding to the cross-domain service to obtain a fault analysis result of the cross-domain fault service.
In another aspect, the present application provides a service failure analysis apparatus, including: the acquisition module is used for acquiring alarm information reported by the controller, wherein the alarm information comprises service alarm information and non-service alarm information; the association module is used for executing association processing to establish association relation between each service and the alarm information, and the association processing comprises the following steps: aiming at the service alarm information, establishing association between the service alarm information and the service represented by the service field according to the service field in the service alarm information; aiming at the non-service alarm information, determining a service related to the non-service alarm information according to a service end-to-end path, and establishing association between the non-service alarm information and the service; and the analysis module is used for acquiring alarm information corresponding to the single-domain fault service according to the association relation and performing fault analysis according to the alarm information corresponding to the single-domain fault service to acquire a service fault analysis result if the single-domain fault service is detected to exist.
In a possible implementation manner, the obtaining module is specifically configured to: establishing long connection with a controller through a YANG-SSE or Websocket interface; receiving the message event reported by the controller through the established long connection; and analyzing and identifying the message event reported by the controller based on an ACTN or data structure YANG model of a T-API protocol to obtain the alarm information in the message event.
In a possible implementation manner, the obtaining module is further configured to: carrying out format standardization processing on the format of the alarm information; screening a target field in alarm information, and removing fields except the target field in the alarm information, wherein the target field comprises a field for representing the type of information, a field for representing time, a field for representing a name and a field for representing a state, and the type of information comprises service alarm information and non-service alarm information; and writing the current warning information into a database.
In a possible implementation manner, the obtaining module is further configured to: and removing repeated alarms, synonymous alarms and edge alarms in the alarm information.
In a possible implementation manner, the apparatus further includes: the second analysis module is used for determining each single-domain service under the cross-domain fault service according to inter-domain link and time slot allocation if the cross-domain fault service is detected to exist; detecting whether the cross-domain service fault is related to the fault of each single-domain service; if yes, analyzing by combining a knowledge base and experience rules according to the service fault analysis result of each single-domain service to obtain a fault analysis result of the cross-domain fault service; if the single domain service is not related to the cross-domain service, acquiring link alarm information of a cross-domain link corresponding to the cross-domain fault service, and analyzing by combining a single domain service fault analysis result, a knowledge base and an experience rule corresponding to the cross-domain service to obtain a fault analysis result of the cross-domain fault service.
In yet another aspect, the present application provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored by the memory to implement the method as previously described.
In yet another aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the method as described above when executed by a processor.
In the service fault analysis method, the device, the electronic equipment and the medium, the service alarm information and the non-service alarm information reported by the controller are acquired, and each service is associated with the alarm information, wherein the service alarm information can be directly associated with the related service, the non-service alarm information is indirectly associated with the service through the service end-to-end path, when the single-domain service is detected to have a fault, the alarm information corresponding to the fault service is acquired according to the association relationship, and the single-domain service fault analysis result is obtained through analysis. According to the method and the device, the service fault analysis result is obtained by acquiring the service alarm information and the non-service alarm information and establishing the association between the alarm information and the service, so that the problems that the fault analysis cannot be performed on an OTN (optical transport network) special line by taking the service as a center and the service fault analysis cannot be performed timely and accurately are solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic flowchart illustrating a service failure analysis method according to an embodiment of the present application;
fig. 2 is a schematic flowchart illustrating another service fault analysis method provided in an embodiment of the present application;
fig. 3 is a schematic flowchart illustrating a further service fault analysis method provided in an embodiment of the present application;
fig. 4 is a schematic flowchart illustrating a cross-domain service fault analysis method according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating an architecture of a cross-domain service provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram schematically illustrating a service failure analysis apparatus provided in the second embodiment of the present application;
fig. 7 is a schematic structural diagram schematically illustrating a service failure analysis electronic device provided in the third embodiment of the present application.
Specific embodiments of the present application have been shown by way of example in the drawings and will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The modules in this application refer to functional modules or logical modules. It may be in the form of software, the functions of which are implemented by a processor executing program code; but may also be in hardware.
The terms referred to in the present application are explained first:
ACTN (Abstract and Control of Traffic Engineered Networks): the standard is an OTN controller northbound interface standard which is dominated by IETF, the architecture of the standard meets an SDN basic layered model, and the standard is divided into a forwarding layer, a middle control layer and an application layer from south to north. For the ACTN controller, the bottom layer device is an OTN physical network topology, and the controller is used for completing control configuration. The control plane realizes the operations of topology maintenance of the whole network, point information collection, flow table information configuration, whole network routing and the like in a hierarchical control mode, provides a north-oriented open interface facing application, and automatically completes routing forwarding and deployment by issuing the interface to the control unit. The application layer is mainly applied and deployed based on SDN technology, customized management application such as a network management system is used, control and forwarding layer interaction logic is reduced, and rapid service deployment is achieved through a software API programming mode.
T-API (Transport API): the method is characterized in that a north interface standard of a front OTN controller is guided by an ONF, and in a T-API interface model, a single physical network node is abstracted into 3-layer nodes of an ETH layer, an ODU layer and an OCH layer; on each layer of nodes, NEP (Node Edge Point) corresponds to a physical port, a service starting Point and a service ending Point are described by SIP, and the equipment manufacturer controller has the capability of abstracting link information and SIP information. Different layers are connected through interlayer links, and the in-layer nodes are connected through the in-layer links and connected with other network domains through inter-domain links. Each node in the abstraction layer is provided with a plurality of NEPs, and each NEP corresponds to a plurality of SIP.
Software Defined Network (SDN) refers to a novel Network innovation architecture, and is an implementation manner of Network virtualization. The core technology OpenFlow separates the control plane and the data plane of the network equipment, thereby realizing the flexible control of network flow, enabling the network to become more intelligent and providing a good platform for the innovation of a core network and application.
An Optical Transport Network (OTN) is a basic Network of an operator, and an OTN dedicated line has the advantages of high bandwidth, low time delay, high security, and high privacy, and thus has become a primary choice for ad hoc networks and service bearers of many customers. With the maturity and large-scale commercial use of SDN technology, the service failure of the conventional OTN dedicated line, which performs failure analysis with resources as the center, depends on the customer declaration, and the disadvantage of passive troubleshooting is increasingly highlighted.
In order to promote SDN of the OTN special line, the slave controller acquires service alarm information and non-service alarm information, establishes association between the alarm information and the service and analyzes the association, and realizes fault analysis of the OTN special line by taking the service as a center.
The technical solution of the present application is illustrated below with specific examples. These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Example one
Fig. 1 is a schematic flow chart of a service fault analysis method according to an embodiment of the present application. As shown in fig. 1, the service fault analysis method provided in this embodiment may include:
s101, acquiring alarm information reported by a controller, wherein the alarm information comprises service alarm information and non-service alarm information;
s102, executing association processing to establish association relation between each service and alarm information, wherein the association processing comprises the following steps: aiming at the service alarm information, establishing association between the service alarm information and the service represented by the service field according to the service field in the service alarm information; aiming at the non-service alarm information, determining a service related to the non-service alarm information according to a service end-to-end path, and establishing association between the non-service alarm information and the service;
s103, if the existence of the single-domain fault service is detected, acquiring alarm information corresponding to the single-domain fault service according to the association relation, and performing fault analysis according to the alarm information corresponding to the single-domain fault service to acquire a service fault analysis result.
In practical applications, the execution subject of the embodiment may be a service failure analysis apparatus, which may be implemented by a computer program, for example, application software; or, it may also be implemented as a medium storing a related computer program, for example, a usb disk, a cloud disk, or the like; still alternatively, the implementation may be realized by a physical device, such as a chip, a server, etc., in which the relevant computer program is integrated or installed.
Specifically, the service failure analysis device obtains the alarm information from the controller, where the alarm information includes a service type (such as a service state, a service detail, and the like) and a non-service type, and in practical application, the non-service type alarm information may specifically include resource type alarm information (such as a network element, a port, a link, a hierarchy, and the like) and performance type alarm information (such as a time delay, a rate, a bandwidth, and the like), and may be selected according to a used interface protocol and a requirement of practical production, where the selection is not limited. For example, the service fault analysis device may obtain service alarm information and non-service alarm information, including resource alarm information and performance alarm information, from the controller through an ACTN protocol and a T-API protocol. After the alarm information is obtained, establishing an association relationship between the alarm information and each service, wherein the service alarm information can be directly associated with the service, and the corresponding service can be matched according to a service name field in the alarm information; for non-service alarm information, the alarm information related to the equipment and the link which pass through can be associated from the service route according to the service end-to-end path, thereby establishing association. Because the service fault may cause the change of the service state, such as the change of the related network element, the link state and the performance in the service path, the alarm information corresponding to the single domain fault service is obtained, the changes are extracted from the single domain fault service, and are integrated, analyzed and processed, and finally the fault point and the fault reason can be analyzed as the single domain service fault analysis result. Finally, the fault delimitation result and the fault point can be sent to a user and other systems in a Restful protocol and other modes.
For example, assuming that a service degradation occurs in a current OTN dedicated line service, the service failure analysis device may obtain alarm information from the controller, where the alarm information includes a service class, a resource class, and a performance class, where the service alarm information indicates that the service is in a degraded state, the resource class alarm information indicates that a part of the device links are abnormal, and the performance class alarm information indicates that the rate is slow. The service alarm information represents the service state and the service details of the service and can be directly associated with the service; the resource alarm information can establish association between the corresponding resource alarm information and the service after determining the network element, link and port through which the service passes according to the end-to-end path of the service. Matching the alarm information associated with the service, listing possible fault reasons, finally combining a knowledge base and an experience rule, carrying out priority ordering on all fault reasons, finding out key reasons, and finally outputting that the fault reason of the service is interrupted by a certain network element.
Fig. 2 is a schematic flow diagram of another service fault analysis method provided in an embodiment of the present application, and as shown in fig. 2, in an example, S101 may specifically include:
s201, establishing long connection with a controller through a YANG-SSE or Websocket interface;
s202, receiving the message event reported by the controller through the established long connection;
s203, analyzing and identifying the message event reported by the controller based on the data structure YANG model of the ACTN or T-API protocol to obtain the alarm information in the message event.
Specifically, the service fault analysis device establishes long connection with the controller through a YANG-SSE or Websocket interface mode, so that a monitoring channel is established, and a message event pushed by the controller is received. And analyzing the message events reported by the controller by combining the data structure YANG models of the ACTN and the T-API protocol, and analyzing the alarm information from the message event, thereby realizing SDN of the traditional service. For example, "down" in the ACTN protocol is a field representing an alarm, "cut" in the T-API protocol is a field representing an alarm, and the service failure analysis device can identify the actual operation state of the service according to the YANG model, wherein the service operation state can be divided into alarm conditions such as normal, off-network interruption, in-network interruption, standby path interruption, and the like by combining the YANG models of the ACTN and T-API protocols.
Fig. 3 is a schematic flow chart of another service fault analysis method provided in an embodiment of the present application, and as shown in fig. 3, in an example, after acquiring alarm information reported by a controller, the method may further include:
s301, carrying out format standardization processing on the format of the alarm information;
s302, a target field in the alarm information is screened, and fields except the target field in the alarm information are removed, wherein the target field comprises a field for representing the type of the information, a field for representing time, a field for representing the name and a field for representing the state, and the type of the information comprises service alarm information and non-service alarm information;
and S303, writing the current warning information into a database.
Specifically, the problem that the formats of the alarm information acquired by the service failure analysis device through different protocols are inconsistent exists, and therefore the formats of the acquired alarm information need to be standardized. For example, at present, northbound interfaces used by some manufacturers conform to the ACTN standard, and some northbound interfaces used by some manufacturers conform to the T-API 2.0 standard, incompatibility of northbound interfaces of a control system makes inter-manufacturer cooperation and uniform arrangement objectively impossible, and a service fault analysis device can perform format unification on obtained alarm information. In addition, because redundant information exists in the alarm information, important fields of the alarm information can be screened, and information fields required by alarm analysis are reserved. For example, for a certain resource class alarm information, fields such as the information type (here, resource class), occurrence time, resource id, running state, etc. are reserved, and other fields irrelevant to alarm analysis are removed. The processed alarm information is written into a database for storage, and the alarm information can be subjected to memory operation processing so as to call the required alarm information in real time during fault analysis.
The processing can realize the uniform format of the alarm information, screen out important information fields, and write the important information fields into a database, thereby improving the information quality and the accuracy of subsequent fault analysis.
Further, to improve the effectiveness of the alarm information, in an example, before S303, the method may further include:
and removing repeated alarms, synonymous alarms and edge alarms in the alarm information.
Specifically, a large number of irrelevant alarms exist under the single-domain controller, and repeated alarms, synonymous alarms and edge alarms in the alarm information are deleted and deduplicated. For example, the alarm compression technique may be used to remove duplicate alarms, synonymous alarms, and edge alarms in the alarm information, wherein the technical means used may be selected according to the actual application, and is not limited herein. Effective warning information is reserved through the operation, and the fault analysis efficiency is improved.
After obtaining the single-domain service fault analysis result, the cross-domain service fault may be further analyzed, and fig. 4 is a schematic flow chart of a cross-domain service fault analysis method provided in an embodiment of the present application, as shown in fig. 4, in an example, the method may further include:
s401, if the existence of the cross-domain fault service is detected, determining each single-domain service under the cross-domain fault service according to inter-domain link and time slot allocation;
s402, detecting whether the cross-domain service fault is related to the fault of each single-domain service;
if yes, executing S403, and according to the service fault analysis result of each single-domain service, combining a knowledge base and experience rules to analyze, so as to obtain a fault analysis result of the cross-domain fault service;
if not, executing S404, collecting link alarm information of a cross-domain link corresponding to the cross-domain fault service, and analyzing by combining a single-domain service fault analysis result, a knowledge base and an experience rule corresponding to the cross-domain service to obtain a cross-domain service fault analysis result.
Specifically, the cross-domain service is formed by combining multiple sections of single-domain services, so that for the cross-domain fault service, each single-domain service corresponding to the cross-domain fault service can be determined, and the analysis of the cross-domain fault service is realized by analyzing the single-domain services. Different single-domain services are connected and combined by the inter-domain link, so that the determination of each single-domain service under the cross-domain fault service can be realized according to the inter-domain link and the time slot distribution thereof. The fault point of the cross-domain service fault can be divided into intra-domain and inter-domain, the judgment of the fault point of the cross-domain service can be realized by judging whether the fault of the cross-domain service is related to the fault of each corresponding single-domain service, and if the fault of the cross-domain service is related to the fault of each corresponding single-domain service, the fault point of the cross-domain service is in the intra-domain; if the fault of the cross-domain service is not related to the fault of each corresponding single-domain service, the fault point of the cross-domain service is between domains. Aiming at the intra-domain faults, acquiring alarm information of each single domain to carry out fault analysis, and analyzing to obtain a cross-domain service fault analysis result according to the service fault analysis result of each single-domain service; and aiming at the intra-domain faults, analyzing by combining inter-domain alarm information and each single-domain service fault analysis result.
For example, fig. 5 is a schematic diagram of an architecture of a cross-domain service provided in an embodiment of the present application, and as shown in fig. 5, assuming that a long distance telephone service between beijing and guangzhou currently exists and the service is interrupted as a fault of the cross-domain service, it can be determined that the single-domain service corresponding to the service is the beijing telephone service and the guangzhou telephone service. And analyzing the telephone services of Beijing and Guangzhou respectively, and judging that the fault of the long-distance telephone service is irrelevant to the fault of the corresponding single-domain service and the fault point is between domains if the telephone services of Beijing and Guangzhou are in normal states. And loading alarm information such as the state, the performance and the like of a cross-domain link related to the service from the managed inter-domain information, and judging that the fault reason of the cross-domain service is interrupted by a certain link according to the state change, the performance change and the single-domain analysis result of the inter-domain link and by combining an expert knowledge base and an empirical algorithm.
The OTN network and the management and control system thereof are independent in domain, form barriers with each other and lack the unified cooperation of cross-domain and cross-manufacturer, and the scheme further analyzes by combining the single-domain service fault analysis result to obtain the cross-domain service fault analysis result, thereby realizing the analysis of cross-domain faults.
In the service fault analysis method provided by this embodiment, the service alarm information reported by the controller and the non-service alarm information are acquired, and each service is associated with the alarm information, where the service alarm information may be directly associated with the relevant service, and the non-service alarm information is indirectly associated with the service through the service end-to-end path, and when a fault occurs in the single domain service, the alarm information corresponding to the faulty service is acquired according to the association relationship, and the single domain service fault analysis result is obtained through analysis. According to the method and the device, the service fault analysis result is obtained by acquiring the service alarm information and the non-service alarm information and establishing the association between the alarm information and the service, so that the fault analysis of the OTN special line is realized by taking the service as the center, and the problem that the service fault analysis cannot be timely and accurately performed is solved.
Example two
Fig. 6 is a schematic structural diagram of a service fault analysis apparatus according to an embodiment of the present application. As shown in fig. 6, the service failure analysis apparatus provided in this embodiment may include:
an obtaining module 61, configured to obtain alarm information reported by a controller, where the alarm information includes service alarm information and non-service alarm information;
the association module 62 is configured to execute association processing to establish an association relationship between each service and the alarm information, where the association processing includes: aiming at the service alarm information, establishing association between the service alarm information and the service represented by the service field according to the service field in the service alarm information; aiming at the non-service alarm information, determining a service related to the non-service alarm information according to a service end-to-end path, and establishing association between the non-service alarm information and the service;
and the analysis device 63 is configured to, if it is detected that the single-domain fault service exists, obtain alarm information corresponding to the single-domain fault service according to the association relationship, perform fault analysis according to the alarm information corresponding to the single-domain fault service, and obtain a service fault analysis result.
In practical applications, the service failure analysis apparatus may be implemented by a computer program, for example, application software; alternatively, the present invention may be implemented as a medium storing a related computer program, for example, a usb disk, a cloud disk, or the like; still alternatively, the implementation may be realized by a physical device, such as a chip, a server, etc., in which the relevant computer program is integrated or installed.
Specifically, the service fault analysis device obtains the alarm information from the controller, where the alarm information includes a service type and a non-service type, and in practical application, the non-service type alarm information may specifically include resource type alarm information and performance type alarm information, and may be selected according to a used interface protocol and a requirement of practical production, where the selection is not limited. For example, the service fault analysis device may obtain service alarm information and non-service alarm information, including resource alarm information and performance alarm information, from the controller through the ACTN protocol and the T-API protocol. After the alarm information is obtained, establishing an association relationship between the alarm information and each service, wherein the service alarm information can be directly associated with the service, and the corresponding service can be matched according to a service name field in the alarm information; for non-service alarm information, the alarm information related to the equipment and the link which pass through can be associated from the service route according to the service end-to-end path, thereby establishing association. Because the service fault may cause the change of the service state, such as the change of the related network elements, the link state and the performance in the service path, the change is extracted from the alarm information corresponding to the single domain fault service, and is integrated, analyzed and processed, and finally the fault point and the fault reason can be analyzed as the single domain service fault analysis result. Finally, the fault delimitation result and the fault point can be sent to the user and other systems through Restful protocol and other modes.
The manner of acquiring the alarm information reported by the controller may be various, and in an example, the acquiring module 61 may be specifically configured to:
establishing long connection with a controller through a YANG-SSE or Websocket interface;
receiving the message event reported by the controller through the established long connection;
and analyzing and identifying the message event reported by the controller based on an ACTN or data structure YANG model of a T-API protocol to obtain the alarm information in the message event.
Specifically, the service fault analysis device establishes long connection with the controller through a YANG-SSE or Websocket interface mode, so that a monitoring channel is established, and message events pushed by the controller are received. And analyzing the message events reported by the controller by combining the data structure YANG models of the ACTN and the T-API protocol, and analyzing the alarm information from the message event, thereby realizing SDN of the traditional service. For example, "down" in the ACTN protocol is a field representing an alarm, "cut" in the T-API protocol is a field representing an alarm, and the service failure analysis device can identify the actual operation state of the service according to the YANG model, wherein the service operation state can be divided into alarm conditions such as normal, off-network interruption, in-network interruption, standby path interruption, and the like by combining the YANG models of the ACTN and T-API protocols.
In order to pre-process the obtained alarm information, in an example, the obtaining module 61 may be specifically configured to:
carrying out format standardization processing on the format of the alarm information;
screening a target field in alarm information, and removing fields except the target field in the alarm information, wherein the target field comprises a field for representing the type of information, a field for representing time, a field for representing a name and a field for representing a state, and the type of information comprises service alarm information and non-service alarm information;
and writing the current warning information into a database.
Specifically, the problem that the formats of the alarm information acquired by the service failure analysis device through different protocols are inconsistent exists, and therefore the formats of the acquired alarm information need to be standardized. For example, at present, northbound interfaces used by some manufacturers conform to the ACTN standard, and some northbound interfaces used by some manufacturers conform to the T-API 2.0 standard, incompatibility of northbound interfaces of a control system makes inter-manufacturer cooperation and uniform arrangement objectively impossible, and a service fault analysis device can perform format unification on obtained alarm information. In addition, because redundant information exists in the alarm information, important fields of the alarm information can be screened, and information fields required by alarm analysis are reserved. For example, for a certain resource class alarm information, fields such as the information type (here, resource class), occurrence time, resource id, running state, etc. are reserved, and other fields irrelevant to alarm analysis are removed. The processed alarm information is written into a database for storage, and memory operation processing can be performed on the alarm information so as to call the required alarm information in real time during fault analysis.
Through the processing, the format unification of the alarm information can be realized, important information fields are screened out and written into a database, so that the information quality is improved, and the accuracy of subsequent fault analysis is improved.
Further, to improve the effectiveness of the warning information, in an example, the obtaining module 61 may further be configured to:
and removing repeated alarms, synonymous alarms and edge alarms in the alarm information.
Specifically, a large number of irrelevant alarms exist under the single-domain controller, and repeated alarms, synonymous alarms and edge alarms in the alarm information are deleted and deduplicated. For example, the alarm compression technique may be used to remove duplicate alarms, synonymous alarms, and edge alarms in the alarm information, wherein the technical means used may be selected according to the actual application, and is not limited herein. The effective warning information is reserved through the operation, and the fault analysis efficiency is improved.
After obtaining the single domain service fault analysis result, the cross-domain service fault may be further analyzed, and in an example, the apparatus may further include:
the second analysis module is used for determining each single-domain service under the cross-domain fault service according to inter-domain link and time slot distribution if the cross-domain fault service is detected to exist;
detecting whether the cross-domain service fault is related to the fault of each single-domain service;
if yes, analyzing by combining a knowledge base and experience rules according to the service fault analysis result of each single-domain service to obtain a fault analysis result of the cross-domain fault service; if the single domain service fault analysis result is not relevant, acquiring the link alarm information of the cross-domain link corresponding to the cross-domain fault service, and analyzing by combining the single domain service fault analysis result, the knowledge base and the experience rule corresponding to the cross-domain service to obtain the cross-domain service fault analysis result.
Specifically, the cross-domain service is formed by combining multiple sections of single-domain services, so that for the cross-domain fault service, each single-domain service corresponding to the cross-domain fault service can be determined, and the analysis of the cross-domain fault service is realized by analyzing the single-domain services. Different single-domain services are connected and combined by the inter-domain link, so that the determination of each single-domain service under the cross-domain fault service can be realized according to the inter-domain link and the time slot distribution thereof. The fault point of the cross-domain service fault can be divided into intra-domain and inter-domain, the judgment of the fault point of the cross-domain service can be realized by judging whether the fault of the cross-domain service is related to the fault of each corresponding single-domain service, and if the fault of the cross-domain service is related to the fault of each corresponding single-domain service, the fault point of the cross-domain service is in the intra-domain; if the fault of the cross-domain service is not related to the fault of each corresponding single-domain service, the fault point of the cross-domain service is between domains. Aiming at the intra-domain faults, acquiring alarm information of each single domain to carry out fault analysis, and analyzing to obtain a cross-domain service fault analysis result according to the service fault analysis result of each single-domain service; and aiming at the intra-domain faults, analyzing by combining inter-domain alarm information and each single-domain service fault analysis result.
The OTN and the management and control system thereof are independent in domain, form barriers with each other, lack the unified cooperation of cross-domain and cross-manufacturer, and the scheme further analyzes by combining the single-domain service fault analysis result to obtain the cross-domain service fault analysis result, thereby realizing the analysis of cross-domain faults.
In the service failure analysis apparatus provided in this embodiment, the service alarm information reported by the controller is acquired and the non-service alarm information is associated with each service and the alarm information, where the service alarm information may be directly associated with the related service, and the non-service alarm information is indirectly associated with the service through the service end-to-end path, when a failure occurs in the single domain service, the alarm information corresponding to the failed service is acquired according to the association relationship, and the failure analysis result of the single domain service is obtained through analysis. According to the method and the device, the service fault analysis result is obtained by acquiring the service alarm information and the non-service alarm information and establishing the association between the alarm information and the service, so that the fault analysis of the OTN special line is realized by taking the service as the center, and the problem that the service fault analysis cannot be timely and accurately performed is solved.
EXAMPLE III
Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure, and as shown in fig. 7, the electronic device includes:
a processor (processor) 291, the electronic device further including a memory (memory) 292; a Communication Interface 293 and bus 294 may also be included. The processor 291, the memory 292, and the communication interface 293 may communicate with each other via the bus 294. Communication interface 293 may be used for the transmission of information. Processor 291 may invoke logic instructions in memory 292 to perform the methods of the embodiments described above.
Further, the logic instructions in the memory 292 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.
The memory 292 is a computer-readable storage medium that can be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 291 executes the functional application and data processing by executing the software program, instructions and modules stored in the memory 292, so as to implement the method in the above method embodiments.
The memory 292 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 292 may include a high speed random access memory and may also include a non-volatile memory.
The disclosed embodiments provide a non-transitory computer-readable storage medium having stored therein computer-executable instructions for implementing the method of the foregoing embodiments when executed by a processor.
Example four
The embodiments of the present disclosure provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method provided in any of the embodiments of the present disclosure.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (12)

1. A service fault analysis method is characterized by comprising the following steps:
acquiring alarm information reported by a controller, wherein the alarm information comprises service alarm information and non-service alarm information;
executing association processing to establish an association relationship between each service and the alarm information, wherein the association processing comprises the following steps: aiming at the service alarm information, establishing association between the service alarm information and the service represented by the service field according to the service field in the service alarm information; aiming at the non-service alarm information, determining a service related to the non-service alarm information according to a service end-to-end path, and establishing association between the non-service alarm information and the service;
if the single-domain fault service is detected to exist, acquiring alarm information corresponding to the single-domain fault service according to the association relation, and performing fault analysis according to the alarm information corresponding to the single-domain fault service to acquire a service fault analysis result.
2. The method of claim 1, wherein the obtaining the alarm information reported by the controller comprises:
establishing long connection with a controller through a YANG-SSE or Websocket interface;
receiving the message event reported by the controller through the established long connection;
and analyzing and identifying the message event reported by the controller based on an ACTN or data structure YANG model of a T-API protocol to obtain the alarm information in the message event.
3. The method of claim 2, wherein after obtaining the alarm information reported by the controller, the method further comprises:
carrying out format standardization processing on the format of the alarm information;
screening a target field in alarm information, and removing fields except the target field in the alarm information, wherein the target field comprises a field for representing the type of information, a field for representing time, a field for representing a name and a field for representing a state, and the type of information comprises service alarm information and non-service alarm information;
and writing the current warning information into a database.
4. The method of claim 3, wherein prior to writing the current warning information to the database, further comprising:
and removing repeated alarms, synonymous alarms and edge alarms in the alarm information.
5. The method according to any one of claims 1-4, further comprising:
if the existence of the cross-domain fault service is detected, determining each single-domain service under the cross-domain fault service according to inter-domain link and time slot allocation;
detecting whether the cross-domain service fault is related to the fault of each single-domain service;
if yes, analyzing by combining a knowledge base and experience rules according to the service fault analysis result of each single-domain service to obtain a fault analysis result of the cross-domain fault service; and if not, acquiring link alarm information of a cross-domain link corresponding to the cross-domain fault service, and analyzing by combining a single-domain service fault analysis result, a knowledge base and an experience rule corresponding to the cross-domain service to obtain a fault analysis result of the cross-domain fault service.
6. A service failure analysis apparatus, comprising:
the acquisition module is used for acquiring alarm information reported by the controller, wherein the alarm information comprises service alarm information and non-service alarm information;
the association module is used for executing association processing to establish association relation between each service and the alarm information, and the association processing comprises the following steps: aiming at the service alarm information, establishing association between the service alarm information and the service represented by the service field according to the service field in the service alarm information; aiming at the non-service alarm information, determining a service related to the non-service alarm information according to a service end-to-end path, and establishing association between the non-service alarm information and the service;
and the analysis module is used for acquiring alarm information corresponding to the single-domain fault service according to the association relation and performing fault analysis according to the alarm information corresponding to the single-domain fault service to acquire a service fault analysis result if the single-domain fault service is detected to exist.
7. The apparatus of claim 6, wherein the obtaining module is specifically configured to:
establishing long connection with a controller through a YANG-SSE or Websocket interface;
receiving the message event reported by the controller through the established long connection;
and analyzing and identifying the message event reported by the controller based on an ACTN or data structure YANG model of a T-API protocol to obtain the alarm information in the message event.
8. The apparatus of claim 7, wherein the obtaining module is further configured to:
carrying out format standardization processing on the format of the alarm information;
screening a target field in alarm information, and removing fields except the target field in the alarm information, wherein the target field comprises a field for representing the type of information, a field for representing time, a field for representing a name and a field for representing a state, and the type of information comprises service alarm information and non-service alarm information;
and writing the current warning information into a database.
9. The apparatus of claim 8, wherein the obtaining module is further configured to:
and removing repeated alarms, synonymous alarms and edge alarms in the alarm information.
10. The apparatus of any of claims 6-9, further comprising:
the second analysis module is used for determining each single-domain service under the cross-domain fault service according to inter-domain link and time slot allocation if the cross-domain fault service is detected to exist;
detecting whether the cross-domain service fault is related to the fault of each single-domain service;
if so, analyzing by combining a knowledge base and experience rules according to the service fault analysis result of each single-domain service to obtain a fault analysis result of the cross-domain fault service; if the single domain service is not related to the cross-domain service, acquiring link alarm information of a cross-domain link corresponding to the cross-domain fault service, and analyzing by combining a single domain service fault analysis result, a knowledge base and an experience rule corresponding to the cross-domain service to obtain a fault analysis result of the cross-domain fault service.
11. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer execution instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-5.
12. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the method of any one of claims 1-5.
CN202211531745.1A 2022-12-01 2022-12-01 Business fault analysis method and device, electronic equipment and medium Pending CN115941442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211531745.1A CN115941442A (en) 2022-12-01 2022-12-01 Business fault analysis method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211531745.1A CN115941442A (en) 2022-12-01 2022-12-01 Business fault analysis method and device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN115941442A true CN115941442A (en) 2023-04-07

Family

ID=86652031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211531745.1A Pending CN115941442A (en) 2022-12-01 2022-12-01 Business fault analysis method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN115941442A (en)

Similar Documents

Publication Publication Date Title
US20200106662A1 (en) Systems and methods for managing network health
US10484265B2 (en) Dynamic update of virtual network topology
US9680722B2 (en) Method for determining a severity of a network incident
JP5840788B2 (en) Method, apparatus and communication network for root cause analysis
US8356093B2 (en) Apparatus and system for estimating network configuration
CN110855509B (en) Novel configuration method for SPTN (packet transport network) network architecture of cloud software definition
CN111800354B (en) Message processing method and device, message processing equipment and storage medium
KR20170049509A (en) Collecting and analyzing selected network traffic
US7991872B2 (en) Vertical integration of network management for ethernet and the optical transport
CN109960634A (en) A kind of method for monitoring application program, apparatus and system
EP3854033B1 (en) Packet capture via packet tagging
WO2019141089A1 (en) Network alarm method, device, system and terminal
WO2015026809A1 (en) Network management layer - configuration management
CN112956158A (en) Structured data plane monitoring
CN110071843B (en) Fault positioning method and device based on flow path analysis
CN109964450B (en) Method and device for determining shared risk link group
CN102792636A (en) Methods, apparatus and communication network for providing restoration survivability
CN101431435B (en) Connection-oriented service configuration and management method
CN115941442A (en) Business fault analysis method and device, electronic equipment and medium
CN116248479A (en) Network path detection method, device, equipment and storage medium
Fernández et al. Application of multi-pronged monitoring and intent-based networking to verticals in self-organising networks
KR101829881B1 (en) Flow management system, controller and method for detecting fault
US10432451B2 (en) Systems and methods for managing network health
US10904123B2 (en) Trace routing in virtual networks
CN101309492B (en) Method and apparatus for acquiring end-to-end circuit information in network management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination