CN114844768A - Information analysis method and device and electronic equipment - Google Patents

Information analysis method and device and electronic equipment Download PDF

Info

Publication number
CN114844768A
CN114844768A CN202210457613.2A CN202210457613A CN114844768A CN 114844768 A CN114844768 A CN 114844768A CN 202210457613 A CN202210457613 A CN 202210457613A CN 114844768 A CN114844768 A CN 114844768A
Authority
CN
China
Prior art keywords
fault
service
abnormal
determining
service node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210457613.2A
Other languages
Chinese (zh)
Inventor
冯鹏
欧阳晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yaxin Technology Co ltd
Original Assignee
Guangzhou Yaxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yaxin Technology Co ltd filed Critical Guangzhou Yaxin Technology Co ltd
Priority to CN202210457613.2A priority Critical patent/CN114844768A/en
Publication of CN114844768A publication Critical patent/CN114844768A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides an information analysis method and device, electronic equipment and a computer readable storage medium, and relates to the technical field of computers. The method comprises the following steps: detecting a first service fault, and determining an abnormal service node corresponding to the first service fault; acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; determining root cause service nodes in the abnormal service nodes according to the fault association coefficient; compared with the prior art that operation and maintenance personnel check service logs according to operation and maintenance experience to judge root cause service nodes, the method and the device for determining root cause service nodes determine the fault association coefficient according to the fault topology, further determine the root cause service nodes according to the fault association coefficient, improve the efficiency and accuracy of root cause service node determination, and save human resources.

Description

Information analysis method and device and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to an information analysis method, an information analysis device, an electronic device, and a computer-readable storage medium.
Background
With the development of technologies such as cloud computing, big data and micro-services, the structure of the business system is more complicated and the technical components are more diversified, so that the normal operation of the business is more affected when the business system fails. In the related art, once a business system fails, operation and maintenance personnel usually determine possible causes of common failures through years of operation and maintenance experience. However, there is a difference in the accuracy of fault location by different operation and maintenance personnel, and this method requires a large labor cost, and it takes a long time to locate the fault.
Disclosure of Invention
The purpose of the present application is to solve at least one of the above technical defects, especially the technical defects of low accuracy and labor cost of manual positioning of the fault cause.
According to an aspect of the present application, there is provided an information analysis method including:
detecting a first service fault, and determining an abnormal service node corresponding to the first service fault;
acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; the fault association coefficient is determined according to a fault topology corresponding to a first service fault, and the fault topology is determined according to the operation information of the abnormal service node;
and determining root cause service nodes in the abnormal service nodes according to the fault association coefficient.
Optionally, the method further includes:
acquiring alarm information of the root cause service node;
and determining target alarm information corresponding to the first service fault in the alarm information.
Optionally, the determining a fault association coefficient corresponding to each abnormal service node includes:
determining a corresponding fault topology of the first service fault according to the operation information of the abnormal service node; the fault topology comprises the position information of the abnormal service node on the call chain and the times of abnormity when the abnormal service node executes mutual call;
and determining the fault association coefficient according to the fault topology.
Optionally, the determining, according to the fault association coefficient, a root cause service node in the abnormal service node includes:
and sequencing the fault association coefficients, and determining the abnormal service node with the maximum fault association coefficient as the root cause service node.
Optionally, the obtaining operation information of the abnormal service node includes:
acquiring a service log of the first service;
preprocessing the service log to acquire the running information of the abnormal service node;
wherein the preprocessing comprises at least one of deleting redundant data, merging data and adding data.
Optionally, the operation information includes at least one of the following:
a service invocation duration; a service invocation success rate; the number of service calls.
According to another aspect of the present application, there is provided an information analysis apparatus including:
the first determining module is used for detecting a first service fault and determining an abnormal service node corresponding to the first service fault;
the second determining module is used for acquiring the operation information of the abnormal service nodes and determining the fault association coefficient corresponding to each abnormal service node; the fault association coefficient is determined according to a fault topology corresponding to a first service fault, and the fault topology is determined according to the operation information of the abnormal service node;
and the third determining module is used for determining a root cause service node in the abnormal service nodes according to the fault association coefficient.
Optionally, the second determining module is specifically configured to:
determining a corresponding fault topology of the first service fault according to the operation information of the abnormal service node; the fault topology comprises position information of the abnormal service node on the call chain and the times of abnormity of the abnormal service node when the abnormal service node executes mutual call;
and determining the fault association coefficient according to the fault topology.
According to another aspect of the present application, there is provided an electronic device including:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the information analysis method according to any one of the first aspects of the present application.
For example, in a third aspect of the present application, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the information analysis method as shown in the first aspect of the application.
According to yet another aspect of the present application, there is provided a computer readable storage medium, which when executed by a processor implements the information analysis method of any one of the first aspects of the present application.
For example, in a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the information analysis method shown in the first aspect of the present application.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided in the various alternative implementations of the first aspect described above.
The beneficial effect that technical scheme that this application provided brought is:
the method comprises the steps that through detection of a first service fault, an abnormal service node corresponding to the first service fault is determined; acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; determining root cause service nodes in the abnormal service nodes according to the fault association coefficient; compared with the prior art that operation and maintenance personnel check service logs according to operation and maintenance experience to judge root cause service nodes, the method and the device for determining root cause service nodes determine the fault association coefficient according to the fault topology, further determine the root cause service nodes according to the fault association coefficient, improve the efficiency and accuracy of root cause service node determination, and save human resources.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of an information analysis method according to an embodiment of the present disclosure;
fig. 2 is a schematic view of an application scenario of an information analysis method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of an information analysis method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of an information analysis method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of an information analysis method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an information analysis apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device for information analysis according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" can be implemented as "a", or as "B", or as "a and B".
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The application provides an information analysis method, an information analysis device, an electronic device and a computer-readable storage medium, which aim to solve the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The embodiment of the present application provides an information analysis method, where an execution subject of the method may be any electronic device, or may also be a device or a chip integrated on these devices, and for example, the method may be executed by a server or a terminal device having service information analysis, and optionally, the method may be executed by the server, for example, the server may be a server device applying various micro service architecture systems, for example, a computer device applying the micro service architecture systems, and the like.
As shown in fig. 1, which is a schematic flow chart of an information analysis method provided in an embodiment of the present application, the method may include:
step S101: and detecting a first service fault, and determining an abnormal service node corresponding to the first service fault.
Optionally, the embodiments of the present application may be applied to the field of computer technologies; for example, the method can be particularly applied to a service information analysis scene in a micro service system.
Specifically, the microservice (or microservice architecture) is a cloud-native architecture; in a microservice architecture, a single application consists of multiple loosely coupled and independently deployable smaller components or services. The services in the micro service architecture can be understood as service nodes, that is, the micro service architecture includes a plurality of service nodes.
By way of example, the micro service system of the present application may be a business processing system, wherein the business processing system may include a plurality of service nodes, for example, the service nodes may include an reimbursement processing node, an audit processing node, a leave processing node, and the like. Optionally, in an actual scene, there is mutual call between nodes in the micro service system; for example, when a user triggers an reimbursement service, according to an reimbursement service processing flow, an audit of the reimbursement service is required in the reimbursement service processing process; at this point, an audit processing node or the like may be invoked by the reimbursement processing node.
In the embodiment of the application, when a first service fault is detected in a micro service system, an abnormal service node corresponding to the first service fault can be determined.
Specifically, the first service failure may include any service failure; for example, in an actual scenario, the first service failure may include a failure that the corresponding service cannot be started through a trigger operation, a service processing result error, and the like.
In the embodiment of the application, the service processing system can be detected at a preset time point or a preset time period to determine the first service fault. In addition, when the user or the system automatically triggers the corresponding service processing, the first service failure can be started and determined because the corresponding service cannot be completed.
The abnormal service node is a node corresponding to the first traffic fault, for example, the abnormal service node may be a node with abnormal operation related to the first traffic fault.
Optionally, the operation parameters of each service node may be obtained through a service log (such as a Trace log) of the service processing system, and an abnormal service node in each service node is determined according to whether the operation parameters satisfy a preset threshold condition. The relevant operation parameters may include parameters such as service node call relationship between service nodes, service call duration, service call success rate, service call times, and call state.
For example, the threshold range of the preset service call duration is less than or equal to 2 milliseconds; when the calling duration of the service node is more than 2 milliseconds, the service node can be considered as an abnormal service node. As another example, the threshold range of the preset service invocation success rate is greater than or equal to 90%; when the service invocation success rate of the service node is 70%, the service node can be considered as an abnormal service node.
Step S102: and acquiring the operation information of the abnormal service nodes, and determining the fault association coefficient corresponding to each abnormal service node.
The fault association coefficient is determined according to a fault topology corresponding to the first service fault, and the fault topology is determined according to the operation information of the abnormal service node.
Specifically, the operation information may include operation parameters of the service nodes, where the operation parameters may include parameters of service node call relationships between the service nodes, service call durations, service call success rates, service call times, call states, and the like.
The fault association coefficient may be used to characterize a degree of association of the abnormal service node with the first traffic fault. Optionally, in this embodiment of the present application, the fault association coefficient may be determined according to a fault topology corresponding to the first service fault.
The fault topology may be a topology between the abnormal service nodes generated according to the operation information of the abnormal service nodes. Optionally, the fault topology may be a tree structure, that is, the fault topology may be a fault tree; the fault tree may reflect the calling relationship between the abnormal service nodes and the number of times that the abnormal service nodes are abnormal when performing the mutual calling.
As an example, in the fault tree shown in fig. 2, a call relationship exists between abnormal service nodes having a connection relationship; for example, a calling relationship exists between the head node and the child node 1, that is, the head node calls the child node 1, and a calling relationship exists between the child node 1 and the child node r-003, that is, the child node 1 calls the child node r-003, and a calling relationship exists between the child node 1 and the child node r-004, that is, the child node 1 calls the child node r-004. It can be understood that a call chain is formed among a plurality of abnormal service nodes with call relations; for example, the head node, child node 1, child node r-003, child node-003, and end node form a call chain; as another example, the head node, child node 1, child node r-004, and child node 003 form another call chain. In addition, the abnormal service nodes in the fault tree are also marked with the abnormal times of the nodes; for example, the number of times of occurrence of an abnormality of child node 1 is 102; the number of times of abnormality occurrence of the child node r-003 is 33, and so on.
It can be understood that, in the embodiment of the present application, the service node call relationship may be obtained according to the operation information of the abnormal service node; a service invocation duration; a service invocation success rate; and determining the fault topology according to information such as service calling times and the like. And then determining a fault association coefficient corresponding to each abnormal service node according to the corresponding fault topology of the first service fault.
Optionally, the fault association coefficient corresponding to each abnormal service node may be determined according to a position order of the abnormal service node in the call chain and the number of times of occurrence of an abnormality when the abnormal service node performs mutual call.
Optionally, the position order of the abnormal service node in the call chain and the number of times of abnormality occurrence when the abnormal service node performs mutual call may be subjected to weighting operation, and the comprehensive weight of the weighting operation is used as the fault association coefficient corresponding to the abnormal service node. For example, the weight values corresponding to the position order and the number of times of occurrence of the abnormality may be predetermined, for example, the weight value corresponding to the position order is 60%; the number of times of occurrence of the abnormality corresponds to a weight value of 40%, and so on. As an example, as shown in fig. 2, in the call chain formed by the head node, child node 1, child node r-004, and child node 003, the order of the position of the head node is 1, and the number of times of occurrence of an anomaly is 20; then, the integrated weight corresponding to the head node is 1 × 60% +20 × 40% — 8.6. For another example, the position order of the child node 1 is 2, and the number of times of occurrence of an exception is 102; then, the total weight of the child node 1 is 2 × 60% +102 × 40% ═ 42.
Step S103: and determining root cause service nodes in the abnormal service nodes according to the fault association coefficient.
Specifically, the root cause service node may be understood as a node causing the first traffic fault to occur, that is, the first traffic fault may occur due to an abnormality of the root cause service node.
It can be understood that the greater the degree of association of the abnormal service node with the first traffic fault, the greater the probability that the abnormal service node is the root service node. That is, the larger the failure correlation coefficient is, the higher the possibility that the abnormal service node is a root cause service node is.
Optionally, in the embodiment of the present application, the fault association coefficients corresponding to the abnormal service nodes may be sorted from large to small, and the node corresponding to the largest fault association coefficient is determined as the root cause service node; in addition, the root cause service nodes with the preset number can be determined according to the fault association coefficient sequence. For example, the first 3 ranked 3 exceptional service nodes may be determined to be root cause service nodes, and so on.
The method comprises the steps that through detection of a first service fault, an abnormal service node corresponding to the first service fault is determined; acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; determining root cause service nodes in the abnormal service nodes according to the fault association coefficient; compared with the prior art that operation and maintenance personnel check service logs according to operation and maintenance experience to judge root cause service nodes, the method and the device for determining root cause service nodes determine the fault association coefficient according to the fault topology, further determine the root cause service nodes according to the fault association coefficient, improve the efficiency and accuracy of root cause service node determination, and save human resources.
In another embodiment of the present application, the obtaining operation information of the abnormal service node includes:
acquiring a service log of the first service;
preprocessing the service log to acquire the running information of the abnormal service node;
wherein the preprocessing comprises at least one of deleting redundant data, merging data and adding data.
In this embodiment of the application, the operation information of the abnormal service node may be obtained from a service log (service log, for example, Trace log) of a service processing system. For example, the service log may be preprocessed by data cleaning, and the operation information of the abnormal service node may be obtained. Wherein the preprocessing comprises at least one of deleting redundant data, merging data and adding data.
In another embodiment of the present application, the method further comprises:
acquiring alarm information of the root cause service node;
and determining target alarm information corresponding to the first service fault in the alarm information.
Optionally, in this embodiment of the present application, after determining the root cause service node, target alarm information that causes the first service fault may be determined from the alarm information of the root cause service node.
Wherein, the target alarm information may include software alarm information and hardware alarm information; software alarm information such as process interrupt alarm information, network interrupt alarm information, and the like; hardware alarm information such as alarm information of insufficient hard disk space, alarm information of CPU fault, etc.
When the target alarm information is determined, the target alarm information can be determined from the alarm information according to a first association relationship between the time generated by each alarm information and the time of occurrence of the first service fault and a second association relationship between the type of the alarm information and the first service fault.
As an example, as shown in fig. 3, in the embodiment of the present application, after obtaining a Trace log, data cleaning may be performed on log data; then determining abnormal service nodes according to the operation information of each service node, namely calling chain abnormal node aggregation; then, determining a fault tree, namely a fault topology of the embodiment of the application, according to the operation information of the abnormal service node; then determining the fault contribution rate of each abnormal service node according to the fault topology, namely a fault association coefficient, and ranking the fault contribution rate; furthermore, the root cause service node can be determined by combining other resource topologies (the relationship between service nodes); and finally, determining target alarm information, namely an alarm root factor, in the alarm information of the root factor service node.
In another embodiment of the present application, the determining a fault association coefficient corresponding to each abnormal service node includes:
determining a corresponding fault topology of the first service fault according to the operation information of the abnormal service node; the fault topology comprises the position information of the abnormal service node on the call chain and the times of abnormity when the abnormal service node executes mutual call;
and determining the fault association coefficient according to the fault topology.
Optionally, in an embodiment of the present application, the operation information includes at least one of the following:
a service invocation duration; a service invocation success rate; the number of service calls.
The fault topology can be a topology between abnormal service nodes generated according to the operation information of the abnormal service nodes; optionally, the fault topology may be a tree structure, that is, the fault topology may be a fault tree; the fault tree may reflect the calling relationship between the abnormal service nodes and the number of times that the abnormal service nodes are abnormal when performing the mutual calling.
As an example, as shown in fig. 2, in the fault tree in the diagram, a call relationship exists between abnormal service nodes having a connection relationship; for example, a calling relationship exists between the head node and the child node 1, that is, the head node calls the child node 1, and a calling relationship exists between the child node 1 and the child node r-003, that is, the child node 1 calls the child node r-003, and a calling relationship exists between the child node 1 and the child node r-004, that is, the child node 1 calls the child node r-004. It can be understood that a call chain is formed among a plurality of abnormal service nodes with call relations; for example, the head node, child node 1, child node r-003, child node-003, and end node form a call chain; as another example, the head node, child node 1, child node r-004, and child node 003 form another call chain. In addition, the abnormal service nodes in the fault tree are also marked with the abnormal times of the nodes; for example, the number of times of occurrence of an abnormality of child node 1 is 102; the number of times of abnormality occurrence of the child node r-003 is 33, and so on.
It can be understood that, in the embodiment of the present application, the service node call relationship may be obtained according to the operation information of the abnormal service node; a service invocation duration; a service invocation success rate; and determining the fault topology according to information such as service calling times and the like. And then determining a fault association coefficient corresponding to each abnormal service node according to the corresponding fault topology of the first service fault.
Optionally, the position order of the abnormal service node in the call chain and the number of times of abnormality occurrence when the abnormal service node performs mutual call may be subjected to weighting operation, and the comprehensive weight of the weighting operation is used as the fault association coefficient corresponding to the abnormal service node. For example, the weight values corresponding to the position order and the number of times of occurrence of the abnormality may be predetermined, for example, the weight value corresponding to the position order is 60%; the number of times of occurrence of the abnormality corresponds to a weight value of 40%, and so on. As an example, as shown in fig. 2, in the call chain formed by the head node, child node 1, child node r-004, and child node 003, the order of the position of the head node is 1, and the number of times of occurrence of an anomaly is 20; then, the integrated weight corresponding to the head node is 1 × 60% +20 × 40% — 8.6. For another example, the position order of the child node 1 is 2, and the number of times of occurrence of an exception is 102; then, the total weight of the child node 1 is 2 × 60% +102 × 40% ═ 42.
As an example, as shown in fig. 4, in the embodiment of the present application, after obtaining a Trace log, data cleaning may be performed on log data; then marking abnormal service nodes according to the operation information (service gold index) of each service node; in addition, in the embodiment of the application, suspected abnormal service nodes can be marked by combining the calling relationship (calling service map) among the service nodes and other resource topologies (relationship among the service nodes); and determining an abnormal node set according to the abnormal service node and the suspected abnormal service node.
In addition, as shown in fig. 5, in some embodiments of the present application, the log data may further include historical log data, i.e., offline data access shown in fig. 5; the learning model can be trained through historical log data, and then abnormal service nodes are obtained through the prediction of the trained learning model.
In another embodiment of the present application, the determining a root cause service node in the abnormal service nodes according to the fault association coefficient includes:
and sequencing the fault association coefficients, and determining the abnormal service node with the maximum fault association coefficient as the root cause service node.
It can be understood that the greater the degree of association of the abnormal service node with the first traffic fault, the greater the probability that the abnormal service node is the root service node. That is, the larger the failure correlation coefficient is, the higher the possibility that the abnormal service node is a root cause service node is.
Optionally, in this embodiment of the present application, the fault association coefficients corresponding to the abnormal service nodes may be sorted from large to small, and the node corresponding to the largest fault association coefficient is determined as the root cause service node.
In addition, in another embodiment of the present application, a preset number of root cause service nodes may also be determined according to the fault association coefficient ranking. For example, the first 3 ranked 3 exceptional service nodes may be determined to be root cause service nodes, and so on.
The method comprises the steps that through detection of a first service fault, an abnormal service node corresponding to the first service fault is determined; acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; determining root cause service nodes in the abnormal service nodes according to the fault association coefficient; compared with the prior art that operation and maintenance personnel check service logs according to operation and maintenance experience to judge root cause service nodes, the method and the device for determining root cause service nodes determine the fault association coefficient according to the fault topology, further determine the root cause service nodes according to the fault association coefficient, improve the efficiency and accuracy of root cause service node determination, and save human resources.
An embodiment of the present application provides an information analysis apparatus, and as shown in fig. 6, the information analysis apparatus 60 may include: a first determination module 601, a second determination module 602, and a third determination module 603, wherein,
a first determining module 601, configured to detect a first service failure, and determine an abnormal service node corresponding to the first service failure;
a second determining module 602, configured to obtain operation information of the abnormal service node, and determine a fault association coefficient corresponding to each abnormal service node; the fault association coefficient is determined according to a fault topology corresponding to a first service fault, and the fault topology is determined according to the operation information of the abnormal service node;
a third determining module 603, configured to determine a root cause service node in the abnormal service nodes according to the fault association coefficient.
In another embodiment of the present application, the apparatus further comprises:
the fourth determining module is used for acquiring the alarm information of the root cause service node;
and determining target alarm information corresponding to the first service fault in the alarm information.
In another embodiment of the present application, the second determining module is specifically configured to:
determining a corresponding fault topology of the first service fault according to the operation information of the abnormal service node; the fault topology comprises the position information of the abnormal service node on the call chain and the times of abnormity when the abnormal service node executes mutual call;
and determining the fault association coefficient according to the fault topology.
In another embodiment of the present application, the third determining module is specifically configured to:
and sequencing the fault association coefficients, and determining the abnormal service node with the maximum fault association coefficient as the root cause service node.
In another embodiment of the present application, the second determining module is specifically configured to:
acquiring a service log of the first service;
preprocessing the service log to acquire the running information of the abnormal service node;
wherein the preprocessing comprises at least one of deleting redundant data, merging data and adding data.
In another embodiment of the present application, the operational information includes at least one of:
a service invocation duration; a service invocation success rate; the number of service calls.
The apparatus of the embodiment of the present application may execute the method provided by the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus of the embodiments of the present application correspond to the steps in the method of the embodiments of the present application, and for the detailed functional description of the modules of the apparatus, reference may be specifically made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
The method comprises the steps that through detection of a first service fault, an abnormal service node corresponding to the first service fault is determined; acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; determining root cause service nodes in the abnormal service nodes according to the fault association coefficient; compared with the prior art that operation and maintenance personnel check service logs according to operation and maintenance experience to judge root cause service nodes, the method and the device for determining root cause service nodes determine the fault association coefficient according to the fault topology, further determine the root cause service nodes according to the fault association coefficient, improve the efficiency and accuracy of root cause service node determination, and save human resources.
An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the method comprises the steps that through detection of a first service fault, an abnormal service node corresponding to the first service fault is determined; acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; determining root cause service nodes in the abnormal service nodes according to the fault association coefficient; compared with the prior art that operation and maintenance personnel check service logs according to operation and maintenance experience to judge root cause service nodes, the method and the device for determining root cause service nodes determine the fault association coefficient according to the fault topology, further determine the root cause service nodes according to the fault association coefficient, improve the efficiency and accuracy of root cause service node determination, and save human resources.
In an alternative embodiment, an electronic device is provided, as shown in fig. 7, the electronic device 4000 shown in fig. 7 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (field programmable Gate Array) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The memory 4003 is used for storing application program codes (computer programs) for executing the present scheme, and is controlled by the processor 4001 to execute. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.
Among them, electronic devices include but are not limited to: mobile phones, notebook computers, multimedia players, desktop computers, and the like.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments.
The method comprises the steps that through detection of a first service fault, an abnormal service node corresponding to the first service fault is determined; acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; determining root cause service nodes in the abnormal service nodes according to the fault association coefficient; compared with the prior art that operation and maintenance personnel check service logs according to operation and maintenance experience to judge root cause service nodes, the method and the device for determining root cause service nodes determine the fault association coefficient according to the fault topology, further determine the root cause service nodes according to the fault association coefficient, improve the efficiency and accuracy of root cause service node determination, and save human resources.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in this application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of this application are also within the protection scope of the embodiments of this application without departing from the technical idea of this application.

Claims (10)

1. An information analysis method, comprising:
detecting a first service fault, and determining an abnormal service node corresponding to the first service fault;
acquiring operation information of the abnormal service nodes, and determining a fault association coefficient corresponding to each abnormal service node; the fault association coefficient is determined according to a fault topology corresponding to a first service fault, and the fault topology is determined according to the operation information of the abnormal service node;
and determining root cause service nodes in the abnormal service nodes according to the fault association coefficient.
2. The information analysis method of claim 1, further comprising:
acquiring alarm information of the root cause service node;
and determining target alarm information corresponding to the first service fault in the alarm information.
3. The information analysis method according to claim 1, wherein the determining the fault association coefficient corresponding to each abnormal service node comprises:
determining a corresponding fault topology of the first service fault according to the operation information of the abnormal service node; the fault topology comprises the position information of the abnormal service node on the call chain and the times of abnormity when the abnormal service node executes mutual call;
and determining the fault association coefficient according to the fault topology.
4. The information analysis method according to claim 1, wherein the determining a root cause service node among the abnormal service nodes according to the fault association coefficient includes:
and sequencing the fault association coefficients, and determining the abnormal service node with the maximum fault association coefficient as the root cause service node.
5. The information analysis method according to claim 1, wherein the obtaining the operation information of the abnormal service node includes:
acquiring a service log of the first service;
preprocessing the service log to acquire the running information of the abnormal service node;
wherein the preprocessing comprises at least one of deleting redundant data, merging data and adding data.
6. The information analysis method according to any one of claims 1 to 5, wherein the operation information includes at least one of:
the service node calls the relation; a service invocation duration; a service invocation success rate; the number of service calls.
7. An information analysis apparatus, characterized by comprising:
the first determining module is used for detecting a first service fault and determining an abnormal service node corresponding to the first service fault;
the second determining module is used for acquiring the operation information of the abnormal service nodes and determining the fault association coefficient corresponding to each abnormal service node; the fault association coefficient is determined according to a fault topology corresponding to a first service fault, and the fault topology is determined according to the operation information of the abnormal service node;
and the third determining module is used for determining a root cause service node in the abnormal service nodes according to the fault association coefficient.
8. The information analysis device according to claim 1, wherein the second determination module is specifically configured to:
determining a corresponding fault topology of the first service fault according to the operation information of the abnormal service node; the fault topology comprises the position information of the abnormal service node on the call chain and the times of abnormity when the abnormal service node executes mutual call;
and determining the fault association coefficient according to the fault topology.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the information analysis method according to any one of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the information analysis method according to any one of claims 1 to 6.
CN202210457613.2A 2022-04-27 2022-04-27 Information analysis method and device and electronic equipment Pending CN114844768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210457613.2A CN114844768A (en) 2022-04-27 2022-04-27 Information analysis method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210457613.2A CN114844768A (en) 2022-04-27 2022-04-27 Information analysis method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114844768A true CN114844768A (en) 2022-08-02

Family

ID=82567717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210457613.2A Pending CN114844768A (en) 2022-04-27 2022-04-27 Information analysis method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114844768A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115016976A (en) * 2022-08-08 2022-09-06 深圳壹师城科技有限公司 Root cause positioning method, device, equipment and storage medium
CN115396296A (en) * 2022-08-18 2022-11-25 中电金信软件有限公司 Service processing method and device, electronic equipment and computer readable storage medium
CN115514617A (en) * 2022-09-13 2022-12-23 上海驻云信息科技有限公司 Universal abnormal root cause positioning and analyzing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253350A1 (en) * 2015-11-03 2018-09-06 Alibaba Group Holding Limited Monitoring node usage in a distributed system
CN108833184A (en) * 2018-06-29 2018-11-16 腾讯科技(深圳)有限公司 Service fault localization method, device, computer equipment and storage medium
CN110166264A (en) * 2018-02-11 2019-08-23 北京三快在线科技有限公司 A kind of Fault Locating Method, device and electronic equipment
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system
CN114202206A (en) * 2021-12-14 2022-03-18 中国工商银行股份有限公司 System abnormal root cause analysis method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253350A1 (en) * 2015-11-03 2018-09-06 Alibaba Group Holding Limited Monitoring node usage in a distributed system
CN110166264A (en) * 2018-02-11 2019-08-23 北京三快在线科技有限公司 A kind of Fault Locating Method, device and electronic equipment
CN108833184A (en) * 2018-06-29 2018-11-16 腾讯科技(深圳)有限公司 Service fault localization method, device, computer equipment and storage medium
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN114202206A (en) * 2021-12-14 2022-03-18 中国工商银行股份有限公司 System abnormal root cause analysis method and device
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115016976A (en) * 2022-08-08 2022-09-06 深圳壹师城科技有限公司 Root cause positioning method, device, equipment and storage medium
CN115016976B (en) * 2022-08-08 2022-11-25 深圳壹师城科技有限公司 Root cause positioning method, device, equipment and storage medium
CN115396296A (en) * 2022-08-18 2022-11-25 中电金信软件有限公司 Service processing method and device, electronic equipment and computer readable storage medium
CN115396296B (en) * 2022-08-18 2023-06-27 中电金信软件有限公司 Service processing method, device, electronic equipment and computer readable storage medium
CN115514617A (en) * 2022-09-13 2022-12-23 上海驻云信息科技有限公司 Universal abnormal root cause positioning and analyzing method and device

Similar Documents

Publication Publication Date Title
CN112162878B (en) Database fault discovery method and device, electronic equipment and storage medium
CN114844768A (en) Information analysis method and device and electronic equipment
US9122784B2 (en) Isolation of problems in a virtual environment
CN114095567B (en) Data access request processing method and device, computer equipment and medium
US9811447B2 (en) Generating a fingerprint representing a response of an application to a simulation of a fault of an external service
CN111913824A (en) Method for determining data link fault reason and related equipment
CN115858311A (en) Operation and maintenance monitoring method and device, electronic equipment and readable storage medium
CN113656252B (en) Fault positioning method, device, electronic equipment and storage medium
CN111159029A (en) Automatic testing method and device, electronic equipment and computer readable storage medium
CN110543462A (en) Microservice reliability prediction method, prediction device, electronic device, and storage medium
CN112087320A (en) Abnormity positioning method and device, electronic equipment and readable storage medium
CN115587017A (en) Data processing method and device, electronic equipment and storage medium
CN115113528A (en) Operation control method, device, equipment and medium of neural network model
CN114546799A (en) Point burying log checking method and device, electronic equipment, storage medium and product
CN113934595A (en) Data analysis method and system, storage medium and electronic terminal
CN113360342A (en) Method and equipment for monitoring service function operating environment
CN112187527A (en) Micro-service abnormity positioning method and device, electronic equipment and readable storage medium
CN116820826B (en) Root cause positioning method, device, equipment and storage medium based on call chain
CN116915463B (en) Call chain data security analysis method, device, equipment and storage medium
CN117971640A (en) Method, device, equipment and medium for testing embedded software
CN117950891A (en) Business exception processing method and device, electronic equipment and storage medium
CN114327991A (en) Data processing method and device, electronic equipment and storage medium
CN117493050A (en) Pinpoint-based fault positioning method and Pinpoint-based fault positioning system
CN116155688A (en) Link fault detection method, device, equipment and medium
CN114253846A (en) Automatic test exception positioning method, device and equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination