CN110677304A - Distributed problem tracking system and equipment - Google Patents

Distributed problem tracking system and equipment Download PDF

Info

Publication number
CN110677304A
CN110677304A CN201910963441.4A CN201910963441A CN110677304A CN 110677304 A CN110677304 A CN 110677304A CN 201910963441 A CN201910963441 A CN 201910963441A CN 110677304 A CN110677304 A CN 110677304A
Authority
CN
China
Prior art keywords
module
node
abnormal
information
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910963441.4A
Other languages
Chinese (zh)
Inventor
严加乔
鲜强
陈光尧
谢睿
徐志坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Quwan Network Technology Co Ltd
Original Assignee
Guangzhou Quwan Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Quwan Network Technology Co Ltd filed Critical Guangzhou Quwan Network Technology Co Ltd
Priority to CN201910963441.4A priority Critical patent/CN110677304A/en
Publication of CN110677304A publication Critical patent/CN110677304A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/042Network management architectures or arrangements comprising distributed management centres cooperatively managing the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The application discloses distributed problem tracking system and equipment includes: the device comprises an acquisition module, an alarm module and a positioning module; the acquisition module is used for acquiring data of nodes in the distributed system, classifying and storing the acquired data; the alarm module is used for alarming the abnormal index data acquired in the acquisition module and the corresponding abnormal index event and generating alarm information; and the positioning module is used for positioning the abnormal node, the user ID and the context and log information corresponding to the node according to the alarm information. Abnormal nodes in the distributed system can be quickly tracked and positioned, and the problems that the manual log scanning mode is low in efficiency and easy to misjudge are avoided.

Description

Distributed problem tracking system and equipment
Technical Field
The present application relates to the field of distributed system technologies, and in particular, to a distributed problem tracking system and device.
Background
The micro-service architecture is a service architecture which is mainstream at present, and has good effects on solving service decoupling and rapid development.
However, micro-service has a problem that all developers are very painful, and the problem of how to quickly locate micro-service is that there are many nodes of micro-service, the session path is not fixed, and the manual log scanning mode is inefficient and is prone to misjudgment.
Meanwhile, with the development of services, service logics become more and more complex, and the included alarm information may include excessive alarm information such as core alarm information, non-core alarm information, similar alarm information, associated alarm information and the like, so that developers cannot effectively utilize the alarm information to judge the severity of problems.
Disclosure of Invention
The embodiment of the application provides a distributed problem tracking system and equipment, so that abnormal nodes in the distributed system can be quickly tracked and positioned, and the problems that the manual log scanning mode is low in efficiency and misjudgment is easy are avoided.
In view of the above, a first aspect of the present application provides a distributed problem tracking system, the system comprising: the device comprises an acquisition module, an alarm module and a positioning module;
the acquisition module is used for acquiring data of nodes in the distributed system, classifying and storing the acquired data;
the alarm module is used for alarming the abnormal index data acquired in the acquisition module and the corresponding abnormal index event and generating alarm information;
and the positioning module is used for positioning the problem information and the node information of the abnormal node according to the alarm information.
Preferably, the acquisition module comprises a sub-acquisition module, a classification module and a storage module;
the sub-acquisition module is used for acquiring data of each node in the distributed system;
the classification module is used for classifying the data of each node, and the classification is as follows: system indexes, service indexes and node logs;
the storage module is used for storing the acquired data of each node at regular time.
Preferably, the system indicators include: CPU utilization rate, residual memory size, network rate and disk IO of the node;
the service index is a self-defined index of a service system, and comprises the following steps: a business service error code event index, a consumption event index;
the node logs comprise log tracking identification TraceID of service call when the current log occurs in each node log.
Preferably, the service indicator includes information that: node name, IP address, user unique ID, TraceID, error code and time.
Preferably, the acquisition module comprises a plurality of sub-acquisition modules;
the sub-acquisition modules are distributed in each node and used for transmitting the data acquired from each node to the acquisition module.
Preferably, the alarm module comprises an aggregation module, a priority division module and an alarm information generation module;
the aggregation module is used for aggregating the abnormal index data and the abnormal index events according to the nodes and the time of the abnormal index events;
the priority dividing module is used for carrying out priority division on the abnormal index events;
and the alarm information generation module is used for sending alarm information to corresponding operation and maintenance personnel according to the priority of the abnormal index event and the node information.
Preferably, the aggregation module is configured to aggregate the same abnormal index data and abnormal index events at different time points on the same node.
Preferably, the prioritization module is configured to prioritize the abnormal index data and the abnormal index event according to the user experience.
Preferably, the positioning module is further configured to retrieve, according to the transit ID carried by the alarm information, a node where the abnormal indicator event is located and log information and a user ID corresponding to the node.
A second aspect of the present application provides a distributed problem tracking device, the device comprising: a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the solution in the distributed problem tracking system of the first aspect according to instructions in the program code.
According to the technical scheme, the embodiment of the application has the following advantages: the application provides a distributed problem tracking system, including: the device comprises an acquisition module, an alarm module and a positioning module; the acquisition module is used for acquiring data of nodes in the distributed system, classifying and storing the acquired data; the alarm module is used for alarming the abnormal index data acquired in the acquisition module and the corresponding abnormal index event and generating alarm information; and the positioning module is used for positioning the problem information and the node information of the abnormal node according to the alarm information.
In the embodiment of the application, the data of each node is collected and classified through the collection module, the classified abnormal index data generate the alarm signal, the information of the abnormal node is positioned according to the alarm signal, the abnormal node in the distributed system can be quickly tracked and positioned, and the condition that the manual log scanning mode is low in efficiency and easy to misjudge is avoided.
Drawings
FIG. 1 is a system architecture diagram of one embodiment of a distributed problem tracking system of the present application;
FIG. 2 is a system architecture diagram of another embodiment of a distributed problem tracking system of the present application;
FIG. 3 is a system architecture diagram of an embodiment of a distributed problem tracking system according to the present application.
Detailed Description
According to the method and the device, the data of each node are collected and classified through the collection module, the classified abnormal index data generate the alarm signal, and therefore the information of the abnormal node is positioned according to the alarm signal, the abnormal node in the distributed system can be quickly tracked and positioned, and the problem that the manual log scanning mode is low in efficiency and easy to misjudge is avoided.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be understood that the present application is applied to a distributed problem tracking system, please refer to fig. 1, fig. 1 is a system architecture diagram of an embodiment of a distributed problem tracking system of the present application, as shown in fig. 1, fig. 1 includes: the system comprises an acquisition module 101, an alarm module 102 and a positioning module 103;
the acquisition module 101 is configured to perform data acquisition on nodes in the distributed system, and classify and store the acquired data.
It should be noted that the acquisition module may be provided with a corresponding sub-acquisition module or other corresponding monitoring devices at each node, and configured to transmit acquired information to the acquisition module in real time; because the collected information may be different kinds of information, such as CPU utilization, residual memory size, service type information, or node location information, it is necessary to classify various information; and meanwhile, the classified data is stored, so that the problems of the nodes can be quickly positioned according to the data of the nodes.
The alarm module 102 is configured to alarm the abnormal index data and the corresponding abnormal index event acquired by the acquisition module 101, and generate alarm information.
It should be noted that the alarm module may analyze the abnormal data to obtain a problem corresponding to the abnormal data, and generate a corresponding alarm signal to the corresponding operation and maintenance personnel, so that the operation and maintenance personnel can quickly determine the problem occurring in the node.
The positioning module 103 is configured to position the problem information and the node information of the abnormal node according to the alarm information.
It should be noted that the positioning module may determine the problem information of the abnormal node according to the content of the warning information, and position the abnormal node according to the identifier carried by the warning information.
The distributed problem tracking system collects data of all nodes through the collection module and classifies the data, abnormal index data after classification generate alarm signals, information of the abnormal nodes is located according to the alarm signals, the abnormal nodes in the distributed system can be quickly tracked and located, and the problem that manual log scanning is inefficient and misjudgment is easy is avoided.
In the above, referring to fig. 2 for facilitating further understanding of the solution, fig. 2 is a system architecture diagram of another embodiment of a distributed problem tracking system according to the embodiment of the present application, as shown in fig. 2, specifically: the system comprises an acquisition module 201, an alarm module 202 and a positioning module 203;
the acquisition module 201 is configured to acquire data of nodes in the distributed system, classify and store the acquired data.
It should be noted that the acquisition module may be provided with a corresponding sub-acquisition module or other corresponding monitoring devices at each node, and configured to transmit acquired information to the acquisition module in real time; because the collected information may be different kinds of information, such as CPU utilization, residual memory size, service type information, or node location information, it is necessary to classify various information; and meanwhile, the classified data is stored, so that the problems of the nodes can be quickly positioned according to the data of the nodes.
In a real-time manner, the acquisition module 201 further includes a sub-acquisition module 2011, a classification module 2012, and a storage module 2013.
The sub-collection module 2011 is configured to collect data of each node in the distributed system.
It should be noted that the SDK of the acquisition module may be embedded in each node to form a plurality of sub-acquisition modules, and the sub-acquisition modules output monitoring data for each node. In a specific data acquisition process, the acquisition module can acquire monitoring data of nodes from each node at regular time; however, if the data life cycle is less than the monitoring index of 10s, in order to avoid the loss of the index, the sub-acquisition module in the node needs to actively report to the acquisition module.
The classifying module 2012 is configured to classify the data of each node into: system metrics, traffic metrics, and node logs.
It should be noted that, the collected information may be different types of data, so the collected data may be divided into a system index, a service index, and a node log. In an embodiment of the vehicle, the system metrics include: general indexes of the CPU utilization rate, the size of residual memory, the network rate, disk IO and the like of the node; the service index is a self-defined index of a service system, and comprises the following steps: a service error code event index, a consumption event index such as a consumption amount or a number of people at a certain time point, etc.; the node logs include a log tracking identification TraceID for the service call in each node log when the current log occurs.
It should also be noted that each index data, such as system index, service index and node log, includes corresponding node information and behavior information and log tracking TraceID. For example, the service error code event indicators include: node name, IP address, user unique ID, TraceID, error code and time, 6 information dimensions.
The storage module 2013 is configured to store the acquired data of each node at regular time.
It should be noted that the storage module is configured to store the data of each node acquired by the acquisition module, so that when a specific node position and a corresponding problem need to be located, the data can be stored for query.
The alarm module 202 is configured to alarm the abnormal index data and the corresponding abnormal index event collected in the collection module 101, and generate alarm information.
It should be noted that the alarm module may analyze the abnormal data to obtain a problem corresponding to the abnormal data, and generate a corresponding alarm signal to the corresponding operation and maintenance personnel, so that the operation and maintenance personnel can quickly determine the problem occurring in the node.
In a specific real-time manner, the alarm module 202 includes an aggregation module 2021, a prioritization module 2022, and an alarm information generation module 2023.
The aggregation module 2021 is configured to aggregate the abnormal index data and the abnormal index event according to the node where the abnormal index event occurs and the time.
It should be noted that, in an embodiment, the aggregation module may be configured to aggregate the same abnormal index data and abnormal index events at different time points on the same node. It is also possible to aggregate the alarm information into a group according to the node name and time, for example, to aggregate a large amount of the same alarm information remaining in a short time, so as to avoid a large amount of repeated alarm information generated by the same node in a short time (which may be several minutes, for example, 5 minutes). And then, aggregation is carried out according to TraceID (link tracking), so that the same service call is avoided, and the same alarm information is generated on a plurality of nodes. After two times of abnormal index event aggregation, the number of alarm information is minimized, and key information is completely not lost.
The prioritization module 2022 is configured to prioritize the anomaly index events.
It should be noted that, in a specific embodiment, the prioritization module may be configured to prioritize the abnormal index data and the abnormal index event according to the user experience. For example, an anomaly indicator event involving very poor consumer or other user experience is prioritized.
The alarm information generating module 2023 is configured to send alarm information to corresponding operation and maintenance staff according to the priority of the abnormal index event and the node information.
It should be noted that for an abnormal index event with a higher priority, the higher the priority of the alarm, the more urgent the alarm needs to be processed, after the abnormal index event needing to be processed urgently is obtained, the alarm information is sent to the operation and maintenance personnel corresponding to the node according to the node information, and the operation and maintenance personnel can determine the specific node position and the corresponding abnormal index event according to the alarm information. For example, when "the CPU utilization rate is more than 80%" is used as the most urgent abnormal index event, the operation and maintenance personnel can be notified by selecting the WeChat surrounding the short message alarm.
The positioning module 203 is configured to position the problem information and the node information of the abnormal node according to the alarm information.
It should be noted that the positioning module may determine the abnormal index event of the abnormal node according to the content of the alarm information, and position the abnormal node according to the identifier carried by the alarm information. In a specific implementation manner, the positioning module is further configured to retrieve, according to the transit ID carried by the alarm information, the node where the abnormal indicator event is located and the log information and the user ID corresponding to the node. Node information and abnormal index events of the nodes can be queried by searching data of each node in the storage module, for example, corresponding dimension information can be input through a WEB query page provided by a positioning module, and nodes, detailed logs, corresponding user IDs and the like related to the alarm information can be queried. For example, after receiving an abnormal index event of "a certain interface service is abnormal", it can be retrieved which node the service error occurred at. Which user is triggered, the corresponding context of the node, detailed log records and other information, which can enable a processor to quickly locate the problem without searching the relevant information step by step in a log in a large amount like the tobacco from the node 1.
According to the method and the device, through the aggregation and priority processing mode, the alarm information is aggregated, redundancy is filtered, some related useless alarm information is notified of the alarm information of the core problem, developers can judge the severity of the problem, meanwhile, the developers can make the session id of the alarm information with abnormal conditions, and the developers can quickly locate the node and reason of the problem according to the session id. And this application can provide detailed problem information for the fortune dimension personnel, helps fortune dimension personnel fix a position fast, improves system stability to let fortune dimension personnel can monitor each item index of entire system, master the health condition of system operation.
In addition, the present application also provides a system architecture diagram of an application embodiment of a specific distributed problem tracking system, as shown in fig. 3, the system architecture diagram includes a plurality of nodes, each node includes a sub-acquisition module, the sub-acquisition module sends acquired information to the acquisition module, on one hand, the acquisition module stores the acquired information in a storage module, and is used for providing positioning information required by the positioning module for problem positioning and node positioning; on one hand, when abnormal data occurs, the alarm module detects corresponding abnormal conditions, and sends WeChat or short messages to the operation and maintenance personnel, and the operation and maintenance personnel position the corresponding node position and the information of the abnormal index event through the positioning module.
The foregoing is an embodiment of the system of the present application, which in fact provides an embodiment of a distributed problem tracking device, the device comprising a processor and a memory: the memory is used for storing the program codes and transmitting the program codes to the processor; the processor is configured to execute the solution in the distributed problem tracking system provided by the above embodiments according to instructions in the program code.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A distributed problem tracking system, comprising: the device comprises an acquisition module, an alarm module and a positioning module;
the acquisition module is used for acquiring data of nodes in the distributed system, classifying and storing the acquired data;
the alarm module is used for alarming the abnormal index data acquired in the acquisition module and the corresponding abnormal index event and generating alarm information;
and the positioning module is used for positioning the problem information and the node information of the abnormal node according to the alarm information.
2. The distributed problem tracking system of claim 1, wherein said acquisition module comprises: the device comprises a sub-acquisition module, a classification module and a storage module;
the sub-acquisition module is used for acquiring data of each node in the distributed system;
the classification module is used for classifying the data of each node, and the classification is as follows: system indexes, service indexes and node logs;
the storage module is used for storing the acquired data of each node at regular time.
3. The distributed problem tracking system of claim 2, wherein said system metrics include: CPU utilization rate, residual memory size, network rate and disk IO of the node;
the service index is a self-defined index of a service system, and comprises the following steps: a business service error code event index, a consumption event index;
the node logs comprise log tracking identification TraceID of service call when the current log occurs in each node log.
4. The distributed problem tracking system according to claim 3, wherein said business indicators include information comprising: node name, IP address, user unique ID, TraceID, error code and time.
5. The distributed problem tracking system of claim 2, wherein said acquisition module comprises a plurality of sub-acquisition modules;
the sub-acquisition modules are distributed in each node and used for transmitting the data acquired from each node to the acquisition module.
6. The distributed problem tracking system of claim 1, wherein said alarm module comprises an aggregation module, a prioritization module, and an alarm information generation module;
the aggregation module is used for aggregating the abnormal index data and the abnormal index events according to the nodes and the time of the abnormal index events;
the priority dividing module is used for carrying out priority division on the abnormal index events;
and the alarm information generation module is used for sending alarm information to corresponding operation and maintenance personnel according to the priority of the abnormal index event and the node information.
7. The distributed problem tracking system of claim 6, wherein said aggregation module is configured to aggregate the same anomaly index data and anomaly index events at different time points on the same node.
8. The distributed problem tracking system of claim 6, wherein said prioritization module is configured to prioritize anomaly index data and anomaly index events based on user experience.
9. The distributed problem tracking system according to claim 1, wherein the positioning module is further configured to retrieve, according to a transit ID identifier carried by the alarm information, a node where the abnormal indicator event is located and log information, a user ID, corresponding to the node.
10. A distributed problem tracking device, the device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the solution in the distributed problem tracking system of any one of claims 1-9 according to instructions in the program code.
CN201910963441.4A 2019-10-11 2019-10-11 Distributed problem tracking system and equipment Pending CN110677304A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910963441.4A CN110677304A (en) 2019-10-11 2019-10-11 Distributed problem tracking system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910963441.4A CN110677304A (en) 2019-10-11 2019-10-11 Distributed problem tracking system and equipment

Publications (1)

Publication Number Publication Date
CN110677304A true CN110677304A (en) 2020-01-10

Family

ID=69081608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910963441.4A Pending CN110677304A (en) 2019-10-11 2019-10-11 Distributed problem tracking system and equipment

Country Status (1)

Country Link
CN (1) CN110677304A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339852A (en) * 2020-02-14 2020-06-26 北京百度网讯科技有限公司 Tracking method, device, electronic equipment and computer readable storage medium
CN111756579A (en) * 2020-06-24 2020-10-09 北京百度网讯科技有限公司 Abnormity early warning method, device, equipment and storage medium
CN112526905A (en) * 2020-11-27 2021-03-19 杭州萤石软件有限公司 Processing method and system for index abnormity
CN112737856A (en) * 2020-12-31 2021-04-30 青岛海尔科技有限公司 Link tracking method and device, storage medium and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168845A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of Fault Locating Method and device
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN108390782A (en) * 2018-02-12 2018-08-10 黄倚霄 A kind of centralization application system performance question synthesis analysis method
CN109660407A (en) * 2019-01-18 2019-04-19 鑫涌算力信息科技(上海)有限公司 Distributed system monitoring system and method
CN109921927A (en) * 2019-02-20 2019-06-21 苏州人之众信息技术有限公司 Real-time calling D-chain trace method based on micro services

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168845A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of Fault Locating Method and device
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN108390782A (en) * 2018-02-12 2018-08-10 黄倚霄 A kind of centralization application system performance question synthesis analysis method
CN109660407A (en) * 2019-01-18 2019-04-19 鑫涌算力信息科技(上海)有限公司 Distributed system monitoring system and method
CN109921927A (en) * 2019-02-20 2019-06-21 苏州人之众信息技术有限公司 Real-time calling D-chain trace method based on micro services

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339852A (en) * 2020-02-14 2020-06-26 北京百度网讯科技有限公司 Tracking method, device, electronic equipment and computer readable storage medium
CN111339852B (en) * 2020-02-14 2023-12-26 阿波罗智联(北京)科技有限公司 Tracking method, tracking device, electronic equipment and computer readable storage medium
CN111756579A (en) * 2020-06-24 2020-10-09 北京百度网讯科技有限公司 Abnormity early warning method, device, equipment and storage medium
CN112526905A (en) * 2020-11-27 2021-03-19 杭州萤石软件有限公司 Processing method and system for index abnormity
CN112737856A (en) * 2020-12-31 2021-04-30 青岛海尔科技有限公司 Link tracking method and device, storage medium and electronic device
CN112737856B (en) * 2020-12-31 2023-02-03 青岛海尔科技有限公司 Link tracking method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
CN110677304A (en) Distributed problem tracking system and equipment
CN110213068B (en) Message middleware monitoring method and related equipment
CN103220173B (en) A kind of alarm monitoring method and supervisory control system
CN112965874B (en) Configurable monitoring alarm method and system
CN110535713B (en) Monitoring management system and monitoring management method
CN103401698B (en) For the monitoring system that server health is reported to the police in server set group operatione
CN112311617A (en) Configured data monitoring and alarming method and system
CN111339175B (en) Data processing method, device, electronic equipment and readable storage medium
CN109977089A (en) Blog management method, device, computer equipment and computer readable storage medium
CN111245672A (en) Monitoring method and system for general extensible tracking service full link
CN112395156A (en) Fault warning method and device, storage medium and electronic equipment
WO2011017955A1 (en) Method for analyzing alarm data and system thereof
CN114036022A (en) Monitoring alarm processing method, device, equipment and medium
CN113746703A (en) Abnormal link monitoring method, system and device
CN114328107A (en) Monitoring method and system for optomagnetic fusion storage server cluster and electronic equipment
CN108039971A (en) A kind of alarm method and device
CN103763143A (en) Method and system for equipment abnormality alarming based on storage server
CN114513400A (en) Log aggregation system and method for improving availability of log aggregation system
CN108984362A (en) Log collection method and device, storage medium, electronic equipment
CN109194532B (en) Method and device for pushing power grid alarm information
CN111983960A (en) Monitoring system and method
CN110601885A (en) Artificial intelligence public cloud abnormity indication alarm system
CN116431872B (en) Observable system and service observing method based on observable system
CN114422324B (en) Alarm information processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200110