CN110875832A - Abnormal service monitoring method, device and system and computer readable storage medium - Google Patents

Abnormal service monitoring method, device and system and computer readable storage medium Download PDF

Info

Publication number
CN110875832A
CN110875832A CN201811014428.6A CN201811014428A CN110875832A CN 110875832 A CN110875832 A CN 110875832A CN 201811014428 A CN201811014428 A CN 201811014428A CN 110875832 A CN110875832 A CN 110875832A
Authority
CN
China
Prior art keywords
service
error
information
error information
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811014428.6A
Other languages
Chinese (zh)
Other versions
CN110875832B (en
Inventor
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811014428.6A priority Critical patent/CN110875832B/en
Publication of CN110875832A publication Critical patent/CN110875832A/en
Application granted granted Critical
Publication of CN110875832B publication Critical patent/CN110875832B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The invention provides a method, a device and a system for monitoring abnormal services and a computer readable storage medium, wherein error information generated by a first service in the online running process is obtained by acquiring the error information of the first service; then, acquiring error chain information of a second service, wherein the second service is an upper-layer service of the first service, and the error chain information of the second service is probably a reason for generating error information of the first service because the second service is a pre-process of the first service; determining the error chain information of the first service based on the obtained error information of the first service and the obtained error chain information of the second service, and collecting all related error information aiming at the first service abnormity; and finally, according to the error chain information of the first service, sending an abnormal service notification to the user corresponding to the first service, so that the reliability of monitoring the abnormal service is improved, the user corresponding to the first service is assisted to solve the abnormality of the first service more quickly, and the efficiency of monitoring the abnormal service is improved.

Description

Abnormal service monitoring method, device and system and computer readable storage medium
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a method, an apparatus, a system, and a computer-readable storage medium for monitoring abnormal traffic.
Background
With the continuous development of the internet and the gradual update of various business requirements, various internet services are more and more diversified, and one internet service is mostly realized by the cooperation of a plurality of businesses with different functions. For example, internet e-commerce is a typical internet service, which generally includes a series of services having an ordered hierarchical relationship: the system comprises a commodity transaction data analysis service, a commodity information pushing service, a commodity detailed information display service, an order generation service, a logistics inquiry service and an after-sale information maintenance service. In order to improve the user experience, each sub-service, especially the core service facing the user and involving the main function, needs to be monitored and managed for abnormality.
In the existing abnormal service monitoring method, a developer generally performs related abnormal monitoring configuration on a service which is in charge of development when the service is on line, and once the abnormal condition occurs to the service, the developer is notified to perform abnormal processing.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: in the process of monitoring abnormal services, the reason for the abnormality needs to be determined in a manual troubleshooting manner, which results in long abnormal troubleshooting time, low troubleshooting efficiency and further low abnormal service monitoring efficiency.
Disclosure of Invention
The embodiment of the invention provides a method, a device and a system for monitoring abnormal services and a computer readable storage medium, which improve the efficiency and reliability of monitoring the abnormal services.
According to a first aspect of the present invention, a method for monitoring abnormal traffic is provided, including:
acquiring error information of a first service, wherein the error information of the first service is generated in the online running process of the first service;
acquiring error chain information of a second service, wherein the second service is an upper-layer service of the first service;
determining error chain information of the first service according to the error information of the first service and the error chain information of the second service;
and sending an abnormal service notification to a user corresponding to the first service according to the error chain information of the first service.
Optionally, in a possible implementation manner of the first aspect, the determining, according to the error information of the first service and the error chain information of the second service, the error chain information of the first service includes:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service;
according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service;
and determining the error chain information of the first service according to the error information of the first service and the upper-layer error information.
Optionally, in another possible implementation manner of the first aspect, the determining, according to the error information of the first service and the upper-layer error information, the error chain information of the first service includes:
and sequentially combining the error information of the first service with the upper-layer error information according to the error cause-effect relationship to obtain the error chain information of the first service.
Optionally, in yet another possible implementation manner of the first aspect, before the obtaining the error chain information of the second service, the method further includes:
according to a preset service level sequence, sequentially inquiring the nearest upper-layer service generating error information from the first service upwards;
and determining the nearest upper layer service as a second service.
Optionally, in another possible implementation manner of the first aspect, the sending, according to the error chain information of the first service, an abnormal service notification to a user corresponding to the first service includes:
determining a notification mode corresponding to the first service and a user to be notified according to the importance level of the first service;
generating an abnormal service notification according to the error chain information of the first service;
and sending the abnormal service notification to the user to be notified in the notification mode.
Optionally, in another possible implementation manner of the first aspect, before the obtaining the error information of the first service, the method further includes:
receiving a verification instruction input by a user and data to be verified of a first service;
acquiring error configuration information corresponding to the first service according to the check indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier;
if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service passes verification;
correspondingly, the acquiring the error information of the first service, which is generated in the online running process of the first service, includes:
acquiring an error log generated by an error log generation model of a first service in the process of running on the first service line;
and acquiring the error information of the first service according to the error log.
Optionally, in yet another possible implementation manner of the first aspect, before the obtaining, according to the check indication, the incorrect configuration information corresponding to the first service, the method further includes:
receiving a position identification input by a user aiming at a first service and an error identification corresponding to the position identification;
and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
According to a second aspect of the present invention, there is provided an abnormal traffic monitoring apparatus, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring error information of a first service, and the error information of the first service is generated in the process of online running of the first service;
the query module is used for acquiring error chain information of a second service, wherein the second service is an upper-layer service of the first service;
the link establishing module is used for determining the error link information of the first service according to the error information of the first service and the error link information of the second service;
and the notification module is used for sending an abnormal service notification to the user corresponding to the first service according to the error chain information of the first service.
Optionally, in a possible implementation manner of the second aspect, the link building module is specifically configured to:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service; according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service; and determining the error chain information of the first service according to the error information of the first service and the upper-layer error information.
Optionally, in another possible implementation manner of the second aspect, the link building module is specifically configured to:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service; according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service; and sequentially combining the error information of the first service with the upper-layer error information according to the error cause-effect relationship to obtain the error chain information of the first service.
Optionally, in yet another possible implementation manner of the second aspect, before the obtaining of the error chain information of the second service, the query module is further configured to:
according to a preset service level sequence, sequentially inquiring the nearest upper-layer service generating error information from the first service upwards; and determining the nearest upper layer service as a second service.
Optionally, in another possible implementation manner of the second aspect, the notification module is specifically configured to:
determining a notification mode corresponding to the first service and a user to be notified according to the importance level of the first service; generating an abnormal service notification according to the error chain information of the first service; and sending the abnormal service notification to the user to be notified in the notification mode.
Optionally, in yet another possible implementation manner of the second aspect, the apparatus further includes a verification module, configured to:
before the obtaining module obtains the error information of the first service, receiving a verification instruction input by a user and data to be verified of the first service; acquiring error configuration information corresponding to the first service according to the check indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier; if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service passes verification;
correspondingly, the obtaining module is specifically configured to:
acquiring an error log generated by an error log generation model of a first service in the process of running on the first service line; and acquiring the error information of the first service according to the error log.
Optionally, in yet another possible implementation manner of the second aspect, the apparatus further includes an application module, configured to:
before the checking module acquires error configuration information corresponding to the first service according to the checking indication, receiving a position identification input by a user aiming at the first service and an error identification corresponding to the position identification; and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
According to a third aspect of the present invention, there is provided an abnormal traffic monitoring system, comprising: the system comprises a memory, at least one processor and at least one computer program, wherein the at least one computer program is stored in the memory, and the at least one processor runs the at least one computer program to execute the abnormal traffic monitoring method according to the first aspect of the present invention and various possible designs of the first aspect of the present invention.
According to a fourth aspect of the present invention, there is provided a readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to implement the abnormal traffic monitoring method according to the first aspect of the present invention and various possible designs of the first aspect of the present invention.
The embodiment of the invention provides a method, a device and a system for monitoring abnormal services and a computer readable storage medium, wherein error information generated by a first service in an online running process is obtained by acquiring the error information of the first service; then, acquiring error chain information of a second service, wherein the second service is an upper-layer service of the first service, and the error chain information of the second service is probably a reason for generating error information by the first service because the second service is a pre-process of the first service; determining the error chain information of the first service based on the obtained error information of the first service and the obtained error chain information of the second service, and collecting all associated error information aiming at the first service abnormity; and finally, according to the error chain information of the first service, sending an abnormal service notification to the user corresponding to the first service, so that the user corresponding to the first service not only obtains the error information of the first service, but also can obtain the abnormal condition of the upper-layer service which possibly causes the abnormal condition of the first service, thereby improving the reliability of monitoring the abnormal service, assisting the user corresponding to the first service to solve the abnormal condition of the first service more quickly, and improving the efficiency of monitoring the abnormal service.
Drawings
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for monitoring abnormal services according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a service level sequence provided by an embodiment of the present invention;
fig. 4 is a schematic flow chart of another abnormal service monitoring method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an abnormal service monitoring apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of another abnormal traffic monitoring apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of an abnormal service monitoring system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.
It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.
It should be understood that in the present invention, a "service" may be understood as a program module that can be developed separately, and may be an application program that provides a service, or may be an independent functional module in the application program. Such as an e-commerce application installed in a cell phone, whose user provides e-commerce services, but can be subdivided into multiple functions, such as pre-sales (providing pre-sales services) and post-sales (providing post-sales services) that can include independently providing services.
The term flash is a distributed, reliable, and highly available system for mass log collection, aggregation, and transmission. Various data senders are customized in a log system and used for collecting data; at the same time, Flume provides simple processing of data.
The term Kafka is a high throughput distributed publish-subscribe messaging system that can handle all the action flow data in a consumer-scale website. These data are typically addressed by handling logs and log aggregations due to throughput requirements. The purpose of Kafka is to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time messages through clustering.
The term Storm is a distributed, fault-tolerant, real-time computing system, typically consisting of a master node and a plurality of worker nodes. The master node runs a daemon named "Nimbus" for code allocation, task placement and fault detection. Each worker node runs a daemon named "hypervisor" to monitor the work and start and stop the worker process.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention. The servers 1 shown in fig. 1 are used for developers to develop, test and operate on line a plurality of services, and provide service to the clients 2 through the internet or other connection modes after operating on line. After the service is on line, developers also need to maintain abnormal problems found in the service operation process in time. One or more servers 1 may perform the method according to various embodiments of the present invention described below to monitor abnormal services and notify the relevant responsible persons of the abnormal services, for example, data overload, data access error, or loading timeout occurs, and notify developers to perform optimization adjustment on the constituent codes of the services in time, so as to improve the use experience of the client 2. The server 1 may be an e-commerce server, a game server, an interactive system server or a data storage server, etc. according to the classification of the type of the service to be provided, and the present invention does not specifically limit the type of the server and the type of the service. The embodiment of the present invention shown in fig. 1 may be executed by one server 1, or may be executed by a plurality of servers 1. In the case of being executed by a plurality of servers 1, the plurality of servers 1 may be a server group provided in the same computer room, or may be servers which are provided with a plurality of different development platforms and cooperate in a distributed manner.
Since each service is abnormal, the abnormal condition can be caused by the service itself and can also be caused by other services. For example, the product information push service generates error information of push delay, but the reason of the occurrence of the push delay is that the source data of the product information push service has errors, and the source data of the product information push service is obtained by processing the product purchase and sale data analysis service of the previous layer, so that the product purchase and sale data analysis service needs to be adjusted or updated to solve the abnormal problem of the push delay. However, in the prior art, each service is usually handled as a responsible person by a corresponding developer or maintainer, and after the responsible person of the goods information push service receives the notification of the delayed push exception, it takes a lot of time to check the code program of the goods information push service, which wastes a lot of time. The abnormal service monitoring method provided by the embodiment of the invention integrates the error chain information of the related upstream service on the basis of the error information of the current service and sends the integrated information to the responsible person, thereby improving the reliability of notification and improving the accuracy and efficiency of positioning the abnormal reason in the abnormal service monitoring.
Referring to fig. 2, which is a schematic flowchart of an abnormal service monitoring method according to an embodiment of the present invention, an execution main body of the method shown in fig. 2 may be a software and/or hardware device. The method shown in fig. 2 mainly includes steps S101 to S104, and specifically includes the following steps:
s101, acquiring error information of a first service, wherein the error information of the first service is generated in the process of online running of the first service.
It can be understood that when an abnormality occurs in the running process of the first service, an error log is automatically generated, and the execution main body of this embodiment collects these error logs in real time or periodically and extracts the error information therein, thereby acquiring the error information of the first service.
The obtaining of the error information of the first service may have a variety of implementation manners, and in an optional implementation manner, the execution main body of this embodiment may include a data acquisition subsystem dedicated to obtaining the error information of the first service, the data acquisition subsystem performs error log monitoring on each running service, and once it is monitored that an error log is generated by the service, the error log is read through a flash. The FLUME for load collecting error logs can also perform distributed aggregation and transmission on the logs. And (4) introducing the collected error logs of the first service into the kafka cluster as a producer by using the Flume, and receiving mass log information in a distributed manner in real time by using the kafka. And transmitting the error information produced according to the error log to the storm cluster for real-time processing. storm is a distributed system that can process messages produced by kakfa in real time, and kafka + storm is a high-performance real-time data processing mechanism in big data processing. The error information consumed by storm is firstly stored in HBASE, and if the error information is data before 7 days, the error information is transferred from HBASE to HDFS for long-term archiving. The data stored in the HBASE is the acquired error information of the first service.
S102, obtaining error chain information of a second service, wherein the second service is an upper layer service of the first service.
It can be understood that, before the obtaining of the error chain information of the second service, the second service is selected in the upper layer service of the first service.
There are many ways to determine the second service, and in an alternative implementation, the second service may be understood as all services hierarchically above the first service. Determining all services above the first service as second services according to a preset service level sequence; acquiring at least one error message generated by a second service; and acquiring the error chain information of the second service according to at least one error message generated by the second service. For example, each service only saves the error information it generates until the error is removed. Then, when each service generates error information, it only needs to traverse the error information generated by each service at its upper layer, and then combine to obtain the error chain information of the second service.
In another alternative implementation, the second service may be understood as a single service. Specifically, the latest upper layer service generating error information may be sequentially queried from the first service upwards according to a preset service level order; and then determining the nearest upper layer service as a second service. Each service creates and stores an error chain message when it generates an error message, and the error chain message is not canceled until the error is removed. Therefore, when each service generates error information, the service can establish the own error link information only by searching the upper-layer service which has the error information and is closest to the service, and the service does not need to traverse all the upper-layer services, so that the efficiency of establishing the error link information is improved. The preset service level sequence may be a level sequence that specifies different services according to a time sequence or implementation logic of a data processing flow. For example, service B is to process output data of service a, and it can be seen that the implementation of service B is based on the output of service a, the hierarchy of service a is before service B. Then, the first service is used as a starting point to judge whether the error information is generated or not by carrying out hierarchy-by-hierarchy judgment, and the upper-layer service which has the error information and is closest to the hierarchy of the first service is used as a second service.
Fig. 3 is a schematic diagram of a service level sequence according to an embodiment of the present invention. FIG. 3 shows 6 services with sequential hierarchical relationships: service A, service B, service C, service D, service E and service F. The 6 services may belong to an e-commerce service application. The e-commerce service application can be understood as an application program on a mobile phone terminal, can also be understood as website information in a browser, or can be a system in a special platform device. The E-commerce service application may include various hierarchical relationships in which, in the hierarchical relationship shown in fig. 3, if the service E is the first service, only the service C and the service D generate error information in the upper layer service thereof. In fig. 3, the error information 31 of service C is illustrated by a triangle; the error information 32 of the service D is illustrated by a square, and the error chain information 321 of the service D is illustrated by a triangle and a square which are connected in sequence; error information 33 for service E is illustrated in a circle; the error chain information 331 of the service E is illustrated by sequentially connected triangles, squares and circles. Then, if the service D is taken as the second service, the error chain information 321 of the service D may be directly taken as the error chain information of the second service; if the service C and the service D are taken as the second service, or the service a, the service B, the service C and the service D are taken as the second service, the error information 31 of the service C and the error information 32 of the service D are obtained, and the error chain information of the second service is obtained through combination.
S103, determining the error chain information of the first service according to the error information of the first service and the error chain information of the second service.
It can be understood that the error information of the first service and the error chain information of the second service are combined, spliced or extracted to obtain the error chain information of the first service. The error chain information of the first service should contain error information about the upstream service in addition to the error information of the first service itself.
There may be multiple implementation manners for obtaining the error chain information of the first service, and in an optional implementation manner, the error chain information of the first service may be obtained by directly splicing the error information of the first service sequentially after the error chain information of the second service. Continuing with the example shown in FIG. 3, for example, various error information is identified by an error code, the error information 31 of the service C is 5138/513G/5131, the error information 32 of the service D is 51384/513F9, and the error chain information 321 of the second service is 5138/513G/5131/51384/513F 9. And if the error information of the first service is 51384Y, the acquired error chain information of the first service is 5138/513G/5131/51384/513F 9/51384Y. Each error code may be preset by the system for each error type, or may be a detection location that is applied and preset by a user (e.g., a developer) in advance during a service code writing stage.
In another optional implementation manner, the error chain information of the second service may be firstly analyzed to obtain an error information set corresponding to the second service, where the error information set corresponding to the second service includes error information generated by the second service and error information generated by an upper-layer service of the second service. And assuming that the error chain information of the second service is a queue of the error information of the second service and the upper layer service, analyzing each error information from the queue. In the above example in which the error code is used as the error information, 5138, 513G, 5131, 51384, and 513F9 may be obtained by analysis from 5138/513G/5131/51384/513F 9. And then, according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service. The preset error causal relationship is, for example, a front-back logic causal relationship determined according to the data processing flow, for example, when the uploading of the commodity image fails in the data storage service, an error that the displayed commodity cannot be displayed may occur in the subsequent commodity display service, and thus the error information "the uploading of the commodity image fails" and the error information "the displayed commodity cannot be displayed" are causal relationships. In the above example, the error chain information 321 of the second service is 5138/513G/5131/51384/513F9, wherein the underlined error information is the upper-layer error information having a causal relationship with the error information 51384Y of the first service. And finally, determining the error chain information of the first service according to the error information of the first service and the upper-layer error information. It can be understood that, according to the error causal relationship, the error information of the first service and the upper-layer error information are sequentially combined to obtain the error chain information of the first service. It can be understood that the error information of the first service and the upper layer error information are used to form the error chain information of the first service. For example, the error chain information of the acquired first service is 5138/51384/51384Y. The error chain information of the first service obtained according to the error cause-and-effect relationship can indicate the error cause of the service more clearly, and the reliability of subsequent abnormal service notification can be further improved, so that the abnormal monitoring efficiency is improved.
And S104, sending an abnormal service notification to a user corresponding to the first service according to the error chain information of the first service.
The abnormal service notification can be sent by telephone voice prompt, short message, mail or system prompt. The user corresponding to the first service may be an actual developer, and/or an administrator. In the service hierarchy sequence, because the lower layer service implements various functions based on the processing result of the upper layer service, the more users corresponding to the lower layer service, the more services corresponding to the obtained error chain information may be, and it can be understood that the more user rights corresponding to the lower layer service are, the more abnormal reasons need to be considered. In an optional implementation manner, a notification manner and a user to be notified corresponding to the first service may be determined according to the importance level of the first service. And then generating an abnormal service notification according to the error chain information of the first service. And finally, sending the abnormal service notification to the user to be notified in the notification mode. For example, if the first service is a preference information push service and may not affect the main function of the electronic commerce, an abnormal service notification may be sent to a maintainer of the preference information push service through a weak notification manner (e.g., a system message); if the first service is a goods interface display service, which may directly affect the main functions of the electronic commerce, a strong notification manner (such as a telephone voice prompt) is used to send an abnormal service notification to the maintainer, the developer and the supervisor of the goods interface display service.
The embodiment provides an abnormal service monitoring method, which includes obtaining error information of a first service, and obtaining error information generated by the first service in an online running process; then, acquiring error chain information of a second service, wherein the second service is an upper-layer service of the first service, and the error chain information of the second service is probably a reason for generating error information by the first service because the second service is a pre-process of the first service; determining the error chain information of the first service based on the obtained error information of the first service and the obtained error chain information of the second service, and collecting all associated error information aiming at the first service abnormity; and finally, according to the error chain information of the first service, sending an abnormal service notification to the user corresponding to the first service, so that the user corresponding to the first service not only obtains the error information of the first service, but also can obtain the abnormal condition of the upper-layer service which possibly causes the abnormal condition of the first service, thereby improving the reliability of monitoring the abnormal service, assisting the user corresponding to the first service to solve the abnormal condition of the first service more quickly, and improving the efficiency of monitoring the abnormal service.
Referring to fig. 4, which is a schematic flow chart of another abnormal service monitoring method provided in the embodiment of the present invention, in order to more clearly describe various implementation manners of the present invention, in an alternative implementation manner, based on the embodiment of the method shown in fig. 2 and various possible implementation manners thereof, a service verification process shown in fig. 4 may be further included before step S101 (obtaining error information of the first service). The method shown in fig. 4 mainly includes steps S201 to S203, and specifically includes the following steps:
s201, receiving a verification instruction input by a user and data to be verified of the first service.
It is understood that, during the development test or online approval process, a user (e.g., a developer) inputs a verification instruction and data to be verified of the first service (e.g., a code of the first service), and a verification process is initiated in the system.
S202, acquiring error configuration information corresponding to the first service according to the check indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier.
It can be understood that, before the user initiates the verification process, or before the system starts verification, the system first performs the error configuration information application process of the first service: specifically, the location identifier input by the user for the first service and the error identifier corresponding to the location identifier may be received. And acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier. The user (for example, a developer) needs to apply for registration of the error configuration information that the user needs to configure in the first service code, and the system is pre-entered so as to be well documented in the subsequent verification and possible error information identification process. For example, a user (for example, a developer) needs to set a code of an error log generation model at the 530 th line code of a business so that a corresponding error log can be generated when a commodity image shows an error, and then the user needs to apply error configuration information for the error type targeted by the error log generation model and the configuration position of the error log generation model in the error configuration information application process, for example, "error code: 51384Y; an image display error; 530". The error configuration information of each service is prestored in the application process, so that in the verification process, when the system detects that the first service is verified, the error configuration information prestored aiming at the first service can be acquired.
S203, if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service is verified to be passed.
It can be understood that, in the data to be verified of the first service, the configuration condition of the error log generation model therein is compared with the error configuration information stored in advance, and if the comparison result is consistent, the verification is passed, and if the comparison result is inconsistent, the verification is not passed. If the verification is passed, the test is qualified or other online approval processes are continued, and if the verification is not passed, the inconsistent positions and the types of the error log generation models lacking the configuration can be sent to the user so as to be convenient for the user to make up.
In the embodiment shown in fig. 4, correspondingly, the step S101 (obtaining error information of a first service, where the error information of the first service is generated in an online running process of the first service) in the foregoing embodiment may specifically include: acquiring an error log generated by an error log generation model of a first service in the process of running on the first service line; and acquiring the error information of the first service according to the error log. When the first service is abnormal, the error log generated by the error log generation model is configured in the first service, and the system extracts the error information of the first service from the error log.
In the embodiment, by pre-storing the error configuration information in the application process, the error configuration information can be automatically checked and approved before the service is online, so that the error log generation model is correctly configured in each service, and the reliability of monitoring the abnormal service can be improved.
Referring to fig. 5, which is a schematic structural diagram of an abnormal traffic monitoring apparatus according to an embodiment of the present invention, the abnormal traffic monitoring apparatus 50 shown in fig. 5 mainly includes the following modules:
the obtaining module 51 is configured to obtain error information of a first service, where the error information of the first service is generated in an online running process of the first service.
The query module 52 is configured to obtain error chain information of a second service, where the second service is an upper layer service of the first service.
A link establishing module 53, configured to determine error link information of the first service according to the error information of the first service and the error link information of the second service.
And a notification module 54, configured to send an abnormal service notification to a user corresponding to the first service according to the error chain information of the first service.
The abnormal service monitoring apparatus in the embodiment shown in fig. 5 may be correspondingly used to execute the steps in the method embodiment shown in fig. 2, and the implementation principle and the technical effect are similar, which are not described herein again.
Optionally, the link establishing module 53 is specifically configured to:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service; according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service; and determining the error chain information of the first service according to the error information of the first service and the upper-layer error information.
Optionally, the link establishing module 53 is specifically configured to:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service; according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service; and sequentially combining the error information of the first service with the upper-layer error information according to the error cause-effect relationship to obtain the error chain information of the first service.
Optionally, before the obtaining the error chain information of the second service, the query module 52 is further configured to:
according to a preset service level sequence, sequentially inquiring the nearest upper-layer service generating error information from the first service upwards; and determining the nearest upper layer service as a second service.
Optionally, the notification module 54 is specifically configured to:
determining a notification mode corresponding to the first service and a user to be notified according to the importance level of the first service; generating an abnormal service notification according to the error chain information of the first service; and sending the abnormal service notification to the user to be notified in the notification mode.
Fig. 6 is a schematic structural diagram of another abnormal traffic monitoring apparatus according to an embodiment of the present invention. On the basis of the foregoing embodiment, the abnormal traffic monitoring apparatus 50 may further include a checking module 55, configured to:
before the obtaining module obtains the error information of the first service, receiving a verification instruction input by a user and data to be verified of the first service; acquiring error configuration information corresponding to the first service according to the check indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier; and if an error log generation model matched with the error identifier is acquired at the position indicated by each position identifier in the data to be verified of the first service, acquiring a result that the first service passes verification.
Correspondingly, the obtaining module 51 is specifically configured to:
acquiring an error log generated by an error log generation model of a first service in the process of running on the first service line; and acquiring the error information of the first service according to the error log.
With continued reference to the structure of the abnormal traffic monitoring apparatus shown in fig. 6, optionally, an application module 56 may be further included, configured to:
before the checking module acquires error configuration information corresponding to the first service according to the checking indication, receiving a position identification input by a user aiming at the first service and an error identification corresponding to the position identification; and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
The abnormal service monitoring apparatus in the embodiment shown in fig. 6 may be correspondingly used to execute the steps in the method embodiment shown in fig. 4, and the implementation principle and the technical effect are similar, which are not described herein again.
Referring to fig. 7, which is a schematic diagram of a hardware structure of an abnormal traffic monitoring system according to an embodiment of the present invention, the abnormal traffic monitoring system 60 includes: memory 62, at least one processor 61, and at least one computer program; wherein the content of the first and second substances,
a memory 62 for storing the at least one computer program, which may also be a flash memory (flash). The computer program is, for example, an application program, a functional module, or the like that implements the above method.
A processor 61, configured to execute at least one computer program stored in the memory, so as to implement each step in the above abnormal traffic monitoring method. Reference may be made in particular to the description relating to the preceding method embodiment.
Alternatively, the memory 62 may be separate or integrated with the processor 61.
When the memory 62 is a device independent from the processor 61, the abnormal traffic monitoring system 60 may further include:
a bus 63 for connecting the memory 62 and the processor 61. The terminal of fig. 7 may further include a transmitter (not shown) for transmitting the abnormal traffic notification generated by the processor 61 to the first traffic correspondent user.
The specific implementation manner of the abnormal service monitoring system may be implemented by a server or a terminal, which is not limited in the embodiments of the present invention.
The present invention also provides a readable storage medium, in which a computer program is stored, and the computer program is used for implementing the abnormal traffic monitoring method provided by the above various embodiments when being executed by a processor.
The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instruction from the readable storage medium, and the execution of the execution instruction by the at least one processor causes the device to implement the abnormal traffic monitoring method provided in the various embodiments described above.
In the embodiment of the system, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (16)

1. An abnormal service monitoring method is characterized by comprising the following steps:
acquiring error information of a first service, wherein the error information of the first service is generated in the online running process of the first service;
acquiring error chain information of a second service, wherein the second service is an upper-layer service of the first service;
determining error chain information of the first service according to the error information of the first service and the error chain information of the second service;
and sending an abnormal service notification to a user corresponding to the first service according to the error chain information of the first service.
2. The method of claim 1, wherein the determining the error chain information of the first service according to the error information of the first service and the error chain information of the second service comprises:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service;
according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service;
and determining the error chain information of the first service according to the error information of the first service and the upper-layer error information.
3. The method of claim 2, wherein the determining the error chain information of the first service according to the error information of the first service and the upper-layer error information comprises:
and sequentially combining the error information of the first service with the upper-layer error information according to the error cause-effect relationship to obtain the error chain information of the first service.
4. The method according to any one of claims 1 to 3, further comprising, before said obtaining the error chain information of the second service:
according to a preset service level sequence, sequentially inquiring the nearest upper-layer service generating error information from the first service upwards;
and determining the nearest upper layer service as a second service.
5. The method according to claim 1, wherein the sending an abnormal service notification to the user corresponding to the first service according to the error chain information of the first service includes:
determining a notification mode corresponding to the first service and a user to be notified according to the importance level of the first service;
generating an abnormal service notification according to the error chain information of the first service;
and sending the abnormal service notification to the user to be notified in the notification mode.
6. The method according to claim 1 or 5, wherein before the obtaining the error information of the first service, the method further comprises:
receiving a verification instruction input by a user and data to be verified of a first service;
acquiring error configuration information corresponding to the first service according to the check indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier;
if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service passes verification;
correspondingly, the acquiring the error information of the first service, which is generated in the online running process of the first service, includes:
acquiring an error log generated by an error log generation model of a first service in the process of running on the first service line;
and acquiring the error information of the first service according to the error log.
7. The method according to claim 6, further comprising, before the obtaining the mis-configuration information corresponding to the first service according to the check indication:
receiving a position identification input by a user aiming at a first service and an error identification corresponding to the position identification;
and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
8. An abnormal traffic monitoring apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring error information of a first service, and the error information of the first service is generated in the process of online running of the first service;
the query module is used for acquiring error chain information of a second service, wherein the second service is an upper-layer service of the first service;
the link establishing module is used for determining the error link information of the first service according to the error information of the first service and the error link information of the second service;
and the notification module is used for sending an abnormal service notification to the user corresponding to the first service according to the error chain information of the first service.
9. The apparatus of claim 8, wherein the link establishment module is specifically configured to:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service; according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service; and determining the error chain information of the first service according to the error information of the first service and the upper-layer error information.
10. The apparatus according to claim 9, wherein the link establishment module is specifically configured to:
analyzing error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper-layer service of the second service; according to a preset error causal relationship, determining upper-layer error information having a causal relationship with the error information of the first service in an error information set corresponding to the second service; and sequentially combining the error information of the first service with the upper-layer error information according to the error cause-effect relationship to obtain the error chain information of the first service.
11. The apparatus according to any one of claims 8 to 10, wherein the query module, before the obtaining the error chain information of the second service, is further configured to:
according to a preset service level sequence, sequentially inquiring the nearest upper-layer service generating error information from the first service upwards; and determining the nearest upper layer service as a second service.
12. The apparatus of claim 8, wherein the notification module is specifically configured to:
determining a notification mode corresponding to the first service and a user to be notified according to the importance level of the first service; generating an abnormal service notification according to the error chain information of the first service; and sending the abnormal service notification to the user to be notified in the notification mode.
13. The apparatus of claim 8 or 12, further comprising a verification module configured to:
before the obtaining module obtains the error information of the first service, receiving a verification instruction input by a user and data to be verified of the first service; acquiring error configuration information corresponding to the first service according to the check indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier; if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service passes verification;
correspondingly, the obtaining module is specifically configured to:
acquiring an error log generated by an error log generation model of a first service in the process of running on the first service line; and acquiring the error information of the first service according to the error log.
14. The apparatus of claim 13, further comprising an application module configured to:
before the checking module acquires error configuration information corresponding to the first service according to the checking indication, receiving a position identification input by a user aiming at the first service and an error identification corresponding to the position identification; and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
15. An abnormal traffic monitoring system, comprising: the abnormal traffic monitoring method comprises a memory, at least one processor and at least one computer program, wherein the at least one computer program is stored in the memory, and the at least one processor runs the at least one computer program to execute the abnormal traffic monitoring method according to any one of claims 1 to 7.
16. A readable storage medium, wherein a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, the computer program is used for implementing the abnormal traffic monitoring method according to any one of claims 1 to 7.
CN201811014428.6A 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium Active CN110875832B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811014428.6A CN110875832B (en) 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811014428.6A CN110875832B (en) 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110875832A true CN110875832A (en) 2020-03-10
CN110875832B CN110875832B (en) 2023-05-12

Family

ID=69715440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811014428.6A Active CN110875832B (en) 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110875832B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837013A (en) * 2021-02-02 2021-05-25 拉扎斯网络科技(上海)有限公司 Service processing method, device and equipment
CN114064387A (en) * 2020-08-07 2022-02-18 中国电信股份有限公司 Log monitoring method, system, device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103378982A (en) * 2012-04-17 2013-10-30 深圳市腾讯计算机系统有限公司 Internet business operation monitoring method and Internet business operation monitoring system
US20140136692A1 (en) * 2012-11-14 2014-05-15 International Business Machines Corporation Diagnosing distributed applications using application logs and request processing paths
CN106100913A (en) * 2016-08-25 2016-11-09 北京票之家科技有限公司 Error message alignment system and method
CN107172113A (en) * 2016-03-08 2017-09-15 阿里巴巴集团控股有限公司 Treating method and apparatus when service call is abnormal
CN107301125A (en) * 2017-06-19 2017-10-27 广州华多网络科技有限公司 A kind of method, device and electronic equipment for finding root mistake

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103378982A (en) * 2012-04-17 2013-10-30 深圳市腾讯计算机系统有限公司 Internet business operation monitoring method and Internet business operation monitoring system
US20140136692A1 (en) * 2012-11-14 2014-05-15 International Business Machines Corporation Diagnosing distributed applications using application logs and request processing paths
CN107172113A (en) * 2016-03-08 2017-09-15 阿里巴巴集团控股有限公司 Treating method and apparatus when service call is abnormal
CN106100913A (en) * 2016-08-25 2016-11-09 北京票之家科技有限公司 Error message alignment system and method
CN107301125A (en) * 2017-06-19 2017-10-27 广州华多网络科技有限公司 A kind of method, device and electronic equipment for finding root mistake

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064387A (en) * 2020-08-07 2022-02-18 中国电信股份有限公司 Log monitoring method, system, device and computer readable storage medium
CN112837013A (en) * 2021-02-02 2021-05-25 拉扎斯网络科技(上海)有限公司 Service processing method, device and equipment
CN112837013B (en) * 2021-02-02 2023-08-11 拉扎斯网络科技(上海)有限公司 Service processing method, device and equipment

Also Published As

Publication number Publication date
CN110875832B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
US9329982B2 (en) Deployment pattern monitoring
CN111913818B (en) Method for determining dependency relationship between services and related device
US10169203B2 (en) Test simulation for software defined networking environments
US10644973B2 (en) Monitoring of availability data for system management environments
EP3239840B1 (en) Fault information provision server and fault information provision method
CN111327647B (en) Method and device for providing service to outside by container and electronic equipment
US11061669B2 (en) Software development tool integration and monitoring
US8914798B2 (en) Production control for service level agreements
CN110851471A (en) Distributed log data processing method, device and system
CN112313627A (en) Mapping mechanism of events to serverless function workflow instances
CN113377626A (en) Visual unified alarm method, device, equipment and medium based on service tree
CN110875832A (en) Abnormal service monitoring method, device and system and computer readable storage medium
CN113656252B (en) Fault positioning method, device, electronic equipment and storage medium
US11063946B2 (en) Feedback framework
CN108111343B (en) Method and equipment for realizing terminal monitoring based on cloud platform and computer storage medium
CN110618943B (en) Security service test method and device, electronic equipment and readable storage medium
CN111835566A (en) System fault management method, device and system
US20150193496A1 (en) Indoor positioning service scanning with trap enhancement
US10928986B1 (en) Transaction visibility frameworks implemented using artificial intelligence
CN114338494B (en) Service dependency topological relation obtaining method and device, storage medium and electronic equipment
US11941564B2 (en) Event monitoring with support system integration
CN114841648B (en) Material distribution method, device, electronic equipment and medium
CN109450700B (en) Visual service detection method and device
CN116743553A (en) Mirror image testing method and device
CN117692499A (en) Distributed monitoring management method, system, equipment and medium for operation and maintenance service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant