CN110875832B - Abnormal service monitoring method, device and system and computer readable storage medium - Google Patents

Abnormal service monitoring method, device and system and computer readable storage medium Download PDF

Info

Publication number
CN110875832B
CN110875832B CN201811014428.6A CN201811014428A CN110875832B CN 110875832 B CN110875832 B CN 110875832B CN 201811014428 A CN201811014428 A CN 201811014428A CN 110875832 B CN110875832 B CN 110875832B
Authority
CN
China
Prior art keywords
service
error
information
error information
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811014428.6A
Other languages
Chinese (zh)
Other versions
CN110875832A (en
Inventor
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811014428.6A priority Critical patent/CN110875832B/en
Publication of CN110875832A publication Critical patent/CN110875832A/en
Application granted granted Critical
Publication of CN110875832B publication Critical patent/CN110875832B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The invention provides an abnormal service monitoring method, device, system and computer readable storage medium, which are used for obtaining error information generated by a first service in an on-line operation process by obtaining the error information of the first service; then obtaining the error link information of the second service, wherein the second service is an upper layer service of the first service, and the error link information of the second service is likely to be the cause of error information generated by the first service because the second service is a prepositive process of the first service; based on the obtained error information of the first service and the error link information of the second service, determining the error link information of the first service, and collecting all relevant error information aiming at the first service abnormality; and finally, according to the error link information of the first service, sending an abnormal service notification to the user corresponding to the first service, thereby improving the reliability of abnormal service monitoring, assisting the user corresponding to the first service to solve the abnormality of the first service more quickly, and improving the efficiency of abnormal service monitoring.

Description

Abnormal service monitoring method, device and system and computer readable storage medium
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a method, an apparatus, a system, and a computer readable storage medium for monitoring abnormal services.
Background
With the continuous development of the internet and the increasing update of various business demands, various internet services are more and more diversified, and one internet service is mostly realized by the cooperation of a plurality of businesses with different functions. For example, internet e-commerce is a typical internet service that generally includes a series of businesses with sequential hierarchical relationships: the commodity buying and selling data analysis service, the commodity information pushing service, the commodity detailed information display service, the order generation service, the logistics inquiry service and the after-sale information maintenance service. In order to enhance the user experience, it is necessary to monitor and manage anomalies for each sub-service, especially the core service that is user-oriented and involves the main functions.
In the existing abnormal service monitoring method, a developer generally performs related abnormal monitoring configuration on a service responsible for development when the developer is on line, and the developer is notified of performing abnormal processing once an abnormal condition occurs in the service.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: in the process of monitoring the abnormal service, a manual investigation mode is needed to determine the reason of the abnormality, which leads to longer time for the abnormality investigation and lower investigation efficiency, and further leads to lower monitoring efficiency of the abnormal service.
Disclosure of Invention
The embodiment of the invention provides an abnormal service monitoring method, an abnormal service monitoring device, an abnormal service monitoring system and a computer readable storage medium, which improve the monitoring efficiency and the reliability of abnormal service.
According to a first aspect of the present invention, there is provided an abnormal traffic monitoring method, including:
acquiring error information of a first service, wherein the error information of the first service is generated in an on-line operation process of the first service;
acquiring error link information of a second service, wherein the second service is an upper layer service of the first service;
determining error link information of the first service according to the error information of the first service and the error link information of the second service;
and sending abnormal service notification to the user corresponding to the first service according to the error link information of the first service.
Optionally, in a possible implementation manner of the first aspect, the determining the error link information of the first service according to the error information of the first service and the error link information of the second service includes:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service;
According to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service;
and determining error link information of the first service according to the error information of the first service and the upper layer error information.
Optionally, in another possible implementation manner of the first aspect, the determining, according to the error information of the first service and the upper layer error information, error link information of the first service includes:
and sequentially combining the error information of the first service with the upper layer error information according to the error causality to obtain error link information of the first service.
Optionally, in still another possible implementation manner of the first aspect, before the acquiring the error link information of the second service, the method further includes:
sequentially inquiring the latest upper-layer service generating error information from the first service upwards according to a preset service level sequence;
and determining the latest upper layer service as a second service.
Optionally, in a further possible implementation manner of the first aspect, the sending, according to the error chain information of the first service, an abnormal service notification to a user corresponding to the first service includes:
Determining a notification mode and a user to be notified corresponding to the first service according to the importance level of the first service;
generating an abnormal service notification according to the error link information of the first service;
and sending the abnormal service notification to the user to be notified in the notification mode.
Optionally, in a further possible implementation manner of the first aspect, before the acquiring the error information of the first service, the method further includes:
receiving a verification instruction input by a user and data to be verified of a first service;
obtaining error configuration information corresponding to the first service according to the verification indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier;
if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service is verified;
correspondingly, the obtaining the error information of the first service, where the error information of the first service is generated in the on-line running process of the first service, includes:
in the running process of the first service line, obtaining an error log generated by an error log generation model of the first service;
And acquiring error information of the first service according to the error log.
Optionally, in a further possible implementation manner of the first aspect, before the obtaining, according to the check indication, error configuration information corresponding to the first service, the method further includes:
receiving a position identifier input by a user aiming at a first service and an error identifier corresponding to the position identifier;
and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
According to a second aspect of the present invention, there is provided an abnormal traffic monitoring apparatus comprising:
the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring error information of a first service, wherein the error information of the first service is generated in the on-line operation process of the first service;
the query module is used for acquiring error link information of a second service, wherein the second service is an upper-layer service of the first service;
the link establishment module is used for determining the error link information of the first service according to the error information of the first service and the error link information of the second service;
and the notification module is used for sending abnormal service notification to the user corresponding to the first service according to the error link information of the first service.
Optionally, in one possible implementation manner of the second aspect, the chain building module is specifically configured to:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service; according to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service; and determining error link information of the first service according to the error information of the first service and the upper layer error information.
Optionally, in another possible implementation manner of the second aspect, the chain building module is specifically configured to:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service; according to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service; and sequentially combining the error information of the first service with the upper layer error information according to the error causality to obtain error link information of the first service.
Optionally, in still another possible implementation manner of the second aspect, before the acquiring the error link information of the second service, the query module is further configured to:
sequentially inquiring the latest upper-layer service generating error information from the first service upwards according to a preset service level sequence; and determining the latest upper layer service as a second service.
Optionally, in a further possible implementation manner of the second aspect, the notification module is specifically configured to:
determining a notification mode and a user to be notified corresponding to the first service according to the importance level of the first service; generating an abnormal service notification according to the error link information of the first service; and sending the abnormal service notification to the user to be notified in the notification mode.
Optionally, in a further possible implementation manner of the second aspect, a verification module is further included, where the verification module is configured to:
before the acquisition module acquires the error information of the first service, receiving a verification instruction input by a user and data to be verified of the first service; obtaining error configuration information corresponding to the first service according to the verification indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier; if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service is verified;
Correspondingly, the acquisition module is specifically configured to:
in the running process of the first service line, obtaining an error log generated by an error log generation model of the first service; and acquiring error information of the first service according to the error log.
Optionally, in a further possible implementation manner of the second aspect, the method further includes an application module, configured to:
before the verification module acquires error configuration information corresponding to the first service according to the verification instruction, receiving a position identifier input by a user aiming at the first service and an error identifier corresponding to the position identifier; and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
According to a third aspect of the present invention, there is provided an abnormal traffic monitoring system comprising: the system comprises a memory, at least one processor and at least one computer program, wherein the at least one computer program is stored in the memory, and the at least one processor runs the at least one computer program to execute the abnormal service monitoring method according to the first aspect and various possible designs of the first aspect.
According to a fourth aspect of the present invention, there is provided a readable storage medium having stored therein a computer program for implementing the abnormal traffic monitoring method of the first aspect and the various possible designs of the first aspect of the present invention when the computer program is executed by a processor.
The embodiment of the invention provides an abnormal service monitoring method, device, system and computer readable storage medium, wherein error information generated by a first service in an on-line operation process is obtained by obtaining the error information of the first service; then obtaining error link information of a second service, wherein the second service is an upper layer service of the first service, and the error link information of the second service is likely to be a cause of generating error information by the first service because the second service is a prepositive process of the first service; determining the error link information of the first service based on the obtained error information of the first service and the error link information of the second service, and collecting all relevant error information aiming at first service abnormality; and finally, according to the error link information of the first service, sending an abnormal service notification to the user corresponding to the first service, so that the user corresponding to the first service can not only obtain the error information of the first service, but also obtain the abnormal condition of the upper layer service possibly causing the abnormality of the first service, thereby improving the reliability of abnormal service monitoring, assisting the user corresponding to the first service to solve the abnormality of the first service more quickly, and improving the efficiency of abnormal service monitoring.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention;
fig. 2 is a schematic flow chart of an abnormal service monitoring method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a business hierarchy order provided by an embodiment of the present invention;
FIG. 4 is a schematic flow chart of another abnormal service monitoring method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an abnormal service monitoring device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of another abnormal service monitoring apparatus according to an embodiment of the present invention;
fig. 7 is a schematic hardware structure of an abnormal service monitoring system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.
It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.
It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.
As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.
It should be understood that, in the present invention, a "service" may be understood as a program module that may be developed separately, and may be an application program that provides a service, or may be a separate functional module in the application program. For example, an e-commerce application installed in a cell phone, whose user provides e-commerce services, but which may be subdivided into a plurality of functions, for example, may include pre-sales (pre-sales) and after-sales (after-sales) services, which provide services independently.
The term flime is a distributed, reliable, and highly available system of massive log collection, aggregation, and transmission. Supporting customization of various data transmitters in a log system for collecting data; meanwhile, the jump provides for simple processing of the data.
The term Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all action flow data in consumer-scale websites. These data are typically addressed by processing logs and log aggregations due to throughput requirements. The purpose of Kafka is to unify on-line and off-line message processing through the Hadoop parallel loading mechanism, and also to provide real-time messages through the clusters.
The term Storm is a distributed, fault tolerant real-time computing system, typically consisting of a master node and a plurality of working nodes. The master node runs a daemon called "Nimbus" for code allocation, task placement, and fault detection. Each work node runs a daemon process called "Supervisor" for listening to work, starting and terminating the work process.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Referring to fig. 1, an application scenario is schematically shown in the embodiment of the present invention. The plurality of servers 1 shown in fig. 1 are used for development, testing and online operation of a plurality of services by developers, and provide service services to the clients 2 through the internet or other connection modes after online operation. After the service is online, the developer also needs to maintain the abnormal problems found in the service operation process in time. One or more servers 1 may monitor abnormal services and notify relevant responsible persons, such as occurrence of anomalies such as data overload, data access errors or loading time-out, by executing the methods described in the various embodiments of the present invention, and notify developers of timely optimizing and adjusting the constituent codes of the services so as to improve the use experience of the client 2. The server 1 may be an e-commerce server, a game server, an interactive system server, a data storage server, or the like according to the type of the service provided, and the present invention is not particularly limited to the type of the server and the type of the service. The embodiment of the present invention shown in fig. 1 may be executed by one server 1 or may be executed by a plurality of servers 1. In the case of execution by a plurality of servers 1, the plurality of servers 1 may be server groups provided in the same machine room, or servers provided in a distributed manner to provide a plurality of different development platforms and cooperatively operate.
The occurrence of an anomaly for each service may be the cause of the service itself, or may be the cause of other services. For example, the commodity information pushing service generates error information of pushing delay, but the reason of the pushing delay is that the source data of the commodity information pushing service is wrong, and the source data of the commodity information pushing service is processed by the commodity buying and selling data analysis service of the previous level, so that the commodity buying and selling data analysis service needs to be adjusted or updated to solve the abnormality of the pushing delay. However, in the prior art, each service is usually responsible for exception handling by a corresponding developer or maintainer as a responsible person, and after the responsible person of the commodity information pushing service receives the exception notification of the pushing delay, a great amount of time is spent on checking the code program of the commodity information pushing service, so that a great amount of time is wasted. The abnormal service monitoring method provided by the embodiment of the invention integrates the error chain information of the related upstream service on the basis of the error information of the current service and sends the error chain information to the responsible person, thereby improving the reliability of notification and improving the accuracy and efficiency of positioning the abnormal reasons in abnormal service monitoring.
Referring to fig. 2, a flowchart of an abnormal service monitoring method provided by an embodiment of the present invention is shown, where an execution body of the method shown in fig. 2 may be a software and/or hardware device. The method shown in fig. 2 mainly includes steps S101 to S104, and specifically includes the following steps:
s101, obtaining error information of a first service, wherein the error information of the first service is generated in an online operation process of the first service.
It can be understood that, when the first service is abnormal in the operation process, an error log is automatically generated, and the execution body of the embodiment collects the error logs in real time or periodically and extracts error information therein, thereby obtaining the error information of the first service.
In an alternative implementation manner, the execution body of the embodiment may include a data acquisition subsystem dedicated to acquiring the error information of the first service, where the data acquisition subsystem performs error log monitoring on each running service, and reads the error log through the flash once it is monitored that the service generates the error log. The Flume for load acquisition error logs can be used for distributed aggregation and transmission of the logs. The Flume imports the collected error logs of the first service into the kafka cluster as a producer, and the kafka can be used for distributively receiving massive log information in real time. The error information produced according to the error log is transmitted to the storm cluster for real-time processing. storm is a distributed system capable of processing kakfa-produced messages in real time, and kafka+storm is a high performance real time data processing mechanism in large data processing. The error information of the storm consumption is first stored in the HBASE, and if the data is 7 days earlier, the data is transferred from the HBASE to the HDFS for long-term archiving. The data stored in the HBASE is the acquired error information of the first service.
S102, obtaining error link information of a second service, wherein the second service is an upper layer service of the first service.
It may be understood that the second service is selected from the upper layer services of the first service before the error link information of the second service is acquired.
There are a number of ways to determine the second service, and in an alternative implementation, the second service may be understood as all services hierarchically ordered above the first service. Determining all the services above the first service as second service according to a preset service hierarchy sequence; acquiring at least one error message generated by a second service; and obtaining error link information of the second service according to at least one error message generated by the second service. For example, each service only saves the error information it generates and does not cancel the error information until the error is eliminated. When each service generates error information, only the error information generated by each service on the upper layer is traversed, and then the error link information of the second service is obtained by combining.
In another alternative implementation, the second service may be understood as a single service. Specifically, the latest upper layer service generating error information can be queried from the first service upwards in sequence according to a preset service level sequence; and then determining the latest upper layer service as a second service. Each service creates and saves an error link message when generating the error message, and does not cancel the error link message until the error is eliminated. When each service generates error information, the service can establish own error link information only by searching the upper layer service which has the error information and is closest to the upper layer service without traversing all the upper layer services, thereby improving the efficiency of establishing the error link information. The preset service hierarchy order may be a hierarchy order that specifies different services according to a timing sequence or implementation logic of the data processing flow. For example, service B is to process output data of service a, and it is seen that service B is implemented based on output of service a, and then service a is layered before service B. Then, the first service is used as a starting point to judge whether error information is generated or not in a layer-by-layer mode, and an upper layer service which has the error information and is closest to the first service layer relation is used as a second service.
Referring to fig. 3, a schematic diagram of a service level sequence is provided in an embodiment of the present invention. Fig. 3 shows 6 services with sequential hierarchical relationships: service a, service B, service C, service D, service E, service F. The 6 businesses can belong to the same electronic commerce service application. The e-commerce service application may be understood as an application program on a mobile phone terminal, as website information in a browser, or as a system in a dedicated platform device. The E-commerce service application may include a variety of hierarchical relationships in which only the traffic C and the traffic D generate error information in the upper layer traffic provided that the traffic E is the first traffic as shown in fig. 3. In fig. 3, the error information 31 of service C is illustrated with triangles; error information 32 of service D is illustrated in square, error link information 321 of service D is illustrated in triangle and square connected in sequence; error information 33 of service E is illustrated in circles; the error chain information 331 of service E is illustrated in sequentially connected triangles, squares and circles. If the service D is taken as the second service, the error link information 321 of the service D may be directly taken as the error link information of the second service; if the service C and the service D are used as the second service, or the service a, the service B, the service C and the service D are used as the second service, the error information 31 of the service C and the error information 32 of the service D are obtained, and the error link information of the second service is obtained by combining.
S103, determining the error link information of the first service according to the error information of the first service and the error link information of the second service.
It can be understood that the error information of the first service is combined, spliced or extracted with the error link information of the second service to obtain the error link information of the first service. The error link information of the first service should contain related error information of the upstream service in addition to the error information of the first service itself.
The method for obtaining the error link information of the first service may be various, and in an alternative implementation, the error link information of the first service may be obtained by directly splicing the error link information of the first service to the error link information of the second service in sequence. With continued reference to the example shown in FIG. 3, various error messages are identified, for example, with error codes, error message 31 for service C is 5138/513G/5131, error message 32 for service D is 51384/513F9, and error link message 321 for the second service is 5138/513G/5131/51384/513F9. The error information of the first service is 51384Y, and the obtained error link information of the first service is 5138/513G/5131/51384/513F9/51384Y. The error codes may be preset by the system for various error types, or may be applied in advance by a user (for example, a developer) in a service code writing stage and preset with a detection position.
In another optional implementation manner, the error chain information of the second service may be first parsed to obtain an error information set corresponding to the second service, where the error information set corresponding to the second service includes error information generated by the second service and error information generated by an upper layer service of the second service. Assuming that the error link information of the second service is a queue of error information of the second service and its upper layer service, each error information is parsed from the queue. In the above example, the error code is taken as the error information, 5138, 513G, 5131, 51384, and 513F9 may be obtained by parsing 5138/513G/5131/51384/513F 9. And then, according to the preset error causality, determining upper-layer error information with causality to the error information of the first service in the error information set corresponding to the second service. The preset error causal relationship is, for example, a front-and-back logic causal relationship determined according to a data processing flow, for example, when a commodity image uploading failure occurs in a data storage service, a later commodity display service may have an error that a displayed commodity cannot be displayed, so that an error message of "the commodity image uploading failure" and an error message of "the displayed commodity cannot be displayed" are causal relationships. In the above example, the error link information 321 of the second service is 5138/513G/5131/51384/513F9, where the error information underlined is the upper layer error information having a causal relationship with the error information 51384Y of the first service. And finally, determining error link information of the first service according to the error information of the first service and the upper layer error information. It can be understood that, according to the error causal relationship, the error information of the first service and the upper layer error information are sequentially combined to obtain the error link information of the first service. It is understood that the error link information of the first service is formed by the error information of the first service and the upper layer error information. For example, the obtained error link information of the first service is 5138/51384/51384Y. According to the error link information of the first service obtained by the error causal relationship, the error cause of the service can be more clearly indicated, the reliability of the follow-up abnormal service notification can be further improved, and the abnormal monitoring efficiency is further improved.
S104, according to the error link information of the first service, sending an abnormal service notification to the user corresponding to the first service.
The mode of sending the abnormal service notification can be telephone voice prompt, short message, mail or system prompt and the like. The user corresponding to the first service may be an actual developer, and/or a manager. In the service hierarchy sequence, because the lower layer service realizes various functions based on the processing result of the upper layer service, the more the user corresponding to the lower layer service is, the more the service corresponding to the obtained error link information is likely to be, and the more the user authority corresponding to the lower layer service is understood to be, the more the abnormality reasons need to be considered. In an optional implementation manner, a notification manner and a user to be notified corresponding to the first service may be determined according to the importance level of the first service. And then generating abnormal service notification according to the error link information of the first service. And finally, the abnormal service notification is sent to the user to be notified in the notification mode. For example, if the first service is a preferential information push service, the main function of the electronic commerce may not be affected, and then an abnormal service notification may be sent to a maintainer of the preferential information push service through a weak notification manner (such as a system message); if the first service is a commodity interface display service, the main function of the electronic commerce may be directly affected, and then an abnormal service notification is sent to maintenance personnel, developers and authorities of the commodity interface display service in a strong notification mode (such as telephone voice prompt).
The embodiment provides an abnormal service monitoring method, which comprises the steps of obtaining error information generated by a first service in an on-line operation process by obtaining the error information of the first service; then obtaining error link information of a second service, wherein the second service is an upper layer service of the first service, and the error link information of the second service is likely to be a cause of generating error information by the first service because the second service is a prepositive process of the first service; determining the error link information of the first service based on the obtained error information of the first service and the error link information of the second service, and collecting all relevant error information aiming at first service abnormality; and finally, according to the error link information of the first service, sending an abnormal service notification to the user corresponding to the first service, so that the user corresponding to the first service can not only obtain the error information of the first service, but also obtain the abnormal condition of the upper layer service possibly causing the abnormality of the first service, thereby improving the reliability of abnormal service monitoring, assisting the user corresponding to the first service to solve the abnormality of the first service more quickly, and improving the efficiency of abnormal service monitoring.
Referring to fig. 4, a flowchart of another abnormal service monitoring method according to an embodiment of the present invention is provided, and in order to more clearly describe various implementations of the present invention, in an alternative implementation, based on the method embodiment shown in fig. 2 and various possible implementations thereof, a service verification process shown in fig. 4 may be further included before step S101 (obtaining error information of the first service). The method shown in fig. 4 mainly includes steps S201 to S203, specifically as follows:
s201, receiving a verification instruction input by a user and data to be verified of the first service.
It may be understood that, during the development test or online approval process, a user (for example, a developer) inputs a verification instruction and data to be verified of the first service (for example, a code of the first service), and initiates a verification process in the system.
S202, error configuration information corresponding to the first service is obtained according to the check indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier.
It can be understood that, before the user initiates the verification process, or before the system starts to verify, the system performs the error configuration information application process of the first service: the method specifically comprises the steps of receiving a position identifier input by a user for a first service and an error identifier corresponding to the position identifier. And acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier. The user (e.g. developer) needs to apply for registration of the error configuration information that needs to be configured in the first service code, and the system is pre-recorded so as to be able to check in the subsequent verification and possible error information identification process. For example, a user (for example, a developer) needs to set a code of an error log generation model at a 530 th line code of a service so that a corresponding error log can be generated when a commodity image is displayed in error, and then the user needs to apply for error configuration information on an error type for which the error log generation model is directed and a configuration position of the error log generation model in an error configuration information application process, for example, "error code: 51384Y; an image display error; 530". The error configuration information of each service is prestored in advance in the application process, so that in the verification process, when the system detects that the first service is verified, the error configuration information prestored for the first service can be obtained.
S203, if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be checked of the first service, a result of passing the check of the first service is obtained.
It can be understood that in the data to be checked of the first service, the configuration condition of the error log generation model is compared with the pre-stored error configuration information, if the comparison result is consistent, the verification is passed, and if the comparison result is inconsistent, the verification is not passed. If the verification is passed, the test is qualified or other online approval flows can be continued, and if the verification is not passed, the types of error log generation models of inconsistent positions and lack of configuration can be sent to the user so as to make up for the user.
In the embodiment shown in fig. 4, correspondingly, the step S101 (obtaining the error information of the first service, where the error information of the first service is generated during the on-line running process of the first service) in the foregoing embodiment may specifically include: in the running process of the first service line, obtaining an error log generated by an error log generation model of the first service; and acquiring error information of the first service according to the error log. It can be understood that when the first service is abnormal, the error log generated by the error log generating model is configured in the first service, and the system extracts the error information of the first service from the error log.
According to the embodiment, through the pre-storage of the error configuration information in the application process, the error configuration information can be automatically checked and approved before the business is online, the error log generation model is ensured to be correctly configured in each business, and the reliability of monitoring the abnormal business can be improved.
Referring to fig. 5, which is a schematic structural diagram of an abnormal service monitoring apparatus according to an embodiment of the present invention, an abnormal service monitoring apparatus 50 shown in fig. 5 mainly includes the following modules:
the obtaining module 51 is configured to obtain error information of a first service, where the error information of the first service is generated in an online operation process of the first service.
And the query module 52 is configured to obtain error link information of a second service, where the second service is an upper layer service of the first service.
And the link establishment module 53 is configured to determine the error link information of the first service according to the error information of the first service and the error link information of the second service.
And the notification module 54 is configured to send an abnormal service notification to a user corresponding to the first service according to the error link information of the first service.
The abnormal service monitoring apparatus of the embodiment shown in fig. 5 may be correspondingly used to perform the steps in the embodiment of the method shown in fig. 2, and the implementation principle and technical effects are similar, and are not repeated herein.
Optionally, the chain building module 53 is specifically configured to:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service; according to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service; and determining error link information of the first service according to the error information of the first service and the upper layer error information.
Optionally, the chain building module 53 is specifically configured to:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service; according to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service; and sequentially combining the error information of the first service with the upper layer error information according to the error causality to obtain error link information of the first service.
Optionally, before the obtaining the error link information of the second service, the query module 52 is further configured to:
sequentially inquiring the latest upper-layer service generating error information from the first service upwards according to a preset service level sequence; and determining the latest upper layer service as a second service.
Optionally, the notification module 54 is specifically configured to:
determining a notification mode and a user to be notified corresponding to the first service according to the importance level of the first service; generating an abnormal service notification according to the error link information of the first service; and sending the abnormal service notification to the user to be notified in the notification mode.
Referring to fig. 6, a schematic structural diagram of another abnormal service monitoring apparatus according to an embodiment of the present invention is shown. On the basis of the above embodiment, the abnormal traffic monitoring apparatus 50 may further include a verification module 55 for:
before the acquisition module acquires the error information of the first service, receiving a verification instruction input by a user and data to be verified of the first service; obtaining error configuration information corresponding to the first service according to the verification indication, wherein the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier; and if an error log generation model matched with the error identifier is acquired at the position indicated by each position identifier in the data to be verified of the first service, acquiring a result of passing the verification of the first service.
Accordingly, the obtaining module 51 is specifically configured to:
in the running process of the first service line, obtaining an error log generated by an error log generation model of the first service; and acquiring error information of the first service according to the error log.
With continued reference to the abnormal traffic monitoring apparatus structure shown in fig. 6, optionally, an application module 56 may further be included for:
before the verification module acquires error configuration information corresponding to the first service according to the verification instruction, receiving a position identifier input by a user aiming at the first service and an error identifier corresponding to the position identifier; and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
The abnormal service monitoring apparatus of the embodiment shown in fig. 6 may be correspondingly used to perform the steps in the method embodiment shown in fig. 4, and the implementation principle and technical effects are similar, and are not repeated herein.
Referring to fig. 7, a hardware structure diagram of an abnormal service monitoring system according to an embodiment of the present invention is provided, where the abnormal service monitoring system 60 includes: a memory 62, at least one processor 61 and at least one computer program; wherein, the liquid crystal display device comprises a liquid crystal display device,
A memory 62 for storing the at least one computer program, which may also be a flash memory (flash). Such as application programs, functional modules, etc. implementing the methods described above.
A processor 61 for executing at least one computer program stored in the memory to implement the steps of the abnormal traffic monitoring method described above. Reference may be made in particular to the description of the embodiments of the method described above.
Alternatively, the memory 62 may be separate or integrated with the processor 61.
When the memory 62 is a device independent from the processor 61, the abnormal traffic monitoring system 60 may further include:
a bus 63 for connecting the memory 62 and the processor 61. The terminal of fig. 7 may further include a transmitter (not shown) for transmitting the abnormal service notification generated by the processor 61 to the first service counterpart user.
The specific implementation manner of the abnormal service monitoring system may be implemented by a server manner or a terminal manner, and the embodiment of the present invention is not limited thereto.
The present invention also provides a readable storage medium having stored therein a computer program for implementing the abnormal traffic monitoring method provided in the above various embodiments when executed by a processor.
The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). In addition, the ASIC may reside in a user device. The processor and the readable storage medium may reside as discrete components in a communication device.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and execution of the execution instructions by the at least one processor causes the device to implement the abnormal traffic monitoring method provided by the various embodiments described above.
In the above system embodiment, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (16)

1. An abnormal service monitoring method is characterized by comprising the following steps:
acquiring error information of a first service, wherein the error information of the first service is generated in an on-line operation process of the first service;
acquiring error link information of a second service, wherein the second service is an upper layer service of the first service;
determining error link information of the first service according to the error information of the first service and the error link information of the second service;
according to the error link information of the first service, sending an abnormal service notification to a user corresponding to the first service;
before the error information of the first service is acquired, the method further comprises:
checking the first service according to the error configuration information corresponding to the first service to determine whether the first service is configured with an error log generation model; the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier; the error configuration information of the first service is systematically input when a user applies for registration of the error configuration information to be configured in the service code of the first service; the system stores error configuration information of each service; the error information of the first service is obtained according to an error log of the first service; the error log of the first service is generated according to the error log generation model, and the error log generation model is matched with the error identification included in the error configuration information of the first service.
2. The method of claim 1, wherein the determining the error link information for the first service based on the error information for the first service and the error link information for the second service comprises:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service;
according to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service;
and determining error link information of the first service according to the error information of the first service and the upper layer error information.
3. The method of claim 2, wherein the determining the error chain information of the first service according to the error information of the first service and the upper layer error information comprises:
and sequentially combining the error information of the first service with the upper layer error information according to the error causality to obtain error link information of the first service.
4. A method according to any one of claims 1 to 3, further comprising, prior to said obtaining the error link information of the second service:
sequentially inquiring the latest upper-layer service generating error information from the first service upwards according to a preset service level sequence;
and determining the latest upper layer service as a second service.
5. The method of claim 1, wherein the sending, according to the error link information of the first service, an abnormal service notification to the user corresponding to the first service includes:
determining a notification mode and a user to be notified corresponding to the first service according to the importance level of the first service;
generating an abnormal service notification according to the error link information of the first service;
and sending the abnormal service notification to the user to be notified in the notification mode.
6. The method according to claim 1 or 5, wherein the verifying the first service according to the error configuration information corresponding to the first service to determine whether the first service has configured an error log generation model includes:
receiving a verification instruction input by a user and data to be verified of a first service;
Obtaining error configuration information corresponding to the first service according to the check indication;
if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service is verified;
correspondingly, the obtaining the error information of the first service, where the error information of the first service is generated in the on-line running process of the first service, includes:
in the running process of the first service line, obtaining an error log generated by an error log generation model of the first service;
and acquiring error information of the first service according to the error log.
7. The method of claim 6, further comprising, prior to said obtaining error configuration information corresponding to said first service based on said check indication:
receiving a position identifier input by a user aiming at a first service and an error identifier corresponding to the position identifier;
and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
8. An abnormal traffic monitoring apparatus, comprising:
the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring error information of a first service, wherein the error information of the first service is generated in the on-line operation process of the first service;
the query module is used for acquiring error link information of a second service, wherein the second service is an upper-layer service of the first service;
the link establishment module is used for determining the error link information of the first service according to the error information of the first service and the error link information of the second service;
the notification module is used for sending abnormal service notification to the user corresponding to the first service according to the error link information of the first service;
the system further comprises a verification module for:
before the acquisition module acquires the error information of a first service, checking the first service according to the error configuration information corresponding to the first service to determine whether the first service is configured with an error log generation model; the error configuration information comprises a position identifier and an error identifier corresponding to the position identifier; the error configuration information of the first service is systematically input when a user applies for registration of the error configuration information to be configured in the service code of the first service; the system stores error configuration information of each service; the error information of the first service is obtained according to an error log of the first service; the error log of the first service is generated according to the error log generation model, and the error log generation model is matched with the error identification included in the error configuration information of the first service.
9. The device according to claim 8, characterized in that said link-building module is specifically configured to:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service; according to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service; and determining error link information of the first service according to the error information of the first service and the upper layer error information.
10. The device according to claim 9, characterized in that said link-building module is specifically configured to:
analyzing the error chain information of the second service to obtain an error information set corresponding to the second service, wherein the error information set corresponding to the second service comprises error information generated by the second service and error information generated by an upper layer service of the second service; according to the preset error causality, determining upper layer error information with causality with the error information of the first service in an error information set corresponding to the second service; and sequentially combining the error information of the first service with the upper layer error information according to the error causality to obtain error link information of the first service.
11. The apparatus according to any one of claims 8 to 10, wherein the query module, prior to the obtaining the error link information of the second service, is further configured to:
sequentially inquiring the latest upper-layer service generating error information from the first service upwards according to a preset service level sequence; and determining the latest upper layer service as a second service.
12. The apparatus of claim 8, wherein the notification module is specifically configured to:
determining a notification mode and a user to be notified corresponding to the first service according to the importance level of the first service; generating an abnormal service notification according to the error link information of the first service; and sending the abnormal service notification to the user to be notified in the notification mode.
13. The apparatus according to claim 8 or 12, wherein the verification module is specifically configured to:
receiving a verification instruction input by a user and data to be verified of a first service; obtaining error configuration information corresponding to the first service according to the check indication; if an error log generation model matched with the error identifier is obtained at the position indicated by each position identifier in the data to be verified of the first service, obtaining a result that the first service is verified;
Correspondingly, the acquisition module is specifically configured to:
in the running process of the first service line, obtaining an error log generated by an error log generation model of the first service; and acquiring error information of the first service according to the error log.
14. The apparatus of claim 13, further comprising an application module configured to:
before the verification module acquires error configuration information corresponding to the first service according to the verification instruction, receiving a position identifier input by a user aiming at the first service and an error identifier corresponding to the position identifier; and acquiring or updating error configuration information corresponding to the first service according to the position identifier and the error identifier corresponding to the position identifier.
15. An abnormal traffic monitoring system, comprising: memory, at least one processor and at least one computer program, the at least one computer program stored in the memory, the at least one processor running the at least one computer program to perform the abnormal traffic monitoring method of any one of claims 1 to 7.
16. A readable storage medium, wherein a computer program is stored in the readable storage medium, the computer program being for implementing the abnormal traffic monitoring method according to any one of claims 1 to 7 when being executed by a processor.
CN201811014428.6A 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium Active CN110875832B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811014428.6A CN110875832B (en) 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811014428.6A CN110875832B (en) 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110875832A CN110875832A (en) 2020-03-10
CN110875832B true CN110875832B (en) 2023-05-12

Family

ID=69715440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811014428.6A Active CN110875832B (en) 2018-08-31 2018-08-31 Abnormal service monitoring method, device and system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110875832B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064387A (en) * 2020-08-07 2022-02-18 中国电信股份有限公司 Log monitoring method, system, device and computer readable storage medium
CN112837013B (en) * 2021-02-02 2023-08-11 拉扎斯网络科技(上海)有限公司 Service processing method, device and equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103378982A (en) * 2012-04-17 2013-10-30 深圳市腾讯计算机系统有限公司 Internet business operation monitoring method and Internet business operation monitoring system
US9069668B2 (en) * 2012-11-14 2015-06-30 International Business Machines Corporation Diagnosing distributed applications using application logs and request processing paths
CN107172113B (en) * 2016-03-08 2020-06-12 阿里巴巴集团控股有限公司 Processing method and device in abnormal service call
CN106100913A (en) * 2016-08-25 2016-11-09 北京票之家科技有限公司 Error message alignment system and method
CN107301125B (en) * 2017-06-19 2021-08-24 广州华多网络科技有限公司 Method and device for searching root error and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
温立辉.《Java EE编程技术》.北京理工大学出版社,2016,(第第1版版),第175-176页. *

Also Published As

Publication number Publication date
CN110875832A (en) 2020-03-10

Similar Documents

Publication Publication Date Title
US11449379B2 (en) Root cause and predictive analyses for technical issues of a computing environment
US11269718B1 (en) Root cause detection and corrective action diagnosis system
US9898397B2 (en) Deployment pattern monitoring
US10158726B2 (en) Supporting high availability for orchestrated services
CN111913818B (en) Method for determining dependency relationship between services and related device
US10169203B2 (en) Test simulation for software defined networking environments
CN110851471A (en) Distributed log data processing method, device and system
US11416379B1 (en) Creation of software tests matching production personas
CN110875832B (en) Abnormal service monitoring method, device and system and computer readable storage medium
CN110740071A (en) network interface monitoring method, device and system
US11410049B2 (en) Cognitive methods and systems for responding to computing system incidents
CN109299124B (en) Method and apparatus for updating a model
CN110727575B (en) Information processing method, system, device and storage medium
CN115705190A (en) Method and device for determining dependence degree
WO2023163846A1 (en) System and methods for application failover automation
CN115934453A (en) Troubleshooting method, troubleshooting device and storage medium
CN115190008B (en) Fault processing method, fault processing device, electronic equipment and storage medium
CN116708135B (en) Network service fault monitoring method and device, electronic equipment and storage medium
CN113535568B (en) Verification method, device, equipment and medium for application deployment version
CN113031960B (en) Code compiling method, device, server and storage medium
US10928986B1 (en) Transaction visibility frameworks implemented using artificial intelligence
Afifah et al. Cost-Effective Automation: Cloud-Based Monitoring Combining HPA with VPA for Scalable Startups
CN116032745A (en) Automatic configuration method and device of hadoop cluster
CN115469882A (en) Software project management method and device, electronic equipment and storage medium
CN115080424A (en) Service instance testing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant