CN111010291B - Business process abnormity warning method and device, electronic equipment and storage medium - Google Patents

Business process abnormity warning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111010291B
CN111010291B CN201911169771.2A CN201911169771A CN111010291B CN 111010291 B CN111010291 B CN 111010291B CN 201911169771 A CN201911169771 A CN 201911169771A CN 111010291 B CN111010291 B CN 111010291B
Authority
CN
China
Prior art keywords
monitoring
server
parameter value
data packet
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911169771.2A
Other languages
Chinese (zh)
Other versions
CN111010291A (en
Inventor
李扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enyike Beijing Data Technology Co ltd
Original Assignee
Enyike Beijing Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enyike Beijing Data Technology Co ltd filed Critical Enyike Beijing Data Technology Co ltd
Priority to CN201911169771.2A priority Critical patent/CN111010291B/en
Publication of CN111010291A publication Critical patent/CN111010291A/en
Application granted granted Critical
Publication of CN111010291B publication Critical patent/CN111010291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Abstract

The application provides a business process abnormity warning method, a business process abnormity warning device, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting monitoring data packets returned by buried points arranged in all levels of servers at intervals of a preset time period, wherein each monitoring data packet carries identification information of a server of a data source; extracting a monitoring parameter value in each monitoring data packet, and acquiring a corresponding monitoring threshold value according to identification information carried by the monitoring data packet; performing abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a total judgment result of the service system; and generating alarm information according to the total judgment result. According to the method and the device, the plurality of monitoring data packets are sequenced according to the preset rule, important monitoring data packets can be preferentially processed according to the requirements, the fault or abnormal root cause can be found quickly, and the maintenance efficiency of workers on the service system is improved conveniently.

Description

Business process abnormity warning method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer network technologies, and in particular, to a method and an apparatus for alarming a business process exception, an electronic device, and a storage medium.
Background
With the development of software technology, the functions of software are more and more rich and powerful. The corresponding software background business process is longer and more complex, and often one function needs to call a plurality of business systems. This results in that when an abnormal or faulty business process occurs, an operator or developer cannot quickly and effectively locate the link where the abnormal or faulty business process occurs.
In view of the above problems, no effective technical solution exists at present.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for alarming a business process exception, which can quickly locate a fault or an exception of a business system, and improve a maintenance speed of the fault or the exception.
In a first aspect, an embodiment of the present application provides a method for alarming an abnormal service flow, which is used for monitoring a service system, where the service system includes cascaded multi-stage servers, and the method includes the following steps:
collecting monitoring data packets returned by buried points arranged in all levels of servers at intervals of a preset time period, wherein each monitoring data packet carries identification information of a server of a data source;
extracting a monitoring parameter value in each monitoring data packet, and acquiring a corresponding monitoring threshold value according to identification information carried by the monitoring data packet;
performing abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a total judgment result of the service system;
and generating alarm information according to the total judgment result.
According to the embodiment of the application, monitoring data packets returned by the embedded points of all levels of servers are collected, and the monitoring parameter value in each monitoring data packet is compared with the corresponding preset monitoring threshold value, so that the abnormity judgment of the service system is realized, the abnormity or fault position can be quickly positioned, the staff can maintain quickly, and the operation smoothness of the service system is improved.
Optionally, in the method for alarming an abnormal service flow in an embodiment of the present application, the step of extracting the monitoring parameter value in each monitoring data packet and obtaining the corresponding monitoring threshold according to the identification information carried by the monitoring data packet includes:
sequencing the monitoring data packets returned by each buried point according to a preset rule to generate a data packet queue;
and sequentially extracting the monitoring parameter value of each monitoring data packet from the data packet queue, and acquiring a corresponding monitoring threshold according to the identification information carried by the monitoring data packet.
According to the embodiment of the application, the plurality of monitoring data packets are sequenced according to the preset rule, so that important monitoring data packets can be preferentially processed according to requirements, and the fault or abnormal root cause can be found quickly.
Optionally, in the method for alarming an abnormal service flow in an embodiment of the present application, the step of sorting the monitoring data packets returned by each embedded point according to a preset rule to generate a data packet queue includes:
sequencing the monitoring data packets returned by each embedded point according to the cascade order of the multi-stage servers and the acquisition time of the monitoring data packets to generate a data packet queue; wherein, in the monitoring data packets collected at the same time, the monitoring data packet of the superior server is arranged in front of the monitoring data packet of the inferior server; in the monitoring data packets acquired at different times, the monitoring data packet with the earlier acquisition time is arranged before the monitoring data packet with the later acquisition time.
Optionally, in the method for alarming an abnormal service flow in an embodiment of the present application, the step of performing an abnormal judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value, and generating a judgment result includes:
comparing the monitoring parameter value of each server with a corresponding monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the monitoring parameter value is abnormal in service;
and generating a total judgment result according to the abnormal judgment result of the monitoring parameter values of the servers at all levels, wherein the total judgment result comprises the abnormal information and the abnormal source of the servers at all levels.
According to the embodiment of the application, the abnormal root of the whole service system is judged by comprehensively analyzing the abnormal judgment results of all the servers, so that the fault source can be quickly positioned.
Optionally, in the method for alarming an exception in a service flow according to the embodiment of the present application, the monitoring data packet includes a first monitoring parameter value and at least one second monitoring parameter value, where the first monitoring parameter value is a parameter value used for determining whether the current-stage server is abnormal, and the second monitoring parameter value is a parameter value that affects a service of a server at each stage of a lower stage of the current-stage server;
the step of performing an abnormal judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a judgment result comprises:
comparing the first monitoring parameter value of each server with the corresponding first monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the first monitoring parameter value is abnormal in service;
if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a monitoring data packet corresponding to a superior server of the abnormal server;
comparing the second monitoring parameter value with a second preset threshold value to judge whether the abnormal root of the abnormal server is originated from the server or from the superior server;
and generating a total judgment result of the service system according to the abnormal judgment result of the servers at all levels.
According to the embodiment of the application, the second monitoring parameter value in the superior server is combined to judge whether the root of the self abnormity is the self operation problem or originates from the superior server, so that the abnormity source can be quickly and accurately found, and the maintenance efficiency of the service system is improved conveniently.
Optionally, in the method for alarming a service flow abnormality in an embodiment of the present application, the monitoring data packet includes a first monitoring parameter value and at least one second monitoring parameter value, where the first monitoring parameter value is a parameter value used for determining whether the current-stage server is abnormal, and the second monitoring parameter value is a parameter value having an influence on a service of each-stage server of a lower stage of the current-stage server;
the step of performing an abnormal judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a judgment result comprises:
comparing the first monitoring parameter value of each server with the corresponding first monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the first monitoring parameter value is abnormal in service;
if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a plurality of monitoring data packets corresponding to a superior server of the abnormal server, wherein the plurality of monitoring data packets corresponding to the superior server comprise a monitoring data packet at a time point corresponding to the first monitoring parameter value and a plurality of monitoring data packets at each time point in a preset time period before the corresponding time point;
and judging whether the abnormal root of the abnormal server is originated from the server or the superior server according to the fluctuation condition of the acquired plurality of second monitoring parameter values.
According to the embodiment of the application, the second monitoring parameter value in the superior server is combined to judge whether the root of the self abnormity is the self operation problem or originates from the superior server, so that the abnormity source can be quickly and accurately found, and the maintenance efficiency of the service system is improved conveniently.
In a second aspect, an embodiment of the present application further provides a device for warning about an abnormal service flow, which is used for monitoring a service system, where the service system includes cascaded multi-stage servers, and the device includes:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring monitoring data packets returned by buried points arranged on servers at all levels at intervals of a preset time period, and each monitoring data packet carries identification information of a server of a data source;
the acquisition module is used for extracting the monitoring parameter value in each monitoring data packet and acquiring a corresponding monitoring threshold according to the identification information carried by the monitoring data packet;
the first generation module is used for carrying out abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a total judgment result of the service system;
and the second generation module is used for generating alarm information according to the total judgment result.
Optionally, in the device for warning about an abnormal service flow in an embodiment of the present application, the obtaining module includes:
the sorting unit is used for sorting the monitoring data packets returned by the buried points according to a preset rule so as to generate a data packet queue;
and the acquisition unit is used for sequentially extracting the monitoring parameter value of each monitoring data packet from the data packet queue and acquiring a corresponding monitoring threshold according to the identification information carried by the monitoring data packet.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the steps in the method as provided in the first aspect are executed.
In a fourth aspect, embodiments of the present application provide a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, performs the steps in the method as provided in the first aspect.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic view of an implementation scenario of a method and an apparatus for alarming an abnormal business process provided in an embodiment of the present application.
Fig. 2 is a flowchart of a business process monitoring method according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a service flow abnormality warning apparatus according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of a method and an apparatus for alarming an abnormal business process provided by the present application, which are integrated in an electronic device such as a computer or a monitoring server in the form of a computer program. The method and the device for alarming the abnormal business process are mainly applied to monitoring the abnormal business system so as to quickly locate the abnormal server. The service system comprises a plurality of servers which are sequentially cascaded, for example, the plurality of servers which are sequentially cascaded comprise a first-level server C1 … (a n-1-level server Cn-1) and an nth-level server n, wherein n is a natural number greater than 2. The first-level server C1 receives the service request of the user, then performs the first-step operation processing, and transmits the service data obtained by the operation processing to the second-level server, and the second-level server processes the service data again, and continues uploading after the processing is completed, until the entire service flow is completed on the nth-level server. The monitoring server 100 is respectively connected with each level of server, and collects monitoring data packets returned by buried points arranged in each level of server at intervals of a preset time period, wherein each monitoring data packet carries identification information of a server of a data source; extracting a monitoring parameter value in each monitoring data packet, and acquiring a corresponding monitoring threshold value according to identification information carried by the monitoring data packet; performing abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a total judgment result of the service system; generating alarm information according to the total judgment result; so that maintenance personnel can quickly position and fix the server, and quick maintenance is facilitated.
The present application will be described in detail with reference to specific examples.
Referring to fig. 2, fig. 2 is a flowchart of a method for alarming an abnormal business process in some embodiments of the present application. The method is used for monitoring a service system, wherein the service system comprises cascaded multi-stage servers, and a superior server sends data obtained by the operation of the current stage to a subordinate server for service operation. The business process abnormity warning method comprises the following steps:
s101, monitoring data packets returned by buried points arranged in all levels of servers are collected every other preset time period, and each monitoring data packet carries identification information of a server of a data source.
S102, extracting the monitoring parameter value in each monitoring data packet, and acquiring a corresponding monitoring threshold according to the identification information carried by the monitoring data packet.
S103, carrying out abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value, and generating a total judgment result of the service system.
And S104, generating alarm information according to the total judgment result.
In this step S101, collection of monitoring data collected by the buried points of the servers at each level may be realized by a flume component. And preset acquisition rules are arranged in the buried points so as to realize directional acquisition of corresponding monitoring parameter values in corresponding servers. The servers at all levels can uniformly collect a monitoring parameter value, and parameters of corresponding dimensionality can be selected as the monitoring parameter value based on the service condition of the servers at all levels. For example, the success rate of service completion, the average service consumption time, or the operation state parameters of each sub-service of each level of server may be collected.
In step S102, since the services processed by the servers corresponding to different levels are different, the monitoring parameters to be monitored are different, and the monitoring thresholds corresponding to different monitoring parameters are also different. Before executing the step, a set of mapping relationship between the server and the monitoring parameter and the monitoring threshold needs to be established correspondingly. When the monitoring data packet is obtained, the monitoring parameter values in the data packet can be extracted, and a monitoring threshold value which is preset for the corresponding type of the monitoring parameter values of the server is inquired based on the identification information of the monitoring data packet.
In some embodiments, since the monitoring server collects many monitoring data packets, the monitoring server continuously collects the data of the buried points on each level of server at preset time intervals. Therefore, queues need to be established for sequential processing to avoid confusion.
Specifically, the step S102 includes the following substeps: s1021, sequencing the monitoring data packets returned by the buried points according to a preset rule to generate a data packet queue; and S1022, sequentially extracting the monitoring parameter value of each monitoring data packet from the data packet queue, and acquiring a corresponding monitoring threshold according to the identification information carried by the monitoring data packet.
In step S1021, sorting the monitoring packets returned from each embedded point according to the cascade order of the multiple servers and the acquisition time of the monitoring packets, so as to generate a packet queue; wherein, in the monitoring data packets collected at the same time, the monitoring data packet of the superior server is arranged in front of the monitoring data packet of the inferior server; in the monitoring data packets acquired at different times, the monitoring data packet with the earlier acquisition time is arranged before the monitoring data packet with the later acquisition time. Because the service data flow direction of the server flows from the upper level to the lower level, the monitoring data packet of the upper level server is processed first, so that the fault source can be found more quickly.
Of course, it is understood that the order of the monitoring packets for each level of servers may also be reversed at the server level, with the lower level of servers ranked in front and the upper level of servers ranked in the back.
In this step S103, the monitoring server may perform abnormality determination on the servers based on comparison of the monitoring parameter value of each server and the corresponding monitoring threshold value. When the monitoring parameter value is not in the range of the monitoring threshold value, the server is indicated to be abnormal, and when the monitoring parameter value is in the range of the monitoring threshold value, the server is indicated to be normally operated. And finally, after the judgment of all levels of servers is finished, the monitoring server counts the abnormal conditions of all the servers to generate a total judgment result.
It will be appreciated that in some embodiments, this step S103 comprises the steps of: s1031, comparing the monitoring parameter value of each server with a corresponding monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the monitoring parameter value is abnormal in service; s1032, generating a total judgment result according to the abnormal judgment result of the monitoring parameter values of each level of server, wherein the total judgment result comprises abnormal information and abnormal sources of each level of server. For example, if it is found through statistics that any abnormality occurs in all of the first-level server to the third-level server, the abnormality occurs from the fourth-level server, and it is found according to the analysis of the abnormality information that the abnormality occurs in the fourth-level server and the servers below the fourth-level server are similar, it can be determined that the fourth-level server is the root of the failure. If the second-level server is abnormal, the third-level server to the fifth-level server are normal, and the sixth-level server is abnormal, it can be determined that the second-level server and the sixth-level server are abnormal, and the association degree of the second-level server and the sixth-level server is not large, and the second-level server and the sixth-level server are abnormal due to the BUG of the respective system.
It is understood that, in some embodiments, the monitoring data packet includes a first monitoring parameter value and at least one second monitoring parameter value, the first monitoring parameter value is a parameter value used for determining whether the current-level server is abnormal, and the second monitoring parameter value is a parameter value having an influence on the service of each-level server of the next-level server of the current-level server. Correspondingly, this step S103 includes:
s1033, comparing the first monitoring parameter value of each server with the corresponding first monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the first monitoring parameter value is abnormal in service; s1034, if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a monitoring data packet corresponding to a superior server of the abnormal server; s1035, comparing the second monitoring parameter value with a second preset threshold value to judge whether the abnormal root of the abnormal server originates from the server or from the superior server; s1036, generating a total judgment result of the service system according to the abnormal judgment result of each level of server.
However, when the fluctuation is transmitted to the next-level server, the abnormality is caused in the next-level server, and therefore, the server which generates the fluctuation can be determined as a root cause of the failure of the next-level server.
For example, the server a calculates a plurality of service data sent by an upper server to obtain a plurality of processed service data a, and when performing service calculation in the server a, one or two received service data are lost due to a packet loss rate or a network failure, and the service processing in the server a does not relate to the association of the plurality of service data, but when performing service calculation in a lower server B of the server a, because it needs to perform integration calculation on each service data according to a certain algorithm, and because the service data loss occurs in the server a, a service data chain is incomplete, and thus a serious failure occurs when the server B operates. In the actual detection process, when the first monitoring parameter value in the monitoring data packet of the server a is judged, it is judged that the server a has no fault, and the service operation is normal. When the server B performs fault judgment based on the first monitoring parameter value, it is found that the server B is abnormal, at this time, it is required to return to extract and analyze the second monitoring parameter value of the server a, and see whether the second monitoring parameter value is within a preset threshold range, for example, the second monitoring parameter value of the server a is a packet loss rate, the threshold range of the packet loss rate is 99-100%, and if the packet loss rate is not within this range, it indicates that the fault of the server B is caused by the implicit abnormality of the server B (the abnormality not shown in the business operation of the server B), and since the packet loss has a certain randomness, even if the packet loss rate is not within this range, the data that may be lost is irrelevant data, the abnormality of the server B is not caused. When the server at the current level finds the abnormality, the relevant monitoring parameter value of the superior server is judged to judge whether the fault root is from the server or the superior server, so that the real fault or the abnormality can be quickly positioned.
It is understood that, in some embodiments, the monitoring data packet includes a first monitoring parameter value and at least one second monitoring parameter value, the first monitoring parameter value is a parameter value used for determining whether the current-stage server is abnormal, and the second monitoring parameter value is a parameter value having an influence on the service of each-stage server of the next stage of the current-stage server; correspondingly, this step S103 comprises the following sub-steps: s1037, comparing the first monitoring parameter value of each server with the corresponding first monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the first monitoring parameter value is abnormal in service; s1038, if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a plurality of monitoring data packets corresponding to a superior server of the abnormal server, wherein the plurality of monitoring data packets corresponding to the superior server comprise a monitoring data packet of a time point for acquiring the first monitoring parameter value and a plurality of monitoring data packets of each time point in a preset time period before the time point; and S1039, judging whether the abnormal root of the abnormal server is originated from the server or the superior server according to the fluctuation condition of the acquired plurality of second monitoring parameter values. In some servers, some monitoring parameter value exceptions may or may not cause an exception of the next-level server, and the exception of the monitoring parameter value does not cause an exception of the service data processing of the server itself. When a certain level of server is abnormal, the trend of the corresponding second monitoring parameter value of the previous level of server needs to be judged, and if the second monitoring parameter of the previous level of server is just abnormally fluctuated when the server is abnormal, the reason that the server is abnormal is shown in the previous level of server.
In step S104, alarm information is generated according to the total judgment result, an operator is reminded to perform maintenance, and alarm classification is performed according to the abnormal or faulty condition. For example, an anomaly for a single server that does not affect the operation of lower level server traffic may generate primary alarm information. And generating high-level alarm information for the condition that the one-level server causes the abnormality or the failure of the lower-level or subsequent multi-level server. Generating middle-level alarm information for the abnormality or fault of the plurality of servers, which does not affect each other.
As can be seen from the above, in the service flow abnormality warning method provided in the embodiment of the present application, the monitoring data packets returned from the embedded points of the servers at different levels are collected, and the monitoring parameter value in each monitoring data packet is compared with the corresponding preset monitoring threshold, so that abnormality judgment on the service system is implemented, an abnormal or fault position can be quickly located, a worker can quickly maintain the service system, and the operation smoothness of the service system is improved.
Referring to fig. 3, fig. 3 is a structural diagram of a service flow abnormality warning apparatus in some embodiments of the present application. The device is used for monitoring a service system, the service system comprises cascaded multi-stage servers, and the device comprises: the device comprises an acquisition module 201, an acquisition module 202, a first generation module 203 and a second generation module 204.
The acquisition module 201 is configured to acquire monitoring data packets returned from embedded points of servers at different levels every preset time period, where each monitoring data packet carries identification information of a server from which data is sourced. The collection module 201 can collect monitoring data collected by the embedded points of the servers at all levels through the flash component. And preset acquisition rules are arranged in the buried points so as to realize directional acquisition of corresponding monitoring parameter values in corresponding servers. The servers at all levels can uniformly collect a monitoring parameter value, and parameters of corresponding dimensionality can be selected as the monitoring parameter value based on the service condition of the servers at all levels. For example, the success rate of service completion, the average service consumption time, or the operation state parameters of each sub-service of each level of server may be collected.
The obtaining module 202 is configured to extract a monitoring parameter value in each monitoring data packet, and obtain a corresponding monitoring threshold according to identification information carried by the monitoring data packet; because the services processed by the servers corresponding to different levels are different, the monitoring parameters to be monitored are different, and the monitoring thresholds corresponding to different monitoring parameters are also different. Before executing the step, a set of mapping relationship between the server and the monitoring parameter and the monitoring threshold needs to be established correspondingly. When the monitoring data packet is obtained, the monitoring parameter values in the data packet can be extracted, and a monitoring threshold value which is preset for the corresponding type of the monitoring parameter values of the server is inquired based on the identification information of the monitoring data packet.
In some embodiments, since the monitoring server collects many monitoring data packets, the monitoring server continuously collects the data of the buried points on each level of server at preset time intervals. Therefore, queues need to be established for sequential processing to avoid confusion.
Specifically, the obtaining module 202 includes a sorting unit and a obtaining unit. The sorting unit is used for sorting the monitoring data packets returned by each embedded point according to a preset rule so as to generate a data packet queue; the acquisition unit is used for sequentially extracting the monitoring parameter value of each monitoring data packet from the data packet queue and acquiring a corresponding monitoring threshold value according to the identification information carried by the monitoring data packet.
The first generating module 203 is configured to perform an exception judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value, and generate a total judgment result of the service system. The monitoring server may make an anomaly determination for the server based on a comparison of the monitoring parameter value for each server and the corresponding monitoring threshold value. When the monitoring parameter value is not in the range of the monitoring threshold value, the server is indicated to be abnormal, and when the monitoring parameter value is in the range of the monitoring threshold value, the server is indicated to be normally operated. And finally, after the judgment of all levels of servers is finished, the monitoring server counts the abnormal conditions of all the servers to generate a total judgment result.
It can be understood that, in some embodiments, the first generating module 203 is specifically configured to compare the monitoring parameter value of each server with the corresponding monitoring threshold according to the sequence of the data packet queue, so as to determine whether the server corresponding to the monitoring parameter value is abnormal in service; and generating a total judgment result according to the abnormal judgment result of the monitoring parameter values of the servers at each level, wherein the total judgment result comprises the abnormal information and the abnormal source of the servers at each level. For example, if it is found through statistics that any abnormality occurs in all of the first-level server to the third-level server, the abnormality occurs from the fourth-level server, and it is found according to the analysis of the abnormality information that the abnormality occurs in the fourth-level server and the servers below the fourth-level server are similar, it can be determined that the fourth-level server is the root of the failure. If the second-level server is abnormal, the third-level server to the fifth-level server are normal, and the sixth-level server is abnormal, it can be determined that the second-level server and the sixth-level server are abnormal, and the association degree of the second-level server and the sixth-level server is not large, and the second-level server and the sixth-level server are abnormal due to the BUG of the respective system.
It is understood that, in some embodiments, the monitoring data packet includes a first monitoring parameter value and at least one second monitoring parameter value, the first monitoring parameter value is a parameter value used for determining whether the current-level server is abnormal, and the second monitoring parameter value is a parameter value having an influence on the service of each-level server of the next-level server of the current-level server. Correspondingly, the first generating module 203 is configured to compare the first monitoring parameter value of each server with the corresponding first monitoring threshold according to the sequence of the data packet queue, so as to determine whether the server corresponding to the first monitoring parameter value is abnormal in service; if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a monitoring data packet corresponding to a superior server of the abnormal server; comparing the second monitoring parameter value with a second preset threshold value to judge whether the abnormal root of the abnormal server is originated from the server or from the superior server; and generating a total judgment result of the service system according to the abnormal judgment result of the servers at all levels.
However, when the fluctuation is transmitted to the next-level server, the abnormality is caused in the next-level server, and therefore, the server generating the fluctuation can be determined to be the root cause of the failure of the next-level server.
For example, the server a calculates a plurality of service data sent by an upper server to obtain a plurality of processed service data a, and when performing service calculation in the server a, one or two received service data are lost due to a packet loss rate or a network failure, and the service processing in the server a does not relate to the association of the plurality of service data, but when performing service calculation in a lower server B of the server a, because it needs to perform integration calculation on each service data according to a certain algorithm, and because the service data loss occurs in the server a, a service data chain is incomplete, and thus a serious failure occurs when the server B operates. In the actual detection process, when the first monitoring parameter value in the monitoring data packet of the server a is judged, it is judged that the server a has no fault, and the service operation is normal. When the server B performs fault judgment based on the first monitoring parameter value, it is found that the server B is abnormal, at this time, it is required to return to extract and analyze the second monitoring parameter value of the server a, and see whether the second monitoring parameter value is within a preset threshold range, for example, the second monitoring parameter value of the server a is a packet loss rate, the threshold range of the packet loss rate is 99-100%, and if the packet loss rate is not within this range, it indicates that the fault of the server B is caused by the implicit abnormality of the server B (the abnormality not shown in the business operation of the server B), and since the packet loss has a certain randomness, even if the packet loss rate is not within this range, the data that may be lost is irrelevant data, the abnormality of the server B is not caused. When the server at the current level finds the abnormality, the relevant monitoring parameter value of the superior server is judged to judge whether the fault root is from the server or the superior server, so that the real fault or the abnormality can be quickly positioned.
It is understood that, in some embodiments, the monitoring data packet includes a first monitoring parameter value and at least one second monitoring parameter value, the first monitoring parameter value is a parameter value used for determining whether the current-stage server is abnormal, and the second monitoring parameter value is a parameter value having an influence on the service of each-stage server of the next stage of the current-stage server; correspondingly, the first generating module 203 is specifically configured to compare the first monitoring parameter value of each server with the corresponding first monitoring threshold according to the sequence of the data packet queue, so as to determine whether the service of the server corresponding to the first monitoring parameter value is abnormal; if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a plurality of monitoring data packets corresponding to a superior server of the abnormal server, wherein the plurality of monitoring data packets corresponding to the superior server comprise a monitoring data packet of a time point for acquiring the first monitoring parameter value and a plurality of monitoring data packets of each time point in a preset time period before the time point; and judging whether the abnormal root of the server is originated from the server or the superior server according to the fluctuation condition of the second monitoring parameter value in the plurality of monitoring data packets. In some servers, some monitoring parameter value exceptions may or may not cause an exception of the next-level server, and the exception of the monitoring parameter value does not cause an exception of the service data processing of the server itself. When a certain level of server is abnormal, the trend judgment needs to be carried out on the corresponding second monitoring parameter value of the previous level of server, and if the second monitoring parameter of the previous level of server is just abnormally fluctuated when the server is abnormal, the reason why the server is abnormal is shown in the previous level of server.
The second generating module 204 is configured to generate alarm information according to the total judgment result. The second generating module 204 generates alarm information according to the total judgment result, prompts an operator to maintain, and performs alarm classification according to abnormal or fault conditions. For example, an anomaly for a single server that does not affect the operation of lower level server traffic may generate primary alarm information. And generating high-level alarm information for the condition that the one-level server causes the abnormality or the failure of the lower-level or subsequent multi-level server. Generating middle-level alarm information for the abnormality or fault of the plurality of servers, which does not affect each other.
As can be seen from the above, the service flow abnormality warning device provided in the embodiment of the present application collects the monitoring data packets returned by the embedded points of the servers at all levels and compares the monitoring parameter value in each monitoring data packet with the corresponding preset monitoring threshold value, thereby implementing abnormality judgment on the service system, quickly locating an abnormality or a fault position, facilitating quick maintenance by a worker, and improving the operation smoothness of the service system.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, in which an electronic device 3 includes: the processor 301 and the memory 302, the processor 301 and the memory 302 being interconnected and communicating with each other via a communication bus 303 and/or other form of connection mechanism (not shown), the memory 302 storing a computer program executable by the processor 301, the processor 301 executing the computer program when the computing device is running to perform the method of any of the alternative implementations of the embodiments described above.
The embodiment of the present application provides a storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the above embodiment. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (7)

1. A business process abnormal alarm method is used in a business system, the business system comprises a cascade of multi-stage servers, the method comprises the following steps:
collecting monitoring data packets returned by buried points arranged in all levels of servers at intervals of a preset time period, wherein each monitoring data packet carries identification information of a server of a data source;
extracting a monitoring parameter value in each monitoring data packet, and acquiring a corresponding monitoring threshold value according to identification information carried by the monitoring data packet;
performing abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a total judgment result of the service system;
generating alarm information according to the total judgment result;
the steps of extracting the monitoring parameter value in each monitoring data packet and acquiring the corresponding monitoring threshold value according to the identification information carried by the monitoring data packet include:
sequencing the monitoring data packets returned by each buried point according to a preset rule to generate a data packet queue;
sequentially extracting the monitoring parameter value of each monitoring data packet from the data packet queue, and acquiring a corresponding monitoring threshold value according to identification information carried by the monitoring data packet;
the monitoring data packet comprises a first monitoring parameter value and at least one second monitoring parameter value, the first monitoring parameter value is a parameter value used for judging whether the service of the current-stage server is abnormal, and the second monitoring parameter value is a parameter value having influence on the service of the lower-stage server of the current-stage server;
and the step of performing abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value and generating a judgment result comprises the following steps:
comparing the first monitoring parameter value of each server with the corresponding first monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the first monitoring parameter value is abnormal in service;
if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a monitoring data packet corresponding to a superior server of the abnormal server;
comparing the second monitoring parameter value with a second preset threshold value to judge whether the abnormal root of the abnormal server is originated from the server or from the superior server;
and generating a total judgment result of the service system according to the abnormal judgment result of the servers at all levels.
2. The method for alarming abnormality in business process according to claim 1, wherein the step of sorting the monitoring packets returned from the respective burial points according to a preset rule to generate a packet queue comprises:
sequencing the monitoring data packets returned by each embedded point according to the cascade order of the multi-stage servers and the acquisition time of the monitoring data packets to generate a data packet queue; wherein, in the monitoring data packets collected at the same time, the monitoring data packet of the superior server is arranged in front of the monitoring data packet of the inferior server; in the monitoring data packets acquired at different times, the monitoring data packet with the earlier acquisition time is arranged before the monitoring data packet with the later acquisition time.
3. The method for alarming abnormality in service flow according to claim 2, wherein the step of performing abnormality judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value and generating a judgment result comprises:
comparing the monitoring parameter value of each server with a corresponding monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the monitoring parameter value is abnormal in service;
and generating a total judgment result according to the abnormal judgment result of each level of server, wherein the total judgment result comprises the abnormal information and the abnormal source of each level of server.
4. The method according to claim 1, wherein the monitoring data packet includes a first monitoring parameter value and at least one second monitoring parameter value, the first monitoring parameter value is a parameter value used for determining whether the service of the current-stage server is abnormal, and the second monitoring parameter value is a parameter value having an influence on the service of each-stage server below the current-stage server;
and the step of performing abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value and generating a judgment result comprises the following steps:
comparing the first monitoring parameter value of each server with the corresponding first monitoring threshold value according to the sequence of the data packet queue to judge whether the server corresponding to the first monitoring parameter value is abnormal in service;
if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a plurality of monitoring data packets corresponding to a superior server of the abnormal server, wherein the plurality of monitoring data packets corresponding to the superior server comprise a monitoring data packet at a time point corresponding to the first monitoring parameter value and a monitoring data packet at each time point in a preset time before the corresponding time point;
and judging whether the abnormal root of the abnormal server is originated from the server or the superior server according to the fluctuation condition of the acquired plurality of second monitoring parameter values.
5. An abnormal alarm device for a business process, which is used in a business system, wherein the business system includes cascaded multi-stage servers, and the device includes:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring monitoring data packets returned by buried points arranged on servers at all levels at intervals of a preset time period, and each monitoring data packet carries identification information of a server of a data source;
the acquisition module is used for extracting the monitoring parameter value in each monitoring data packet and acquiring a corresponding monitoring threshold according to the identification information carried by the monitoring data packet;
the first generation module is used for carrying out abnormity judgment on the service system according to each monitoring parameter value and the corresponding monitoring threshold value to generate a total judgment result of the service system;
the second generation module is used for generating alarm information according to the total judgment result;
the acquisition module comprises a sorting unit and an acquisition unit;
the sorting unit is used for sorting the monitoring data packets returned by each embedded point according to a preset rule so as to generate a data packet queue;
the acquiring unit is used for sequentially extracting the monitoring parameter value of each monitoring data packet from the data packet queue and acquiring a corresponding monitoring threshold according to the identification information carried by the monitoring data packet;
the monitoring data packet comprises a first monitoring parameter value and at least one second monitoring parameter value, the first monitoring parameter value is a parameter value used for judging whether the service of the current-stage server is abnormal, and the second monitoring parameter value is a parameter value having influence on the service of the lower-stage server of the current-stage server;
the first generating module is further configured to compare the first monitoring parameter value of each server with the corresponding first monitoring threshold according to the sequence of the data packet queue, so as to determine whether the server corresponding to the first monitoring parameter value is abnormal in service; if the server service corresponding to the first monitoring parameter value is abnormal, acquiring a second monitoring parameter value in a monitoring data packet corresponding to a superior server of the abnormal server; comparing the second monitoring parameter value with a second preset threshold value to judge whether the abnormal root of the server is from the server or from a superior server of the server; and generating a total judgment result of the service system according to the abnormal judgment result of the servers at all levels.
6. An electronic device comprising a processor and a memory, said memory storing computer readable instructions which, when executed by said processor, perform the steps of the method of any of claims 1-4.
7. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method according to any one of claims 1-4.
CN201911169771.2A 2019-11-25 2019-11-25 Business process abnormity warning method and device, electronic equipment and storage medium Active CN111010291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911169771.2A CN111010291B (en) 2019-11-25 2019-11-25 Business process abnormity warning method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911169771.2A CN111010291B (en) 2019-11-25 2019-11-25 Business process abnormity warning method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111010291A CN111010291A (en) 2020-04-14
CN111010291B true CN111010291B (en) 2022-08-09

Family

ID=70112120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911169771.2A Active CN111010291B (en) 2019-11-25 2019-11-25 Business process abnormity warning method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111010291B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205237A (en) * 2020-12-15 2021-08-03 格创东智(深圳)科技有限公司 Glass production information processing method and device, electronic equipment and storage medium thereof
CN112597203A (en) * 2020-12-28 2021-04-02 恩亿科(北京)数据科技有限公司 General data monitoring method and system based on big data platform
CN112685256B (en) * 2020-12-30 2023-05-09 上海掌门科技有限公司 Method, equipment and medium for monitoring server
CN112857450A (en) * 2021-01-20 2021-05-28 武汉新泽安科技有限公司 Automatic monitoring system for environmental air quality
CN113377627B (en) * 2021-06-10 2023-12-05 广州朗国电子科技股份有限公司 Business server abnormality detection method, system, equipment and storage medium
CN114710401B (en) * 2022-04-29 2024-02-06 北京达佳互联信息技术有限公司 Abnormality positioning method and device
CN115514678B (en) * 2022-09-23 2023-09-26 四川新网银行股份有限公司 Continuity monitoring method for internet financial business

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075369A (en) * 2011-02-28 2011-05-25 杭州华三通信技术有限公司 Method and equipment for managing monitoring equipment
CN105451036A (en) * 2014-09-18 2016-03-30 中国电信股份有限公司 Video quality monitoring method, device and CDN system
CN107886242A (en) * 2017-11-10 2018-04-06 平安科技(深圳)有限公司 Data monitoring method, device, computer equipment and storage medium
CN108092836A (en) * 2016-11-21 2018-05-29 深圳市蓝希领地科技有限公司 The monitoring method and device of a kind of server
WO2018176496A1 (en) * 2017-04-01 2018-10-04 华为技术有限公司 Iptv service quality detection method, device and system
CN109828883A (en) * 2017-11-23 2019-05-31 腾讯科技(北京)有限公司 Task data treating method and apparatus, storage medium and electronic device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868736B2 (en) * 2012-04-27 2014-10-21 Motorola Mobility Llc Estimating a severity level of a network fault

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075369A (en) * 2011-02-28 2011-05-25 杭州华三通信技术有限公司 Method and equipment for managing monitoring equipment
CN105451036A (en) * 2014-09-18 2016-03-30 中国电信股份有限公司 Video quality monitoring method, device and CDN system
CN108092836A (en) * 2016-11-21 2018-05-29 深圳市蓝希领地科技有限公司 The monitoring method and device of a kind of server
WO2018176496A1 (en) * 2017-04-01 2018-10-04 华为技术有限公司 Iptv service quality detection method, device and system
CN107886242A (en) * 2017-11-10 2018-04-06 平安科技(深圳)有限公司 Data monitoring method, device, computer equipment and storage medium
CN109828883A (en) * 2017-11-23 2019-05-31 腾讯科技(北京)有限公司 Task data treating method and apparatus, storage medium and electronic device

Also Published As

Publication number Publication date
CN111010291A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN111010291B (en) Business process abnormity warning method and device, electronic equipment and storage medium
US9672085B2 (en) Adaptive fault diagnosis
US7779467B2 (en) N grouping of traffic and pattern-free internet worm response system and method using N grouping of traffic
US8635498B2 (en) Performance analysis of applications
US10346744B2 (en) System and method for visualisation of behaviour within computer infrastructure
US11348023B2 (en) Identifying locations and causes of network faults
JP5098821B2 (en) Monitoring device and monitoring method for detecting a sign of failure of monitored system
CN110928718A (en) Exception handling method, system, terminal and medium based on correlation analysis
US20150346066A1 (en) Asset Condition Monitoring
CN106104496A (en) The abnormality detection not being subjected to supervision for arbitrary sequence
US9547545B2 (en) Apparatus and program for detecting abnormality of a system
CN111814999A (en) Fault work order generation method, device and equipment
CN113271224A (en) Node positioning method and device, storage medium and electronic device
Harutyunyan et al. Abnormality analysis of streamed log data
CN113704018A (en) Application operation and maintenance data processing method and device, computer equipment and storage medium
CN115865649A (en) Intelligent operation and maintenance management control method, system and storage medium
KR101281460B1 (en) Method for anomaly detection using statistical process control
CN110469461A (en) A kind of fracture predictor method of blower tooth band, its device and readable storage medium storing program for executing
Zhu et al. Automatic fault diagnosis in cloud infrastructure
Rafique et al. TSDN-enabled network assurance: A cognitive fault detection architecture
CN114327988B (en) Visual network fault relation determination method and device
CN110838940A (en) Underground cable inspection task configuration method and device
ZHANG et al. Approach to anomaly detection in microservice system with multi-source data streams
AU2014200806B1 (en) Adaptive fault diagnosis
US20220269577A1 (en) Data-Center Management using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant