CN115081969B - Abnormal data determination method and related device - Google Patents

Abnormal data determination method and related device Download PDF

Info

Publication number
CN115081969B
CN115081969B CN202211010956.0A CN202211010956A CN115081969B CN 115081969 B CN115081969 B CN 115081969B CN 202211010956 A CN202211010956 A CN 202211010956A CN 115081969 B CN115081969 B CN 115081969B
Authority
CN
China
Prior art keywords
response time
system response
baseline
target
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211010956.0A
Other languages
Chinese (zh)
Other versions
CN115081969A (en
Inventor
孟庆江
田忠毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Cicc Wealth Securities Co ltd
Original Assignee
China Cicc Wealth Securities Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Cicc Wealth Securities Co ltd filed Critical China Cicc Wealth Securities Co ltd
Priority to CN202211010956.0A priority Critical patent/CN115081969B/en
Publication of CN115081969A publication Critical patent/CN115081969A/en
Application granted granted Critical
Publication of CN115081969B publication Critical patent/CN115081969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Alarm Systems (AREA)

Abstract

The embodiment of the application discloses an abnormal data determining method and a related device, wherein the method comprises the following steps: determining a plurality of alarm events; acquiring index data corresponding to each alarm event in a plurality of alarm events; generating a system response time sequence according to the system response time data corresponding to each alarm event; correcting the system response time data corresponding to each alarm event according to the baselines corresponding to the alarm events to obtain a target system response time sequence corresponding to the system response time sequence; performing mutation detection on the target system response time sequence to screen out alarm events corresponding to the target system response time sequence with data mutation, so as to obtain a plurality of target alarm events; and carrying out root cause positioning on index data corresponding to each target alarm event, and determining abnormal index data corresponding to a plurality of target alarm events. By adopting the embodiment of the application, false alarms are eliminated, and the efficiency and accuracy of abnormal data determination are improved.

Description

Abnormal data determination method and related device
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for determining abnormal data.
Background
With the development of internet technology and artificial intelligence, network transaction becomes a widely-used transaction mode, but with the increase of transaction data and the development of transaction monitoring technology, faults detected in a transaction system are increased, but the cost of manpower consumption is high by manually checking one by one, so that a greater demand is provided for efficiently finding out abnormal data.
Disclosure of Invention
The embodiment of the application provides an abnormal data determining method and a related device, which are beneficial to improving the accuracy and efficiency of abnormal data determination.
In a first aspect, an embodiment of the present application provides a method for determining abnormal data, where the electronic device is a server in communication with a transaction system and is applied to the electronic device, and the method includes:
determining a plurality of service alarm events; comprising the baseline comparison operation as in steps a-c:
step a, acquiring system response time data of each event in a plurality of events within a preset period, wherein the event is a service event of a service type, and the service type comprises at least one of the following: the system response time data of a single event comprises a plurality of system response times counted by the single event which occurs for a plurality of times in the preset period;
Step b), comparing the system response time data corresponding to each event with a baseline corresponding to the system response time data, wherein the baseline comprises an upper baseline and a lower baseline, the upper baseline and the lower baseline can correspond to a plurality of baseline values, a single baseline value can correspond to one moment, and the baseline corresponding to the system response time data is used for representing the following states: the baseline required by the system response time of the x-th occurrence of each event to execute the comparison is the baseline corresponding to the moment of the x-th occurrence, x is any one of a plurality of times of occurrence of each event in the preset period, and the baseline is obtained through historical experience learning;
step c, selecting the event with the sum of times exceeding the upper baseline and times exceeding the lower baseline in the system response time data being more than the preset times as the service alarm event, and obtaining a plurality of service alarm events;
generating a system response time sequence according to the system response time data corresponding to each service alarm event in the service alarm events, including: arranging a plurality of system response times of each service alarm event according to the sequence of the occurrence time to obtain a system response time sequence;
Correcting the system response time data corresponding to each service alarm event according to the baselines corresponding to the service alarm events to obtain a target system response time sequence corresponding to the system response time sequence, wherein the method comprises the following steps of d-i:
step d, for each service alarm event in the plurality of service alarm events, performing the following correction operation to obtain a plurality of target system response time sequences corresponding to the plurality of service alarm events one by one:
step e, determining a baseline average value sequence of a plurality of baseline values in a baseline corresponding to a system response time sequence of a currently processed service alarm event according to an upper baseline value and a lower baseline value of the plurality of baseline values in the baseline, wherein the baseline average value sequence corresponds to a plurality of baseline average values corresponding to a plurality of system response times in the system response time sequence one by one;
f, calculating the difference value between each system response time in the system response time sequence and the corresponding baseline mean value to obtain a difference value sequence;
step g, filtering abnormal values of a plurality of differences in the difference sequence to obtain filtered h target differences;
step h, determining the average value of the h target difference values to obtain a target change average value, wherein h is a positive integer greater than 1;
Step i, calculating the difference value between the system response time in the system response time sequence and the target change mean value to obtain each corrected system response time, and obtaining a target system response time sequence;
aiming at a plurality of target system response time sequences of the plurality of service alarm events, executing the baseline comparison operation, screening out target system response time sequences which do not exceed the baseline for the preset times, and obtaining at least two residual target system response time sequences;
performing mutation detection on the at least two target system response time sequences to screen out service alarm events corresponding to the target system response time sequences without data mutation, so as to obtain a plurality of target service alarm events;
and carrying out root cause positioning on index data corresponding to each target service alarm event, and determining abnormal index data corresponding to the plurality of target service alarm events, wherein the index data comprises a target system response time sequence corresponding to each target service alarm event and a system response time sequence corresponding to a corresponding back-end platform, the back-end platform is a platform to which a service which is pulled or called by the transaction system for responding to a calling instruction, the calling instruction is an instruction which is generated by the transaction system for responding to a request instruction of the service event, and a single system response time sequence corresponding to the back-end platform is a sequence which is obtained by arranging a plurality of system response times of the single service in the preset period according to the sequence of occurrence times.
In a second aspect, an embodiment of the present application provides an abnormal data determining apparatus applied to an electronic device, where the electronic device is a server in communication with a transaction system, the apparatus includes:
a determining unit, configured to determine a plurality of service alarm events, including a baseline comparison operation as in steps a-c:
step a, acquiring system response time data of each event in a plurality of events within a preset period, wherein the event is a service event of a service type, and the service type comprises at least one of the following: the system response time data of a single event comprises a plurality of system response times counted by the single event which occurs for a plurality of times in the preset period;
step b), comparing the system response time data corresponding to each event with a baseline corresponding to the system response time data, wherein the baseline comprises an upper baseline and a lower baseline, the upper baseline and the lower baseline can correspond to a plurality of baseline values, a single baseline value can correspond to one moment, and the baseline corresponding to the system response time data is used for representing the following states: the base line required by the system response time of the x-th occurrence of each event for executing the comparison is the base line corresponding to the moment of the x-th occurrence, and x is any one of a plurality of times of occurrence of each event in the preset period;
Step c, selecting the event with the sum of times exceeding the upper baseline and times exceeding the lower baseline in the system response time data being more than the preset times as the service alarm event, and obtaining a plurality of service alarm events;
the generating unit is configured to generate a system response time sequence according to the system response time data corresponding to each service alarm event, where the generating unit includes: arranging a plurality of system response times of each service alarm event according to the sequence of the occurrence time to obtain a system response time sequence;
the correction unit is configured to correct the system response time data corresponding to each service alarm event according to the baselines corresponding to the plurality of service alarm events, so as to obtain a target system response time sequence corresponding to the system response time sequence, and includes the following steps:
step d, for each service alarm event in the plurality of service alarm events, performing the following correction operation to obtain a plurality of target system response time sequences corresponding to the plurality of service alarm events one by one:
step e, determining a baseline average value sequence of a plurality of baseline values in a baseline corresponding to a system response time sequence of a currently processed service alarm event according to an upper baseline value and a lower baseline value of the plurality of baseline values in the baseline, wherein the baseline average value sequence corresponds to a plurality of baseline average values corresponding to a plurality of system response times in the system response time sequence one by one;
F, calculating the difference value between each system response time in the system response time sequence and the corresponding baseline mean value to obtain a difference value sequence;
step g, filtering abnormal values of a plurality of differences in the difference sequence to obtain filtered h target differences;
step h, determining the average value of the h target difference values to obtain a target change average value, wherein h is a positive integer greater than 1;
step i, calculating the difference value between the system response time in the system response time sequence and the target change mean value to correct each system response time, so as to obtain a target system response time sequence;
the baseline comparison unit is used for executing the baseline comparison operation aiming at a plurality of target system response time sequences of the plurality of service alarm events, screening out target system response time sequences which do not exceed the baseline preset times in practice, and obtaining at least two remaining target system response time sequences;
the mutation detection unit is used for carrying out mutation detection on the at least two target system response time sequences so as to screen out service alarm events corresponding to the target system response time sequences without data mutation, and a plurality of target service alarm events are obtained;
The root cause positioning unit is used for performing root cause positioning on index data corresponding to each target service alarm event, determining abnormal index data corresponding to the plurality of target service alarm events, wherein the index data comprises a target system response time sequence corresponding to each target service alarm event and a system response time sequence corresponding to a corresponding back-end platform, the back-end platform is a platform to which a service which is pulled or invoked by the transaction system for responding to a calling instruction, the calling instruction is an instruction which is generated by the transaction system for responding to a request instruction of the service event, and a single system response time sequence corresponding to the back-end platform is a sequence which is obtained by arranging a plurality of system response times of the single service which occur multiple times within the preset period according to a sequence relation of occurrence times.
In a third aspect, embodiments of the present application provide an electronic device comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing part or all of the steps as described in any of the methods of the first aspect of the embodiments of the present application.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to perform some or all of the steps as described in any of the methods of the first aspect of the embodiments of the present application.
It can be seen that in the embodiment of the present application, the electronic device may determine a plurality of alarm events, and acquire each of the plurality of alarm events
The method comprises the steps of generating a system response time sequence according to system response time data corresponding to each alarm event, correcting the system response time data corresponding to each alarm event according to base lines corresponding to a plurality of alarm events to obtain a target system response time sequence corresponding to the system response time sequence, carrying out mutation detection on the target system response time sequence to screen out alarm events corresponding to the target system response time sequence without data mutation to obtain a plurality of target alarm events, and finally carrying out root cause positioning on the index data corresponding to each target alarm event to determine abnormal index data corresponding to the plurality of target alarm events, thereby being beneficial to eliminating false alarm events and improving the accuracy and efficiency of abnormal data determination.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for determining abnormal data according to an embodiment of the present application;
FIG. 2a is a schematic illustration of a baseline and system response time provided by an embodiment of the present application;
FIG. 2b is a schematic illustration of another baseline and system response time provided by an embodiment of the present application;
FIG. 2c is a schematic illustration of another baseline and system response time provided by an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 4 is a schematic diagram of an abnormal data determining apparatus provided in an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The electronic device according to the embodiments of the present application may include various handheld devices, vehicle-mounted devices, wearable devices, computing devices, or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), mobile Station (MS), terminal devices (terminal devices), and so on. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices. In some examples, the electronic device may also be a server, and may specifically include an anomaly data determination server.
Referring to fig. 1, fig. 1 is a flowchart of an abnormal data determining method provided in an embodiment of the present application, which is applied to an electronic device, and as shown in the drawing, the abnormal data determining method includes the following operations.
S101, determining a plurality of alarm events.
Wherein the electronic device may comprise a data processing (transaction) system and/or an abnormal data determination system. When the electronic device is a server, the server may be used for determining abnormal data, and the servers corresponding to the data processing system are in communication connection to determine abnormal conditions in the system.
Wherein, the above abnormal data can be understood as: when the data processing system works, data corresponding to abnormal conditions are generated due to faults such as configuration change, processing delay, network delay, system breakdown, system card machine and the like of the system.
Where an event is an action or occurrence within a system, the system will generate or trigger a signal when the event occurs and will provide a mechanism to automatically load the corresponding action.
For example, the electronic device may identify or monitor, by using the monitoring unit, a transaction or a processing situation of a service in the data processing environment, and so on, and timely obtain services of the service corresponding to a plurality of processing situations in a system operation process; the traffic types may include at least one of: search services, transaction services, authentication services, listening services, data deletion services, reminder services, and the like, are not limited herein.
The electronic device may generate a plurality of events when processing the service corresponding to the service type, and the events may include at least one of the following events: alert events, verification events, reminder events, data deletion events, listening events, search events, etc., without limitation; different traffic scenarios may produce different types of events.
And when the abnormal condition occurs in the service, namely when the abnormal service type occurs, for example, if the abnormal condition occurs in the search service, the abnormal transaction service and the abnormal condition occurs in the monitoring service, abnormal data can be generated, when the abnormal condition occurs in the system, the monitoring unit of the system can give an alarm signal according to the abnormal condition, and after the alarm signal is given, the electronic equipment can generate an alarm event.
When an abnormal condition occurs in the system, the electronic device can determine a plurality of alarm events within a preset period (which can be default of the system or set by a user, and is not limited herein).
In this application, a plurality may refer to two or more, and will not be described in detail later.
S102, acquiring index data corresponding to each alarm event in the plurality of alarm events, wherein the index data comprises system response time data corresponding to each alarm event.
The index is a unit or a method for measuring the development degree of a thing, and the change condition corresponding to a certain thing can be quantified or the target can be measured in a digital form.
The index data may be quantized or represented in digital form corresponding to the index when the index fluctuates.
The index may be a unit or a method for evaluating abnormal conditions of the system, and the index data may be specific data corresponding to the index obtained by the electronic device in a period of time. For example, the electronic device may receive a request instruction issued by a terminal device corresponding to a user for the transaction system, where the request instruction generates a call instruction according to the request instruction when the request instruction is processed by a front-end platform of the transaction system a, and pull or call each service corresponding to the transaction system a to the back-end platform according to the call instruction, so as to complete or execute the client request.
The data processing system corresponding to the electronic device can correspond to a plurality of front-end platforms and/or back-end platforms, and response time abnormality of some front-end platforms can be caused by response time abnormality of the back-end platforms, but the relevance between the front-end platforms and the back-end platforms and between the back-end platforms can change at any time; when the abnormal situation occurs, an alarm event is generated, and in the application, the electronic device can take the abnormal situation corresponding to the alarm event as an index, and take the system response time of the front-end platform or the back-end platform responding to the request instruction as index data, wherein each alarm event can correspond to one system response time, and the abnormal situation of the data processing system corresponding to the electronic device can be measured through the system response time.
The index data corresponding to the alarm events may be a set of data that varies linearly or non-linearly.
S103, generating a system response time sequence according to the system response time data corresponding to each alarm event.
In the application, because the electronic device corresponds to a plurality of front-end and/or rear-end platforms, the system response time of the alarm event corresponding to each platform is different, and in the case that a plurality of platforms exist, the sequence of the system response time of the plurality of platforms acquired by the electronic device is different, and when the number of the alarm events is huge, the alarm time of the plurality of platforms acquired by the electronic device may be confused; therefore, the electronic device can serialize the system response time data corresponding to the alarm event, and can arrange a plurality of system response times according to the time sequence relationship to obtain a system response time sequence, so that the abnormal data, namely the reason for generating the abnormal condition in the system, can be determined according to the system response time.
S104, correcting the system response time data corresponding to each alarm event according to the baselines corresponding to the alarm events to obtain a target system response time sequence corresponding to the system response time sequence.
The electronic device may preset a baseline, and a value or a parameter corresponding to the baseline may be set by a user or default by a system, which is not limited herein. The base line can be obtained through historical experience learning, the base line can correspond to a plurality of moments, the base line value corresponding to each moment can be the same or different, different indexes can correspond to different base lines, and when the indexes are abnormal, the base line corresponding to the response time data of the system can be selected in a period of time.
The base line can be any one of preset standard base lines adopted in the current data processing process preselected by the electronic equipment, and can be used for judging or determining an alarm event in a plurality of events generated by the electronic equipment so as to detect abnormal data corresponding to index data through the alarm event.
In a specific implementation, when the system configuration is changed, the response time of the front-end platform and/or the back-end platform in response to the user request is changed, and the overall translation or the deviation is generated, so that the change of the base line is often delayed beyond the range of the base line, the determination of the alarm event is closely related to the base line, in this case, the system generates a false alarm event, namely, a false alarm event is generated, and when the index data corresponding to the false alarm event is processed, the electronic equipment generates judgment errors when determining the real abnormal condition of the system, so that the abnormal data positioning is inaccurate. Therefore, in the application, the system response time data corresponding to each alarm event can be corrected through the baselines corresponding to the alarm events to obtain effective index data, namely the target system response time sequence, so that the target system response time sequence is matched with the correct response time, the influence of the false alarm condition generated by the system configuration change on the determination of the abnormal condition of the whole system is avoided or reduced, and the accuracy of system abnormal data or abnormal condition positioning is improved.
S105, carrying out mutation detection on the target system response time sequence to screen out alarm events corresponding to the target system response time sequence without data mutation, and obtaining a plurality of target alarm events.
Wherein, because of the hysteresis of the change of the base line of each front-end and/or back-end platform, the change time or moment is different, and then the alarm event determined by the electronic device according to the base line is different, and the corresponding system response time may be different; thus, if the baseline difference between the two platforms is too large, when the electronic device determines that the evaluation criteria of the multiple alarm events corresponding to the multiple front-end and/or back-end platforms are different, a severe abrupt change exists in the system response time sequence generated according to the multiple system response time data, and the whole system response time sequence is corrected in the correction step, so that the situation of generating the data abrupt change quota may be ignored, and obviously, the difficulty of locating the association relationship between the alarm event and the abnormal situation is increased if the system response time data of the part is not wanted. Therefore, the abrupt change detection can be continuously carried out on the target system response sequence to screen out the alarm event corresponding to the target system response time sequence without data abrupt change, thereby further screening out the data offset condition caused by false alarm and being beneficial to improving the accuracy of the alarm event.
S106, performing root cause positioning on the index data corresponding to each target alarm event, and determining abnormal index data corresponding to the plurality of target alarm events.
After the system response time data of the error alarm condition is screened, a plurality of target alarm events can be obtained, and root cause positioning is performed according to index data corresponding to the target alarm events, so that abnormal index data corresponding to the target alarm events can be determined.
The abnormal index data may be a certain system response time data, and may be used to locate an alarm event, and further determine a specific system influence cause through the alarm event, for example, may be an alarm event generated by a system fault of one or more platforms, specifically, may be an alarm event generated by a network switch fault of the back-end platform a, or may be an alarm event generated by a router fault of the back-end platform B, so that a fault point may be accurately located, thereby implementing root cause location for the abnormal data.
It can be seen that, in this embodiment of the present application, an electronic device may determine a plurality of alarm events, obtain index data corresponding to each alarm event in the plurality of alarm events, where the index data includes system response time data corresponding to each alarm event, generate a system response time sequence according to the system response time data corresponding to each alarm event, further correct the system response time data corresponding to each alarm event according to a baseline corresponding to the plurality of alarm events, obtain a target system response time sequence corresponding to the system response time sequence, then perform mutation detection on the target system response time sequence to screen out an alarm event corresponding to the target system response time sequence in which no data mutation occurs, obtain a plurality of target alarm events, and finally perform root cause positioning on the index data corresponding to each target alarm event, to determine abnormal index data corresponding to the plurality of target alarm events. Therefore, correction and mutation detection of index data can be used for eliminating the condition of system false alarm, so that the subsequent root cause positioning data preparation is realized, and the accuracy and the efficiency of abnormal data determination are improved.
In one possible example, a plurality of alert events are determined, the method may include the steps of: acquiring first system response time data corresponding to each event in a plurality of events in every preset period; comparing the first system response time data corresponding to each event with the baseline; and taking an event corresponding to the first system response time data exceeding the baseline preset times in the first system response time data as the alarm event to obtain a plurality of alarm events.
Wherein the baseline comprises an upper baseline and a lower baseline, the upper baseline and the lower baseline may correspond to a plurality of baseline values, each of which may correspond to a time instant.
The preset time period and/or the preset times can be set by a user or default by a system, and are not limited herein; the selection of an alarm event from a plurality of events can be constrained by a preset time period and a preset number of times, and can be used as a selection criterion of the alarm event. The preset number of times may be a minimum limit value that exceeds the sum of the upper baseline number of times and the lower baseline number of times.
In a specific implementation, when comparing the first system response time data corresponding to each event with the base line, the electronic device may select an event in the first system response time data, where the sum of the times exceeding the upper base line and the times exceeding the lower base line is greater than a preset time, as an alarm event.
It should be noted that, the upper baseline may correspond to the first preset times, and the lower baseline may correspond to the second preset times, that is, events exceeding the upper baseline or the lower baseline may be respectively restrained according to different preset times, and the implementation manner is the same as that of the present example, and detailed descriptions thereof are omitted herein.
For example, the predetermined time period may be defined to correspond to M minutes, and the unit of M may be minutes, hours, seconds, etc.; the preset number of times is defined as N, wherein M, N is a positive integer greater than 1. The electronic device may obtain first system response time data corresponding to each event in the plurality of events corresponding to each M minutes, as shown in fig. 2a, which is a schematic diagram of a baseline and system response time, a unit of the system response time sequence and a baseline unit may be placed in the same unit standard, an abscissa is the system response time, an event may include an event a (solid point) and an event B (open point), the electronic device may generate the first system response time sequence according to the plurality of first system response time data, if M is 5 minutes, N is 3 times, as shown in the figure, the first system response time data corresponding to the event a has 5 times exceeding the upper baseline and the lower baseline, and the first system response time data corresponding to the event B has 2 times exceeding the upper baseline and the lower baseline, and the event a may be considered as an alarm event, and the event B may be considered as a non-alarm event.
It can be seen that in this example, the preset time period and the preset number of times can be dynamically adjusted to determine the alarm event through the baseline, which is beneficial to accurately and sensitively determine the alarm event from the plurality of events.
In one possible example, if the baseline includes an upper baseline and a lower baseline; the system response time data corresponding to each alarm event is corrected according to the baselines corresponding to the alarm events, and the method may include the following steps: determining a baseline average sequence of the baselines according to the upper baselines and the lower baselines, wherein the baseline average sequence corresponds to i baseline values, i is a positive integer greater than 1; calculating the difference value between each numerical value corresponding to the i baseline values and the baseline values in the system response time sequence to obtain i difference values, wherein the i difference values form a difference value sequence; and correcting the system response time data corresponding to each alarm event according to the difference value sequence.
Wherein each baseline value in the baseline mean sequence is an average of each baseline value of the corresponding upper and lower baselines.
The correction of the system response time data corresponding to the alarm event does not mean that the system response time data corresponding to the alarm event causes false alarms, but the reason is that the base line is correspondingly changed after the system configuration is changed, but the change of the base line often has hysteresis, which is slower than the change of the system response time corresponding to the front-end platform, so that when the base line is used for detecting the system response time data and determining the alarm event, the false alarms are performed, and in consideration that the base line is a comparison standard for determining whether the event is the alarm event or not, the accuracy and the reliability of the finally obtained result are reduced due to the change of the standard, so that the abnormal index data is determined by carrying out subsequent steps by correcting the system response time data corresponding to the alarm event.
Wherein the upper base line and/or the lower base line corresponds to i values, i being a positive integer greater than 1. The value of i can be determined according to a preset correction time period P starting from the current moment, and the value of i is closely related to the preset correction time period P.
Optionally, in order to clearly describe the local features of the subsequent system response time sequence (i.e. the corresponding shape, direction of fluctuation, central position, peak, etc. of the line graph or curve formed by the sequence, not limited herein), the ratio of i/P may be dynamically adjusted to be relatively large when the preset modification period P is less than or equal to an interval value a, i.e. the i value is increased when P is less than or equal to a. For example, if the interval value a is selected to be 6, if a time period within 10min is selected, i/P is 1/2; the i value may be taken as 5; if a time period within 6min is selected, i/P can be adjusted to 2/3, and the value of i can be selected to be 4. Therefore, the readability of the system response time sequence can be increased, and the probability of the subsequent system response time data correction can be improved.
Wherein, when the A sequence is used for representing the baseline average sequence, the formula of the baseline average sequence is expressed as:
Figure SMS_1
wherein up i ,low i A baseline value for the ith upper baseline and a baseline value for the lower baseline; when the B sequence is used to represent the difference sequence, the formula of the difference sequence is:
Figure SMS_2
Wherein C i Mean, which is the value of the ith system response time sequence i Is the ith baseline mean of the difference sequence.
In this example, the difference value between the value in the system response time sequence and the baseline value of the baseline at the corresponding moment is calculated to obtain a difference value sequence composed of a plurality of difference values, and then the system response time data corresponding to the alarm event is corrected according to the difference value sequence, so that the comparison between the subsequent baseline and the corrected system time data can be realized, the error alarm event caused by the hysteresis of the baseline change is eliminated, and the accuracy of determining the abnormal data is improved.
In one possible example, the system response time data corresponding to each alarm event is modified according to the difference sequence, and the method may include the following steps: performing outlier filtering on the i differences in the difference sequence to obtain h target differences; determining the average value of the h target difference values to obtain a target change average value, wherein h is a positive integer greater than 1; and determining a difference value between each numerical value in the system response time sequence and the target change mean value to obtain the target system response time sequence, wherein the target system response time sequence corresponds to i target numerical values.
Wherein the h value is less than or equal to the i value, and when no outlier exists in the i differences, the h value is equal to the i value.
The filtering of the outlier for the i differences in the sequence of differences may be implemented by a clustering algorithm, for example: the method comprises the steps of clustering i differences through a clustering algorithm to obtain a clustering result, wherein the clustering result comprises i points, the i points correspond to the i differences, the clustering result is classified according to the point density, if two kinds of classification are obtained, a first target clustering result and a second target clustering result, the point density of the first target clustering result is larger than that of the second target clustering result, the difference corresponding to the points in the first target clustering result is selected as a normal value, and the difference corresponding to the points in the second target clustering result is filtered as an abnormal value to obtain h target differences. The outliers may be determined by using a mean square error method, a box graph method, or the like, and the data may be filtered by artificial judgment based on experience, which is not limited herein.
And calculating the difference value between each numerical value in the system response time sequence and the target change mean value, wherein i obtained difference values are in one-to-one correspondence with the target numerical values, and the target system response time sequence is formed.
Wherein, when the target system response time sequence is represented by a D sequence, the resulting target system response time sequence may be represented as d=c-B.
As shown in fig. 2b, another schematic diagram of the baseline and the system response time is shown, where the unit of the system response time sequence and the unit of the baseline may be placed in the same unit standard, and the abscissa is the system response time, after the system configuration is changed, the system response time sequence has been changed, and due to the hysteresis of the baseline change, the baseline is not changed, and some normal events may generate the situation shown in fig. 2b, where the comparison between the baseline and the system response time may result in determining that a plurality of normal events are erroneously determined as alarm events. In order to eliminate false alarm events, abnormal data are accurately found, system response data of alarm events are corrected according to a base line, a target system response time sequence is obtained after correction, as shown in fig. 2c, at this time, the target system response time sequence is compared with the base line, events corresponding to the target system response time sequence which does not exceed the base line by preset times are screened out, and the screened events corresponding to the target system response time sequence are determined to be normal events.
Wherein, the event corresponding to the response time sequence of the screened target system is a false alarm event, and the reason for generating the false alarm is that the baseline change has hysteresis when the system configuration is changed.
It can be seen that in this example, the system response time data corresponding to the alarm event is corrected, and the authenticity of the alarm event is determined by using the baseline, which is beneficial to eliminating the false alarm event and to accurately determining the abnormal data subsequently.
In one possible example, before the mutation detection is performed on the target system response time sequence to screen out the alarm events corresponding to the target system response time sequence in which the data mutation does not occur, the method may include the following steps: taking a k moment point as a reference, obtaining a first target system response time sequence before the k moment point and a second target system response time sequence after the k moment point, wherein the first target system response time sequence comprises n 1 A first target value, a second target system response time sequence comprising n 2 A second target value, n 1 、n 2 Are positive integers greater than or equal to 1; according to said n 1 A first target value and said n 2 Determining a first mean value, a second mean value, a first variance and a second variance corresponding to the first target system response time sequence and the second target system response time sequence respectively, wherein the first mean value and the first variance are the mean value and the variance of the first target system response time sequence, and the second mean value and the second variance are the mean value and the variance of the second target system response time sequence; determining a pre-contrast value according to the first mean value, the second mean value, the first variance and the second variance; determining a target critical value; and if the absolute value of the pre-contrast value is larger than the target critical value, determining that the response sequence of the target system is mutated.
The time points are selected one by one, the number of the selected time points is i-2, the number of the obtained pre-contrast values is i-2, and the time points from the 2 nd time point to the i-1 th time point are selected one by one.
Wherein n is 1 +n 2 =i-1, n 1 、n 2 Are positive integers greater than or equal to 1.
Wherein, when t is used to represent the pre-contrast value, the pre-contrast value calculation formula is shown as (1):
Figure SMS_3
(1);
wherein x is 1 For the corresponding value of the first target system response time sequence, x 2 The calculation formula of S is shown as (2) for the value corresponding to the response time sequence of the second target system:
Figure SMS_4
(2)。
wherein S is 1 Is the square of the first variance, S 2 The square of (2) is the second variance.
Wherein, the determination of the target critical value refers to the t distribution table according to the significance level and the degree of freedom V, the found value is the target critical value, wherein, the significance level can be preset manually or default by the system, the significance level can be 95%, 90%, 85% and the like, the formula of the degree of freedom V is expressed as V=n 1 +n 2 -2。
And after the absolute value of the pre-comparison value t is compared with the target critical value, determining that the response sequence of the target system is suddenly changed, screening out the alarm event corresponding to the response sequence of the target system, which is not suddenly changed, so as to obtain a plurality of target alarm events.
In specific implementation, after the k moment is selected, substituting the first mean value, the second mean value, the first variance and the second variance according to the formula to determine a pre-contrast value t, searching a t distribution table after determining the significance level and the degree of freedom to determine a target critical value, and if the absolute value of the pre-contrast value is larger than the target critical value, determining that the response sequence of the target system is suddenly changed.
It can be seen that, in this example, the algorithm of the present application is used to determine whether the response sequence of the target system is mutated, and screen out the alarm event corresponding to the response sequence of the target system that is not mutated, so as to further eliminate the false alarm event and facilitate the subsequent accurate determination of the abnormal data.
In one possible example, the root cause positioning is performed on the index data corresponding to each of the target alarm events, and abnormal index data corresponding to the plurality of target alarm events is determined, and the method may include the following steps: acquiring the back-end index data of a back-end platform corresponding to each target alarm event; clustering the index data corresponding to each target alarm event and the corresponding back end index data by using a clustering algorithm to obtain a plurality of clustering results; and determining index data corresponding to the target alarm event and corresponding to the clustering result with the outlier in the plurality of clustering results as the abnormal index data.
The back-end platform index data may be system response time data corresponding to the back-end platform.
The reason why the distance measurement can be used based on the Pearson correlation coefficient or the distance measurement can be used based on the Spearman correlation coefficient is that the abnormal index data are determined through a target system response time sequence corresponding to the clustering target alarm event and a system response time sequence corresponding to the back-end platform, so that the distance measurement can be suitably used based on the Pearson correlation coefficient or the Spearman correlation coefficient, wherein the clustering algorithm can be a DBSCAN clustering algorithm, the number of classes is not required to be determined in advance, and the clustering algorithm is suitable for determining the abnormal index data.
Wherein, the metric formula based on Pearson correlation coefficient can be expressed as shown in (3):
Figure SMS_5
Figure SMS_6
(3)。
the metric formula based on Spearman correlation coefficient can be expressed as shown in (4):
Figure SMS_7
Figure SMS_8
(4)。
wherein CDS 1 、CDS 2 Respectively a system response time sequence corresponding to the target alarm event and a back-end system response time sequence corresponding to the back-end platform, and a CD 1i CD for the ith component of the system response time sequence corresponding to the target alert event 2i The ith component of the back-end system response time sequence for the corresponding back-end platform, n is the length of the time sequence, RK 1 ,RK 2 Is CDS (compact S) 1 、CDS 2 The components in (1) are converted into a sequence of descending order positions, rk 1i ,rk 2i For RK 1 ,RK 2 Is the i-th component of (c).
Wherein a plurality of sample points can be obtained by clustering the index data, the clustering result including at least one of: clustering, which is an area including a plurality of sample points, and outliers, which are independent sample points that are not clustered.
The CDS can be selected in consideration of that the abnormal index data may cross minutes in the transmission time from the front-end platform to the back-end platform, i.e. the front-end platform transmits the same abnormal index data to the back-end platform in two adjacent minutes i The ith and (i-1) components of (a) are respectively associated with CDS 2 The i-th component of the (a) is brought into a formula to obtain two distance measurement values, namely a first distance measurement value and a second distance measurement value, and a small value in the first distance measurement value and the second distance measurement value is selected as the i-th distance measurement value, so that the error can be further reduced, and the accurate determination of the abnormal index number is facilitatedAccording to the above.
In a specific implementation, determining a distance metric based on a Pearson correlation coefficient or based on a Spearman correlation coefficient according to the condition of index data corresponding to an alarm event, and taking a system response time sequence corresponding to the target alarm event as CDS 1 The response time sequence of the back-end system of the back-end platform corresponding to the target alarm event is used as CDS 2 Clustering is carried out according to the corresponding distance measurement formula, a clustering result is obtained, and if an outlier exists in the obtained clustering result, the index data corresponding to the target alarm event is determined to be abnormal index data.
It can be seen that, in this example, the abnormal data is determined by using the clustering algorithm to determine the system response time data corresponding to the target alarm event and the system response time data of the corresponding back-end platform, which is beneficial to quickly determining the abnormal data.
In one possible example, after the determining the plurality of alarm events, the method may include the steps of: determining an abnormal service event corresponding to each alarm event in the plurality of alarm events to obtain a plurality of abnormal service events, wherein each abnormal service event corresponds to at least one alarm event; determining a first probability corresponding to each alarm event corresponding to a first abnormal service event according to the plurality of alarm events, wherein the first abnormal service event is any one abnormal service event in the plurality of abnormal service events; determining a second probability of occurrence of each second abnormal business event in a plurality of second abnormal business events to obtain a plurality of second probabilities, wherein the second abnormal business event is any abnormal business event except the first abnormal business event in the plurality of abnormal business events; determining a correlation probability relative value between the first abnormal business event and each second abnormal business event according to the first probability and each second probability to obtain a plurality of correlation probability relative values; according to the magnitude relation of the correlation probability relative values, arranging alarm events corresponding to the second abnormal business events corresponding to the correlation probability relative values from large to small to obtain a plurality of target alarm events; and selecting index data corresponding to the target alarm event corresponding to the maximum correlation probability relative value from the plurality of target alarm events as the abnormal index data.
The abnormal business event may be an event corresponding to the abnormal situation of the business, multiple abnormal situations may occur in the process of executing a service corresponding to the business with the abnormal situation, each abnormal situation may generate an alarm event, and one abnormal business event may correspond to multiple alarm events.
The first probability may be a conditional probability, and the relative value of the associated probability is obtained according to a bayesian formula, where the bayesian formula is expressed as follows: p (X) ×p (Y/X) =p (Y) ×p (X/Y), where X represents an alarm event, Y represents an abnormal traffic event, the probability of occurrence of the X event in the preset period is P (X), the probability of occurrence of the Y event in the preset period is P (Y), the correlation probability relative value is represented as P (X/Y)/P (X), and the greater the correlation probability relative value, the greater the probability of occurrence of the abnormal traffic event caused by the alarm event.
The above steps are performed after determining a plurality of alarm events, so as to determine which alarm event of the plurality of alarm events corresponding to an abnormal service event is caused by the reason that the abnormal service event is caused. In addition, root cause positioning is performed on the index data corresponding to each target alarm event, the abnormal index data corresponding to the plurality of target alarm events is determined to be first abnormal index data, the abnormal index data obtained through the steps is second abnormal index data, if the first abnormal index data and the second abnormal index data can be obtained simultaneously, the first abnormal index data is selected to be real abnormal index data, and if only the second abnormal index data is obtained, the first abnormal index data is not obtained, the second abnormal index data is selected to be real abnormal index data.
The electronic equipment can respectively store the alarm event and the abnormal business event which occur as historical alarm event data and historical abnormal business event data into a historical alarm event database and a historical abnormal business event database.
The historical alarm event data comprises a plurality of alarm events, the historical abnormal business event database comprises a plurality of abnormal business event databases, and the plurality of alarm events in the historical alarm event database corresponding to one abnormal business event in the historical abnormal business event database can be obtained through a second preset time period.
In a specific implementation, when an abnormal service event is Y and an alarm event is X, firstly counting a plurality of alarm events in a second preset time period before occurrence of the Y event, calculating the occurrence condition probability P (X/Y) of each alarm event, namely a first probability, determining the occurrence probability of the X event in any preset time period, namely a second probability, according to stored historical alarm event data and historical abnormal service event data, calculating the relative value of the association probability of each alarm event and the abnormal service event according to a Bayesian formula, sorting the association probability values of the alarm events from large to small according to the relative value of the association probability, and selecting the index data corresponding to the alarm event corresponding to the largest association probability value as abnormal index data.
In this example, the association probability value can be calculated by using the algorithm of the present application to determine the association relationship between the alarm event and the abnormal service event, and further obtain the abnormal index data, which is beneficial to quickly determining the abnormal index data.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs are configured by the processor to execute instructions for:
determining a plurality of alarm events;
acquiring index data corresponding to each alarm event in the plurality of alarm events, wherein the index data comprises system response time data corresponding to each alarm event;
generating a system response time sequence according to the system response time data corresponding to each alarm event;
correcting the system response time data corresponding to each alarm event according to the baselines corresponding to the alarm events to obtain a target system response time sequence corresponding to the system response time sequence;
Performing mutation detection on the target system response time sequence to screen out alarm events corresponding to the target system response time sequence without data mutation, so as to obtain a plurality of target alarm events;
and carrying out root cause positioning on the index data corresponding to each target alarm event, and determining the abnormal index data corresponding to the plurality of target alarm events.
It can be seen that, in this embodiment of the present application, an electronic device may determine a plurality of alarm events, obtain index data corresponding to each alarm event in the plurality of alarm events, where the index data includes system response time data corresponding to each alarm event, generate a system response time sequence according to the system response time data corresponding to each alarm event, further correct the system response time data corresponding to each alarm event according to a baseline corresponding to the plurality of alarm events, obtain a target system response time sequence corresponding to the system response time sequence, then perform mutation detection on the target system response time sequence to screen out an alarm event corresponding to the target system response time sequence in which no data mutation occurs, obtain a plurality of target alarm events, and finally perform root cause positioning on the index data corresponding to each target alarm event, to determine abnormal index data corresponding to the plurality of target alarm events. Therefore, correction and mutation detection of index data can be used for eliminating the condition of system false alarm, so that the subsequent root cause positioning data preparation is realized, and the accuracy and the efficiency of abnormal data determination are improved.
In one possible example, in determining a plurality of alert events, the program includes instructions for:
acquiring first system response time data corresponding to each event in a plurality of events in every preset period;
comparing the first system response time data corresponding to each event with the baseline;
and taking an event corresponding to the first system response time data exceeding the baseline preset times in the first system response time data as the alarm event to obtain a plurality of alarm events.
In one possible example, if the baseline includes an upper baseline and a lower baseline; the system response time data corresponding to each alarm event is corrected according to the baselines corresponding to the alarm events, and the program comprises instructions for executing the following steps:
determining a baseline average sequence of the baselines according to the upper baselines and the lower baselines, wherein the baseline average sequence corresponds to i baseline values, i is a positive integer greater than 1;
calculating the difference value between each numerical value corresponding to the i baseline values and the baseline values in the system response time sequence to obtain i difference values, wherein the i difference values form a difference value sequence;
And correcting the system response time data corresponding to each alarm event according to the difference value sequence.
In one possible example, in modifying the system response time data corresponding to each alarm event according to the sequence of differences, the program includes instructions for:
performing outlier filtering on the i differences in the difference sequence to obtain h target differences;
determining the average value of the h target difference values to obtain a target change average value, wherein h is a positive integer greater than 1;
and determining a difference value between each numerical value in the system response time sequence and the target change mean value to obtain the target system response time sequence, wherein the target system response time sequence corresponds to i target numerical values.
In one possible example, before the mutation detection is performed on the target system response time sequence to screen out the alarm events corresponding to the target system response time sequence in which no data mutation occurs, to obtain a plurality of target alarm events, the program includes instructions for performing the following steps:
taking a k moment point as a reference, obtaining a first target system response time sequence before the k moment point and a second target system response time sequence after the k moment point, wherein the first target system response time sequence comprises n 1 A first target value, a second target system response time sequence comprising n 2 A second target value, n 1 、n 2 Are positive integers greater than or equal to 1;
according to said n 1 A first target value and said n 2 Determining a first mean value, a second mean value, a first variance and a second variance corresponding to the first target system response time sequence and the second target system response time sequence respectively, wherein the first mean value and the first variance are the mean value and the variance of the first target system response time sequence, and the second mean value and the second variance are the mean value and the variance of the second target system response time sequence;
determining a pre-contrast value according to the first mean value, the second mean value, the first variance and the second variance;
determining a target critical value;
and if the absolute value of the pre-contrast value is larger than the target critical value, determining that the response sequence of the target system is mutated.
In one possible example, in performing root cause localization on the index data corresponding to each of the target alarm events, determining abnormal index data corresponding to the plurality of target alarm events, the program includes instructions for:
Acquiring the back-end index data of a back-end platform corresponding to each target alarm event;
clustering the index data corresponding to each target alarm event and the corresponding back end index data by using a clustering algorithm to obtain a plurality of clustering results;
and determining index data corresponding to the target alarm event and corresponding to the clustering result with the outlier in the plurality of clustering results as the abnormal index data.
In one possible example, after determining the plurality of alert events, the program includes instructions for:
determining an abnormal service event corresponding to each alarm event in the plurality of alarm events to obtain a plurality of abnormal service events, wherein each abnormal service event corresponds to at least one alarm event;
determining a first probability corresponding to each alarm event corresponding to a first abnormal service event according to the plurality of alarm events, wherein the first abnormal service event is any one abnormal service event in the plurality of abnormal service events;
determining a second probability of occurrence of each second abnormal business event in a plurality of second abnormal business events to obtain a plurality of second probabilities, wherein the second abnormal business event is any abnormal business event except the first abnormal business event in the plurality of abnormal business events;
Determining a correlation probability relative value between the first abnormal business event and each second abnormal business event according to the first probability and each second probability to obtain a plurality of correlation probability relative values;
according to the magnitude relation of the correlation probability relative values, arranging alarm events corresponding to the second abnormal business events corresponding to the correlation probability relative values from large to small to obtain a plurality of target alarm events;
and selecting index data corresponding to the target alarm event corresponding to the maximum correlation probability relative value from the plurality of target alarm events as the abnormal index data.
The foregoing description of the embodiments of the present application has been presented primarily in terms of a method-side implementation. It will be appreciated that the server, in order to implement the above-described functions, includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application may divide the functional units of the server according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated in one processing unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.
In the case of dividing each functional module with corresponding each function, fig. 4 shows a schematic diagram of an abnormal data determination apparatus, as shown in fig. 4, which is applied to an electronic device, the abnormal data determination apparatus 400 may include: a determining unit 401, an acquiring unit 402, a generating unit 403, a correcting unit 404, a mutation detecting unit 405 and a root cause positioning unit 406, wherein,
the determining unit 401 is configured to determine a plurality of alarm events;
the acquiring unit 402 is configured to acquire index data and corresponding system response time data corresponding to each of the plurality of alarm events;
The generating unit 403 is configured to generate a system response time sequence according to the system response time data corresponding to each alarm event;
the correcting unit 404 is configured to correct the system response time data corresponding to each alarm event according to the baselines corresponding to the plurality of alarm events, so as to obtain a target system response time sequence corresponding to the system response time sequence;
the mutation detection unit 405 is configured to perform mutation detection on the target system response time sequence to screen out alarm events corresponding to the target system response time sequence in which no data mutation occurs, so as to obtain a plurality of target alarm events;
the root cause positioning unit 406 is configured to perform root cause positioning on the index data corresponding to each of the target alarm events, and determine abnormal index data corresponding to the plurality of target alarm events.
It can be seen that, in this embodiment of the present application, an electronic device may determine a plurality of alarm events, obtain index data corresponding to each alarm event in the plurality of alarm events, where the index data includes system response time data corresponding to each alarm event, generate a system response time sequence according to the system response time data corresponding to each alarm event, further correct the system response time data corresponding to each alarm event according to a baseline corresponding to the plurality of alarm events, obtain a target system response time sequence corresponding to the system response time sequence, then perform mutation detection on the target system response time sequence to screen out an alarm event corresponding to the target system response time sequence in which no data mutation occurs, obtain a plurality of target alarm events, and finally perform root cause positioning on the index data corresponding to each target alarm event, to determine abnormal index data corresponding to the plurality of target alarm events. Therefore, correction and mutation detection of index data can be used for eliminating the condition of system false alarm, so that the subsequent root cause positioning data preparation is realized, and the accuracy and the efficiency of abnormal data determination are improved.
In one possible example, in determining a plurality of alarm events, the determining unit 401 is specifically configured to:
acquiring first system response time data corresponding to each event in a plurality of events in a preset period;
comparing the first system response time data corresponding to each event with the base line, wherein the base line is a base line corresponding to the preset period selected from preset standard base lines;
and taking an event corresponding to the first response time data exceeding the baseline preset times in the first response time data as the alarm event to obtain a plurality of alarm events.
In one possible example, if the baseline includes an upper baseline and a lower baseline; the aspect of correcting the system response time data corresponding to each alarm event according to the baselines corresponding to the alarm events, where the correcting unit 404 is specifically configured to:
determining a baseline average sequence of the baselines according to the upper baselines and the lower baselines, wherein the baseline average sequence corresponds to i baseline values, i is a positive integer greater than 1;
calculating the difference value between each numerical value corresponding to the i baseline values and the baseline values in the system response time sequence to obtain i difference values, wherein the i difference values form a difference value sequence;
And correcting the system response time data corresponding to each alarm event according to the difference value sequence.
In one possible example, in correcting the system response time data corresponding to each alarm event according to the difference sequence, the correction unit 404 is specifically configured to:
performing outlier filtering on the i differences in the difference sequence to obtain h target differences;
determining the average value of the h target difference values to obtain a target change average value, wherein h is a positive integer greater than 1;
and determining a difference value between each numerical value in the system response time sequence and the target change mean value to obtain the target system response time sequence, wherein the target system response time sequence corresponds to i target numerical values.
In one possible example, before the mutation detection is performed on the target system response time sequence to screen out the alarm events corresponding to the target system response time sequence in which no data mutation occurs, so as to obtain a plurality of target alarm events, the mutation detection unit 405 is specifically configured to:
taking a k moment point as a reference, obtaining a first target system response time sequence before the k moment point and a second target system response time sequence after the k moment point, wherein the first target system response time sequence comprises n 1 A first target value, a second target system response time sequence comprising n 2 A second target value, n 1 、n 2 Are positive integers greater than or equal to 1;
according to said n 1 A first target value and said n 2 Determining a first mean value, a second mean value, a first variance and a second variance corresponding to the first target system response time sequence and the second target system response time sequence respectively, wherein the first mean value and the first variance are the mean value and the variance of the first target system response time sequence, and the second mean value and the second variance are the mean value and the variance of the second target system response time sequence;
determining a pre-contrast value according to the first mean value, the second mean value, the first variance and the second variance;
determining a target critical value;
and if the absolute value of the pre-contrast value is larger than the target critical value, determining that the response sequence of the target system is mutated.
In one possible example, in performing root cause positioning on the index data corresponding to each of the target alarm events, determining abnormal index data corresponding to the plurality of target alarm events, the root cause positioning unit 406 is specifically configured to:
Acquiring the back-end index data of a back-end platform corresponding to each target alarm event;
clustering the index data corresponding to each target alarm event and the corresponding back end index data by using a clustering algorithm to obtain a plurality of clustering results;
and determining index data corresponding to the target alarm event and corresponding to the clustering result with the outlier in the plurality of clustering results as the abnormal index data.
In one possible example, after determining a plurality of alarm events, the determining unit 401 is specifically configured to:
determining a service event corresponding to each alarm event in the plurality of alarm events to obtain a plurality of service events, wherein each service event corresponds to at least one alarm event;
determining a first probability corresponding to each alarm event corresponding to a first service event according to the plurality of alarm events, wherein the first service event is any one service event in the plurality of service events;
determining a second probability of occurrence of each second business event in a plurality of second business events to obtain a plurality of second probabilities, wherein the second business event is any business event except the first business event in the plurality of business events;
Determining a correlation probability relative value between the first business event and each second business event according to the first probability and each second probability to obtain a plurality of correlation probability relative values;
according to the magnitude relation of the correlation probability relative values, arranging alarm events corresponding to the second business events corresponding to the correlation probability relative values from large to small to obtain a plurality of target alarm events;
and selecting index data corresponding to the target alarm event corresponding to the maximum correlation probability relative value from the plurality of target alarm events as the abnormal index data.
It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.
The electronic device provided in this embodiment is configured to execute the abnormal data determining method, so that the same effects as those of the implementation method can be achieved.
The embodiment of the application also provides a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to execute part or all of the steps of any one of the methods described in the embodiments of the method, where the computer includes a server.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-only memory, random access memory, magnetic or optical disk, etc.
The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (5)

1. An anomaly data determination method applied to an electronic device, wherein the electronic device is a server in communication with a transaction system, and the method comprises the following steps:
determining a plurality of service alarm events, including a baseline comparison operation as in steps a-c:
step a, acquiring system response time data of each event in a plurality of events within a preset period, wherein the event is a service event of a service type, and the service type comprises at least one of the following: the system response time data of a single event comprises a plurality of system response times counted by the single event which occurs for a plurality of times in the preset period;
Step b), comparing the system response time data corresponding to each event with a baseline corresponding to the system response time data, wherein the baseline comprises an upper baseline and a lower baseline, the upper baseline and the lower baseline can correspond to a plurality of baseline values, a single baseline value can correspond to one moment, and the baseline corresponding to the system response time data is used for representing the following states: the baseline required by the system response time of the x-th occurrence of each event to execute the comparison is the baseline corresponding to the moment of the x-th occurrence, x is any one of a plurality of times of occurrence of each event in the preset period, and the baseline is obtained through historical experience learning;
step c, selecting the event with the sum of times exceeding the upper baseline and times exceeding the lower baseline in the system response time data being more than the preset times as the service alarm event, and obtaining a plurality of service alarm events;
generating a system response time sequence according to the system response time data corresponding to each service alarm event in the service alarm events, including: arranging a plurality of system response times of each service alarm event according to the sequence of the occurrence time to obtain a system response time sequence;
Correcting the system response time data corresponding to each service alarm event according to the baselines corresponding to the service alarm events to obtain a target system response time sequence corresponding to the system response time sequence, wherein the method comprises the following steps of d-i:
step d, for each service alarm event in the plurality of service alarm events, performing the following correction operation to obtain a plurality of target system response time sequences corresponding to the plurality of service alarm events one by one:
step e, determining a baseline average value sequence of a plurality of baseline values in a baseline corresponding to a system response time sequence of a currently processed service alarm event according to an upper baseline value and a lower baseline value of the plurality of baseline values in the baseline, wherein the baseline average value sequence corresponds to a plurality of baseline average values corresponding to a plurality of system response times in the system response time sequence one by one;
f, calculating the difference value between each system response time in the system response time sequence and the corresponding baseline mean value to obtain a difference value sequence;
step g, filtering abnormal values of a plurality of differences in the difference sequence to obtain filtered h target differences;
step h, determining the average value of the h target difference values to obtain a target change average value, wherein h is a positive integer greater than 1;
Step i, calculating the difference value between the system response time in the system response time sequence and the target change mean value to obtain each corrected system response time, and further obtaining a target system response time sequence;
aiming at a plurality of target system response time sequences of the plurality of service alarm events, executing the baseline comparison operation, screening out target system response time sequences which do not exceed the baseline for the preset times, and obtaining at least two residual target system response time sequences;
performing mutation detection on the at least two target system response time sequences to screen out service alarm events corresponding to the target system response time sequences without data mutation, so as to obtain a plurality of target service alarm events;
and carrying out root cause positioning on index data corresponding to each target service alarm event, and determining abnormal index data corresponding to the plurality of target service alarm events, wherein the index data comprises a target system response time sequence corresponding to each target service alarm event and a system response time sequence corresponding to a corresponding back-end platform, the back-end platform is a platform to which a service which is pulled or called by the transaction system for responding to a calling instruction, the calling instruction is an instruction which is generated by the transaction system for responding to a request instruction of the service event, and a single system response time sequence corresponding to the back-end platform is a sequence which is obtained by arranging a plurality of system response times of the single service in the preset period according to the sequence of occurrence times.
2. The method of claim 1, wherein the performing root cause positioning on the index data corresponding to each target service alarm event, determining abnormal index data corresponding to the plurality of target service alarm events, comprises:
acquiring the back-end index data of a back-end platform corresponding to each target service alarm event;
clustering the target system response time sequence corresponding to each target service alarm event and the system response time sequence corresponding to the corresponding back-end platform by using a clustering algorithm to obtain a plurality of clustering results;
and determining a reference system response time sequence corresponding to a target service alarm event corresponding to a clustering result with an outlier in the plurality of clustering results as the abnormal index data, wherein the reference system response time sequence comprises at least one of the target system response time sequence and a system response time sequence corresponding to the back-end platform.
3. An anomaly data determination device for use with an electronic device that is a server in communication with a transaction system, the anomaly data determination device comprising:
a determining unit, configured to determine a plurality of service alarm events, including a baseline comparison operation as in steps a-c:
Step a, acquiring system response time data of each event in a plurality of events within a preset period, wherein the event is a service event of a service type, and the service type comprises at least one of the following: the system response time data of a single event comprises a plurality of system response times counted by the single event which occurs for a plurality of times in the preset period;
step b), comparing the system response time data corresponding to each event with a baseline corresponding to the system response time data, wherein the baseline comprises an upper baseline and a lower baseline, the upper baseline and the lower baseline can correspond to a plurality of baseline values, a single baseline value can correspond to one moment, and the baseline corresponding to the system response time data is used for representing the following states: the base line required by the system response time of the x-th occurrence of each event for executing the comparison is the base line corresponding to the moment of the x-th occurrence, and x is any one of a plurality of times of occurrence of each event in the preset period;
step c, selecting the event with the sum of times exceeding the upper baseline and times exceeding the lower baseline in the system response time data being more than the preset times as the service alarm event, and obtaining a plurality of service alarm events;
The generating unit is configured to generate a system response time sequence according to the system response time data corresponding to each service alarm event, where the generating unit includes: arranging a plurality of system response times of each service alarm event according to the sequence of the occurrence time to obtain a system response time sequence;
the correction unit is configured to correct the system response time data corresponding to each service alarm event according to the baselines corresponding to the plurality of service alarm events, so as to obtain a target system response time sequence corresponding to the system response time sequence, and includes the following steps:
step d, for each service alarm event in the plurality of service alarm events, performing the following correction operation to obtain a plurality of target system response time sequences corresponding to the plurality of service alarm events one by one:
step e, determining a baseline average value sequence of a plurality of baseline values in a baseline corresponding to a system response time sequence of a currently processed service alarm event according to an upper baseline value and a lower baseline value of the plurality of baseline values in the baseline, wherein the baseline average value sequence corresponds to a plurality of baseline average values corresponding to a plurality of system response times in the system response time sequence one by one;
F, calculating the difference value between each system response time in the system response time sequence and the corresponding baseline mean value to obtain a difference value sequence;
step g, filtering abnormal values of a plurality of differences in the difference sequence to obtain filtered h target differences;
step h, determining the average value of the h target difference values to obtain a target change average value, wherein h is a positive integer greater than 1;
step i, calculating the difference value between the system response time in the system response time sequence and the target change mean value to correct each system response time, thereby obtaining a target system response time sequence;
the baseline comparison unit is used for executing the baseline comparison operation aiming at a plurality of target system response time sequences of the plurality of service alarm events, screening out target system response time sequences which do not exceed the baseline preset times in practice, and obtaining at least two remaining target system response time sequences;
the mutation detection unit is used for carrying out mutation detection on the at least two target system response time sequences so as to screen out service alarm events corresponding to the target system response time sequences without data mutation, and a plurality of target service alarm events are obtained;
The root cause positioning unit is used for performing root cause positioning on index data corresponding to each target service alarm event, determining abnormal index data corresponding to the plurality of target service alarm events, wherein the index data comprises a target system response time sequence corresponding to each target service alarm event and a system response time sequence corresponding to a corresponding back-end platform, the back-end platform is a platform to which a service which is pulled or invoked by the transaction system for responding to a calling instruction, the calling instruction is an instruction which is generated by the transaction system for responding to a request instruction of the service event, and a single system response time sequence corresponding to the back-end platform is a sequence which is obtained by arranging a plurality of system response times of the single service which occur multiple times within the preset period according to a sequence relation of occurrence times.
4. An electronic device comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of claim 1 or 2.
5. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to claim 1 or 2.
CN202211010956.0A 2022-08-23 2022-08-23 Abnormal data determination method and related device Active CN115081969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211010956.0A CN115081969B (en) 2022-08-23 2022-08-23 Abnormal data determination method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211010956.0A CN115081969B (en) 2022-08-23 2022-08-23 Abnormal data determination method and related device

Publications (2)

Publication Number Publication Date
CN115081969A CN115081969A (en) 2022-09-20
CN115081969B true CN115081969B (en) 2023-05-09

Family

ID=83245015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211010956.0A Active CN115081969B (en) 2022-08-23 2022-08-23 Abnormal data determination method and related device

Country Status (1)

Country Link
CN (1) CN115081969B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499246B (en) * 2022-11-15 2023-04-07 阿里云计算有限公司 Abnormal event processing and detecting method and processing system
CN116743637B (en) * 2023-08-15 2023-11-21 中移(苏州)软件技术有限公司 Abnormal flow detection method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269189A (en) * 2017-07-05 2018-07-10 中国中投证券有限责任公司 Achievement data monitoring method, device, storage medium and computer equipment
CN111506478A (en) * 2020-04-17 2020-08-07 上海浩方信息技术有限公司 Method for realizing alarm management control based on artificial intelligence

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9865101B2 (en) * 2015-10-30 2018-01-09 Wipro Limited Methods for detecting one or more aircraft anomalies and devices thereof
US10510078B2 (en) * 2015-11-24 2019-12-17 Vesta Corporation Anomaly detection in groups of transactions
CN109542740B (en) * 2017-09-22 2022-05-27 阿里巴巴集团控股有限公司 Abnormality detection method and apparatus
US10855548B2 (en) * 2019-02-15 2020-12-01 Oracle International Corporation Systems and methods for automatically detecting, summarizing, and responding to anomalies
CN110287078B (en) * 2019-04-12 2024-01-23 上海新炬网络技术有限公司 Abnormality detection and alarm method based on zabbix performance baseline
CN112395120A (en) * 2019-08-14 2021-02-23 阿里巴巴集团控股有限公司 Abnormal point detection method, device, equipment and storage medium
US11526422B2 (en) * 2019-11-18 2022-12-13 Bmc Software, Inc. System and method for troubleshooting abnormal behavior of an application
JP7393244B2 (en) * 2020-02-25 2023-12-06 株式会社日立製作所 Time series data prediction device and time series data prediction method
CN111931860B (en) * 2020-09-01 2021-02-09 腾讯科技(深圳)有限公司 Abnormal data detection method, device, equipment and storage medium
CN112463834B (en) * 2020-12-02 2024-08-27 中国建设银行股份有限公司 Method and device for automatically realizing root cause analysis in stream processing and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269189A (en) * 2017-07-05 2018-07-10 中国中投证券有限责任公司 Achievement data monitoring method, device, storage medium and computer equipment
CN111506478A (en) * 2020-04-17 2020-08-07 上海浩方信息技术有限公司 Method for realizing alarm management control based on artificial intelligence

Also Published As

Publication number Publication date
CN115081969A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN115081969B (en) Abnormal data determination method and related device
CN109587001B (en) Performance index abnormality detection method and device
CN113556258B (en) Anomaly detection method and device
US9354968B2 (en) Systems and methods for data quality control and cleansing
CN112188531B (en) Abnormality detection method, abnormality detection device, electronic apparatus, and computer storage medium
CN110457175B (en) Service data processing method and device, electronic equipment and medium
CN110933115B (en) Analysis object behavior abnormity detection method and device based on dynamic session
US8710976B2 (en) Automated incorporation of expert feedback into a monitoring system
CN113778802B (en) Abnormality prediction method and device
CN112148768A (en) Index time series abnormity detection method, system and storage medium
CN113392893B (en) Method, device, storage medium and computer program product for locating business fault
CN115796708B (en) Big data intelligent quality inspection method, system and medium for engineering construction
CN107071788B (en) Spectrum sensing method and device in cognitive wireless network
CN111176953A (en) Anomaly detection and model training method thereof, computer equipment and storage medium
CN113723861A (en) Abnormal electricity consumption behavior detection method and device, computer equipment and storage medium
CN112380073B (en) Fault position detection method and device and readable storage medium
CN114547145A (en) Method, system, storage medium and equipment for detecting time sequence data abnormity
CN116743637B (en) Abnormal flow detection method and device, electronic equipment and storage medium
CN111259877B (en) Method, device and equipment for judging traffic abnormal scene and storage medium
CN113010394A (en) Machine room fault detection method for data center
CN115588439B (en) Fault detection method and device of voiceprint acquisition device based on deep learning
CN110688273B (en) Classification model monitoring method and device, terminal and computer storage medium
CN111555917A (en) Alarm information processing method and device based on cloud platform
CN115225455B (en) Abnormal device detection method and device, electronic device and storage medium
CN111258788B (en) Disk failure prediction method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant