WO2021185182A1 - Procédé et appareil de détection d'anomalie - Google Patents

Procédé et appareil de détection d'anomalie Download PDF

Info

Publication number
WO2021185182A1
WO2021185182A1 PCT/CN2021/080564 CN2021080564W WO2021185182A1 WO 2021185182 A1 WO2021185182 A1 WO 2021185182A1 CN 2021080564 W CN2021080564 W CN 2021080564W WO 2021185182 A1 WO2021185182 A1 WO 2021185182A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
boundary
period
value
index value
Prior art date
Application number
PCT/CN2021/080564
Other languages
English (en)
Chinese (zh)
Inventor
卢冠男
朱红燕
莫林林
孙芮
李冕正
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021185182A1 publication Critical patent/WO2021185182A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • This application relates to the technical field of operation and maintenance of financial technology (Fintech), and in particular to a method and device for anomaly detection.
  • computers can directly handle most of the business, such as detecting whether wealth management products are in a normal transaction state.
  • the detection of transaction status can be monitored by indicators such as transaction volume and transaction delay; when indicators such as transaction volume or transaction delay are monitored.
  • indicators such as transaction volume or transaction delay are monitored
  • an exception occurs, an alarm is generated, and the operation and maintenance personnel can learn that the transaction is abnormal, and restore the normal state of the transaction by repairing abnormal transaction equipment, abnormal transaction procedures, or setting malicious purchase account permissions.
  • the release and promotion effects of the product can also be judged by the abnormality of the monitored transaction volume or transaction delay indicators.
  • the embodiment of the present invention provides a method and device for abnormality detection, which can automatically adjust the abnormality threshold and improve the accuracy of abnormality detection on the basis of occupying a small memory.
  • an embodiment of the present invention provides an abnormality detection method, the method including:
  • For the collected data in the first major cycle determine at least one index value of each minor cycle of the index to be detected in the first major cycle; for at least one index value of each minor cycle, according to preset boundary rules, Determine the boundary index value of the index to be detected in each small period; determine the boundary index value of the index to be detected in the first major period according to the boundary index value of each small period and in accordance with the boundary rule;
  • the boundary index value of the first major cycle determines the credible boundary index value of the first major cycle; the credible boundary index value of the first major cycle is used as abnormal detection of the collected data in the second major cycle
  • the detection threshold; the second large period is the period after the first large period.
  • the first major cycle includes multiple small cycles, and the data in the first major cycle is collected to determine at least one index value in the first major cycle and the middle and small cycles; for each minor cycle the corresponding index value ,
  • the preset boundary rules obtain the boundary index value corresponding to each small period.
  • the index value of each small period can be selected through the boundary rule, and the boundary index value corresponding to each small period can be obtained, so that the boundary index value corresponding to each small period is more accurate.
  • the boundary index value of the first major period is obtained again according to the boundary rule through the boundary index value corresponding to each small period of the first major period, so that the boundary index value of the first major period is more accurate.
  • the credible boundary index value of the first major cycle is obtained according to the boundary index value of the first major cycle, and the credible boundary index value is used as the detection threshold of the data generated later; in this way, the boundary index of the first major cycle
  • relevant adjustments are made to obtain the credible boundary index value, so that the data collected in the first major cycle afterwards can fluctuate in a normal and reasonable range without triggering abnormalities and causing abnormal abnormalities.
  • the adaptive adjustment of the detection threshold is realized, the accuracy of the detection threshold is improved, and the accuracy of data anomaly detection is further improved.
  • the boundary rule is that for a set of data, the boundary value is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum The value is determined as the boundary value of the group of data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data Certainly.
  • the boundary value is selected by boundary rules; specifically, it is judged whether the number of data in the maximum density area is greater than the density threshold. If the number of data in the maximum density area is greater than the density threshold, it can be considered as being at the maximum value. The density of nearby data conforms to the law of data change, and the maximum value is reasonable data, which can be used as a boundary value. If the number of data in the maximum density area is less than the density threshold, it can be determined that the amount of data near the maximum value is abnormally small and does not conform to the law of data change. The maximum value is likely to be abnormal data, and the maximum value is deleted and restarted.
  • Re-determine the maximum value in the set of data continue to determine whether the number of data in the density area of the re-determined maximum value is greater than the density threshold, until the number of data in the density area of the determined maximum value is greater than the density threshold, then the maximum The value is the boundary value. Therefore, the accuracy of determining the boundary value can be improved, and the accuracy of the subsequent calculation of the detection threshold can be prevented because the boundary value is abnormal data.
  • the density area is determined in the following ways, including:
  • the density area is determined.
  • the density area is centered on the maximum value, and the area radius is the area determined by the radius.
  • the maximum and minimum values of a set of data determine the number of partitions of the set of data, and determine the area radius through the maximum, minimum, and number of partitions, so that the obtained area radius can more accurately divide the density area, so that the density
  • the region can accurately characterize the data distribution characteristics near the maximum value, and then determine whether the data near the maximum value is abnormal, determine whether the maximum value conforms to the data change law, and then determine whether the maximum value is applied to the subsequent detection threshold calculation, so , Increase the accuracy of the detection threshold calculation.
  • determining the credible boundary index value of the first major period according to the boundary index value of the first major period includes:
  • up_boundary is the credible boundary index value of the first major cycle
  • up_p is the boundary index value of the first major cycle
  • K is the fluctuation coefficient
  • eps is the area radius, according to the small cycle in the first major cycle
  • the maximum value, minimum value of the boundary index value and the number of partitions are determined; the number of partitions is determined according to the number of data of the boundary index value of the small period in the first major period; base is determined according to the maximum value and The minimum value is determined.
  • the credible boundary index value is equal to the sum of the boundary index value, twice the area radius and base.
  • the boundary index value plus twice the area radius can set a reasonable fluctuation interval for subsequent data on the basis of the boundary index value, and the setting of base can increase a small amount of accidental data on the basis of the reasonable fluctuation interval to ensure abnormality. In the case of detection accuracy, the false alarm rate is reduced.
  • it also includes:
  • the maximum value in the group of data is used as the boundary value of the group of data.
  • the maximum value in the set of data can be used as the boundary value to prevent the detection threshold from being unable to be calculated because the boundary value is empty.
  • the density threshold is set according to the data volume of the set of data, including:
  • the density threshold is determined by the amount of data, which can increase the rationality and accuracy of the density threshold.
  • each small period in the first large period is the same time period.
  • an embodiment of the present invention provides an abnormality detection device, the device includes:
  • An acquisition unit configured to determine at least one index value of each small period of the indicator to be detected in the first large period for the collected data in the first large period;
  • the processing unit is configured to determine the boundary index value of the indicator to be detected in each small period for at least one index value of each small period according to preset boundary rules; according to the boundary index value of each small period, according to the The boundary rule, determining the boundary index value of the index to be detected in the first major period;
  • the processing unit is further configured to determine the credible boundary index value of the first major cycle according to the boundary index value of the first major cycle; the credible boundary index value of the first major cycle is used as a comparison A detection threshold for abnormal detection of collected data in a large period; the second large period is a period after the first large period.
  • an embodiment of the present application further provides a computing device, including: a memory, configured to store program instructions; a processor, configured to call the program instructions stored in the memory, and execute according to the obtained program as in the first aspect
  • a computing device including: a memory, configured to store program instructions; a processor, configured to call the program instructions stored in the memory, and execute according to the obtained program as in the first aspect
  • embodiments of the present application also provide a computer-readable non-volatile storage medium, including computer-readable instructions.
  • the computer reads and executes the computer-readable instructions, the computer executes the same as in the first aspect.
  • FIG. 1 is a schematic diagram of an anomaly detection architecture provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of an abnormality detection method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a table of average delay data collected in a small period according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a curve of average delay data collected in a small period according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of yet another anomaly detection method provided by an embodiment of the present invention.
  • Fig. 6 is a schematic diagram of an abnormality detection device provided by an embodiment of the present invention.
  • Fig. 1 is an anomaly detection system architecture provided by an embodiment of the present invention.
  • the data collection module 101 sends the collected data of each small cycle or the data of the first large cycle collected to the calculation module 102; the calculation module 102 collects data according to the The data of each small cycle sent by the data module 101 calculates the boundary index value of each small cycle, and then calculates the boundary index value of the first major cycle according to the boundary index value of each minor cycle in the first major cycle, and further calculates the boundary index value of the first major cycle.
  • the calculation module 102 sends the obtained credible boundary index value to the detection module 103, and the detection module 103 uses the credible boundary index value as the detection threshold to detect whether the data in the second largest cycle is abnormal.
  • the embodiment of the present application provides a method flow of anomaly detection, as shown in FIG. 2, including:
  • Step 201 With regard to the collected data in the first major cycle, determine at least one index value of each minor cycle of the index to be detected in the first major cycle;
  • the first major cycle can be ten days, one week, four days, three days, etc.
  • the minor cycle can be six hours, three hours, one hour, etc.
  • the first major cycle and the minor cycle can be based on the laws and specifics of past data You need to wait for the setting, and there is no restriction on the specifics.
  • the index to be detected is the index corresponding to the need to be detected, for example, the operation index of the lithography equipment of the chip: resolution, alignment accuracy, etc.
  • Banks newly launched transaction indicators for wealth management products transaction volume, transaction delay, etc.
  • the collected data is the data generated corresponding to the indicators to be detected. For example, the resolution and alignment accuracy of the chip's lithography equipment need to be collected to determine whether the accuracy of the lithography equipment is qualified.
  • the index value is the value that reflects the operating level, such as the maximum value and the minimum value in the index data in a small period.
  • Step 202 For at least one indicator value of each small period, determine the boundary indicator value of the indicator to be detected in each small period according to a preset boundary rule;
  • the boundary index value is the upper boundary index value, lower boundary index value, discrete boundary, convergence boundary, conversion boundary, etc. in the index data group to be detected collected in the corresponding small period.
  • the specific boundary index value type is based on the index to be detected The data characteristics and change rules of the data are determined, and the specifics are not limited.
  • the preset boundary rule may be that the boundary value of a group of data is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum value is determined as the The boundary value of the group of data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • the preset boundary rule may be that the boundary value of a set of data is to determine the density area from the minimum value in the data; if the number of data in the density area is greater than the density threshold, the minimum value is determined as The boundary value of the group of data; otherwise, the minimum value is deleted from the group of data, and the step of determining the density area from the minimum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • the preset boundary rule is the rule of how to obtain the boundary value. Determine the maximum value and the density area of the maximum value from a set of data for which the boundary value needs to be obtained; if the number of data in the density area of the maximum value is greater than the density threshold , The maximum value is the boundary value of the group of data; otherwise, the number of data in the density area of the maximum value is less than the density threshold, the maximum value is deleted from the group of data; in the group where the maximum value is deleted Determine the maximum value in the data, and continue to determine whether the number of data in the density area of the maximum value determined the second time is greater than the density threshold; if it is greater, the maximum value determined the second time is the boundary value of the group of data; otherwise, Delete the maximum value determined for the second time; determine the maximum value in the group of data where the maximum value determined for the first time and the maximum value determined for the second time have been deleted; continue to determine the data in the density area of the maximum value determined for the third time Whether the number of is greater than the density threshold;
  • the minimum value and the density area of the minimum value are determined from a group of data for which the boundary value needs to be obtained; if the number of data in the minimum density area is greater than the density threshold, the minimum value is the boundary value of the group of data; Otherwise, the number of data in the density area of the minimum value is less than the density threshold, the minimum value is deleted from the group of data; the minimum value is determined in the group of data where the minimum value is deleted, and the second determination is continued.
  • the minimum value determined for the second time is the boundary value of the group of data; otherwise, the minimum value determined for the second time is deleted; Determine the minimum value in the set of data of the minimum value determined at the first time and the minimum value determined at the second time, and continue to determine whether the number of data in the density area of the minimum value determined at the third time is greater than the density threshold; until the minimum determined If the number of data in the density region of the value is greater than the density threshold, the final minimum value is used as the boundary value, that is, the boundary index value.
  • the density threshold is set according to the data volume of the set of data, including:
  • the corresponding function can be set according to the data characteristics such as the density change rate of the collected data and the square difference between the data.
  • the density threshold is determined by a power function, or the density threshold is determined by a logarithmic function.
  • the coefficient value of the correlation coefficient is set according to actual needs, and the calculation method of the density threshold is not specifically limited.
  • the density area is determined by the following methods, including: determining the number of partitions of the group of data according to the maximum and minimum values of the group of data; determining the area according to the maximum and minimum values of the group of data and the number of partitions Radius; taking the maximum value as the center and the area radius as the radius to determine the density area.
  • the characteristics of the index data can be analyzed according to the index data of the same type or different types in history, but with the same changing law, and the corresponding operation can be determined through the maximum value and the minimum value to obtain the corresponding partition number, so that the number of partitions and the number of partitions
  • the area radius determined by the maximum and minimum values of the group data can more accurately divide the density area, so that the number of data in the density area of the index value can accurately represent whether the index value is abnormal, and the rationality of the index value can be accurately judged.
  • FIG. 3 is a schematic diagram of a table of average delay data collected in a small period according to an embodiment of the present invention, as shown in the figure. As shown in 3, real-time collection of transactions that occur within each minute, and calculation of the average delay generated by transactions within each minute. Among them, the shaded part has the maximum average delay of 60ms, the minimum average delay of 10ms, and the average delay of 10ms. The average value of the maximum value and the minimum value is 35ms, and the number of partitions can be determined by the following formula:
  • block_edge [(-1,4),(50,4),(200,6),(500,8),(1000,10),(2000,14),(mean+1,14)]
  • mean+1 is the average value. Therefore, in the formula (-1,4), (50,4), that is, the number of partitions corresponding to the mean value +1 greater than -1 and less than 50 is 4. (50,4),(200,6), that is, the number of partitions corresponding to the mean value +1 greater than 50 and less than 200 is 4. (200,6),(500,8), that is, the mean value +1 is greater than 200 and less than 500, the number of partitions corresponding to 6,...(2000,14),(mean+1,14)], that is, the mean value + The number of partitions with 1 greater than 2000 is 14. Then the number of partitions corresponding to an average value of 35+1 is 4.
  • the area radius is determined according to the maximum and minimum values of the set of data and the number of partitions.
  • the difference between the maximum value and the minimum value is divided by the number of partitions to get the area radius.
  • the density area is the interval where the average delay is [47.5(60-12.5), 72.5(60+12.5)].
  • FIG. 4 is a schematic diagram of the curve of the average delay data collected in a small period according to an embodiment of the present invention.
  • the average delay of 60ms is only the maximum average delay of 60ms between the baseline ( ⁇ ) 72.5 and the baseline ( ⁇ ) 47.5, that is, the number of data in the density area is 0.
  • the maximum average delay of 60ms is deleted from the data group of this small period, and the maximum average delay is determined to be 40ms again.
  • the minimum delay is 10ms, and the average of the maximum and minimum average delay is 25ms.
  • the number of partitions is determined by the block_edge formula to be 4.
  • the density area is the interval where the average delay is [32.5, 47.5].
  • the maximum average delay is 40ms.
  • the average delay is 36ms, and the average delay is 40ms.
  • the delay is 36ms and the average delay is 35ms, that is, the number of data in the density area is 3.
  • the density threshold is:
  • the number of data 3 in the density area is greater than the density threshold 2.7708520116421, and the average delay 40ms is the boundary index value. It can also be called the upper boundary index value.
  • the density threshold can be calculated according to the number of data without the maximum deleted.
  • eps can be calculated using the deleted maximum value and the minimum value in the group of data.
  • the calculation method of the density threshold and the area radius is not specifically limited.
  • the operation and maintenance personnel can quickly identify whether the transaction delay data generated in the first major cycle can be applied by setting a preset ratio; for example, If the ratio of the time when no transaction is generated in a week to the time when a transaction occurs in a week is less than 15%, the boundary indicator value of the transaction delay can be calculated based on the transaction delay data of this week; or, for the corresponding daily in a week There will be very little data in the time period. For example, when the number of transactions is small at night or in the early morning, you can set the preset ratio of the time when no transaction is generated to the time when the transaction is generated according to the user's schedule and habits to determine the transaction delay. Whether the data can be used to calculate the boundary index value.
  • the maximum value in the group of data is taken as the boundary value of the group of data. That is to say, when the number of data in the small period is small or the data characteristics are special, and the boundary value cannot be calculated according to the preset boundary rules, the maximum value in the group of data is used as the boundary value of the group of data.
  • Step 203 Determine the boundary index value of the index to be detected in the first major period according to the boundary index value of each small period according to the boundary rule;
  • step 202 the boundary index value for each small period has been calculated, the boundary index value of each small period in the first large period is taken, and the boundary of the indicator to be detected in the first large period is calculated according to the boundary rules. Index value.
  • the boundary rule may be that the boundary value of a group of data is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum value is determined as the group The boundary value of the data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • the preset boundary rule may be that the boundary value of a set of data is to determine the density area from the minimum value in the data; if the number of data in the density area is greater than the density threshold, the minimum value is determined as the The boundary value of the group of data; otherwise, the minimum value is deleted from the group of data, and the step of determining the density area from the minimum value in the data is returned; the density threshold is set according to the data volume of the group of data.
  • a set of data is the boundary index value of each small period corresponding to the first large period; wherein, each small period may be all small periods in the first large period. For example, if the first major cycle is seven days a week, and the minor cycle is one hour, then the number of minor cycles included in the first major cycle is 7 ⁇ 24, corresponding to 168 boundary index values, then these 168 boundary index values are a group According to the data, the boundary index value of the first major period is calculated through these 168 boundary index values.
  • the small periods in the first large period may be the same time period. For example, the first major cycle is seven days a week, and the minor cycle is one hour.
  • the first major cycle includes 7 minor cycles from 0 o'clock to 1 o'clock, and 1 o'clock. 7 small cycles at -2 o'clock, 7 small cycles at 2 o'clock-3 o'clock...7 small cycles at 23 o'clock -24 o'clock. Therefore, the boundary index value of the 7 small periods at 0:00 -1 is a set of data, the boundary index value of the 7 small periods at 1:00 -2 is a set of data, and the 7 small periods at 2:3 am The boundary index value of is a set of data... The boundary index value of the 7 small periods from 23 am to 24 am is a set of data.
  • the boundary index value corresponding to 0 o'clock -1 in the first major period is determined, the boundary index value corresponding to 0 o'clock -1 in the first major period, and the boundary index corresponding to 1 o'clock -2 in the first major period.
  • Index value is the boundary index value corresponding to 23 o'clock -24 o'clock in the first major period. It can be so, for each period of the day in the first major cycle, the corresponding boundary index value of the first major cycle will be obtained. If the analysis determines that there is almost no transaction in the small period of a time period in a day, or the transaction volume is very small, then the boundary index value of the small period of the time period may not be calculated.
  • step 202 the method of determining the boundary value based on a set of data has been described in detail in step 202.
  • the set of data in the small period in step 202 is replaced with the set of data in the first large period mentioned above for calculation to determine the first period.
  • the boundary index value of a large period will not be repeated here.
  • the density area is determined by the following methods, including:
  • the density area is determined.
  • the number of each small cycle in the first major cycle can be determined.
  • Data characteristics such as quantity and square difference of data determine the function of calculating the density threshold.
  • the density threshold is determined by a power function, or the density threshold is determined by a logarithmic function.
  • the function of the density threshold calculated in the first large period may be the same as or different from the function of the density threshold calculated in the small period.
  • the calculation method of the density threshold is not specifically limited.
  • the density area is determined by the following methods, including: determining the number of partitions of the group of data according to the maximum and minimum values of the group of data; determining the area according to the maximum and minimum values of the group of data and the number of partitions Radius; taking the maximum value as the center and the area radius as the radius to determine the density area.
  • the method of determining the density area based on a set of data has been described in detail in step 202.
  • the set of data in the small period in step 202 is replaced with the boundary index value corresponding to each small period in the first large period.
  • the formed set of data is calculated to determine the density area of the first major period, which will not be repeated here.
  • the boundary index value of the small period (assuming the small period is the small period from 0:00 to 1:00 on Monday of the week) is 40ms. If each small period is the same time period, the small period boundary index value of 0:00-1:00 on Tuesday of the week is obtained in the same way, and the small period boundary index value of 0:00-1:00 on Wednesday of the week is 45ms, The index value of the small cycle boundary at 0:00-1:00 on mid-Thursday of the week is 47.5ms, the index value of the small cycle boundary at 0:00-1:00 on Friday of the week is 48ms, and the small cycle boundary at 0:00-1:00 on Saturday of the week.
  • the index value is 48.5ms, and the index value of the small period boundary at 0:00-1:00 on Sunday of the week is 55ms. In this group of data, the maximum value is 55ms, the minimum value is 40ms, and the average value is 47.5ms.
  • the density threshold can be calculated according to the number of data without the deleted maximum.
  • eps can be calculated using the deleted maximum value and the minimum value in the group of data.
  • the calculation method of the density threshold and the area radius is not specifically limited.
  • the maximum value in the group of data is taken as the boundary value of the group of data. That is to say, when the number of data in the first major period is small or the data characteristics are special, and the boundary value cannot be calculated according to the preset boundary rule, the maximum value in the group of data is used as the boundary value of the group of data.
  • Step 204 Determine the credible boundary index value of the first major period according to the boundary index value of the first major period
  • the boundary index value of the first major cycle can be used as the credible boundary index value of the first major cycle, or some simple calculations can be performed on the basis of the boundary index value of the first major cycle to obtain the credible boundary index value. Increase the reliability of the credible boundary index value.
  • determining the credible boundary index value of the first major period according to the boundary index value of the first major period includes:
  • up_boundary is the credible boundary index value of the first major cycle
  • up_p is the boundary index value of the first major cycle
  • K is the fluctuation coefficient
  • eps is the area radius, according to the small cycle in the first major cycle
  • the maximum value, minimum value of the boundary index value and the number of partitions are determined; the number of partitions is determined according to the number of data of the boundary index value of the small period in the first major period; base is determined according to the maximum value and The minimum value is determined.
  • the coefficient 0.1 can be adjusted accordingly according to the analysis of engineers and technicians, and it is just as a realization possibility here.
  • Step 205 The credible boundary index value of the first major period is used as a detection threshold for abnormal detection of the collected data in the second major period; the second major period is the period after the first major period.
  • the second major cycle may be a major cycle immediately after the first major cycle, for example, the first week of January is the first major cycle, and the second week of January is the second major cycle.
  • the second major cycle may be a major cycle that is not immediately after the first major cycle, for example, the first week of January is the first major cycle, and the fourth week of January is the second major cycle.
  • the second major cycle can be a major cycle that overlaps with the first major cycle. For example, Monday to Sunday in the first week of January is the first major cycle, and Tuesday to Sunday in the first week of January Adding the Monday of the second week of January is the second major cycle.
  • the credible boundary index value determined by using 168 data in the first major cycle can be used to detect the data generated at each moment (ms ⁇ s ⁇ min, etc.) in the second major cycle.
  • the credible boundary index value determined by the 7 data of the corresponding period in the first major cycle can be used to detect the data generated at each time (ms ⁇ s ⁇ min, etc.) of the corresponding period in the second major cycle.
  • the data in the corresponding time period in the second major period is greater than the credible boundary index value, it can be determined that the data is abnormal and triggers an alarm.
  • the credible boundary indicator of a certain period of a certain day in the first major cycle can be used to detect the data generated in the corresponding period of the second major cycle corresponding to this day; for example, 0- on the first day of the first major cycle
  • the credible boundary index at 1 o'clock is a, which can be used to detect the data generated at 0-1 on the first day of the second major cycle
  • the credible boundary index at 0-1 on the second day of the first major cycle is b. It can be used to detect the data generated at 0-1 on the second day of the second major cycle.
  • the cycle length of the first major cycle and the cycle length of the second major cycle can be the same or different.
  • the first major cycle is two days, the second major cycle is one day, or the first major cycle is One day, the second largest cycle is two days.
  • the data can be updated in time by shortening the period length of the first major cycle and the period length of the second major cycle; for example, the first major cycle is one day, the second major cycle is one day, and the data can be updated in a certain period of the first major cycle.
  • the signal boundary index value detects the data generated in the corresponding time period in the second largest period.
  • the time period of the first major cycle for determining the credible boundary index value is not specifically limited, and the time period of the second major cycle using the credible boundary index value to detect is not specifically limited, and it can be flexibly adjusted according to the characteristics of data fluctuations or needs. .
  • the first major cycle includes multiple small cycles, and the data in the first major cycle is collected to determine at least one index value in the first major cycle and the middle and small cycles; for each minor cycle the corresponding index value ,
  • the preset boundary rules obtain the boundary index value corresponding to each small period.
  • the index value of each small period can be selected through the boundary rule, and the boundary index value corresponding to each small period can be obtained, so that the boundary index value corresponding to each small period is more accurate.
  • the boundary index value of the first major period is obtained again according to the boundary rule through the boundary index value corresponding to each small period of the first major period, so that the boundary index value of the first major period is more accurate.
  • the credible boundary index value of the first major cycle is obtained according to the boundary index value of the first major cycle, and the credible boundary index value is used as the detection threshold of the data generated later; in this way, the boundary index of the first major cycle
  • relevant adjustments are made to obtain the credible boundary index value, so that the data collected in the first major cycle afterwards can fluctuate in a normal and reasonable range without triggering abnormalities and causing abnormal abnormalities.
  • the adaptive adjustment of the detection threshold is realized, the accuracy of the detection threshold is improved, and the accuracy of anomaly detection is further improved.
  • the embodiment of the present application provides yet another method flow of anomaly detection, as shown in FIG. 5, including:
  • Step 501 Set the first large period and each small period.
  • setting the first major period includes setting period size, period start time, period end time and other related parameters.
  • Setting each small period includes setting period size, period start time and period end time and other related parameters.
  • each small period is all small periods in the first large period, or each small period in the first large period is the same time period, etc. The relationship between each small period and the first large period can be flexibly set.
  • Step 502 Collect data in each small period.
  • determine the product or equipment that needs to be detected data determine the to-be-detected index of the product or equipment, and perform data collection on the to-be-detected index.
  • Step 503 According to the number of data in each small period, respectively calculate the density threshold corresponding to each small period.
  • Step 504 Determine the maximum value and the minimum value in the data of each small period respectively.
  • Step 505 Determine the number of partitions corresponding to the data in each small period according to the maximum value and the minimum value in the data of each small period, and determine the area radius of each small period according to the maximum value, minimum value and the number of partitions of each small period. , Determine the density area of each small period according to the maximum value of each small period and the area radius.
  • Step 506 Determine the number of data in the density area corresponding to the maximum value of each small period.
  • Step 507 Determine whether the number of data in the density area of each small period is greater than the density threshold. If not, proceed to step 508 to delete the maximum value in the small period corresponding to the number of data in the density area less than the density threshold from the small period, re-determine the maximum value, and perform steps 505, 506, and 507 until the small period The number of data in the density area of the cycle is greater than the density threshold. If yes, go to step 509.
  • Step 509 Use the maximum value corresponding to the density area when the number of data in the density area of the small period is greater than the density threshold as the boundary index value of the small period, and similarly obtain the boundary index value of each small period.
  • Step 510 Determine the number of the boundary index values of the small period in the first large period, and determine the density threshold.
  • Step 511 Determine the maximum value and the minimum value of the boundary index value of the small period in the first large period.
  • Step 512 Determine the number of partitions corresponding to the small period boundary index value data in the first major period according to the maximum value and the minimum value, determine the area radius according to the maximum value, minimum value and the number of partitions, and then determine the first major period according to the area radius The density area.
  • Step 513 Determine the number of data in the density area of the first major period.
  • Step 514 Determine whether the number of data in the density area of the first major period is greater than the density threshold of the first major period. If not, proceed to step 515 to delete the number of data in the density area less than the maximum value in the first major period corresponding to the density threshold from the data group in the first major period, re-determine the maximum value, and perform step 511, 512, 513, until the number of data in the density area of the first major period is greater than the density threshold. If yes, go to step 516.
  • Step 516 When the number of data in the density area of the first major period is greater than the density threshold, the maximum value of the density area corresponding to the first major period is used as the boundary index value of the first major period.
  • Step 517 Obtain the credible boundary index value of the first major cycle according to the boundary index value of the first major cycle.
  • Step 518 Monitor and detect the data of the second major cycle according to the credible boundary index value of the first major cycle, and detect whether the data of the second major cycle is abnormal.
  • step 501 and step 502 step 502 can be performed first, and then step 501 can be performed.
  • FIG. 6 is a schematic diagram of an abnormality detection device provided by an embodiment of the application, as shown in FIG. 6, including:
  • the collection unit 601 is configured to determine at least one index value of each small period of the index to be detected in the first large period for the collected data in the first large period;
  • the processing unit 602 is configured to determine the boundary index value of the indicator to be detected in each small period for at least one index value of each small period according to preset boundary rules; according to the boundary index value of each small period, according to The boundary rule determines the boundary index value of the index to be detected in the first major period;
  • the processing unit 602 is further configured to determine the credible boundary index value of the first major cycle according to the boundary index value of the first major cycle; the credible boundary index value of the first major cycle is used as a reference value A detection threshold for abnormality detection of collected data in two major periods; the second major cycle is a cycle after the first major cycle.
  • the boundary rule is that for a set of data, the boundary value is to determine the density area from the maximum value in the data; if the number of data in the density area is greater than the density threshold, the maximum The value is determined as the boundary value of the group of data; otherwise, the maximum value is deleted from the group of data, and the step of determining the density area from the maximum value in the data is returned; the density threshold is set according to the data volume of the group of data Certainly.
  • the density area is determined in the following ways, including:
  • the density area is determined.
  • processing unit 602 is specifically configured to calculate by the following formula:
  • up_boundary is the credible boundary index value of the first major cycle
  • up_p is the boundary index value of the first major cycle
  • K is the fluctuation coefficient
  • eps is the area radius, according to the small cycle in the first major cycle The maximum value, the minimum value and the number of partitions in the boundary index values of ?? And the minimum value is determined.
  • processing unit 602 is further configured to:
  • the maximum value in the group of data is used as the boundary value of the group of data.
  • the density threshold is set according to the data volume of the set of data, including:
  • each small period in the first large period is the same time period.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Debugging And Monitoring (AREA)

Abstract

L'invention concerne un procédé et un appareil de détection d'anomalie. Le procédé consiste à : en ce qui concerne les données collectées dans un premier cycle principal, déterminer au moins une valeur d'indice d'un indice à soumettre à une détection dans chaque cycle mineur du premier cycle principal ; en ce qui concerne au moins une valeur d'indice dans chaque cycle mineur, déterminer une valeur d'indice limite dudit indice dans chaque cycle mineur en fonction d'une règle de limite prédéfinie ; déterminer une valeur d'indice limite dudit indice dans le premier cycle principal en fonction de la règle de limite et sur la base de la valeur d'indice limite dans chaque cycle mineur ; déterminer une valeur d'indice limite de confiance dans le premier cycle principal en fonction de la valeur d'indice limite dans le premier cycle principal ; et considérer la valeur d'indice limite de confiance dans le premier cycle principal comme valeur de seuil de détection pour effectuer une détection d'anomalie sur des données collectées dans un second cycle principal, le second cycle principal étant le cycle suivant le premier cycle principal. Le procédé permet d'effectuer un réglage adaptatif d'une valeur de seuil de détection, et d'améliorer la précision de la valeur de seuil de détection et de la détection d'anomalie.
PCT/CN2021/080564 2020-03-19 2021-03-12 Procédé et appareil de détection d'anomalie WO2021185182A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010196303.0A CN111400141B (zh) 2020-03-19 2020-03-19 一种异常检测的方法及装置
CN202010196303.0 2020-03-19

Publications (1)

Publication Number Publication Date
WO2021185182A1 true WO2021185182A1 (fr) 2021-09-23

Family

ID=71432698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080564 WO2021185182A1 (fr) 2020-03-19 2021-03-12 Procédé et appareil de détection d'anomalie

Country Status (2)

Country Link
CN (1) CN111400141B (fr)
WO (1) WO2021185182A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301803A (zh) * 2021-12-24 2022-04-08 北京百度网讯科技有限公司 网络质量检测方法、装置、电子设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400141B (zh) * 2020-03-19 2021-11-09 深圳前海微众银行股份有限公司 一种异常检测的方法及装置
CN117130819B (zh) * 2023-10-27 2024-01-30 江西师范大学 一种基于时延方差和相关系数值的微服务故障诊断方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871190A (zh) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 一种业务指标监控方法及装置
CN109213654A (zh) * 2018-07-05 2019-01-15 北京奇艺世纪科技有限公司 一种异常检测方法及装置
CN109558295A (zh) * 2018-11-15 2019-04-02 新华三信息安全技术有限公司 一种性能指标异常检测方法及装置
US10509695B1 (en) * 2015-03-30 2019-12-17 ThetaRay Ltd. System and method for anomaly detection in dynamically evolving data using low rank matrix decomposition
CN111400141A (zh) * 2020-03-19 2020-07-10 深圳前海微众银行股份有限公司 一种异常检测的方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7076695B2 (en) * 2001-07-20 2006-07-11 Opnet Technologies, Inc. System and methods for adaptive threshold determination for performance metrics
CN105406991A (zh) * 2015-10-26 2016-03-16 上海华讯网络系统有限公司 基于网络监控指标由历史数据生成业务阈值的方法及系统
CN107153593B (zh) * 2016-03-02 2021-01-05 创新先进技术有限公司 一种互联网业务监控阈值的确定方法及装置
CN106557401A (zh) * 2016-10-13 2017-04-05 中国铁道科学研究院电子计算技术研究所 一种it设备监控指标的动态阈值设定方法及系统
CN110189228A (zh) * 2019-06-24 2019-08-30 深圳前海微众银行股份有限公司 一种监测异常交易的方法和装置
CN110377491A (zh) * 2019-07-10 2019-10-25 中国银联股份有限公司 一种数据异常检测方法及装置
CN110890998B (zh) * 2019-11-06 2021-08-27 厦门网宿有限公司 一种确定阈值的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10509695B1 (en) * 2015-03-30 2019-12-17 ThetaRay Ltd. System and method for anomaly detection in dynamically evolving data using low rank matrix decomposition
CN107871190A (zh) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 一种业务指标监控方法及装置
CN109213654A (zh) * 2018-07-05 2019-01-15 北京奇艺世纪科技有限公司 一种异常检测方法及装置
CN109558295A (zh) * 2018-11-15 2019-04-02 新华三信息安全技术有限公司 一种性能指标异常检测方法及装置
CN111400141A (zh) * 2020-03-19 2020-07-10 深圳前海微众银行股份有限公司 一种异常检测的方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301803A (zh) * 2021-12-24 2022-04-08 北京百度网讯科技有限公司 网络质量检测方法、装置、电子设备及存储介质
CN114301803B (zh) * 2021-12-24 2024-03-08 北京百度网讯科技有限公司 网络质量检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111400141A (zh) 2020-07-10
CN111400141B (zh) 2021-11-09

Similar Documents

Publication Publication Date Title
WO2021185182A1 (fr) Procédé et appareil de détection d'anomalie
CN107066365B (zh) 一种系统异常的监测方法及装置
US7421460B2 (en) Method for determining execution of backup on a database
CN103354924B (zh) 用于监视性能指标的方法和系统
CN111984503A (zh) 一种监控指标数据异常数据识别的方法及装置
CN112433919B (zh) 一种信息告警方法、设备及存储介质
CN111045894B (zh) 数据库异常检测方法、装置、计算机设备和存储介质
CN108874535B (zh) 一种任务调节方法、计算机可读存储介质及终端设备
CN104811344A (zh) 网络动态业务监控方法及装置
CN112527598B (zh) 监控数据的方法、装置、设备、存储介质和程序产品
TWI700578B (zh) 異常檢測的方法及裝置
EP4343554A1 (fr) Procédé et appareil de surveillance de système
CN112365070B (zh) 一种电力负荷预测方法、装置、设备及可读存储介质
US9116804B2 (en) Transient detection for predictive health management of data processing systems
JP5219783B2 (ja) 不正アクセス検知装置及び不正アクセス検知プログラム及び記録媒体及び不正アクセス検知方法
CN110855484B (zh) 自动检测业务量变化的方法、系统、电子设备和存储介质
CN112506901A (zh) 一种数据质量测量方法、装置及介质
CN112685390B (zh) 数据库实例管理方法及装置、计算设备
CN115168154A (zh) 一种基于动态基线的异常日志检测方法、装置及设备
CN111309716A (zh) 应用于pas案例库的维护方法、装置及计算机设备
CN114513441B (zh) 基于区块链的系统维护方法、装置、设备及存储介质
US11829226B2 (en) Anomaly detection apparatus, anomaly detection method, and anomaly detection program
US20240020545A1 (en) Selecting forecasting algorithms using motifs
CN115189350A (zh) 电网用户侧的实时电价攻击检测方法、装置及存储介质
CN117575802A (zh) 金融系统的升级监控方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21771452

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21771452

Country of ref document: EP

Kind code of ref document: A1