CN112905419A - Index data monitoring threshold range determination method and device and readable storage medium - Google Patents

Index data monitoring threshold range determination method and device and readable storage medium Download PDF

Info

Publication number
CN112905419A
CN112905419A CN202110232174.0A CN202110232174A CN112905419A CN 112905419 A CN112905419 A CN 112905419A CN 202110232174 A CN202110232174 A CN 202110232174A CN 112905419 A CN112905419 A CN 112905419A
Authority
CN
China
Prior art keywords
threshold
probability
index data
threshold range
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110232174.0A
Other languages
Chinese (zh)
Other versions
CN112905419B (en
Inventor
潘建宁
郑健彦
郭销淳
毛茂德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN202110232174.0A priority Critical patent/CN112905419B/en
Publication of CN112905419A publication Critical patent/CN112905419A/en
Application granted granted Critical
Publication of CN112905419B publication Critical patent/CN112905419B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents

Abstract

The embodiment of the application provides a method and a device for determining an index data monitoring threshold range and a readable storage medium, and relates to the technical field of data monitoring, wherein the method for determining the index data monitoring threshold range comprises the following steps: the method comprises the steps of calculating the respective probabilities of a plurality of index parameters included in index data in a preset effective threshold range, and calculating the monitoring threshold range of the index data according to the respective probabilities of the index parameters in the effective threshold range and a preset standard threshold probability.

Description

Index data monitoring threshold range determination method and device and readable storage medium
Technical Field
The application relates to the technical field of data monitoring, in particular to a method and a device for determining an index data monitoring threshold range and a readable storage medium.
Background
At present, in data monitoring of an enterprise in each scene, the state of various data is often determined by setting a threshold range, for example, interface success rate, interface delay, number of online users of a website, or CPU usage rate, memory usage rate, etc. related to equipment and facilities, and too low or too high data needs to be accurately monitored, so that the enterprise can respond in time and make a corresponding decision. In the prior art, a static threshold anomaly detection scheme or a dynamic threshold anomaly detection scheme is usually adopted, for static threshold anomaly detection, along with the increase of user requirements, the increase of equipment functions and the diversification of service scene production environments, it is a tedious matter to select a proper threshold for each index, even if the same index is adopted, the threshold needs to be continuously adjusted to adapt to the change of the index, and for dynamic threshold anomaly detection, the static threshold is avoided to be set based on normal distribution, but the application range of the data is limited by the strong assumption that the data obeys the normal distribution.
In view of this, it is necessary for those skilled in the art to provide a threshold range determination scheme with higher adaptability.
Disclosure of Invention
The application provides a method and a device for determining an index data monitoring threshold range and a readable storage medium.
The embodiment of the application can be realized as follows:
in a first aspect, the present application provides a method for determining an index data monitoring threshold range, including:
calculating respective probabilities of a plurality of index parameters included in the index data within a preset effective threshold range, wherein the index parameters form the index data according to the time sequence characteristics, and the index data are used for representing service data monitored by user demands;
and calculating to obtain the monitoring threshold range of the index data according to the respective probabilities of the index parameters in the effective threshold range and the preset standard threshold probability.
In an alternative embodiment, calculating respective probabilities of a plurality of index parameters included in the index data within a preset effective threshold range includes:
calculating a Gaussian mixture parameter of the index data based on a hidden variable estimation algorithm;
determining a Gaussian mixture distribution probability density function of the index data according to the Gaussian mixture parameters;
and calculating the probability of each index parameter in the effective threshold range according to the Gaussian mixture distribution probability density function.
In an optional embodiment, the calculating, according to respective probabilities of the plurality of index parameters within the effective threshold range and a preset standard threshold probability, a monitoring threshold range of the index data includes:
constructing a reference threshold range model according to the respective probabilities of the index parameters in the effective threshold range, the mean value of the index parameters and the standard deviations of the index parameters, wherein the reference threshold range model is used for calculating the probability of the index data in the reference threshold range, the upper bound of the reference threshold range is the same as that of the effective threshold range, and the lower bound of the reference threshold range is negative infinity;
and calculating to obtain the monitoring threshold range of the index data according to the reference threshold range model and the standard threshold probability.
In an optional embodiment, the calculating the monitoring threshold range of the index data according to the reference threshold range model and the standard threshold probability includes:
calculating according to a reference threshold range model to obtain a first to-be-determined reference threshold probability and a second to-be-determined reference threshold probability, wherein the first to-be-determined reference threshold probability represents the probability of the index data in the first to-be-determined reference range, and the second to-be-determined reference threshold probability represents the probability of the index data in the second to-be-determined reference range;
determining an upper bound threshold of the index data according to the first to-be-determined reference threshold probability and the standard threshold probability, wherein the standard threshold probability represents the probability of the index data in a standard reference range;
determining a lower bound threshold of the index data according to the second to-be-determined reference threshold probability and the standard threshold probability;
and taking the upper bound threshold value and the lower bound threshold value as a monitoring threshold range of the index data.
In an alternative embodiment, the calculating the first to-be-determined reference threshold probability according to the reference threshold range model includes:
calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data;
and substituting the binary search initial value into a reference threshold range model to calculate to obtain a first to-be-determined reference threshold probability.
In an optional embodiment, determining an upper threshold of the index data according to the first to-be-determined reference threshold probability and the standard threshold probability includes:
judging whether the difference value between the first to-be-determined reference threshold value probability and the standard threshold value probability is smaller than a preset difference value threshold value or not;
if so, taking the first to-be-determined reference threshold probability as an upper bound threshold of the index data;
and if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the first to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the first to-be-determined reference threshold probability as an upper bound threshold of the index data.
In an optional embodiment, the calculating the second reference threshold probability to be determined according to the reference threshold range model includes:
calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data;
and substituting the binary search initial value into the adjusted reference threshold range model to calculate to obtain a second to-be-determined reference threshold probability.
In an optional embodiment, determining a lower bound threshold of the index data according to the second to-be-determined reference threshold probability and the standard threshold probability includes:
judging whether the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference value threshold value or not;
if so, taking the second to-be-determined reference threshold probability as a lower bound threshold of the index data;
and if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the second to-be-determined reference threshold probability as a lower bound threshold of the index data.
In a second aspect, the present application provides an index data monitoring threshold range determination apparatus, including:
the calculation module is used for calculating the respective probabilities of a plurality of index parameters included in the index data within a preset effective threshold range, the index parameters form the index data according to the time sequence characteristics, and the index data is used for representing the service data monitored by the user demand;
and the determining module is used for calculating the monitoring threshold range of the index data according to the respective probabilities of the index parameters in the effective threshold range and the preset standard threshold probability.
In a third aspect, the present application provides a computer device, where the computer device includes a processor and a non-volatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device executes the method for determining the indicator data monitoring threshold range according to any one of the foregoing embodiments.
In a fourth aspect, the present application provides a readable storage medium, where the readable storage medium includes a computer program, and the computer program controls a computer device in the readable storage medium to execute the method for determining the index data monitoring threshold range according to any one of the foregoing embodiments when the computer program runs.
The beneficial effects of the embodiment of the application include, for example: by adopting the method, the device and the readable storage medium for determining the index data monitoring threshold range, the index data is formed by a plurality of index parameters according to the time sequence characteristics by calculating the respective probabilities of the index parameters included in the index data in the preset effective threshold range, and the index data is used for representing the service data required to be monitored by the user; and then calculating the monitoring threshold range of the index data according to the respective probabilities of the index parameters in the effective threshold range and the preset standard threshold probability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic diagram of domain name request quantity for a user to access a live broadcast platform according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a fitting result of threshold range monitoring based on k-sigam according to an embodiment of the present disclosure;
fig. 3 is a schematic block diagram of a structure of an index data monitoring threshold range determination system according to an embodiment of the present application;
fig. 4 is a schematic block diagram of another structure of an index data monitoring threshold range determination system according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating a step of a method for determining a monitoring threshold range of index data according to an embodiment of the present disclosure;
fig. 6 is a flowchart illustrating another step of a method for determining a monitoring threshold range of index data according to an embodiment of the present application;
fig. 7 is a schematic flowchart illustrating another step of a method for determining a monitoring threshold range of index data according to an embodiment of the present application;
fig. 8 is a schematic diagram of a fitting result of the index data monitoring threshold range determination method provided in the embodiment of the present application;
fig. 9 is a block diagram schematically illustrating a structure of an index data monitoring threshold range determining apparatus according to an embodiment of the present application;
fig. 10 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.
At present, in order to clearly plan the strategy of each service, an enterprise generally monitors the index data of the relevant service, and the basis for monitoring is generally determined according to a set threshold range, and a corresponding strategy is provided when the index data is too high or too low. In the prior art, two schemes exist for determining the threshold range of the index data.
The first is static threshold anomaly detection, which typically sets a fixed threshold based on, i.e., empirically, and considers an indicator to be anomalous when its value exceeds (or falls below) the threshold. However, the indexes in the production environment are thousands of, and it is a tedious matter to select an appropriate threshold for each index. Moreover, even if the same index is used, the threshold value needs to be continuously adjusted to adapt to the change of the index.
The second is dynamic threshold anomaly detection, such as k-sigma anomaly detection, in the conventional dynamic threshold anomaly detection scheme, a data service normal distribution is assumed, a mean value and a standard deviation (sigma) of data are calculated, when an index value is within k times of the standard deviation of the mean value, the index is considered to be normal, otherwise, the index is considered to be abnormal.
For example, referring to fig. 1 and fig. 2 in combination, fig. 1 shows a domain name request amount of a user accessing a live broadcast platform, where an abscissa is time and a unit is minute, that is, data within 10000 minutes is collected, and an ordinate is a domain name request amount, where it is known that a normal data range (Threshold) is less than "150000", the k-sigma anomaly detection may be used for calculation, and referring to fig. 2, an area of a probability density curve in a certain interval is a probability that an index value is in the interval, and a fitting result calculated based on the k-sigma anomaly detection does not really reflect a probability of the index data, and an obtained normal data range is "100000" and is much less than a real normal data range "150000", because the provided index data, that is, a domain name request amount is a multimodal data, this method avoids setting a static Threshold, but a strong assumption (unimodal) that data obeys a distribution also limits an applicable range thereof, so that it cannot reflect the real situation in all scenarios and the upper bound of the finally determined threshold range is not accurate.
In order to solve the above-mentioned problem, please refer to fig. 3 in combination, fig. 3 is a system for determining an index data monitoring threshold range according to an embodiment of the present application, where the system for determining an index data monitoring threshold range may include a computer device 100 and a plurality of terminal devices 200, a user performs a related service operation through the terminal devices 200, and the computer device 100 may monitor a related service according to index data related to the service operation.
Referring to fig. 4 in combination, fig. 4 is a system for determining an index data monitoring threshold range provided in the embodiment of the present application, and it should be understood that the system for determining an index data monitoring threshold range may further include a server 300, where the server 300 is in communication with a plurality of terminal devices 200, and a huge amount of data may cause a large pressure on the server 300, so that the computer device 100 may be separately connected to the server 300 for processing the service data received by the server 300 and performing threshold monitoring on the relevant index data. In other embodiments of the present application, the index data monitoring threshold range determination system may also be composed of more or fewer parts, which is not limited herein.
Referring to fig. 5 in conjunction, fig. 5 is a schematic flowchart illustrating steps of a method for determining a threshold range of index data monitoring provided in an embodiment of the present application, where the method is implemented by the computer device 100 in fig. 3, and the method for determining a threshold range of index data monitoring is described in detail below.
Step 201, calculating respective probabilities of a plurality of index parameters included in the index data within a preset effective threshold range.
The index parameters form index data according to the time sequence characteristics, and the index data is used for representing service data monitored by user demands.
Step 202, calculating to obtain a monitoring threshold range of the index data according to respective probabilities of the index parameters in the effective threshold range and a preset standard threshold probability.
As described above, the index data is composed of a plurality of index parameters with time-series characteristics, for example, when the index data is the domain name request amount of the live broadcast platform, the plurality of index parameters are the number of domain name requests made by the user through the terminal device 200 in a preset time period. The index data may also be the number of online users of the live broadcast platform, and the index parameters are the number of users logging in the live broadcast platform through the terminal device 200 and watching online in a preset time period. The standard threshold probability can be predetermined and used for representing the normal probability of the index data, and the monitoring threshold range of the index data can be determined on the basis of the standard threshold probability and the probability of each index parameter in the effective threshold range.
Through the steps, the monitoring threshold range of the index data is determined on the basis of determining the respective probabilities of the index parameters in the preset effective threshold range, so that the probability distribution of each index parameter can be truly reflected, and the threshold range of the index data can be more adaptively determined no matter whether the index data is multimodal data or unimodal data.
On the basis of the foregoing, please refer to fig. 6 in combination, in order to more clearly describe the scheme provided by the present application, the foregoing step 201 may be implemented by the following detailed example.
Substep 201-1, a gaussian mixture parameter of the index data is calculated based on a hidden variable estimation algorithm.
And a substep 201-2 of determining a Gaussian mixture distribution probability density function of the index data according to the Gaussian mixture parameters.
And a substep 201-3 of calculating the probability of each index parameter in the effective threshold range according to the Gaussian mixture distribution probability density function.
It should be noted that the probability of each index parameter in the effective threshold range provided in the embodiment of the present application may be obtained by calculating a gaussian mixture distribution probability density function, and in order to obtain the gaussian mixture distribution probability density function corresponding to the index data, an EM algorithm (Expectation-Maximization, abbreviated as hidden variable estimation algorithm) needs to be used for estimation to obtain the gaussian mixture parameter of the index data.
Alternatively, reference may be made to the following gaussian mixture model:
Figure BDA0002958940620000111
wherein p (x) is a Gaussian mixture distribution probability density function; k is the number of components of the Gaussian mixture model, and K is 4; pik、μk、ΣkThe Gaussian mixture parameters which are index data are obtained by estimating a Gaussian mixture model by using an EM (effective electromagnetic) algorithm; n is the number of numerical integration cells, N is 10000, x is index data, x is (x)1,x2,x3,…,xn),xnIs the nth index parameter.
In order to more clearly describe the scheme provided by the present application, reference may be made to the foregoing example again, that is, the index data is a domain name request amount for the live broadcast platform, and p (x) corresponds to a gaussian mixture distribution probability density function of the domain name request amount, where N is time, as before, data within 10000 minutes is taken, and in the case where N is 10000, that is, 1 minute is an interval.
In addition, referring to fig. 7, as an alternative embodiment, the aforementioned step 202 can be implemented by the following steps.
Substep 202-1, constructing a reference threshold range model according to respective probabilities of the plurality of index parameters within the effective threshold range, the mean of the plurality of index parameters, and the standard deviations of the plurality of index parameters.
The reference threshold range model is used for calculating the probability of the index data in the reference threshold range, the upper bound of the reference threshold range is the same as the upper bound of the effective threshold range, and the lower bound of the reference threshold range is negative infinity.
And a substep 202-2 of calculating a monitoring threshold range of the index data according to the reference threshold range model and the standard threshold probability.
Based on the foregoing, the respective probabilities of the index parameters within the effective threshold range are determined by a gaussian mixture distribution probability density function p (x), based on which the gaussian mixture distribution probability density function p (x) can be numerically integrated to determine the respective probabilities of the index parameters within the effective threshold range, and the following formula can be referred to:
Figure BDA0002958940620000121
wherein p (x | a ≦ x ≦ b) is the probability of each of the plurality of index parameters within the effective threshold range; interval [ a, b]Is a valid threshold range; t is tiIs the index data interval number i,
Figure BDA0002958940620000122
i=0,1,2,…,N;f(ti) For index parameters corresponding to index data interval number iNumerical integration.
Based on the above, a reference threshold range model can be constructed according to the mean value of the index parameters and the standard deviation of the index parameters:
p(x|x≤b)=p(x|x<μ-10σ)+p(x|μ-10σ≤x≤b)
≈p(x|μ-10σ≤x≤b)
where μ is the mean of the index data and σ is the standard deviation of the index data.
In combination with the foregoing example, that is, when the index data is the domain name request amount of the user accessing the live broadcast platform, the valid threshold range [ a, b ] may be set to [0, 17500], μ is the mean value of all domain name request amounts, and σ is the standard deviation of all domain name request amounts.
In order to more clearly describe the scheme provided by the present application, the foregoing sub-step 202-2 can be implemented by the following specific embodiments.
(1) And calculating to obtain a first to-be-determined reference threshold probability and a second to-be-determined reference threshold probability according to the reference threshold range model.
The first to-be-determined reference threshold probability characterizes the probability of the index data in the first to-be-determined reference range, and the second to-be-determined reference threshold probability characterizes the probability of the index data in the second to-be-determined reference range.
(2) And determining an upper bound threshold of the index data according to the first to-be-determined reference threshold probability and the standard threshold probability.
The standard threshold probability characterizes the probability of the index data being in a standard reference range.
(3) And determining a lower bound threshold of the index data according to the second reference threshold probability to be determined and the standard threshold probability.
(4) And taking the upper bound threshold value and the lower bound threshold value as a monitoring threshold range of the index data.
On the basis of the foregoing, the standard threshold probability characterizes the probability of the index data being in the standard reference range, i.e., the preset probability of the data being normal, for example, the probability of the data being normal is 99.96%, and the standard reference range can characterize the first 99.96% of the data in all index parameters as being normal.
For the (1) portion in the foregoing sub-step 202-2, implementation can be performed in the following manner.
And (I) calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data.
And (II) substituting the binary search initial value into a reference threshold range model to calculate to obtain a first to-be-determined reference threshold probability.
In order to describe the scheme more clearly, the mode of implementing the threshold range in the scheme may be based on binary search, and on this basis, a binary search initial value may be obtained by calculation according to a mean value and a standard deviation of the index data: l ═ μ -10 σ, u ═ μ +10 σ, m ═ l + u)/2, and l and u are binary search coefficients; m is an initial value of binary search, and the initial iteration number t may be set to 0. The binary search initial value m may be substituted into the reference threshold range model to calculate a first to-be-determined reference threshold probability, optionally, please refer to the formula:
pm1=p(x|x≤m)
wherein p ism1For the first reference threshold probability to be determined, the way p (x | x ≦ m) is calculated can be referred to the aforementioned reference threshold range model.
On this basis, as for the part (2) in the foregoing sub-step 202-2, implementation can be performed in the following manner.
And (I) judging whether the difference value between the first to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference value threshold.
And (II) if so, taking the first to-be-determined reference threshold probability as an upper threshold of the index data.
And (III) if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the first to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the first to-be-determined reference threshold probability as an upper bound threshold of the index data.
On the basis of the foregoing, preset conditions may be set: | p (x | x ≦ Tu)-p0| ≦ E, wherein TuTo an upper bound threshold, p0Is the standard threshold probability, e is the preset differenceValue threshold, it being understood that T may be considered to be when the above-mentioned preset condition is metu=pm1That is, it can be determined whether the difference between the first to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, i.e. | pm1-p0|<∈。
If pm1-p0If | <eholds, then the first to-be-determined reference threshold probability p calculated at this time is usedm1Upper bound threshold T as indicator datau
If pm1-p0If | < ∈ does not hold, then a binary search can be performed for the first to-be-determined reference threshold probability, in the following manner:
if p ism1>p0If u is equal to m, m is equal to (l + u)/2, and it is determined whether | p is satisfiedm1-p0If not, then continue the iterative computation.
If p ism1<p0If l is equal to m, m is equal to (l + u)/2, and it is determined whether | p is satisfiedm1-p0If not, then continue the iterative computation.
It should be understood that the upper bound threshold is determined by the binary search, and by using the adaptive characteristic, the method can be applied to scenes with huge data quantity, such as large data.
Accordingly, the following detailed description is also possible in the aforementioned sub-step 202-2 (1).
And thirdly, calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data.
And (IV) substituting the binary search initial value into the adjusted reference threshold range model to calculate to obtain a second to-be-determined reference threshold probability.
Similarly, the binary search initial value may be obtained in the same manner as described above, and it should be understood that, since the second to-be-determined reference threshold probability corresponding to the lower bound threshold is calculated, substituting the binary search initial value into the adjusted reference threshold range model may be represented as:
pm2=1-p(x|x≤m)
wherein p ism2A reference threshold probability is determined for the second to-be-determined. The specific calculation principle may refer to the aforementioned calculation principle of the first to-be-determined reference threshold probability.
On this basis, as for the part (3) in the foregoing sub-step 202-2, implementation can be performed in the following manner.
And (I) judging whether the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference value threshold.
And (II) if so, taking the second reference threshold probability to be determined as a lower threshold of the index data.
And (III) if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the second to-be-determined reference threshold probability as a lower bound threshold of the index data.
Accordingly, preset conditions may be set: | p (x | x)>Tl)-p0| ≦ E, wherein TlIs the lower threshold of the index data. It should be understood that when the above preset condition is satisfied, T can be considered asl=pm2That is, it can be determined whether the difference between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, i.e. | pm2-p0|<∈。
If pm2-p0If | <eis satisfied, then calculating the probability p of the second reference threshold to be determined obtained at the momentm2Lower bound threshold T as indicator datal
If pm2-p0If | < ∈ does not hold, then a binary search can be performed for the second reference threshold probability to be determined, in the following manner:
if p ism>p0If l is equal to m, m is equal to (l + u)/2, and it is determined whether | p is satisfiedm1-p0If the element is less than the left element, continuing iterative computation;
if p ism<p0If u is equal to m, m is equal to (l + u)/2, and it is determined whether | p is satisfiedm1-p0If | <e, continue iteration if not satisfiedAnd (4) calculating.
It should be understood that, in the foregoing process of calculating the upper threshold and the lower threshold of the index data, the maximum iteration number may be set, and in the binary search process, if the maximum iteration number is reached, the set condition still cannot be met, the current result may also be taken as the target result, so as to improve the calculation efficiency and reduce the calculation pressure of the computer device 100, for example, in the process of calculating the upper threshold of the index data, the maximum iteration number of the binary search is set to 50, and when the iteration is performed to 50, if the iteration is performed to | pm1-p0If I < ∈ does not hold, then the current pm1As an upper bound threshold TuAnd the binary search process is stopped.
In order to more clearly describe the scheme provided by the present application, please refer to fig. 1 again, and refer to fig. 8 in combination, according to the domain name request quantity provided in fig. 1 for the user to access the live broadcast platform, the output calculation is performed according to the scheme provided by the present application, and the result in fig. 8 can be obtained, as can be clearly seen from fig. 8, the fitting result is consistent with the real data, the calculated normal data range is also "150000", and subsequently, the theoretical threshold value, i.e., the upper threshold value and the lower threshold value, which is reasonable for the domain name request quantity for the user to access the live broadcast platform can be obtained by using the foregoing scheme.
Through the scheme, different from the manually set static threshold, the threshold in the embodiment of the application is obtained based on data distribution calculation, namely, the threshold is self-adaptive and does not need to be manually set, so that the problem existing in the static threshold detection scheme is solved. It should be noted that, in addition to the foregoing solutions, the method for determining the index data monitoring threshold range provided in the embodiment of the present application is also completely applicable to other production environment scenarios, such as CPU utilization monitoring, memory utilization monitoring, website online people monitoring, and the like, and is not limited herein.
An embodiment of the present application provides an index data monitoring threshold range determining device 110, please refer to fig. 9 in combination, where the index data monitoring threshold range determining device 110 includes:
the calculating module 1101 is configured to calculate respective probabilities of a plurality of index parameters included in the index data within a preset effective threshold range, where the plurality of index parameters form the index data according to the time sequence characteristics, and the index data is used to represent service data monitored by user requirements.
The determining module 1102 is configured to calculate a monitoring threshold range of the index data according to respective probabilities of the plurality of index parameters within the effective threshold range and a preset standard threshold probability.
Further, the calculation module 1101 is specifically configured to:
calculating a Gaussian mixture parameter of the index data based on a hidden variable estimation algorithm; determining a Gaussian mixture distribution probability density function of the index data according to the Gaussian mixture parameters; and calculating the probability of each index parameter in the effective threshold range according to the Gaussian mixture distribution probability density function.
Further, the determining module 1102 is specifically configured to:
constructing a reference threshold range model according to the respective probabilities of the index parameters in the effective threshold range, the mean value of the index parameters and the standard deviations of the index parameters, wherein the reference threshold range model is used for calculating the probability of the index data in the reference threshold range, the upper bound of the reference threshold range is the same as that of the effective threshold range, and the lower bound of the reference threshold range is negative infinity; and calculating to obtain the monitoring threshold range of the index data according to the reference threshold range model and the standard threshold probability.
Further, the determining module 1102 is further specifically configured to:
calculating according to a reference threshold range model to obtain a first to-be-determined reference threshold probability and a second to-be-determined reference threshold probability, wherein the first to-be-determined reference threshold probability represents the probability of the index data in the first to-be-determined reference range, and the second to-be-determined reference threshold probability represents the probability of the index data in the second to-be-determined reference range; determining an upper bound threshold of the index data according to the first to-be-determined reference threshold probability and the standard threshold probability, wherein the standard threshold probability represents the probability of the index data in a standard reference range; determining a lower bound threshold of the index data according to the second to-be-determined reference threshold probability and the standard threshold probability; and taking the upper bound threshold value and the lower bound threshold value as a monitoring threshold range of the index data.
Further, the determining module 1102 is further specifically configured to:
calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data; and substituting the binary search initial value into a reference threshold range model to calculate to obtain a first to-be-determined reference threshold probability.
Further, the determining module 1102 is further specifically configured to:
judging whether the difference value between the first to-be-determined reference threshold value probability and the standard threshold value probability is smaller than a preset difference value threshold value or not; if so, taking the first to-be-determined reference threshold probability as an upper bound threshold of the index data; and if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the first to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the first to-be-determined reference threshold probability as an upper bound threshold of the index data.
Further, the determining module 1102 is further specifically configured to:
calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data; and substituting the binary search initial value into the adjusted reference threshold range model to calculate to obtain a second to-be-determined reference threshold probability.
Further, the determining module 1102 is further specifically configured to:
judging whether the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference value threshold value or not; if so, taking the second to-be-determined reference threshold probability as a lower bound threshold of the index data; and if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the second to-be-determined reference threshold probability as a lower bound threshold of the index data.
The embodiment of the present application provides a computer device 100, where the computer device 100 includes a processor and a non-volatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device 100 executes the above method for determining the indicator data monitoring threshold range. As shown in fig. 10, fig. 10 is a block diagram of a computer device 100 according to an embodiment of the present application. The computer apparatus 100 includes an index data monitoring threshold range determination device 110, a memory 111, a processor 112, and a communication unit 113.
To facilitate the transfer or interaction of data, the elements of the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other, directly or indirectly. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The indicator data monitoring threshold range determining means 110 includes at least one software functional module which can be stored in the memory 111 in the form of software or firmware (firmware) or is fixed in an Operating System (OS) of the computer device 100. The processor 112 is used for executing executable modules stored in the memory 111, such as software functional modules and computer programs included in the index data monitoring threshold range determination device 110.
The embodiment of the present application provides a readable storage medium, where the readable storage medium includes a computer program, and when the computer program runs, the computer device 100 where the readable storage medium is located is controlled to execute the foregoing method for determining the indicator data monitoring threshold range.
In summary, the embodiments of the present application provide a method and an apparatus for determining an index data monitoring threshold range, and a readable storage medium, which are based on a gaussian mixture model, and have strong distribution fitting capability and a wider application range; the adaptive threshold avoids tedious manual threshold setting; numerical integration solves the calculation problem of the cumulative distribution function; the binary search greatly improves the search speed of the threshold value, and makes the large-scale use of the algorithm possible. The method is applied to real-time anomaly detection of hundreds of key business indexes in a company, avoids complicated manual threshold setting, can timely reflect index distribution change due to regular automatic updating of the threshold, achieves detection accuracy and recall rate of more than 85%, and ensures system stability.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A method for determining an index data monitoring threshold range is characterized by comprising the following steps:
calculating respective probabilities of a plurality of index parameters included in the index data within a preset effective threshold range, wherein the index parameters form the index data according to time sequence characteristics, and the index data are used for representing service data monitored by user demands;
and calculating to obtain the monitoring threshold range of the index data according to the respective probabilities of the index parameters in the effective threshold range and a preset standard threshold probability.
2. The method of claim 1, wherein calculating the respective probabilities of the plurality of metric parameters included in the metric data within the preset valid threshold comprises:
calculating a Gaussian mixture parameter of the index data based on a hidden variable estimation algorithm;
determining a Gaussian mixture distribution probability density function of the index data according to the Gaussian mixture parameters;
and calculating the probability of each index parameter in the effective threshold range according to the Gaussian mixture distribution probability density function.
3. The method according to claim 1, wherein the calculating a monitoring threshold range of the index data according to the respective probabilities of the index parameters within the effective threshold range and a preset standard threshold probability includes:
constructing a reference threshold range model according to respective probabilities of the index parameters within the effective threshold range, a mean value of the index parameters and standard deviations of the index parameters, wherein the reference threshold range model is used for calculating the probability of the index data within a reference threshold range, an upper bound of the reference threshold range is the same as an upper bound of the effective threshold range, and a lower bound of the reference threshold range is minus infinity;
and calculating to obtain the monitoring threshold range of the index data according to the reference threshold range model and the standard threshold probability.
4. The method of claim 3, wherein calculating a monitoring threshold range for the metric data based on the reference threshold range model and the standard threshold probability comprises:
calculating a first to-be-determined reference threshold probability and a second to-be-determined reference threshold probability according to the reference threshold range model, wherein the first to-be-determined reference threshold probability represents the probability of the index data in the first to-be-determined reference range, and the second to-be-determined reference threshold probability represents the probability of the index data in the second to-be-determined reference range;
determining an upper bound threshold of the index data according to the first to-be-determined reference threshold probability and a standard threshold probability, wherein the standard threshold probability represents the probability of the index data in a standard reference range;
determining a lower bound threshold of the index data according to the second to-be-determined reference threshold probability and the standard threshold probability;
and taking the upper bound threshold and the lower bound threshold as a monitoring threshold range of the index data.
5. The method of claim 4, wherein calculating the first reference threshold probability to be determined according to the reference threshold range model comprises:
calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data;
and substituting the binary search initial value into the reference threshold range model to calculate to obtain the first to-be-determined reference threshold probability.
6. The method of claim 5, wherein determining the upper bound threshold for the metric data based on the first to-be-determined reference threshold probability and a standard threshold probability comprises:
judging whether the difference value between the first to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference value threshold value or not;
if so, taking the first to-be-determined reference threshold probability as an upper bound threshold of the index data;
if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the first to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the first to-be-determined reference threshold probability as an upper bound threshold of the index data.
7. The method of claim 4, wherein calculating a second reference threshold probability to be determined according to the reference threshold range model comprises:
calculating to obtain a binary search initial value according to the mean value and the standard deviation of the index data;
and substituting the binary search initial value into the adjusted reference threshold range model to calculate to obtain the second to-be-determined reference threshold probability.
8. The method of claim 5, wherein determining the lower bound threshold for the indicator data based on the second reference threshold probability to be determined and a standard threshold probability comprises:
judging whether the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference value threshold value or not;
if so, taking the second to-be-determined reference threshold probability as a lower bound threshold of the index data;
and if not, performing binary search on the first to-be-determined reference threshold probability until the difference value between the second to-be-determined reference threshold probability and the standard threshold probability is smaller than a preset difference threshold, and taking the second to-be-determined reference threshold probability as a lower bound threshold of the index data.
9. An index data monitoring threshold range determination device, comprising:
the calculation module is used for calculating the respective probabilities of a plurality of index parameters included in the index data within a preset effective threshold range, the index parameters form the index data according to the time sequence characteristics, and the index data is used for representing the service data monitored by the user demand;
and the determining module is used for calculating the monitoring threshold range of the index data according to the respective probabilities of the index parameters in the effective threshold range and a preset standard threshold probability.
10. A computer device comprising a processor and a non-volatile memory having computer instructions stored thereon, wherein the computer instructions, when executed by the processor, cause the computer device to perform the metric data monitoring threshold range determination method of any one of claims 1-8.
11. A readable storage medium, characterized in that the readable storage medium comprises a computer program which, when running, controls a computer device in which the readable storage medium is located to perform the indicator data monitoring threshold range determination method according to any one of claims 1 to 8.
CN202110232174.0A 2021-03-02 2021-03-02 Index data monitoring threshold range determining method and device and readable storage medium Active CN112905419B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110232174.0A CN112905419B (en) 2021-03-02 2021-03-02 Index data monitoring threshold range determining method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110232174.0A CN112905419B (en) 2021-03-02 2021-03-02 Index data monitoring threshold range determining method and device and readable storage medium

Publications (2)

Publication Number Publication Date
CN112905419A true CN112905419A (en) 2021-06-04
CN112905419B CN112905419B (en) 2022-11-15

Family

ID=76108642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110232174.0A Active CN112905419B (en) 2021-03-02 2021-03-02 Index data monitoring threshold range determining method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN112905419B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101267362A (en) * 2008-05-16 2008-09-17 亿阳信通股份有限公司 A dynamic identification method and its device for normal fluctuation range of performance normal value
US20090144425A1 (en) * 2007-12-04 2009-06-04 Sony Computer Entertainment Inc. Network bandwidth detection, distribution and traffic prioritization
CN105406991A (en) * 2015-10-26 2016-03-16 上海华讯网络系统有限公司 Method and system for generating service threshold by historical data based on network monitoring indexes
CN107066365A (en) * 2017-02-20 2017-08-18 阿里巴巴集团控股有限公司 The monitoring method and device of a kind of system exception
CN107871190A (en) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 A kind of operational indicator monitoring method and device
CN108063699A (en) * 2017-12-28 2018-05-22 携程旅游信息技术(上海)有限公司 Network performance monitoring method, apparatus, electronic equipment, storage medium
CN110995477A (en) * 2019-11-20 2020-04-10 北京宝兰德软件股份有限公司 Early warning processing method, device and equipment based on dynamic threshold and storage medium
CN111176953A (en) * 2020-01-02 2020-05-19 广州虎牙科技有限公司 Anomaly detection and model training method thereof, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144425A1 (en) * 2007-12-04 2009-06-04 Sony Computer Entertainment Inc. Network bandwidth detection, distribution and traffic prioritization
CN101267362A (en) * 2008-05-16 2008-09-17 亿阳信通股份有限公司 A dynamic identification method and its device for normal fluctuation range of performance normal value
CN105406991A (en) * 2015-10-26 2016-03-16 上海华讯网络系统有限公司 Method and system for generating service threshold by historical data based on network monitoring indexes
CN107871190A (en) * 2016-09-23 2018-04-03 阿里巴巴集团控股有限公司 A kind of operational indicator monitoring method and device
CN107066365A (en) * 2017-02-20 2017-08-18 阿里巴巴集团控股有限公司 The monitoring method and device of a kind of system exception
CN108063699A (en) * 2017-12-28 2018-05-22 携程旅游信息技术(上海)有限公司 Network performance monitoring method, apparatus, electronic equipment, storage medium
CN110995477A (en) * 2019-11-20 2020-04-10 北京宝兰德软件股份有限公司 Early warning processing method, device and equipment based on dynamic threshold and storage medium
CN111176953A (en) * 2020-01-02 2020-05-19 广州虎牙科技有限公司 Anomaly detection and model training method thereof, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赵新斌等: "异常值检测方法在民航告警中的应用", 《南京航空航天大学学报》 *
赵新斌等: "异常值检测方法在民航告警中的应用", 《南京航空航天大学学报》, no. 04, 15 August 2017 (2017-08-15) *

Also Published As

Publication number Publication date
CN112905419B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN110474795B (en) Server capacity processing method and device, storage medium and electronic equipment
US7778715B2 (en) Methods and systems for a prediction model
US8516499B2 (en) Assistance in performing action responsive to detected event
CN111277511B (en) Transmission rate control method, device, computer system and readable storage medium
CN111740865A (en) Flow fluctuation trend prediction method and device and electronic equipment
CN112905419B (en) Index data monitoring threshold range determining method and device and readable storage medium
CN113343577B (en) Parameter optimization method, device, equipment and medium based on machine learning
CN111027591A (en) Node fault prediction method for large-scale cluster system
CN114328078A (en) Threshold dynamic calculation method and device and computer readable storage medium
CN115495705A (en) Evaluation function determination method, evaluation function determination device, electronic device, and storage medium
CN111258788B (en) Disk failure prediction method, device and computer readable storage medium
CN112785000A (en) Machine learning model training method and system for large-scale machine learning system
CN110417744B (en) Security determination method and device for network access
CN112784165A (en) Training method of incidence relation estimation model and method for estimating file popularity
CN110765303A (en) Method and system for updating database
CN111309706A (en) Model training method and device, readable storage medium and electronic equipment
CN117081996B (en) Flow control method based on server-side real-time feedback and soft threshold and related equipment
CN115883392B (en) Data perception method and device of computing power network, electronic equipment and storage medium
CN114518849B (en) Data storage method and device and electronic equipment
CN113688929B (en) Prediction model determining method, apparatus, electronic device and computer storage medium
CN113138877B (en) Method, apparatus and computer program product for managing backup systems
CN114089712B (en) Data processing method and device
CN117132177B (en) Runoff forecasting model construction and runoff forecasting method based on multiple hypothesis test
CN112153685B (en) RRC fault detection method and device
CN117873828A (en) Alarm processing method, device, equipment and medium of server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant