WO2024066720A1 - Indicator threshold determination method and apparatus, storage medium, and electronic apparatus - Google Patents

Indicator threshold determination method and apparatus, storage medium, and electronic apparatus Download PDF

Info

Publication number
WO2024066720A1
WO2024066720A1 PCT/CN2023/110331 CN2023110331W WO2024066720A1 WO 2024066720 A1 WO2024066720 A1 WO 2024066720A1 CN 2023110331 W CN2023110331 W CN 2023110331W WO 2024066720 A1 WO2024066720 A1 WO 2024066720A1
Authority
WO
WIPO (PCT)
Prior art keywords
indicator
threshold
data
coordinate
value
Prior art date
Application number
PCT/CN2023/110331
Other languages
French (fr)
Chinese (zh)
Inventor
杨伟伟
冯媛
邵敏峰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024066720A1 publication Critical patent/WO2024066720A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Definitions

  • the present disclosure relates to the field of big data and artificial intelligence technology, and in particular, to a method, device, storage medium and electronic device for determining an indicator threshold.
  • threshold setting such as anomaly detection, root cause analysis, data prediction, alarm management, intelligent recovery, and perception evaluation.
  • wireless network operators set thresholds for service indicators mainly based on fixed empirical thresholds of indicators or dynamic thresholds obtained from relatively complex statistical distributions. Even if dynamic thresholds based on mathematical methods such as statistical distribution are used, the threshold solution problem is converted into a threshold setting problem in another dimension, which makes it difficult to accurately and objectively measure the pros and cons of service indicators, and thus effectively guide network O&M and analysis and achieve the goal of maximizing data value.
  • the embodiments of the present disclosure provide a method, device, storage medium and electronic device for determining an indicator threshold, so as to at least solve the problem of how to determine the indicator threshold.
  • a method for determining an indicator threshold comprising: obtaining aggregated indicator data corresponding to a target indicator; determining an indicator data set from the aggregated indicator data, sorting first indicator data in the indicator data set to obtain second indicator data, clustering the second indicator data to obtain a plurality of clustered groups, and fitting the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object; and determining an indicator threshold from a set of intersection coordinates of the piecewise functions according to the indicator bias of the target indicator.
  • a device for determining an indicator threshold comprising: an acquisition module configured to acquire aggregate indicator data corresponding to a target indicator; a first determination module configured to determine a target indicator from the aggregate indicator data; Determine an indicator data set, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain multiple groups after clustering, and fit the indicator data of each group of the multiple groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; a second determination module is configured to determine an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  • a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned method for determining the indicator threshold value when running.
  • an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the method for determining the indicator threshold through the computer program.
  • aggregated indicator data corresponding to a target indicator is obtained; an indicator data set is determined from the aggregated indicator data, first indicator data in the indicator data set is sorted to obtain second indicator data, the second indicator data is clustered to obtain a plurality of clustered groups, and the indicator data of each of the plurality of groups is fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object; an indicator threshold is determined from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the technical problem of how to determine the indicator threshold.
  • FIG1 is a hardware structure block diagram of a computer terminal of a method for determining an indicator threshold value according to an embodiment of the present disclosure
  • FIG2 is a flow chart of a method for determining an indicator threshold according to an embodiment of the present disclosure
  • FIG3 is a schematic diagram of two-dimensional discrete points according to an embodiment of the present disclosure.
  • FIG. 4 is a structural block diagram of a device for determining an indicator threshold according to an embodiment of the present disclosure.
  • FIG1 is a hardware structure block diagram of a computer terminal of the method for determining the indicator threshold of the embodiment of the present disclosure.
  • the computer terminal may include one or more (only one is shown in FIG1 ) processors 202 (the processor 202 may include but is not limited to a microprocessor (Microprocessor Unit, referred to as MPU) or a programmable logic device (Programmable logic device, referred to as PLD) and a memory 204 configured to store data.
  • MPU microprocessor Unit
  • PLD programmable logic device
  • the above-mentioned computer terminal may also include a transmission device 206 and an input and output device 208 configured to have a communication function.
  • a transmission device 206 and an input and output device 208 configured to have a communication function.
  • the structure shown in FIG1 is only for illustration and does not limit the structure of the above-mentioned computer terminal.
  • the computer terminal may also include more or fewer components than those shown in FIG1 , or have a different configuration with the same function as that shown in FIG1 or more functions than those shown in FIG1 .
  • the memory 204 may be configured to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the method for determining the index threshold value in the embodiment of the present disclosure, and the processor 202 executes various functional applications and data processing by running the computer programs stored in the memory 204, that is, to implement the above method.
  • the memory 204 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 204 may further include a memory remotely arranged relative to the processor 202, and these remote memories may be connected to the computer terminal via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the transmission device 206 is configured to receive or send data via a network.
  • Specific examples of the above-mentioned network may include a wireless network provided by a communication provider of a computer terminal.
  • the transmission device 206 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 206 can be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet wirelessly.
  • RF Radio Frequency
  • TCP Transmission Control Protocol
  • transmission control protocol transmission control protocol
  • RTT Round-Trip Time, round-trip delay
  • CPU Central Processing Unit
  • central processing unit central processing unit
  • FIG2 is a flow chart of a method for determining an indicator threshold according to an embodiment of the present disclosure. As shown in FIG2 , the steps of the method include:
  • Step S202 obtaining aggregated indicator data corresponding to the target indicator.
  • Step S204 determining an indicator data set from the aggregated indicator data, and performing a The indicator data are sorted to obtain second indicator data, the second indicator data are clustered to obtain multiple groups after clustering, and the indicator data of each group of the multiple groups are fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitoring object.
  • clustering algorithms for clustering the second indicator data may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean migration clustering algorithm, hierarchical clustering, etc., but are not limited to these.
  • Step S206 determining an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  • the disclosed embodiment obtains aggregated indicator data corresponding to the target indicator; determines an indicator data set from the aggregated indicator data, sorts the first indicator data in the indicator data set to obtain second indicator data, clusters the second indicator data to obtain a plurality of clustered groups, and fits the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; determines an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the problem of how to determine the indicator threshold.
  • the following implementation steps are proposed: determine the pre-set monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, the initial indicator data under the indicator category, and the time aggregation granularity corresponding to the target indicator; determine the indicator data to be aggregated according to the pre-set monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, and the initial indicator data under the indicator category; aggregate the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator.
  • a technical solution for aggregating the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator, which specifically includes: obtaining a first time granularity of the indicator data to be aggregated; when it is determined that the first time granularity is smaller than the time aggregation granularity, obtaining the first indicator data of the indicator data to be aggregated within the first time granularity, and aggregating multiple first time granularities into the time aggregation granularity; aggregating multiple first indicator data within the multiple first time granularities into first aggregated indicator data within the time aggregation granularity, and determining the first aggregated indicator data as the aggregated indicator data corresponding to the target indicator.
  • first indicator data of the indicator data to be aggregated within the first time granularity can be obtained, and the first indicator data can be determined as the aggregated indicator data corresponding to the target indicator.
  • the second indicator data before clustering the second indicator data, can be further standardized to obtain a plurality of standardized indicator values, wherein each standardized indicator value corresponds to a sorting number; for each standardized indicator value, the sorting number corresponding to the standardized indicator value is determined as the horizontal coordinate, and the standardized indicator value is determined as the vertical coordinate to obtain the coordinate point corresponding to the standardized indicator value; the coordinate slopes between the two adjacent coordinate points are determined to obtain a plurality of coordinate slopes, and for each of the plurality of coordinate slopes, a smoothing value of each coordinate slope is determined to obtain a plurality of smoothing values; based on the plurality of smoothing values, The sliding value determines third indicator data, and determines the third indicator data as the updated second indicator data.
  • the above-mentioned standardization processing may include normalization processing.
  • the second indicator data may be standardized by using a normalization processing method to compress the range to within the range of [0, 1], so as to standardize the data and improve the data processing efficiency.
  • the following technical solution is proposed: clustering the multiple coordinate slopes according to a preset clustering algorithm to obtain multiple groups of slope values; for each group of slope values, determining the mean of the coordinate slopes of each group of slope values as the smoothing value of the coordinate slope of each group of slope values.
  • the above-mentioned preset clustering algorithms may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean shift clustering algorithm, hierarchical clustering, etc., and the present disclosure does not limit this.
  • a technical solution is also proposed, and the specific steps include: when it is determined that there is a target group slope value among the multiple groups of slope values, the smoothed value of the coordinate slope of the adjacent group slope value adjacent to the target group slope value is determined as the smoothed value of the coordinate slope within the target group slope value, or the smoothed value of the coordinate slope within the target group slope value is determined according to a preset smoothing value, wherein the number of coordinate slopes within the target group slope value is different from the number of coordinate slopes within each group of slope values.
  • multiple groups of means can be obtained according to the mean of the indicator data of each group, and a mean set can be determined according to the multiple groups of means, wherein the mean set includes the means corresponding to each piecewise function; the intersection coordinates of the intersection coordinate set of the piecewise function are determined, the left derivative and the right derivative corresponding to the intersection coordinates are determined, and the first mean of the first piecewise function corresponding to the left derivative in the mean set and the second mean of the second piecewise function corresponding to the right derivative in the mean set are determined; based on the first mean and the second mean, it is determined whether to retain the intersection coordinates within the indicator threshold set.
  • the following technical solution is proposed to illustrate the implementation process of determining whether to retain the intersection coordinates within the indicator threshold set based on the first mean and the second mean: determine a first absolute distance value between the intersection coordinates and the origin coordinates; determine first coordinate information corresponding to the first mean based on the first absolute distance value and the first mean, wherein the first coordinate information represents the independent variable value of the first piecewise function; determine second coordinate information corresponding to the second mean based on the first absolute distance value and the second mean, wherein the second coordinate information represents the independent variable value of the second piecewise function; when it is determined that the first coordinate information is the same as the second coordinate information, retain the intersection coordinates within the indicator threshold set; when it is determined that the first coordinate information is different from the second coordinate information, retain the intersection coordinates within the indicator threshold set.
  • a technical solution for implementing the above step S206 of determining the indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, specifically including: determining the set of non-differentiable points of the piecewise function and the coordinate points whose second-order derivatives are target values; determining the indicator threshold set based on the set of non-differentiable points, the coordinate points whose second-order derivatives are target values, and the intersection coordinate set of the piecewise function; determining the set of indicator thresholds according to the set of non-differentiable points, the coordinate points whose second-order derivatives are target values, and the intersection coordinate set of the piecewise function; determining the set of The indicator bias of the target indicator determines the indicator threshold from the indicator threshold set.
  • target value may be, for example, 0, but is not limited thereto.
  • the process of determining an indicator threshold from an indicator threshold set according to the indicator bias of the target indicator can be implemented in a variety of ways, specifically including: Way 1, when it is determined that the indicator bias of the target indicator is negative, if the type of the indicator threshold is determined to be an alarm threshold, then the maximum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, then the minimum value in the indicator threshold set is determined as the indicator threshold.
  • Method 2 When it is determined that the indicator bias of the target indicator is positive, if the type of the indicator threshold is determined to be an alarm threshold, the minimum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the maximum value in the indicator threshold set is determined as the indicator threshold.
  • the alarm threshold can be understood as a threshold when the performance corresponding to the indicator data of the target indicator is poor.
  • the alarm threshold of the CPU usage is set to 80%, at which time the CPU occupies more resources and the performance is poor.
  • the optimal threshold can be understood as a threshold when the performance corresponding to the indicator data of the target indicator is better.
  • the optimal threshold of the network delay is set to 10%, at which time the network delay is small and the performance is better. In particular, for multiple optimal thresholds, the smallest one is selected as the optimal delay.
  • the monitoring dimensions and indicators are determined, and the key KPI indicator system for operation and maintenance scenarios ⁇ KPI1, KPI2, ..., KPIn ⁇ is constructed; then, according to the actual real-time requirements of business operation and maintenance, the time aggregation granularity is determined, and the time granularity indicator aggregation is performed on each dimension-KPI data to construct a five-tuple data of ⁇ dimension (i.e. monitoring dimension), object (i.e. monitoring object), time granularity (i.e. time aggregation granularity), indicator (i.e. indicator category of target indicator), data (i.e. initial indicator data under indicator category) ⁇ ; among them, the business indicator configuration information needs to clarify the bias of the indicator and the normal range of the indicator.
  • ⁇ dimension i.e. monitoring dimension
  • object i.e. monitoring object
  • time granularity i.e. time aggregation granularity
  • indicator i.e. indicator category of target indicator
  • data i.e.
  • Step a construct a single-dimensional single-object or multi-object KPI indicator data set (the object range selection depends on the actual application scenario), sort the data values of the time series indicators (from small to large or from large to small), and obtain a two-dimensional sequence of ID values and KPI indicator values ⁇ i: Valuei ⁇ , i ⁇ [1, N], N is the number of samples in the data set, and the ID value is the corresponding serial number of the sorted KPI indicator, starting from 1 and increasing by an interval of 1.
  • the two-dimensional sequence can be expressed as a two-dimensional discrete point image with the sample ID and KPI index value as the coordinate axis.
  • the coordinate point is the ID as the horizontal axis and the KPI index value as the vertical axis.
  • Step b Considering that the KPI indicator value may fluctuate greatly, in order to facilitate subsequent processing, the indicator is first standardized.
  • the KPI indicator value is standardized by a normalization processing method, and the range is compressed to the range of [0, 1].
  • clustering algorithms can avoid excessive parameter settings in the process of building the algorithm model, and can classify data with differences in the sequence, making it easier to obtain turning points later.
  • the present disclosure does not specifically limit the type of clustering algorithms.
  • Step d For the G clustering result sets of step c, curve fitting is performed respectively to obtain a piecewise function f(x) having G fitting functions.
  • the curve fitting method can quickly obtain an approximate piecewise function, which provides an effective way to solve the turning point in the subsequent automatic threshold calculation process.
  • Step a Solve the intersection coordinate information of each adjacent piecewise fitting function. For the intersection coordinates, solve the x ⁇ [1, N] interval, the f(x) non-differentiable point set C and the coordinate points where the second-order derivative is 0 to form the threshold set T. For x ⁇ C, calculate the left derivative k1' and the right derivative k2' respectively, and calculate the absolute distance between k1' and k2' and k ⁇ K to determine the category. If both the left and right derivatives belong to the same group, it means that the point is not the turning point we want to find, and then remove the point from the threshold set T.
  • Step b Combine the indicator bias information provided in the business indicator configuration information in step 1 and take the maximum or minimum value in the threshold set T as the threshold solution.
  • the monitoring dimension of this embodiment is the server
  • the objects are server A, server B, and server C
  • the indicators are CPU usage (%), memory usage (%), disk usage (%), and network rate (kbps).
  • the above indicators are constructed into a key KPI indicator system for server equipment operation and maintenance scenarios;
  • the time aggregation granularity is determined to be 1 hour. Taking the server's CPU utilization rate (%) as the target indicator, the indicator is aggregated at the time granularity to construct a five-tuple data of ⁇ dimension, object, time granularity, indicator, data ⁇ ; the business indicator configuration information clearly states that the bias of the CPU utilization rate (%) indicator is negative, and the normal range of the indicator is 0 to 100.
  • Table 1 Index data record table
  • mapping function is solved according to the discrete sample points.
  • server A and server B as objects, select their corresponding indicator data sets, sort the indicator data in the set according to the values, and obtain a two-dimensional sequence ⁇ i: Valuei ⁇ , i ⁇ [1, 48] of the combination of ID values and CPU usage (%) values, and construct a two-dimensional discrete point image with the horizontal axis as the sample ID and the vertical axis as the CPU usage (%).
  • the CPU usage (%) indicator value is normalized.
  • the minimum-maximum scaling method can be used to compress the indicator range to the range of [0, 1].
  • the normalization function is as follows:
  • the two-dimensional sequence discrete points within the above two-dimensional sequence can be understood as coordinate points corresponding to the above standardized index values.
  • the slope between two consecutive points is calculated in segments (equivalent to the above-mentioned coordinate slope), and the slope is smoothed by the three-point mean slope k’.
  • the three-point mean smoothed slope cannot be calculated for the last two points, and the three-point mean smoothed slope of the previous point can be used instead to obtain a new sequence of ID values and three-point mean smoothed slopes ⁇ i:k’i ⁇ , i ⁇ [1,48].
  • the threshold is automatically learned.
  • intersection coordinate set X ⁇ (10, 0.3), (30, 0.5), (40, 0.8) ⁇ of each adjacent piecewise fitting function.
  • the intersection coordinates solve the x ⁇ [1, 48] interval, the set C of non-differentiable points of f(x) and the coordinate points where the second-order derivative is 0.
  • the threshold set T is formed.
  • the sets C and T are also ⁇ (10, 0.3), (30, 0.5), (40, 0.8) ⁇ .
  • the left derivative k1' and the right derivative k2' are calculated respectively.
  • the absolute distances with k ⁇ K are calculated for k1' and k2' respectively.
  • the k selects the mean of the smooth slopes of the groups corresponding to the piecewise functions on the left and right sides of the coordinate point. According to the calculated absolute distance, the belonging category is determined. For the coordinate point (10, 0.3), the left derivative k1' and the right derivative k2' are 0.04 and 0.01 respectively. The mean of the smooth slopes of the groups corresponding to the piecewise functions on the left and right sides of the coordinate point (10, 0.3) are 0.046 and 0.012 respectively. According to the absolute distance calculation, the left derivative k1' and the right derivative k2' belong to different groups, and the point is not removed from the threshold set T. The same goes for other coordinate points, and the final threshold set T is ⁇ (10, 0.3), (30, 0.5), (40, 0.8) ⁇ .
  • the usage scenario is to find the CPU usage rate (%) that needs to issue an alarm, that is, the worse threshold, so the maximum value of the smooth slope of the three-point mean in the threshold set T is selected as 0.8 as the reference value for generating the threshold. Based on this value, the CPU usage rate (%) indicator value before normalization is obtained in reverse, which is 90.72, which is the required threshold solution.
  • the monitoring dimension of this embodiment is the cell
  • the objects are cell 622001, cell 622002, cell 622003, ..., cell 622099
  • the indicators are TCP connection success rate (%), TCP retransmission rate (%), TCP disorder rate (%), TCP average RTT delay (ms);
  • the time aggregation granularity is determined to be 1 hour.
  • the indicator is aggregated at the time granularity to construct the five-tuple data of ⁇ dimension, object, time granularity, indicator, data ⁇ ; the business indicator configuration information clearly states that the bias of the TCP average RTT delay (ms) indicator is negative, and the normal range of the indicator is greater than or equal to 0.
  • mapping function is solved according to the discrete sample points.
  • 50 cells including cell 622001, cell 622002, cell 622003, ..., cell 622050, are determined as objects, and their corresponding indicator data sets are selected.
  • the indicator data in the set are sorted according to the values to obtain a two-dimensional sequence ⁇ i: Valuei ⁇ , i ⁇ [1, 1200] of the combination of ID value and TCP average RTT delay (ms) value.
  • a two-dimensional discrete point image is constructed with the horizontal axis as sample ID and the vertical axis as TCP average RTT delay (ms).
  • the indicator value of TCP average RTT delay (ms) is normalized, and the minimum-maximum scaling method can be used to compress the indicator range to the range of [0, 1].
  • the threshold is automatically learned.
  • the minimum value of the smooth slope of the three-point mean in the threshold set T 0.123, is selected as the reference value for generating the threshold. Based on this value, the TCP average RTT delay (ms) indicator value of 2.5 before normalization is obtained in reverse, which is the required threshold solution.
  • the unilaterality of the indicators is first divided to distinguish between positive and negative indicators. Secondly, the indicator data is preprocessed and the rate of change of the curve is calculated. Then, the machine learning algorithm is used to train the indicator data model. Finally, the threshold learning is transformed into a solution problem based on the turning point of the unilaterality of the indicator, so as to realize the intelligent generation of thresholds for different types of indicators with lower cost and higher accuracy.
  • the problem that the threshold or the converted threshold needs to be manually set can be solved more thoroughly.
  • the technical solution disclosed in the present invention has better applicability and accuracy, and provides a strong guarantee for the operation and maintenance support and operation analysis of mobile operators, which not only helps mobile operators to perform operation and maintenance support and operation analysis more accurately, but also greatly saves labor costs.
  • the present disclosure relates to the field of big data and artificial intelligence technology, and in particular to the field of communication big data and engineering operation and maintenance in the Internet, the Internet of Things, etc., where a large number of indicator thresholds need to be set in a targeted manner, such as the operation and maintenance support and operation analysis of mobile operators, such as anomaly detection, root cause analysis, data prediction, alarm management, intelligent recovery and perception evaluation.
  • the setting method of indicator thresholds on the market is currently widely used, and the indicator threshold is mainly set based on the fixed empirical threshold of the indicator or the dynamic threshold obtained by relatively complex statistical distribution.
  • the present disclosure converts the calculation problem of the indicator threshold into an image solving problem, and combines artificial intelligence algorithms for training and prediction.
  • the artificial intelligence algorithms used also have more choices in practical applications, such as neural network, clustering, classification and other algorithms.
  • the constructed model has good accuracy and broad application prospects, which provides a premise for the precision and intelligence of mobile operator engineering operation and maintenance, and also clarifies the direction for reducing labor costs.
  • This disclosure is aimed at the field of operation and maintenance, especially large and complex architecture systems. It includes IT equipment operation and maintenance based on underlying monitoring indicators and business system operation and maintenance based on model-based KPI/KQI indicators. By collecting and cleaning key system indicators, building monitoring dimension models and automatically learning indicator thresholds, it is possible to identify faults or risks in the system, thereby facilitating network optimization personnel to handle or avoid faults in advance.
  • a device for determining an indicator threshold is also provided, which is used to implement the above embodiments and preferred implementation methods.
  • the term “module” may be a combination of software and/or hardware that implements a predetermined function.
  • the devices described in the following embodiments are preferably implemented in software, the implementation of hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG4 is a structural block diagram of a device for determining an indicator threshold according to an embodiment of the present disclosure. As shown in FIG4 , the device for determining an indicator threshold includes:
  • An acquisition module 42 is configured to acquire aggregated indicator data corresponding to a target indicator
  • a first determination module 44 is configured to determine an indicator data set from the aggregated indicator data, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain a plurality of clustered groups, and fit the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object;
  • clustering algorithms for clustering the second indicator data may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean migration clustering algorithm, hierarchical clustering, etc., but are not limited to these.
  • the second determination module 46 is configured to determine an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  • aggregated indicator data corresponding to the target indicator is obtained; an indicator data set is determined from the aggregated indicator data, the first indicator data in the indicator data set is sorted to obtain second indicator data, the second indicator data is clustered to obtain multiple clustered groups, and the indicator data of each group of the multiple groups is fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; an indicator threshold is determined from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the problem of how to determine the indicator threshold.
  • the acquisition module 42 is further configured to: determine a preset monitoring dimension, a monitoring object of the monitoring dimension, an indicator category of the target indicator, the initial indicator data under the indicator category, and a time aggregation granularity corresponding to the target indicator; determine the indicator data to be aggregated according to the preset monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, and the initial indicator data under the indicator category; aggregate the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator.
  • the acquisition module 42 is further configured to: acquire the first time granularity of the indicator data to be aggregated; when determining that the first time granularity is smaller than the time aggregation granularity, acquire the first indicator data of the indicator data to be aggregated within the first time granularity, and aggregate multiple first time granularities into the time aggregation granularity; aggregate multiple first indicator data within the multiple first time granularities into first aggregate indicator data within the time aggregation granularity, and determine the first aggregate indicator data as the aggregate indicator data corresponding to the target indicator.
  • the above-mentioned acquisition module 42 is also configured to: when it is determined that the first time granularity is equal to the time aggregation granularity, obtain the first indicator data of the indicator data to be aggregated within the first time granularity, and determine the first indicator data as the aggregation indicator data corresponding to the target indicator.
  • the acquisition module 42 is further configured as follows: before clustering the second indicator data, the second indicator data is standardized to obtain a plurality of standardized indicator values, wherein each standardized indicator value corresponds to a sorting number; for each standardized indicator value, the sorting number corresponding to the standardized indicator value is determined as the horizontal coordinate, and the standardized indicator value is determined as the vertical coordinate to obtain the coordinate point corresponding to the standardized indicator value; the coordinate slopes between two adjacent coordinate points are determined to obtain a plurality of coordinate slopes, and for each of the plurality of coordinate slopes, a smoothing value of each coordinate slope is determined to obtain a plurality of smoothing values; the third indicator data is determined according to the plurality of smoothing values, and the third indicator data is determined as the updated second indicator data.
  • the above-mentioned standardization processing may include normalization processing.
  • the second indicator data may be standardized by using a normalization processing method to compress the range to within the range of [0, 1], so as to standardize the data and improve the data processing efficiency.
  • the acquisition module 42 is further configured to: in the process of determining the smoothing value of each coordinate slope in the multiple coordinate slopes to obtain multiple smoothing values, cluster the multiple coordinate slopes according to a preset clustering algorithm to obtain multiple groups of slope values; for each group of slope values, determine the mean of the coordinate slopes of each group of slope values as the smoothing value of the coordinate slope of each group of slope values.
  • the above-mentioned preset clustering algorithms may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean shift clustering algorithm, hierarchical clustering, etc., and the present disclosure does not limit this.
  • the acquisition module 42 is further configured to: when it is determined that there is a target group slope value among the multiple groups of slope values, determine the smoothed value of the coordinate slope of the adjacent group slope value adjacent to the target group slope value as the smoothed value of the coordinate slope within the target group slope value, or determine the smoothed value of the coordinate slope within the target group slope value according to a preset smoothing value, wherein the number of coordinate slopes within the target group slope value is different from the number of coordinate slopes within each group of slope values.
  • the first determination module 44 is further configured to: obtain multiple groups of means according to the mean of the indicator data of each group, and determine a mean set according to the multiple groups of means, wherein the mean set includes the means corresponding to each piecewise function; determine the intersection coordinates of the intersection coordinate set of the piecewise function, determine the left derivative and the right derivative corresponding to the intersection coordinates, and determine the first mean of the first piecewise function corresponding to the left derivative in the mean set and the second mean of the second piecewise function corresponding to the right derivative in the mean set; determine whether to retain the intersection coordinates within the indicator threshold set based on the first mean and the second mean.
  • the first determination module 44 is further configured to: determine a first absolute distance value between the intersection coordinates and the origin coordinates; determine first coordinate information corresponding to the first mean according to the first absolute distance value and the first mean, wherein the first coordinate information represents the independent variable value of the first piecewise function; determine second coordinate information corresponding to the second mean according to the first absolute distance value and the second mean, wherein the second coordinate information represents the independent variable value of the second piecewise function; when it is determined that the first coordinate information is the same as the second coordinate information, retain the intersection coordinates within the indicator threshold set; when it is determined that the first coordinate information is different from the second coordinate information, retain the intersection coordinates within the indicator threshold set.
  • the second determination module 46 is further configured to: determine the set of non-differentiable points of the piecewise function and the coordinate points where the second-order derivative is the target value; determine the indicator threshold set based on the set of non-differentiable points, the coordinate points where the second-order derivative is the target value, and the coordinate set of the intersection of the piecewise function; and determine the indicator threshold from the indicator threshold set according to the indicator bias of the target indicator.
  • the second determination module 46 is further configured as follows: when it is determined that the indicator bias of the target indicator is negative, if the type of the indicator threshold is determined to be an alarm threshold, the maximum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the minimum value in the indicator threshold set is determined as the indicator threshold.
  • the above-mentioned second determination module 46 is also configured as follows: when it is determined that the indicator bias of the target indicator is positive, if the type of the indicator threshold is determined to be an alarm threshold, the minimum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the maximum value in the indicator threshold set is determined as the indicator threshold.
  • the present disclosure proposes a threshold intelligent learning and operation and maintenance device based on curve image calculation, which can solve the core problem of replacement of threshold self-learning in the industry (i.e., converting one threshold automatic learning process into the threshold setting of another threshold), and truly achieve automatic identification and operation and maintenance of thresholds without human intervention.
  • the indicator data of the monitoring object based on the granularity of the actual application scenario is obtained; the constructed data is displayed in a sequence graphical manner, the distribution data is converted into a curve image, and by solving the turning point of the distribution, combined with the actual indicator business characteristics, the indicator threshold self-learning function is further realized.
  • a first aspect of the present disclosure provides a model building unit based on collected indicators, which is configured to implement functions such as data cleaning, aggregate model description, and core business indicator configuration item description.
  • the second aspect of the present disclosure provides a method for solving a mapping function based on discrete sample points, the method comprising: converting the time series data after model construction into the image representation required for solving the threshold in the present disclosure. It should be noted that the image is not actually drawn here, but the converted data sequence can express the characteristics of the image; through clustering and image fitting algorithms, a mapping function of the image based on the sample sequence is obtained.
  • the third aspect of the present disclosure provides a threshold automatic learning calculation method based on curve image calculation, the method comprising: solving the slope change rate of the function curve obtained in the above steps, combining the business characteristics of the indicator, to obtain the threshold intelligent recognition result.
  • the fourth aspect of the present disclosure provides a threshold automatic learning device based on curve image calculation, the device comprising: a real-time data aggregation module, configured to perform real-time data cleaning and indicator aggregation for key KPI indicators of each entity node of the multi-dimensional system of the operation and maintenance system; a threshold intelligent identification module, configured to execute the method in the above steps.
  • a fifth aspect of the present disclosure provides an electronic device, the electronic device comprising a computer processor and a memory: the computer memory is configured to store a computer program;
  • the processor is configured to implement the functions implemented by the model building unit described in the first aspect above, and to execute a method for solving a mapping function based on discrete sample points described in the second aspect above and a threshold automatic learning calculation method based on curve image calculation described in the third aspect.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is a better implementation method.
  • the technical solution of the present disclosure, or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium (such as ROM/RAM, a magnetic disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.
  • a readable storage medium such as ROM/RAM, a magnetic disk, or an optical disk
  • the above-mentioned computer-readable storage medium may include, but is not limited to: a USB flash drive, a read-only memory (ROM), a random access memory (RAM), a mobile hard disk, a magnetic disk or an optical disk, and other media that can store computer programs.
  • An embodiment of the present disclosure further provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
  • the processor may be configured to perform the following steps through a computer program:
  • the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
  • modules or steps of the present disclosure can be implemented by a general computing device, they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices, they can be implemented by a program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, the steps shown or described can be executed in a different order than here, or they can be made into individual integrated circuit modules, or multiple modules or steps therein can be made into a single integrated circuit module for implementation.
  • the present disclosure is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present disclosure provide an indicator threshold determination method and apparatus, a storage medium, and an electronic apparatus. The method comprises: acquiring aggregated indicator data corresponding to a target indicator; determining an indicator data set from the aggregated indicator data, sorting first indicator data in the indicator data set to obtain second indicator data, clustering the second indicator data to obtain a plurality of clustered groups, and fitting the indicator data in each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein a same indicator data set represents indicator data of a same monitoring object; and determining an indicator threshold from an intersection coordinate set of the piecewise function according to an indicator bias of the target indicator. The technical solution solves the problem of how to determine an indicator threshold in the related technologies.

Description

指标阈值的确定方法、装置、存储介质及电子装置Method, device, storage medium and electronic device for determining index threshold
本公开要求于2022年09月30日提交中国专利局、申请号为202211225305.3、发明名称“指标阈值的确定方法、装置、存储介质及电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application filed with the China Patent Office on September 30, 2022, with application number 202211225305.3 and invention name “Method, device, storage medium and electronic device for determining indicator threshold value”, the entire contents of which are incorporated by reference in this disclosure.
技术领域Technical Field
本公开涉及大数据和人工智能技术领域,具体而言,涉及一种指标阈值的确定方法、装置、存储介质及电子装置。The present disclosure relates to the field of big data and artificial intelligence technology, and in particular, to a method, device, storage medium and electronic device for determining an indicator threshold.
背景技术Background technique
随着万物互联时代的到来,传感器、智能手机、可穿戴设备以及智能家电等设备将成为万物互联的一部分,在设备的运行时期往往会产生海量数据,无线网络运营商在对数据的快速处理过程中,通常会挖掘有效数据价值并应用于运维支撑和运营分析。基于移动互联网的业务丰富多样,不同的业务对网络性能的要求也各异,因此,在采集数据进行运维支撑和运营分析的过程中,往往需要结合业务的特点和指标的实际情况进行阈值设定,从而灵活地构建指标的评价标准。With the advent of the Internet of Everything era, sensors, smartphones, wearable devices, smart home appliances and other devices will become part of the Internet of Everything. Massive amounts of data will often be generated during the operation of the devices. Wireless network operators will usually mine the value of effective data and apply it to operation support and operation analysis in the process of rapid data processing. Mobile Internet-based services are rich and diverse, and different services have different requirements for network performance. Therefore, in the process of collecting data for operation support and operation analysis, it is often necessary to set thresholds based on the characteristics of the business and the actual situation of the indicators, so as to flexibly construct the evaluation criteria of the indicators.
需要进行阈值设定的运维分析场景非常多,如异常检测、根因分析、数据预测、告警管理、智能恢复和感知评估等。以往无线网络运营商对业务的指标进行阈值设定的时候主要是基于指标的固定经验阈值或者相对复杂的统计学分布得到的动态阈值,即使是采用基于统计学分布等数学方法的动态阈值,也是将阈值求解问题转换为另一个维度的阈值设置问题,难以准确客观地衡量业务指标的优劣情况,进而有效地指导网络运维和分析并达到数据价值最大化的目标。There are many O&M analysis scenarios that require threshold setting, such as anomaly detection, root cause analysis, data prediction, alarm management, intelligent recovery, and perception evaluation. In the past, wireless network operators set thresholds for service indicators mainly based on fixed empirical thresholds of indicators or dynamic thresholds obtained from relatively complex statistical distributions. Even if dynamic thresholds based on mathematical methods such as statistical distribution are used, the threshold solution problem is converted into a threshold setting problem in another dimension, which makes it difficult to accurately and objectively measure the pros and cons of service indicators, and thus effectively guide network O&M and analysis and achieve the goal of maximizing data value.
因此,针对相关技术,如何确定出指标阈值的问题,目前尚未提出有效的解决方案。Therefore, with regard to the related technologies, there is no effective solution to the problem of how to determine the indicator threshold.
因此,有必要对相关技术予以改良以克服相关技术中的所述缺陷。Therefore, it is necessary to improve the related technology to overcome the above-mentioned defects in the related technology.
发明内容Summary of the invention
本公开实施例提供了一种指标阈值的确定方法、装置、存储介质及电子装置,以至少解决如何确定出指标阈值的问题。The embodiments of the present disclosure provide a method, device, storage medium and electronic device for determining an indicator threshold, so as to at least solve the problem of how to determine the indicator threshold.
根据本公开实施例的一方面,提供一种指标阈值的确定方法,包括:获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。According to one aspect of an embodiment of the present disclosure, a method for determining an indicator threshold is provided, comprising: obtaining aggregated indicator data corresponding to a target indicator; determining an indicator data set from the aggregated indicator data, sorting first indicator data in the indicator data set to obtain second indicator data, clustering the second indicator data to obtain a plurality of clustered groups, and fitting the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object; and determining an indicator threshold from a set of intersection coordinates of the piecewise functions according to the indicator bias of the target indicator.
根据本公开实施例的又一方面,还提供了一种指标阈值的确定装置,包括:获取模块,设置为获取目标指标对应的聚合指标数据;第一确定模块,设置为从所述聚合指标数据中确 定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;第二确定模块,设置为根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。According to another aspect of the embodiment of the present disclosure, a device for determining an indicator threshold is also provided, comprising: an acquisition module configured to acquire aggregate indicator data corresponding to a target indicator; a first determination module configured to determine a target indicator from the aggregate indicator data; Determine an indicator data set, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain multiple groups after clustering, and fit the indicator data of each group of the multiple groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; a second determination module is configured to determine an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
根据本公开实施例的又一方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述指标阈值的确定方法。According to another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned method for determining the indicator threshold value when running.
根据本公开实施例的又一方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过计算机程序执行上述指标阈值的确定方法。According to another aspect of an embodiment of the present disclosure, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the method for determining the indicator threshold through the computer program.
通过本公开,通过获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,解决了如何确定指标阈值的技术问题。Through the present disclosure, aggregated indicator data corresponding to a target indicator is obtained; an indicator data set is determined from the aggregated indicator data, first indicator data in the indicator data set is sorted to obtain second indicator data, the second indicator data is clustered to obtain a plurality of clustered groups, and the indicator data of each of the plurality of groups is fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object; an indicator threshold is determined from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the technical problem of how to determine the indicator threshold.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
此处所说明的附图用来提供对本公开的进一步理解,构成本公开的一部分,本公开的示例性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:The drawings described herein are used to provide a further understanding of the present disclosure and constitute a part of the present disclosure. The exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute an improper limitation on the present disclosure. In the drawings:
图1是本公开实施例的指标阈值的确定方法的计算机终端的硬件结构框图;FIG1 is a hardware structure block diagram of a computer terminal of a method for determining an indicator threshold value according to an embodiment of the present disclosure;
图2是根据本公开实施例的指标阈值的确定方法的流程图;FIG2 is a flow chart of a method for determining an indicator threshold according to an embodiment of the present disclosure;
图3是根据本公开实施例的二维离散点的示意图;FIG3 is a schematic diagram of two-dimensional discrete points according to an embodiment of the present disclosure;
图4是根据本公开实施例的指标阈值的确定装置的结构框图。FIG. 4 is a structural block diagram of a device for determining an indicator threshold according to an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。In order to enable those skilled in the art to better understand the scheme of the present disclosure, the technical scheme in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in the field without creative work should fall within the scope of protection of the present disclosure.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于 清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments of the present disclosure described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to Those steps or elements explicitly listed may include other steps or elements not explicitly listed or inherent to these processes, methods, products or apparatuses.
本公开实施例中所提供的方法实施例可以在计算机终端或者类似的运算装置中执行。以运行在计算机终端上为例,图1是本公开实施例的指标阈值的确定方法的计算机终端的硬件结构框图。如图1所示,计算机终端可以包括一个或多个(图1中仅示出一个)处理器202(处理器202可以包括但不限于微处理器(Microprocessor Unit,简称是MPU)或可编程逻辑器件(Programmable logic device,简称是PLD)和设置为存储数据的存储器204,在一个示例性实施例中,上述计算机终端还可以包括设置为通信功能的传输设备206以及输入输出设备208。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述计算机终端的结构造成限定。例如,计算机终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示等同功能或比图1所示功能更多的不同的配置。The method embodiments provided in the embodiments of the present disclosure can be executed in a computer terminal or a similar computing device. Taking running on a computer terminal as an example, FIG1 is a hardware structure block diagram of a computer terminal of the method for determining the indicator threshold of the embodiment of the present disclosure. As shown in FIG1 , the computer terminal may include one or more (only one is shown in FIG1 ) processors 202 (the processor 202 may include but is not limited to a microprocessor (Microprocessor Unit, referred to as MPU) or a programmable logic device (Programmable logic device, referred to as PLD) and a memory 204 configured to store data. In an exemplary embodiment, the above-mentioned computer terminal may also include a transmission device 206 and an input and output device 208 configured to have a communication function. It can be understood by those skilled in the art that the structure shown in FIG1 is only for illustration and does not limit the structure of the above-mentioned computer terminal. For example, the computer terminal may also include more or fewer components than those shown in FIG1 , or have a different configuration with the same function as that shown in FIG1 or more functions than those shown in FIG1 .
存储器204可设置为存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例中的指标阈值的确定方法对应的计算机程序,处理器202通过运行存储在存储器204内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器204可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器204可进一步包括相对于处理器202远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 204 may be configured to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the method for determining the index threshold value in the embodiment of the present disclosure, and the processor 202 executes various functional applications and data processing by running the computer programs stored in the memory 204, that is, to implement the above method. The memory 204 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 204 may further include a memory remotely arranged relative to the processor 202, and these remote memories may be connected to the computer terminal via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
传输设备206设置为经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端的通信供应商提供的无线网络。在一个实例中,传输设备206包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输设备206可以为射频(Radio Frequency,简称为RF)模块,其设置为通过无线方式与互联网进行通讯。The transmission device 206 is configured to receive or send data via a network. Specific examples of the above-mentioned network may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 206 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 206 can be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet wirelessly.
接下来对本公开中提及的技术术语的含义进行解释:The following is an explanation of the meanings of the technical terms mentioned in this disclosure:
KQI,Key Quality Indicators,业务质量参数;KQI, Key Quality Indicators, service quality parameters;
KPI,Key Performance Index,关键性能指标;KPI, Key Performance Index, key performance indicator;
TCP,Transmission Control Protocol,传输控制协议;TCP, Transmission Control Protocol, transmission control protocol;
RTT,Round-Trip Time,往返时延;RTT, Round-Trip Time, round-trip delay;
CPU,Central Processing Unit,中央处理器。CPU, Central Processing Unit, central processing unit.
图2是根据本公开实施例的指标阈值的确定方法的流程图,如图2所示,该方法的步骤包括:FIG2 is a flow chart of a method for determining an indicator threshold according to an embodiment of the present disclosure. As shown in FIG2 , the steps of the method include:
步骤S202,获取目标指标对应的聚合指标数据。Step S202, obtaining aggregated indicator data corresponding to the target indicator.
步骤S204,从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一 指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据。Step S204, determining an indicator data set from the aggregated indicator data, and performing a The indicator data are sorted to obtain second indicator data, the second indicator data are clustered to obtain multiple groups after clustering, and the indicator data of each group of the multiple groups are fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitoring object.
需要说明的是,上述对第二指标数据进行聚类的聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,但不限于此。It should be noted that the above-mentioned clustering algorithms for clustering the second indicator data may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean migration clustering algorithm, hierarchical clustering, etc., but are not limited to these.
步骤S206,根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。Step S206: determining an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
本公开实施例通过获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,解决了如何确定出指标阈值的问题。The disclosed embodiment obtains aggregated indicator data corresponding to the target indicator; determines an indicator data set from the aggregated indicator data, sorts the first indicator data in the indicator data set to obtain second indicator data, clusters the second indicator data to obtain a plurality of clustered groups, and fits the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; determines an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the problem of how to determine the indicator threshold.
在一个示例性实施例中,为了更好的理解上述步骤S202中获取目标指标对应的聚合指标数据的实现过程,提出了以下实现步骤:确定预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据和所述目标指标对应的时间聚合粒度;根据所述预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据确定待聚合指标数据;按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据。In an exemplary embodiment, in order to better understand the implementation process of obtaining the aggregated indicator data corresponding to the target indicator in the above step S202, the following implementation steps are proposed: determine the pre-set monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, the initial indicator data under the indicator category, and the time aggregation granularity corresponding to the target indicator; determine the indicator data to be aggregated according to the pre-set monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, and the initial indicator data under the indicator category; aggregate the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator.
在一个示例性实施例中,提出了一种按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据的技术方案,具体包括:获取所述待聚合指标数据的第一时间粒度;在确定所述第一时间粒度小于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,在将多个第一时间粒度聚合为所述时间聚合粒度;将所述多个第一时间粒度内的多个第一指标数据聚合为所述时间聚合粒度内的第一聚合指标数据,将所述第一聚合指标数据确定为所述目标指标对应的聚合指标数据。In an exemplary embodiment, a technical solution is proposed for aggregating the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator, which specifically includes: obtaining a first time granularity of the indicator data to be aggregated; when it is determined that the first time granularity is smaller than the time aggregation granularity, obtaining the first indicator data of the indicator data to be aggregated within the first time granularity, and aggregating multiple first time granularities into the time aggregation granularity; aggregating multiple first indicator data within the multiple first time granularities into first aggregated indicator data within the time aggregation granularity, and determining the first aggregated indicator data as the aggregated indicator data corresponding to the target indicator.
在一个示例性实施例中,还可以在确定所述第一时间粒度等于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,将所述第一指标数据确定为所述目标指标对应的聚合指标数据。In an exemplary embodiment, when it is determined that the first time granularity is equal to the time aggregation granularity, first indicator data of the indicator data to be aggregated within the first time granularity can be obtained, and the first indicator data can be determined as the aggregated indicator data corresponding to the target indicator.
在一个示例性实施例中,在对所述第二指标数据进行聚类之前,进一步的,可以对所述第二指标数据进行标准化处理,得到多个标准化指标数值,其中,每一个标准化指标数值对应有排序序号;对于每一个标准化指标数值,将所述标准化指标数值对应的排序序号确定为横坐标,将所述标准化指标数值确定为纵坐标,得到所述标准化指标数值对应的坐标点;确定出两两相邻的所述坐标点之间的坐标斜率,得到多个坐标斜率,对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值;根据所述多个平 滑值确定第三指标数据,并将所述第三指标数据确定为更新后的所述第二指标数据。In an exemplary embodiment, before clustering the second indicator data, the second indicator data can be further standardized to obtain a plurality of standardized indicator values, wherein each standardized indicator value corresponds to a sorting number; for each standardized indicator value, the sorting number corresponding to the standardized indicator value is determined as the horizontal coordinate, and the standardized indicator value is determined as the vertical coordinate to obtain the coordinate point corresponding to the standardized indicator value; the coordinate slopes between the two adjacent coordinate points are determined to obtain a plurality of coordinate slopes, and for each of the plurality of coordinate slopes, a smoothing value of each coordinate slope is determined to obtain a plurality of smoothing values; based on the plurality of smoothing values, The sliding value determines third indicator data, and determines the third indicator data as the updated second indicator data.
需要说明的是,上述标准化处理可以包括归一化处理,例如,采用归一化的处理方式对第二指标数据进行标准化,将范围压缩至[0,1]范围内,可以将数据标准化,从而提高数据的处理效率。It should be noted that the above-mentioned standardization processing may include normalization processing. For example, the second indicator data may be standardized by using a normalization processing method to compress the range to within the range of [0, 1], so as to standardize the data and improve the data processing efficiency.
在一个示例性实施例中,在对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值的过程中,提出了如下技术方案:按照预设聚类算法对将所述多个坐标斜率进行聚类,得到多组斜率值;对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值。In an exemplary embodiment, in the process of determining the smoothing value of each coordinate slope among the multiple coordinate slopes to obtain multiple smoothing values, the following technical solution is proposed: clustering the multiple coordinate slopes according to a preset clustering algorithm to obtain multiple groups of slope values; for each group of slope values, determining the mean of the coordinate slopes of each group of slope values as the smoothing value of the coordinate slope of each group of slope values.
其中,上述预设聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,本公开对此不作限制。Among them, the above-mentioned preset clustering algorithms may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean shift clustering algorithm, hierarchical clustering, etc., and the present disclosure does not limit this.
在一个示例性实施例中,还提出了一种技术方案,具体步骤包括:在确定所述多组斜率值中存在目标组斜率值的情况下,将与所述目标组斜率值相邻的临近组斜率值的坐标斜率的平滑值确定为所述目标组斜率值内的坐标斜率的平滑值,或者按照预设平滑值确定所述目标组斜率值内的坐标斜率的平滑值,其中,所述目标组斜率值内的坐标斜率的数量与所述每一组斜率值内的坐标斜率的数量不同。In an exemplary embodiment, a technical solution is also proposed, and the specific steps include: when it is determined that there is a target group slope value among the multiple groups of slope values, the smoothed value of the coordinate slope of the adjacent group slope value adjacent to the target group slope value is determined as the smoothed value of the coordinate slope within the target group slope value, or the smoothed value of the coordinate slope within the target group slope value is determined according to a preset smoothing value, wherein the number of coordinate slopes within the target group slope value is different from the number of coordinate slopes within each group of slope values.
在一个示例性实施例中,在对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数之后,进一步的,可以根据所述每个分组的指标数据的均值得到多组均值,并根据所述多组均值确定均值集合,其中,所述均值集合包括各个分段函数对应的均值;确定所述分段函数的交点坐标集合的交点坐标,确定所述交点坐标对应的左导数和右导数,并确定所述左导数在所述均值集合内对应的第一分段函数的第一均值和所述右导数在所述均值集合内对应的第二分段函数的第二均值;基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内。In an exemplary embodiment, after clustering the second indicator data to obtain multiple groups after clustering, and fitting the indicator data of each group of the multiple groups to obtain the piecewise function corresponding to each group, further, multiple groups of means can be obtained according to the mean of the indicator data of each group, and a mean set can be determined according to the multiple groups of means, wherein the mean set includes the means corresponding to each piecewise function; the intersection coordinates of the intersection coordinate set of the piecewise function are determined, the left derivative and the right derivative corresponding to the intersection coordinates are determined, and the first mean of the first piecewise function corresponding to the left derivative in the mean set and the second mean of the second piecewise function corresponding to the right derivative in the mean set are determined; based on the first mean and the second mean, it is determined whether to retain the intersection coordinates within the indicator threshold set.
在一个示例性实施例中,针对基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内的实现过程,提出以下技术方案进行说明:确定所述交点坐标与原点坐标之间的第一绝对距离值;根据所述第一绝对距离值和所述第一均值确定所述第一均值对应的第一坐标信息,其中,所述第一坐标信息表示所述第一分段函数的自变量取值;根据所述第一绝对距离值和所述第二均值确定所述第二均值对应的第二坐标信息,其中,所述第二坐标信息表示所述第二分段函数的自变量取值;在确定所述第一坐标信息与所述第二坐标信息相同的情况下,将所述交点坐标保留在所述指标阈值集合内;在确定所述第一坐标信息与所述第二坐标信息不同的情况下,将所述交点坐标保留在所述指标阈值集合内。In an exemplary embodiment, the following technical solution is proposed to illustrate the implementation process of determining whether to retain the intersection coordinates within the indicator threshold set based on the first mean and the second mean: determine a first absolute distance value between the intersection coordinates and the origin coordinates; determine first coordinate information corresponding to the first mean based on the first absolute distance value and the first mean, wherein the first coordinate information represents the independent variable value of the first piecewise function; determine second coordinate information corresponding to the second mean based on the first absolute distance value and the second mean, wherein the second coordinate information represents the independent variable value of the second piecewise function; when it is determined that the first coordinate information is the same as the second coordinate information, retain the intersection coordinates within the indicator threshold set; when it is determined that the first coordinate information is different from the second coordinate information, retain the intersection coordinates within the indicator threshold set.
在一个示例性实施例中,提出了一种如何实现上述步骤S206中根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值的技术方案,具体包括:确定出所述分段函数的不可导点集合和二阶导数为目标值的坐标点;基于所述不可导点集合、所述二阶导数为目标值的坐标点和所述分段函数的交点坐标集合确定出所述指标阈值集合;根据所 述目标指标的指标偏向性从指标阈值集合内确定出指标阈值。In an exemplary embodiment, a technical solution is proposed for implementing the above step S206 of determining the indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, specifically including: determining the set of non-differentiable points of the piecewise function and the coordinate points whose second-order derivatives are target values; determining the indicator threshold set based on the set of non-differentiable points, the coordinate points whose second-order derivatives are target values, and the intersection coordinate set of the piecewise function; determining the set of indicator thresholds according to the set of non-differentiable points, the coordinate points whose second-order derivatives are target values, and the intersection coordinate set of the piecewise function; determining the set of The indicator bias of the target indicator determines the indicator threshold from the indicator threshold set.
其中,需要说明的是,上述目标值例如可以取0,但不限于此。It should be noted that the target value may be, for example, 0, but is not limited thereto.
在一个示例性实施例中,可以通过多种方式实现根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值的过程,具体包括:方式1、在确定所述目标指标的指标偏向性为负向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值。In an exemplary embodiment, the process of determining an indicator threshold from an indicator threshold set according to the indicator bias of the target indicator can be implemented in a variety of ways, specifically including: Way 1, when it is determined that the indicator bias of the target indicator is negative, if the type of the indicator threshold is determined to be an alarm threshold, then the maximum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, then the minimum value in the indicator threshold set is determined as the indicator threshold.
方式2、在确定所述目标指标的指标偏向性为正向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值。Method 2: When it is determined that the indicator bias of the target indicator is positive, if the type of the indicator threshold is determined to be an alarm threshold, the minimum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the maximum value in the indicator threshold set is determined as the indicator threshold.
可选的,在上述实施例中,告警阈值可以理解为目标指标的指标数据对应的性能较差时的阈值,例如,设置CPU使用率的告警阈值为80%,此时CPU占用资源较多,性能较差。而择优阈值可以理解为目标指标的指标数据对应的性能较优时的阈值,例如,设置网络时延的择优阈值为10%,此时网络时延较小,性能较优。特别的,对于多个择优阈值,从中选取最小的作为择优时延。Optionally, in the above embodiment, the alarm threshold can be understood as a threshold when the performance corresponding to the indicator data of the target indicator is poor. For example, the alarm threshold of the CPU usage is set to 80%, at which time the CPU occupies more resources and the performance is poor. The optimal threshold can be understood as a threshold when the performance corresponding to the indicator data of the target indicator is better. For example, the optimal threshold of the network delay is set to 10%, at which time the network delay is small and the performance is better. In particular, for multiple optimal thresholds, the smallest one is selected as the optimal delay.
进一步的,通过以下步骤对本公开实施例的指标阈值的确定过程作进一步的详细描述。Furthermore, the process of determining the indicator threshold of the embodiment of the present disclosure is further described in detail through the following steps.
(1)模型构建单元:(1) Model building unit:
首先基于运维应用实际业务场景,确定监测维度和指标,构建运维场景关键KPI指标体系{KPI1,KPI2,...,KPIn};然后根据实际的业务运维实时性要求,确定时间聚合粒度,对每个维度-KPI数据进行时间粒度指标聚合,构建{维度(即监测维度)、对象(即监测对象)、时间粒度(即时间聚合粒度)、指标(即目标指标的指标类别)、数据(即指标类别下的初始指标数据)}五元组数据;其中,业务指标配置信息需要明确指标的偏向性及指标的正常范围区间。First, based on the actual business scenarios of operation and maintenance applications, the monitoring dimensions and indicators are determined, and the key KPI indicator system for operation and maintenance scenarios {KPI1, KPI2, ..., KPIn} is constructed; then, according to the actual real-time requirements of business operation and maintenance, the time aggregation granularity is determined, and the time granularity indicator aggregation is performed on each dimension-KPI data to construct a five-tuple data of {dimension (i.e. monitoring dimension), object (i.e. monitoring object), time granularity (i.e. time aggregation granularity), indicator (i.e. indicator category of target indicator), data (i.e. initial indicator data under indicator category)}; among them, the business indicator configuration information needs to clarify the bias of the indicator and the normal range of the indicator.
(2)基于离散样本点求解映射函数求解:(2) Solve the mapping function based on discrete sample points:
步骤a、构建单维度单对象或多对象KPI指标数据集合(对象范围选择取决于实际应用场景),将时序指标进行数据值排序(从小到大或从大到小均可),得到ID值和KPI指标值组合的二维序列{i:Valuei},i∈[1,N],N为数据集合的样本个数,ID值为排序后KPI指标的对应序号,从1开始以间隔为1的长度递增。Step a, construct a single-dimensional single-object or multi-object KPI indicator data set (the object range selection depends on the actual application scenario), sort the data values of the time series indicators (from small to large or from large to small), and obtain a two-dimensional sequence of ID values and KPI indicator values {i: Valuei}, i∈[1, N], N is the number of samples in the data set, and the ID value is the corresponding serial number of the sorted KPI indicator, starting from 1 and increasing by an interval of 1.
为了更易于形象化地理解,二维序列可以表现为坐标轴分别为样本ID和KPI指标值的二维离散点图像。如图3所示,以ID为横坐标,以KPI指标值为纵坐标的坐标点。In order to make it easier to understand, the two-dimensional sequence can be expressed as a two-dimensional discrete point image with the sample ID and KPI index value as the coordinate axis. As shown in Figure 3, the coordinate point is the ID as the horizontal axis and the KPI index value as the vertical axis.
步骤b、考虑到KPI指标值可能波动较大,为了便于后续处理,先对指标进行标准化。采用归一化的处理方式对KPI指标值进行标准化,将范围压缩至[0,1]范围内。Step b: Considering that the KPI indicator value may fluctuate greatly, in order to facilitate subsequent processing, the indicator is first standardized. The KPI indicator value is standardized by a normalization processing method, and the range is compressed to the range of [0, 1].
步骤c、分段计算连续两点间斜率k=(y2-y1)/(x2-x1),并对斜率进行三点均值平滑斜率k’,基于平滑斜率k’及ID值进行聚类,得到G个分组,计算各分组的平滑斜率均值,使用集合K进行标记。 Step c, calculate the slope k=(y 2 -y 1 )/(x 2 -x 1 ) between two consecutive points in segments, and perform three-point mean smoothing of the slope k', perform clustering based on the smoothing slope k' and ID value to obtain G groups, calculate the mean of the smoothing slope of each group, and use set K to mark them.
其中,采用聚类的算法可以避免在算法模型构建的过程中过多地进行参数设置,并且可以将序列中有差异的数据进行分类,便于后续更加容易完成转折点的获取,本公开对聚类算法的类型不作具体限定。Among them, the use of clustering algorithms can avoid excessive parameter settings in the process of building the algorithm model, and can classify data with differences in the sequence, making it easier to obtain turning points later. The present disclosure does not specifically limit the type of clustering algorithms.
步骤d、对步骤c的G个聚类分组结果集,分别进行曲线拟合,获得拥有G个拟合函数的分段函数f(x)。曲线拟合的方式可以快速获得近似的分段函数,为后续在阈值自动计算的过程中求解转折点提供了有效的途径。Step d: For the G clustering result sets of step c, curve fitting is performed respectively to obtain a piecewise function f(x) having G fitting functions. The curve fitting method can quickly obtain an approximate piecewise function, which provides an effective way to solve the turning point in the subsequent automatic threshold calculation process.
(3)阈值自动计算:(3) Automatic calculation of threshold:
步骤a、求解各相邻分段拟合函数的交点坐标信息,对交点坐标,再求解x∈[1,N]区间,f(x)不可导点集合C及二阶导数为0的坐标点,组成阈值集合T。对x∈C,分别计算左导数k1’及右导数k2’,针对k1’、k2’分别计算与k∈K的绝对距离,判定归属类别。若左、右导数均归属同一分组,表明该点并非我们想寻找的转折点,则将该点从阈值集合T中剔除。Step a: Solve the intersection coordinate information of each adjacent piecewise fitting function. For the intersection coordinates, solve the x∈[1, N] interval, the f(x) non-differentiable point set C and the coordinate points where the second-order derivative is 0 to form the threshold set T. For x∈C, calculate the left derivative k1' and the right derivative k2' respectively, and calculate the absolute distance between k1' and k2' and k∈K to determine the category. If both the left and right derivatives belong to the same group, it means that the point is not the turning point we want to find, and then remove the point from the threshold set T.
步骤b、结合步骤1业务指标配置信息中提供的指标偏向性信息,取阈值集合T中最大值或最小值为所求得阈值解。Step b: Combine the indicator bias information provided in the business indicator configuration information in step 1 and take the maximum or minimum value in the threshold set T as the threshold solution.
接下来结合以下实施例对指标阈值的确定方法进行进一步说明,以下实施例采用了不同的运维场景以及其对应的指标数据进行说明。Next, the method for determining the indicator threshold is further described in conjunction with the following embodiments. The following embodiments use different operation and maintenance scenarios and their corresponding indicator data for description.
实施例1Example 1
以基于底层监控指标的IT设备运维场景进行举例,在运维过程中,时常需要根据服务器的资源使用情况来进行阈值设置,当达到设定阈值时需要发出告警,并考虑进行服务器扩容。Taking the IT equipment operation and maintenance scenario based on the underlying monitoring indicators as an example, during the operation and maintenance process, it is often necessary to set thresholds based on the resource usage of the server. When the set threshold is reached, an alarm needs to be issued and server expansion needs to be considered.
对于模型构建单元,确定本实施例的监测维度为服务器,对象为服务器A、服务器B、服务器C,指标为CPU使用率(%)、内存使用率(%)、磁盘使用率(%)、网络速率(kbps),以上指标构建成服务器设备运维场景关键的KPI指标体系;For the model building unit, it is determined that the monitoring dimension of this embodiment is the server, the objects are server A, server B, and server C, and the indicators are CPU usage (%), memory usage (%), disk usage (%), and network rate (kbps). The above indicators are constructed into a key KPI indicator system for server equipment operation and maintenance scenarios;
根据实际的业务运维实时性要求,确定时间聚合粒度为1小时,以服务器的CPU使用率(%)为目标指标为例,对该指标进行时间粒度聚合,构建{维度、对象、时间粒度、指标、数据}五元组数据;业务指标配置信息明确CPU使用率(%)指标的偏向性为负向,指标的正常范围区间为0到100。According to the actual real-time requirements of business operation and maintenance, the time aggregation granularity is determined to be 1 hour. Taking the server's CPU utilization rate (%) as the target indicator, the indicator is aggregated at the time granularity to construct a five-tuple data of {dimension, object, time granularity, indicator, data}; the business indicator configuration information clearly states that the bias of the CPU utilization rate (%) indicator is negative, and the normal range of the indicator is 0 to 100.
获得如表1所示的某天24个小时以1小时为粒度,经过聚合后的CPU使用率指标数据:Get the aggregated CPU usage indicator data for 24 hours in a day with 1 hour as the granularity as shown in Table 1:
表1:指标数据记录表

Table 1: Index data record table

基于上述模型构建单元的内容,根据离散样本点求解映射函数。Based on the contents of the above model building unit, the mapping function is solved according to the discrete sample points.
首先,在模型构建单元的结果中,确定服务器A、服务器B为对象,选择其对应的指标数据集合,将集合中的指标数据按照数值进行排序,得到ID值和CPU使用率(%)数值组合的二维序列{i:Valuei},i∈[1,48],构建横轴为样本ID、纵轴为CPU使用率(%)的二维离散点图像。First, in the results of the model building unit, determine server A and server B as objects, select their corresponding indicator data sets, sort the indicator data in the set according to the values, and obtain a two-dimensional sequence {i: Valuei}, i∈[1, 48] of the combination of ID values and CPU usage (%) values, and construct a two-dimensional discrete point image with the horizontal axis as the sample ID and the vertical axis as the CPU usage (%).
其次,对CPU使用率(%)的指标数值进行归一化处理,可以采用最小值-最大值缩放法,将指标范围压缩至[0,1]范围内,归一化函数如下所示:
Secondly, the CPU usage (%) indicator value is normalized. The minimum-maximum scaling method can be used to compress the indicator range to the range of [0, 1]. The normalization function is as follows:
其中,上述二维序列内的二维序列离散点可以理解为上述标准化指标数值对应的坐标点。The two-dimensional sequence discrete points within the above two-dimensional sequence can be understood as coordinate points corresponding to the above standardized index values.
再次,分段计算连续两点间斜率(相当于上述坐标斜率),并对斜率进行三点均值平滑斜率k’,最后2个点无法计算三点均值平滑斜率,可以用前面1个点的三点均值平滑斜率代替,得到ID值和三点均值平滑斜率组合的新序列{i:k’i},i∈[1,48],对序列中的值进行DBSCAN聚类,得到4个分组,计算各分组的平滑斜率均值(相当于对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值的过程),得到集合K={0.046,0.012,0.034,0.089}。Again, the slope between two consecutive points is calculated in segments (equivalent to the above-mentioned coordinate slope), and the slope is smoothed by the three-point mean slope k’. The three-point mean smoothed slope cannot be calculated for the last two points, and the three-point mean smoothed slope of the previous point can be used instead to obtain a new sequence of ID values and three-point mean smoothed slopes {i:k’i}, i∈[1,48]. DBSCAN clustering is performed on the values in the sequence to obtain 4 groups, and the mean of the smoothed slope of each group is calculated (equivalent to the process of determining the mean of the coordinate slope of each group of slope values as the smoothed value of the coordinate slope of each group of slope values for each group of slope values), and the set K={0.046, 0.012, 0.034, 0.089} is obtained.
最后,对上步每个分组的数据使用线性一次函数进行拟合,得到有4个分段的分段函数f(x)。
Finally, a linear linear function is used to fit the data of each group in the previous step to obtain a piecewise function f(x) with 4 segments.
基于上述求解的映射函数,进行阈值自动学习。Based on the mapping function solved above, the threshold is automatically learned.
首先,求解各相邻分段拟合函数的交点坐标集合X={(10,0.3),(30,0.5),(40,0.8)},对于交点坐标,再求解x∈[1,48]区间,f(x)不可导点集合C及二阶导数为0的坐标点,组 成阈值集合T,此实施例中集合C和T同样为{(10,0.3),(30,0.5),(40,0.8)}。对x∈C,分别计算左导数k1’及右导数k2’,针对k1’、k2’分别计算与k∈K的绝对距离,k选择坐标点左、右两侧分段函数所对应分组的平滑斜率均值,根据计算的绝对距离,判定归属类别。对于坐标点(10,0.3),左导数k1’、右导数k2’分别为0.04和0.01,坐标点(10,0.3)左、右两侧分段函数所对应分组的平滑斜率均值分别为0.046和0.012,根据绝对距离计算,左导数k1’、右导数k2’属于不同分组,则不将该点从阈值集合T中剔除。其他坐标点以此类推,最终得到阈值集合T为{(10,0.3),(30,0.5),(40,0.8)}。First, solve the intersection coordinate set X = {(10, 0.3), (30, 0.5), (40, 0.8)} of each adjacent piecewise fitting function. For the intersection coordinates, solve the x∈[1, 48] interval, the set C of non-differentiable points of f(x) and the coordinate points where the second-order derivative is 0. The threshold set T is formed. In this embodiment, the sets C and T are also {(10, 0.3), (30, 0.5), (40, 0.8)}. For x∈C, the left derivative k1' and the right derivative k2' are calculated respectively. The absolute distances with k∈K are calculated for k1' and k2' respectively. k selects the mean of the smooth slopes of the groups corresponding to the piecewise functions on the left and right sides of the coordinate point. According to the calculated absolute distance, the belonging category is determined. For the coordinate point (10, 0.3), the left derivative k1' and the right derivative k2' are 0.04 and 0.01 respectively. The mean of the smooth slopes of the groups corresponding to the piecewise functions on the left and right sides of the coordinate point (10, 0.3) are 0.046 and 0.012 respectively. According to the absolute distance calculation, the left derivative k1' and the right derivative k2' belong to different groups, and the point is not removed from the threshold set T. The same goes for other coordinate points, and the final threshold set T is {(10, 0.3), (30, 0.5), (40, 0.8)}.
其次,结合模型构建单元业务指标配置信息中设置的指标偏向性为负向,且使用场景为寻找需要发出告警的CPU使用率(%),即较差阈值,故选取阈值集合T中三点均值平滑斜率的最大值0.8为生成阈值的参考值,根据该值反向求得归一化前的CPU使用率(%)指标值90.72,即为所需阈值解。Secondly, combined with the negative bias of the indicator set in the business indicator configuration information of the model building unit, and the usage scenario is to find the CPU usage rate (%) that needs to issue an alarm, that is, the worse threshold, so the maximum value of the smooth slope of the three-point mean in the threshold set T is selected as 0.8 as the reference value for generating the threshold. Based on this value, the CPU usage rate (%) indicator value before normalization is obtained in reverse, which is 90.72, which is the required threshold solution.
实施例2Example 2
以基于模型构建KPI/KQI指标的业务系统运维场景进行举例,在移动运营商的网络运维中,需要关注小区的KQI指标情况,通过小区的KQI来对小区的感知情况进行评价,在评价的过程中通常需要进行KQI指标阈值的设定。Taking the business system operation and maintenance scenario of building KPI/KQI indicators based on the model as an example, in the network operation and maintenance of mobile operators, it is necessary to pay attention to the KQI indicators of the cell and evaluate the perception of the cell through the KQI of the cell. In the evaluation process, it is usually necessary to set the KQI indicator threshold.
对于模型构建单元,确定本实施例的监测维度为小区,对象为小区622001、小区622002、小区622003、...、小区622099,指标为TCP连接成功率(%)、TCP重传率(%)、TCP乱序率(%)、TCP平均RTT时延(ms);For the model building unit, it is determined that the monitoring dimension of this embodiment is the cell, the objects are cell 622001, cell 622002, cell 622003, ..., cell 622099, and the indicators are TCP connection success rate (%), TCP retransmission rate (%), TCP disorder rate (%), TCP average RTT delay (ms);
根据实际的业务运维实时性要求,确定时间聚合粒度为1小时,以小区的TCP平均RTT时延(ms)为例,对该指标进行时间粒度聚合,构建{维度、对象、时间粒度、指标、数据}五元组数据;业务指标配置信息明确TCP平均RTT时延(ms)指标的偏向性为负向,指标的正常范围为大于等于0。According to the actual real-time requirements of business operation and maintenance, the time aggregation granularity is determined to be 1 hour. Taking the TCP average RTT delay (ms) of the cell as an example, the indicator is aggregated at the time granularity to construct the five-tuple data of {dimension, object, time granularity, indicator, data}; the business indicator configuration information clearly states that the bias of the TCP average RTT delay (ms) indicator is negative, and the normal range of the indicator is greater than or equal to 0.
获得某天24个小时以1小时为粒度,经过聚合后的TCP平均RTT时延(ms)指标数据。Get the average TCP RTT delay (ms) indicator data after aggregation with a granularity of 1 hour for 24 hours in a day.
基于上述模型构建单元的内容,根据离散样本点求解映射函数。Based on the contents of the above model building unit, the mapping function is solved according to the discrete sample points.
首先,在模型构建单元的结果中,确定小区622001、小区622002、小区622003、...、小区622050共50个小区为对象,选择其对应的指标数据集合,将集合中的指标数据按照数值进行排序,得到ID值和TCP平均RTT时延(ms)数值组合的二维序列{i:Valuei},i∈[1,1200]。构建横轴为样本ID、纵轴为TCP平均RTT时延(ms)的二维离散点图像。First, in the results of the model building unit, 50 cells, including cell 622001, cell 622002, cell 622003, ..., cell 622050, are determined as objects, and their corresponding indicator data sets are selected. The indicator data in the set are sorted according to the values to obtain a two-dimensional sequence {i: Valuei}, i∈[1, 1200] of the combination of ID value and TCP average RTT delay (ms) value. A two-dimensional discrete point image is constructed with the horizontal axis as sample ID and the vertical axis as TCP average RTT delay (ms).
其次,对TCP平均RTT时延(ms)的指标数值进行归一化处理,可以采用最小值-最大值缩放法,将指标范围压缩至[0,1]范围内。Secondly, the indicator value of TCP average RTT delay (ms) is normalized, and the minimum-maximum scaling method can be used to compress the indicator range to the range of [0, 1].
再次,分段计算连续两点间斜率,并对斜率进行三点均值平滑斜率k’,最后2个点无法计算三点均值平滑斜率,可以用前面1个点的三点均值平滑斜率代替,得到ID值和三点均值平滑斜率组合的新序列{i:k’i},i∈[1,1200],对序列中的值进行高斯混合聚类,得到13个分组,计算各分组的平滑斜率均值,得到集合K。Again, calculate the slope between two consecutive points in segments, and perform three-point mean smoothing on the slope k’. The three-point mean smoothing slope cannot be calculated for the last two points, so the three-point mean smoothing slope of the previous point can be used instead to obtain a new sequence of ID values and three-point mean smoothing slopes {i:k’i}, i∈[1, 1200]. Perform Gaussian mixture clustering on the values in the sequence to obtain 13 groups. Calculate the mean of the smoothing slopes of each group to obtain the set K.
最后,对上步每个分组的数据使用线性一次函数、二次函数等进行拟合,可以选择拟合 最好的曲线作为拟合结果,得到有13个分段的分段函数f(x)。Finally, use linear first-order function, quadratic function, etc. to fit the data of each group in the previous step. You can choose to fit The best curve is taken as the fitting result, and a piecewise function f(x) having 13 segments is obtained.
基于上述求解的映射函数,进行阈值自动学习。Based on the mapping function solved above, the threshold is automatically learned.
首先,求解各相邻分段拟合函数的交点坐标集合X,对于交点坐标,再求解x∈[1,1200]区间,f(x)不可导点集合C及二阶导数为0的坐标点,组成阈值集合T。对x∈C,分别计算左导数k1’及右导数k2’,针对k1’、k2’分别计算与k∈K的绝对距离,k选择坐标点左、右两侧分段函数所对应分组的平滑斜率均值,根据计算的绝对距离,判定归属类别。如果属于相同分组,则将该点从阈值集合T中剔除。First, solve the intersection coordinate set X of each adjacent piecewise fitting function. For the intersection coordinates, solve the interval x∈[1,1200], the non-differentiable point set C of f(x) and the coordinate points where the second-order derivative is 0 to form the threshold set T. For x∈C, calculate the left derivative k1' and the right derivative k2' respectively. For k1' and k2', calculate the absolute distance with k∈K respectively. k selects the mean of the smooth slope of the groupings corresponding to the piecewise functions on the left and right sides of the coordinate point. According to the calculated absolute distance, determine the belonging category. If it belongs to the same group, the point is removed from the threshold set T.
其次,结合模型构建单元业务指标配置信息中设置的指标偏向性为负向,且使用场景为寻找需要理想阈值,故选取阈值集合T中三点均值平滑斜率的最小值0.123为生成阈值的参考值,根据该值反向求得归一化前的TCP平均RTT时延(ms)指标值2.5,即为所需阈值解。Secondly, combined with the negative bias of the indicator set in the business indicator configuration information of the model building unit, and the usage scenario is to find the ideal threshold, the minimum value of the smooth slope of the three-point mean in the threshold set T, 0.123, is selected as the reference value for generating the threshold. Based on this value, the TCP average RTT delay (ms) indicator value of 2.5 before normalization is obtained in reverse, which is the required threshold solution.
基于上述实施例,根据不同类型指标在现实网络的实际分布特征,比如大部分指标都具备“三段式”特征,即少量“劣化”样本、绝大多数“正常”样本、少量“优质”样本,可以将指标阈值的计算问题转化为图像的求解问题。Based on the above embodiments, according to the actual distribution characteristics of different types of indicators in real networks, for example, most indicators have a "three-segment" feature, namely a small number of "degraded" samples, the vast majority of "normal" samples, and a small number of "high-quality" samples. The problem of calculating the indicator threshold can be converted into an image solving problem.
在求解的过程中,首先对指标的单边性进行划分,区分正向、负向指标,其次再对指标数据进行预处理并计算曲线的变化率,随后采用机器学习算法对指标数据进行模型训练,最后将阈值的学习转化为基于指标单边性转折点的求解问题,以较低的成本和较高的准确性实现不同类型指标阈值的智能生成。In the solution process, the unilaterality of the indicators is first divided to distinguish between positive and negative indicators. Secondly, the indicator data is preprocessed and the rate of change of the curve is calculated. Then, the machine learning algorithm is used to train the indicator data model. Finally, the threshold learning is transformed into a solution problem based on the turning point of the unilaterality of the indicator, so as to realize the intelligent generation of thresholds for different types of indicators with lower cost and higher accuracy.
相较于传统的固定经验阈值或者相对复杂的基于统计学分布的动态阈值,可以较为彻底地解决阈值或转换后的阈值需要人工设置的问题。本公开的技术方案具有更好的适用性和准确性,为移动运营商的运维支撑和运营分析提供了有力的保障,不仅助力移动运营商更加准确地进行运维支撑和运营分析,也大大节省了人力成本。Compared with the traditional fixed empirical threshold or the relatively complex dynamic threshold based on statistical distribution, the problem that the threshold or the converted threshold needs to be manually set can be solved more thoroughly. The technical solution disclosed in the present invention has better applicability and accuracy, and provides a strong guarantee for the operation and maintenance support and operation analysis of mobile operators, which not only helps mobile operators to perform operation and maintenance support and operation analysis more accurately, but also greatly saves labor costs.
另外,本公开涉及大数据和人工智能技术领域,尤其涉及互联网、物联网等存在大量指标阈值需要进行针对性设置的通信大数据及其工程运维领域,比如移动运营商的运维支撑和运营分析,如异常检测、根因分析、数据预测、告警管理、智能恢复和感知评估等场景。目前市场上的指标阈值的设定方法应用非常广泛,并且指标阈值主要是基于指标的固定经验阈值或者相对复杂的统计学分布得到的动态阈值进行设定。随着人工智能技术的深度发展,基于人工智能算法构建的模型具有更好的适用性,本公开将指标阈值的计算问题转化为图像的求解问题,并结合人工智能算法进行训练和预测,所采用的人工智能算法在实际应用中也有较多的选择,比如神经网络、聚类、分类等算法,所构建的模型具有较好的准确性和较广阔的应用前景,为移动运营商工程运维的精确化和智能化提供了前提,也为人力成本的降低明确了方向。In addition, the present disclosure relates to the field of big data and artificial intelligence technology, and in particular to the field of communication big data and engineering operation and maintenance in the Internet, the Internet of Things, etc., where a large number of indicator thresholds need to be set in a targeted manner, such as the operation and maintenance support and operation analysis of mobile operators, such as anomaly detection, root cause analysis, data prediction, alarm management, intelligent recovery and perception evaluation. The setting method of indicator thresholds on the market is currently widely used, and the indicator threshold is mainly set based on the fixed empirical threshold of the indicator or the dynamic threshold obtained by relatively complex statistical distribution. With the in-depth development of artificial intelligence technology, the model built based on artificial intelligence algorithms has better applicability. The present disclosure converts the calculation problem of the indicator threshold into an image solving problem, and combines artificial intelligence algorithms for training and prediction. The artificial intelligence algorithms used also have more choices in practical applications, such as neural network, clustering, classification and other algorithms. The constructed model has good accuracy and broad application prospects, which provides a premise for the precision and intelligence of mobile operator engineering operation and maintenance, and also clarifies the direction for reducing labor costs.
本公开针对运维领域,特别是大型复杂架构系统。包括基于底层监控指标的IT设备运维及基于模型构建KPI/KQI指标的业务系统运维。通过对系统关键指标采集、清洗、监测维度模型构建及指标阈值自动学习,从而识别系统中可能存在的故障或风险,进而方便网优人员对故障进行提前处置或规避。This disclosure is aimed at the field of operation and maintenance, especially large and complex architecture systems. It includes IT equipment operation and maintenance based on underlying monitoring indicators and business system operation and maintenance based on model-based KPI/KQI indicators. By collecting and cleaning key system indicators, building monitoring dimension models and automatically learning indicator thresholds, it is possible to identify faults or risks in the system, thereby facilitating network optimization personnel to handle or avoid faults in advance.
在本实施例中还提供了指标阈值的确定装置,该装置用于实现上述实施例及优选实施方 式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的设备较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In this embodiment, a device for determining an indicator threshold is also provided, which is used to implement the above embodiments and preferred implementation methods. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, the implementation of hardware, or a combination of software and hardware, is also possible and contemplated.
图4是根据本公开实施例的指标阈值的确定装置的结构框图。如图4所示,指标阈值的确定装置包括:FIG4 is a structural block diagram of a device for determining an indicator threshold according to an embodiment of the present disclosure. As shown in FIG4 , the device for determining an indicator threshold includes:
获取模块42,设置为获取目标指标对应的聚合指标数据;An acquisition module 42 is configured to acquire aggregated indicator data corresponding to a target indicator;
第一确定模块44,设置为从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;A first determination module 44 is configured to determine an indicator data set from the aggregated indicator data, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain a plurality of clustered groups, and fit the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object;
需要说明的是,上述对第二指标数据进行聚类的聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,但不限于此。It should be noted that the above-mentioned clustering algorithms for clustering the second indicator data may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean migration clustering algorithm, hierarchical clustering, etc., but are not limited to these.
第二确定模块46,设置为根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。The second determination module 46 is configured to determine an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
通过上述装置,通过获取目标指标对应的聚合指标数据;从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,解决了如何确定出指标阈值的问题。Through the above-mentioned device, aggregated indicator data corresponding to the target indicator is obtained; an indicator data set is determined from the aggregated indicator data, the first indicator data in the indicator data set is sorted to obtain second indicator data, the second indicator data is clustered to obtain multiple clustered groups, and the indicator data of each group of the multiple groups is fitted to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents the indicator data of the same monitored object; an indicator threshold is determined from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator, thereby solving the problem of how to determine the indicator threshold.
在一个示例性实施例中,上述获取模块42还设置为:确定预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据和所述目标指标对应的时间聚合粒度;根据所述预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据确定待聚合指标数据;按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据。In an exemplary embodiment, the acquisition module 42 is further configured to: determine a preset monitoring dimension, a monitoring object of the monitoring dimension, an indicator category of the target indicator, the initial indicator data under the indicator category, and a time aggregation granularity corresponding to the target indicator; determine the indicator data to be aggregated according to the preset monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, and the initial indicator data under the indicator category; aggregate the indicator data to be aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator.
在一个示例性实施例中,上述获取模块42还设置为:获取所述待聚合指标数据的第一时间粒度;在确定所述第一时间粒度小于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,在将多个第一时间粒度聚合为所述时间聚合粒度;将所述多个第一时间粒度内的多个第一指标数据聚合为所述时间聚合粒度内的第一聚合指标数据,将所述第一聚合指标数据确定为所述目标指标对应的聚合指标数据。In an exemplary embodiment, the acquisition module 42 is further configured to: acquire the first time granularity of the indicator data to be aggregated; when determining that the first time granularity is smaller than the time aggregation granularity, acquire the first indicator data of the indicator data to be aggregated within the first time granularity, and aggregate multiple first time granularities into the time aggregation granularity; aggregate multiple first indicator data within the multiple first time granularities into first aggregate indicator data within the time aggregation granularity, and determine the first aggregate indicator data as the aggregate indicator data corresponding to the target indicator.
在一个示例性实施例中,上述获取模块42还设置为:在确定所述第一时间粒度等于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,将所述第一指标数据确定为所述目标指标对应的聚合指标数据。 In an exemplary embodiment, the above-mentioned acquisition module 42 is also configured to: when it is determined that the first time granularity is equal to the time aggregation granularity, obtain the first indicator data of the indicator data to be aggregated within the first time granularity, and determine the first indicator data as the aggregation indicator data corresponding to the target indicator.
在一个示例性实施例中,上述获取模块42还设置为:在对所述第二指标数据进行聚类之前,对所述第二指标数据进行标准化处理,得到多个标准化指标数值,其中,每一个标准化指标数值对应有排序序号;对于每一个标准化指标数值,将所述标准化指标数值对应的排序序号确定为横坐标,将所述标准化指标数值确定为纵坐标,得到所述标准化指标数值对应的坐标点;确定出两两相邻的所述坐标点之间的坐标斜率,得到多个坐标斜率,对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值;根据所述多个平滑值确定第三指标数据,并将所述第三指标数据确定为更新后的所述第二指标数据。In an exemplary embodiment, the acquisition module 42 is further configured as follows: before clustering the second indicator data, the second indicator data is standardized to obtain a plurality of standardized indicator values, wherein each standardized indicator value corresponds to a sorting number; for each standardized indicator value, the sorting number corresponding to the standardized indicator value is determined as the horizontal coordinate, and the standardized indicator value is determined as the vertical coordinate to obtain the coordinate point corresponding to the standardized indicator value; the coordinate slopes between two adjacent coordinate points are determined to obtain a plurality of coordinate slopes, and for each of the plurality of coordinate slopes, a smoothing value of each coordinate slope is determined to obtain a plurality of smoothing values; the third indicator data is determined according to the plurality of smoothing values, and the third indicator data is determined as the updated second indicator data.
需要说明的是,上述标准化处理可以包括归一化处理,例如,采用归一化的处理方式对第二指标数据进行标准化,将范围压缩至[0,1]范围内,可以将数据标准化,从而提高数据的处理效率。It should be noted that the above-mentioned standardization processing may include normalization processing. For example, the second indicator data may be standardized by using a normalization processing method to compress the range to within the range of [0, 1], so as to standardize the data and improve the data processing efficiency.
在一个示例性实施例中,上述获取模块42还设置为:在对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值的过程中,按照预设聚类算法对将所述多个坐标斜率进行聚类,得到多组斜率值;对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值。In an exemplary embodiment, the acquisition module 42 is further configured to: in the process of determining the smoothing value of each coordinate slope in the multiple coordinate slopes to obtain multiple smoothing values, cluster the multiple coordinate slopes according to a preset clustering algorithm to obtain multiple groups of slope values; for each group of slope values, determine the mean of the coordinate slopes of each group of slope values as the smoothing value of the coordinate slope of each group of slope values.
其中,上述预设聚类算法可以包括Kmeans聚类算法,DBSCAN-基于密度的空间聚类算法,谱聚类算法,GMM-高斯混合模型聚类算法,MeanShift-均值迁移聚类算法,层次聚类等,本公开对此不作限制。Among them, the above-mentioned preset clustering algorithms may include Kmeans clustering algorithm, DBSCAN-density-based spatial clustering algorithm, spectral clustering algorithm, GMM-Gaussian mixture model clustering algorithm, MeanShift-mean shift clustering algorithm, hierarchical clustering, etc., and the present disclosure does not limit this.
在一个示例性实施例中,上述获取模块42还设置为:在确定所述多组斜率值中存在目标组斜率值的情况下,将与所述目标组斜率值相邻的临近组斜率值的坐标斜率的平滑值确定为所述目标组斜率值内的坐标斜率的平滑值,或者按照预设平滑值确定所述目标组斜率值内的坐标斜率的平滑值,其中,所述目标组斜率值内的坐标斜率的数量与所述每一组斜率值内的坐标斜率的数量不同。In an exemplary embodiment, the acquisition module 42 is further configured to: when it is determined that there is a target group slope value among the multiple groups of slope values, determine the smoothed value of the coordinate slope of the adjacent group slope value adjacent to the target group slope value as the smoothed value of the coordinate slope within the target group slope value, or determine the smoothed value of the coordinate slope within the target group slope value according to a preset smoothing value, wherein the number of coordinate slopes within the target group slope value is different from the number of coordinate slopes within each group of slope values.
在一个示例性实施例中,上述第一确定模块44还设置为:根据所述每个分组的指标数据的均值得到多组均值,并根据所述多组均值确定均值集合,其中,所述均值集合包括各个分段函数对应的均值;确定所述分段函数的交点坐标集合的交点坐标,确定所述交点坐标对应的左导数和右导数,并确定所述左导数在所述均值集合内对应的第一分段函数的第一均值和所述右导数在所述均值集合内对应的第二分段函数的第二均值;基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内。In an exemplary embodiment, the first determination module 44 is further configured to: obtain multiple groups of means according to the mean of the indicator data of each group, and determine a mean set according to the multiple groups of means, wherein the mean set includes the means corresponding to each piecewise function; determine the intersection coordinates of the intersection coordinate set of the piecewise function, determine the left derivative and the right derivative corresponding to the intersection coordinates, and determine the first mean of the first piecewise function corresponding to the left derivative in the mean set and the second mean of the second piecewise function corresponding to the right derivative in the mean set; determine whether to retain the intersection coordinates within the indicator threshold set based on the first mean and the second mean.
在一个示例性实施例中,上述第一确定模块44还设置为:确定所述交点坐标与原点坐标之间的第一绝对距离值;根据所述第一绝对距离值和所述第一均值确定所述第一均值对应的第一坐标信息,其中,所述第一坐标信息表示所述第一分段函数的自变量取值;根据所述第一绝对距离值和所述第二均值确定所述第二均值对应的第二坐标信息,其中,所述第二坐标信息表示所述第二分段函数的自变量取值;在确定所述第一坐标信息与所述第二坐标信息相同的情况下,将所述交点坐标保留在所述指标阈值集合内;在确定所述第一坐标信息与所述第二坐标信息不同的情况下,将所述交点坐标保留在所述指标阈值集合内。 In an exemplary embodiment, the first determination module 44 is further configured to: determine a first absolute distance value between the intersection coordinates and the origin coordinates; determine first coordinate information corresponding to the first mean according to the first absolute distance value and the first mean, wherein the first coordinate information represents the independent variable value of the first piecewise function; determine second coordinate information corresponding to the second mean according to the first absolute distance value and the second mean, wherein the second coordinate information represents the independent variable value of the second piecewise function; when it is determined that the first coordinate information is the same as the second coordinate information, retain the intersection coordinates within the indicator threshold set; when it is determined that the first coordinate information is different from the second coordinate information, retain the intersection coordinates within the indicator threshold set.
在一个示例性实施例中,上述第二确定模块46还设置为:确定出所述分段函数的不可导点集合和二阶导数为目标值的坐标点;基于所述不可导点集合、所述二阶导数为目标值的坐标点和所述分段函数的交点坐标集合确定出所述指标阈值集合;根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值。In an exemplary embodiment, the second determination module 46 is further configured to: determine the set of non-differentiable points of the piecewise function and the coordinate points where the second-order derivative is the target value; determine the indicator threshold set based on the set of non-differentiable points, the coordinate points where the second-order derivative is the target value, and the coordinate set of the intersection of the piecewise function; and determine the indicator threshold from the indicator threshold set according to the indicator bias of the target indicator.
在一个示例性实施例中,上述第二确定模块46还设置为:在确定所述目标指标的指标偏向性为负向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值。In an exemplary embodiment, the second determination module 46 is further configured as follows: when it is determined that the indicator bias of the target indicator is negative, if the type of the indicator threshold is determined to be an alarm threshold, the maximum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the minimum value in the indicator threshold set is determined as the indicator threshold.
在一个示例性实施例中,上述第二确定模块46还设置为:在确定所述目标指标的指标偏向性为正向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值;如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值。In an exemplary embodiment, the above-mentioned second determination module 46 is also configured as follows: when it is determined that the indicator bias of the target indicator is positive, if the type of the indicator threshold is determined to be an alarm threshold, the minimum value in the indicator threshold set is determined as the indicator threshold; if the type of the indicator threshold is determined to be a preferential threshold, the maximum value in the indicator threshold set is determined as the indicator threshold.
进一步的,本公开提出了一种基于曲线图像计算的阈值智能学习运维装置,能够解决业界阈值自学习领域替换的核心问题(即将一个阈值自动学习过程转换为另一个阈值的门限设置),真正做到无人工干预的阈值自动识别及运维。通过对大数据运维指标的采集、清洗、模型构建,获得基于实际应用场景粒度下监测对象的指标数据;将构建后数据进行序列图形化展示,将分布数据转换为曲线图像,并通过对分布的转折点求解,结合实际的指标业务特征,进一步实现指标阈值自学习功能。Furthermore, the present disclosure proposes a threshold intelligent learning and operation and maintenance device based on curve image calculation, which can solve the core problem of replacement of threshold self-learning in the industry (i.e., converting one threshold automatic learning process into the threshold setting of another threshold), and truly achieve automatic identification and operation and maintenance of thresholds without human intervention. Through the collection, cleaning, and model construction of big data operation and maintenance indicators, the indicator data of the monitoring object based on the granularity of the actual application scenario is obtained; the constructed data is displayed in a sequence graphical manner, the distribution data is converted into a curve image, and by solving the turning point of the distribution, combined with the actual indicator business characteristics, the indicator threshold self-learning function is further realized.
本公开第一方面提供了基于采集指标的模型构建单元,设置为实现数据清洗、聚体模型描述及核心业务指标配置项描述等功能。A first aspect of the present disclosure provides a model building unit based on collected indicators, which is configured to implement functions such as data cleaning, aggregate model description, and core business indicator configuration item description.
本公开第二方面提供了一种基于离散样本点求解映射函数的方法,所述方法包括:基于模型构建后的时序数据转换为本公开求解阈值时所需的图像表现形式,需要说明的是,此处不是要真正将图像进行绘制,而是转后的数据序列能够表达图像的特点;通过聚类、图像拟合算法,得到图像基于样本序列的映射函数。The second aspect of the present disclosure provides a method for solving a mapping function based on discrete sample points, the method comprising: converting the time series data after model construction into the image representation required for solving the threshold in the present disclosure. It should be noted that the image is not actually drawn here, but the converted data sequence can express the characteristics of the image; through clustering and image fitting algorithms, a mapping function of the image based on the sample sequence is obtained.
本公开第三方面提供了一种基于曲线图像计算的阈值自动学习计算方法,所述方法包括:通过对上述步骤所得函数曲线的斜率变化率求解,结合指标的业务特性,获得阈值智能识别结果。The third aspect of the present disclosure provides a threshold automatic learning calculation method based on curve image calculation, the method comprising: solving the slope change rate of the function curve obtained in the above steps, combining the business characteristics of the indicator, to obtain the threshold intelligent recognition result.
本公开第四方面提供了一种基于曲线图像计算的阈值自动学习装置,所述装置包括:数据实时聚合模块,设置为针对运维系统多维度系统每个实体节点的关键KPI指标进行实时数据清洗及指标聚合;阈值智能识别模块,设置为执行上述步骤中的方法。The fourth aspect of the present disclosure provides a threshold automatic learning device based on curve image calculation, the device comprising: a real-time data aggregation module, configured to perform real-time data cleaning and indicator aggregation for key KPI indicators of each entity node of the multi-dimensional system of the operation and maintenance system; a threshold intelligent identification module, configured to execute the method in the above steps.
本公开第五方面提供了一种电子设备,所述电子设备包括计算机处理器以及存储器:计算机存储器设置为存储计算机程序;A fifth aspect of the present disclosure provides an electronic device, the electronic device comprising a computer processor and a memory: the computer memory is configured to store a computer program;
处理器设置为实现上述第一方面所述的模型构建单元所实现的功能,以及执行上述第二方面所述的一种基于离散样本点求解映射函数的方法和第三方面所述的一种基于曲线图像计算的阈值自动学习计算方法。 The processor is configured to implement the functions implemented by the model building unit described in the first aspect above, and to execute a method for solving a mapping function based on discrete sample points described in the second aspect above and a threshold automatic learning calculation method based on curve image calculation described in the third aspect.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个可读存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present disclosure, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a readable storage medium (such as ROM/RAM, a magnetic disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of each embodiment of the present disclosure.
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。In an exemplary embodiment, the above-mentioned computer-readable storage medium may include, but is not limited to: a USB flash drive, a read-only memory (ROM), a random access memory (RAM), a mobile hard disk, a magnetic disk or an optical disk, and other media that can store computer programs.
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary implementation modes, and this embodiment will not be described in detail herein.
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。An embodiment of the present disclosure further provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:Optionally, in this embodiment, the processor may be configured to perform the following steps through a computer program:
S1,获取目标指标对应的聚合指标数据;S1, obtain the aggregated indicator data corresponding to the target indicator;
S2,从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;S2, determining an indicator data set from the aggregated indicator data, sorting the first indicator data in the indicator data set to obtain second indicator data, clustering the second indicator data to obtain a plurality of clustered groups, and fitting the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitoring object;
S3,根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。S3, determining an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。In an exemplary embodiment, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary implementation modes, and this embodiment will not be described in detail herein.
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above modules or steps of the present disclosure can be implemented by a general computing device, they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices, they can be implemented by a program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, the steps shown or described can be executed in a different order than here, or they can be made into individual integrated circuit modules, or multiple modules or steps therein can be made into a single integrated circuit module for implementation. Thus, the present disclosure is not limited to any specific combination of hardware and software.
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。 The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the principles of the present disclosure shall be included in the protection scope of the present disclosure.

Claims (15)

  1. 一种指标阈值的确定方法,包括:A method for determining an indicator threshold, comprising:
    获取目标指标对应的聚合指标数据;Get the aggregated indicator data corresponding to the target indicator;
    从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;Determine an indicator data set from the aggregated indicator data, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain a plurality of clustered groups, and fit the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitoring object;
    根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。An indicator threshold is determined from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  2. 根据权利要求1所述的指标阈值的确定方法,其中,获取目标指标对应的聚合指标数据,包括:The method for determining an indicator threshold according to claim 1, wherein obtaining aggregate indicator data corresponding to the target indicator comprises:
    确定预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据和所述目标指标对应的时间聚合粒度;Determine a preset monitoring dimension, a monitoring object of the monitoring dimension, an indicator category of the target indicator, initial indicator data under the indicator category, and a time aggregation granularity corresponding to the target indicator;
    根据所述预先设置的监测维度、所述监测维度的监测对象、所述目标指标的指标类别、所述指标类别下的初始指标数据确定待聚合指标数据;Determine the indicator data to be aggregated according to the preset monitoring dimension, the monitoring object of the monitoring dimension, the indicator category of the target indicator, and the initial indicator data under the indicator category;
    按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据。The indicator data to be aggregated is aggregated according to the time aggregation granularity corresponding to the target indicator to obtain aggregated indicator data corresponding to the target indicator.
  3. 根据权利要求2所述的指标阈值的确定方法,其中,按照所述目标指标对应的时间聚合粒度对所述待聚合指标数据进行聚合,得到所述目标指标对应的聚合指标数据,包括:The method for determining the indicator threshold according to claim 2, wherein the to-be-aggregated indicator data is aggregated according to the time aggregation granularity corresponding to the target indicator to obtain the aggregated indicator data corresponding to the target indicator, comprising:
    获取所述待聚合指标数据的第一时间粒度;Obtaining a first time granularity of the indicator data to be aggregated;
    在确定所述第一时间粒度小于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,在将多个第一时间粒度聚合为所述时间聚合粒度;In the case of determining that the first time granularity is smaller than the time aggregation granularity, obtaining first indicator data of the indicator data to be aggregated within the first time granularity, and aggregating multiple first time granularities into the time aggregation granularity;
    将所述多个第一时间粒度内的多个第一指标数据聚合为所述时间聚合粒度内的第一聚合指标数据,将所述第一聚合指标数据确定为所述目标指标对应的聚合指标数据。Aggregate the multiple first indicator data within the multiple first time granularities into first aggregated indicator data within the time aggregation granularity, and determine the first aggregated indicator data as the aggregated indicator data corresponding to the target indicator.
  4. 根据权利要求3所述的指标阈值的确定方法,其中,所述方法还包括:The method for determining an indicator threshold according to claim 3, wherein the method further comprises:
    在确定所述第一时间粒度等于所述时间聚合粒度的情况下,获取待聚合指标数据在所述第一时间粒度内的第一指标数据,将所述第一指标数据确定为所述目标指标对应的聚合指标数据。When it is determined that the first time granularity is equal to the time aggregation granularity, first indicator data of the indicator data to be aggregated within the first time granularity is obtained, and the first indicator data is determined as the aggregation indicator data corresponding to the target indicator.
  5. 根据权利要求1所述的指标阈值的确定方法,其中,在对所述第二指标数据进行聚类之前,所述方法还包括:The method for determining an indicator threshold according to claim 1, wherein, before clustering the second indicator data, the method further comprises:
    对所述第二指标数据进行标准化处理,得到多个标准化指标数值,其中,每一个标准化指标数值对应有排序序号;Performing standardization processing on the second indicator data to obtain a plurality of standardized indicator values, wherein each standardized indicator value corresponds to a sorting sequence number;
    对于每一个标准化指标数值,将所述标准化指标数值对应的排序序号确定为横坐标,将所述标准化指标数值确定为纵坐标,得到所述标准化指标数值对应的坐标点; For each standardized indicator value, the sorting sequence number corresponding to the standardized indicator value is determined as the horizontal coordinate, and the standardized indicator value is determined as the vertical coordinate to obtain the coordinate point corresponding to the standardized indicator value;
    确定出两两相邻的所述坐标点之间的坐标斜率,得到多个坐标斜率,对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值;Determine the coordinate slopes between the two adjacent coordinate points to obtain a plurality of coordinate slopes, and for each of the plurality of coordinate slopes, determine a smoothing value of each of the coordinate slopes to obtain a plurality of smoothing values;
    根据所述多个平滑值确定第三指标数据,并将所述第三指标数据确定为更新后的所述第二指标数据。A third indicator data is determined according to the plurality of smoothed values, and the third indicator data is determined as the updated second indicator data.
  6. 根据权利要求5所述的指标阈值的确定方法,其中,在对于所述多个坐标斜率中的每一个坐标斜率,确定所述每一个坐标斜率的平滑值,得到多个平滑值的过程中,包括:The method for determining the indicator threshold according to claim 5, wherein, in the process of determining the smoothed value of each of the multiple coordinate slopes to obtain the multiple smoothed values, the method comprises:
    按照预设聚类算法对将所述多个坐标斜率进行聚类,得到多组斜率值;Clustering the multiple coordinate slopes according to a preset clustering algorithm to obtain multiple groups of slope values;
    对于每一组斜率值,将每一组斜率值的坐标斜率的均值确定为所述每一组斜率值的坐标斜率的平滑值。For each set of slope values, the mean value of the coordinate slopes of each set of slope values is determined as the smoothed value of the coordinate slopes of each set of slope values.
  7. 根据权利要求6所述的指标阈值的确定方法,其中,所述方法还包括:The method for determining an indicator threshold according to claim 6, wherein the method further comprises:
    在确定所述多组斜率值中存在目标组斜率值的情况下,将与所述目标组斜率值相邻的临近组斜率值的坐标斜率的平滑值确定为所述目标组斜率值内的坐标斜率的平滑值,或者按照预设平滑值确定所述目标组斜率值内的坐标斜率的平滑值,其中,所述目标组斜率值内的坐标斜率的数量与所述每一组斜率值内的坐标斜率的数量不同。When it is determined that there is a target group slope value among the multiple groups of slope values, the smoothed value of the coordinate slope of the adjacent group slope value adjacent to the target group slope value is determined as the smoothed value of the coordinate slope within the target group slope value, or the smoothed value of the coordinate slope within the target group slope value is determined according to a preset smoothing value, wherein the number of coordinate slopes within the target group slope value is different from the number of coordinate slopes within each group of slope values.
  8. 根据权利要求1所述的指标阈值的确定方法,其中,在对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数之后,所述方法还包括:The method for determining the indicator threshold according to claim 1, wherein, after clustering the second indicator data to obtain a plurality of clustered groups, and fitting the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, the method further comprises:
    根据所述每个分组的指标数据的均值得到多组均值,并根据所述多组均值确定均值集合,其中,所述均值集合包括各个分段函数对应的均值;Obtaining multiple groups of means according to the mean values of the indicator data of each group, and determining a mean set according to the multiple groups of means, wherein the mean set includes the means corresponding to each piecewise function;
    确定所述分段函数的交点坐标集合的交点坐标,确定所述交点坐标对应的左导数和右导数,并确定所述左导数在所述均值集合内对应的第一分段函数的第一均值和所述右导数在所述均值集合内对应的第二分段函数的第二均值;Determine the intersection coordinates of the intersection coordinate set of the piecewise functions, determine the left derivative and the right derivative corresponding to the intersection coordinates, and determine a first mean of the first piecewise function corresponding to the left derivative in the mean set and a second mean of the second piecewise function corresponding to the right derivative in the mean set;
    基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内。A determination is made based on the first mean and the second mean whether to keep the intersection point coordinates within the indicator threshold set.
  9. 根据权利要求8所述的指标阈值的确定方法,其中,基于所述第一均值和所述第二均值确定是否将所述交点坐标保留在所述指标阈值集合内,包括:The method for determining an indicator threshold according to claim 8, wherein determining whether to keep the intersection coordinates within the indicator threshold set based on the first mean and the second mean comprises:
    确定所述交点坐标与原点坐标之间的第一绝对距离值;Determine a first absolute distance value between the intersection point coordinates and the origin point coordinates;
    根据所述第一绝对距离值和所述第一均值确定所述第一均值对应的第一坐标信息,其中,所述第一坐标信息表示所述第一分段函数的自变量取值;Determining first coordinate information corresponding to the first mean value according to the first absolute distance value and the first mean value, wherein the first coordinate information represents the value of the independent variable of the first piecewise function;
    根据所述第一绝对距离值和所述第二均值确定所述第二均值对应的第二坐标信息,其中,所述第二坐标信息表示所述第二分段函数的自变量取值;Determining second coordinate information corresponding to the second mean value according to the first absolute distance value and the second mean value, wherein the second coordinate information represents the value of the independent variable of the second piecewise function;
    在确定所述第一坐标信息与所述第二坐标信息相同的情况下,将所述交点坐标保留在所述指标阈值集合内;When it is determined that the first coordinate information is the same as the second coordinate information, retaining the intersection coordinates within the indicator threshold set;
    在确定所述第一坐标信息与所述第二坐标信息不同的情况下,将所述交点坐标保留在所 述指标阈值集合内。When it is determined that the first coordinate information is different from the second coordinate information, the intersection coordinates are retained in the The indicator threshold set.
  10. 根据权利要求1所述的指标阈值的确定方法,其中,根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值,包括:The method for determining an indicator threshold according to claim 1, wherein determining the indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator comprises:
    确定出所述分段函数的不可导点集合和二阶导数为目标值的坐标点;Determine a set of non-differentiable points of the piecewise function and coordinate points whose second-order derivatives are target values;
    基于所述不可导点集合、所述二阶导数为目标值的坐标点和所述分段函数的交点坐标集合确定出所述指标阈值集合;Determine the indicator threshold set based on the non-differentiable point set, the coordinate point where the second-order derivative is the target value, and the intersection coordinate set of the piecewise function;
    根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值。An indicator threshold is determined from an indicator threshold set according to the indicator bias of the target indicator.
  11. 根据权利要求1所述的指标阈值的确定方法,其中,根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值,包括:The method for determining an indicator threshold according to claim 1, wherein determining the indicator threshold from the indicator threshold set according to the indicator bias of the target indicator comprises:
    在确定所述目标指标的指标偏向性为负向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值;In the case where it is determined that the indicator bias of the target indicator is negative, if it is determined that the type of the indicator threshold is an alarm threshold, a maximum value in the indicator threshold set is determined as the indicator threshold;
    如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值。If it is determined that the type of the indicator threshold is a preferential threshold, the minimum value in the indicator threshold set is determined as the indicator threshold.
  12. 根据权利要求1所述的指标阈值的确定方法,其中,根据所述目标指标的指标偏向性从指标阈值集合内确定出指标阈值,包括:The method for determining an indicator threshold according to claim 1, wherein determining the indicator threshold from the indicator threshold set according to the indicator bias of the target indicator comprises:
    在确定所述目标指标的指标偏向性为正向的情况下,如果确定所述指标阈值的类型为告警阈值,则将所述指标阈值集合内的最小值确定为所述指标阈值;In the case where it is determined that the indicator bias of the target indicator is positive, if it is determined that the type of the indicator threshold is an alarm threshold, a minimum value in the indicator threshold set is determined as the indicator threshold;
    如果确定所述指标阈值的类型为择优阈值,则将所述指标阈值集合内的最大值确定为所述指标阈值。If it is determined that the type of the indicator threshold is a preferential threshold, the maximum value in the indicator threshold set is determined as the indicator threshold.
  13. 一种指标阈值的确定装置,包括:A device for determining an indicator threshold, comprising:
    获取模块,设置为获取目标指标对应的聚合指标数据;An acquisition module is configured to obtain the aggregated indicator data corresponding to the target indicator;
    第一确定模块,设置为从所述聚合指标数据中确定出指标数据集合,对所述指标数据集合内的第一指标数据进行排序,得到第二指标数据,对所述第二指标数据进行聚类,得到聚类后的多个分组,并对所述多个分组的每个分组的指标数据进行拟合,得到所述每个分组对应的分段函数,其中,同一指标数据集合表示同一监测对象的指标数据;A first determination module is configured to determine an indicator data set from the aggregated indicator data, sort the first indicator data in the indicator data set to obtain second indicator data, cluster the second indicator data to obtain a plurality of clustered groups, and fit the indicator data of each of the plurality of groups to obtain a piecewise function corresponding to each group, wherein the same indicator data set represents indicator data of the same monitored object;
    第二确定模块,设置为根据所述目标指标的指标偏向性从所述分段函数的交点坐标集合内确定出指标阈值。The second determination module is configured to determine an indicator threshold from the intersection coordinate set of the piecewise function according to the indicator bias of the target indicator.
  14. 一种计算机可读的存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至12任一项中所述的方法。A computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the method described in any one of claims 1 to 12 when run.
  15. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至12任一项中所述的方法。 An electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the method described in any one of claims 1 to 12 through the computer program.
PCT/CN2023/110331 2022-09-30 2023-07-31 Indicator threshold determination method and apparatus, storage medium, and electronic apparatus WO2024066720A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211225305.3A CN117875746A (en) 2022-09-30 2022-09-30 Method and device for determining index threshold, storage medium and electronic device
CN202211225305.3 2022-09-30

Publications (1)

Publication Number Publication Date
WO2024066720A1 true WO2024066720A1 (en) 2024-04-04

Family

ID=90475973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110331 WO2024066720A1 (en) 2022-09-30 2023-07-31 Indicator threshold determination method and apparatus, storage medium, and electronic apparatus

Country Status (2)

Country Link
CN (1) CN117875746A (en)
WO (1) WO2024066720A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127455A1 (en) * 2013-11-06 2015-05-07 Globys, Inc. Automated entity classification using usage histograms & ensembles
CN110489306A (en) * 2019-08-26 2019-11-22 北京博睿宏远数据科技股份有限公司 A kind of alarm threshold value determines method, apparatus, computer equipment and storage medium
CN111985815A (en) * 2020-08-21 2020-11-24 国网能源研究院有限公司 Method and device for screening energy and power operation evaluation indexes
CN114780371A (en) * 2022-05-10 2022-07-22 平安科技(深圳)有限公司 Pressure measurement index analysis method, device, equipment and medium based on multi-curve fitting

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127455A1 (en) * 2013-11-06 2015-05-07 Globys, Inc. Automated entity classification using usage histograms & ensembles
US20150127454A1 (en) * 2013-11-06 2015-05-07 Globys, Inc. Automated entity classification using usage histograms & ensembles
CN110489306A (en) * 2019-08-26 2019-11-22 北京博睿宏远数据科技股份有限公司 A kind of alarm threshold value determines method, apparatus, computer equipment and storage medium
CN111985815A (en) * 2020-08-21 2020-11-24 国网能源研究院有限公司 Method and device for screening energy and power operation evaluation indexes
CN114780371A (en) * 2022-05-10 2022-07-22 平安科技(深圳)有限公司 Pressure measurement index analysis method, device, equipment and medium based on multi-curve fitting

Also Published As

Publication number Publication date
CN117875746A (en) 2024-04-12

Similar Documents

Publication Publication Date Title
WO2020077672A1 (en) Method and device for training service quality evaluation model
EP4024261A1 (en) Model training method, apparatus, and system
CN107038167A (en) Big data excavating analysis system and its analysis method based on model evaluation
CN109446783B (en) Image recognition efficient sample collection method and system based on machine crowdsourcing
CN114298863A (en) Data acquisition method and system of intelligent meter reading terminal
WO2021057382A1 (en) Abnormality detection method and apparatus, terminal, and storage medium
CN113408087B (en) Substation inspection method based on cloud side system and video intelligent analysis
CN108243435B (en) Parameter optimization method and device in LTE cell scene division
WO2021007871A1 (en) Alumina production operation optimization system and method employing cloud-side collaboration
US20210065021A1 (en) Working condition state modeling and model correcting method
WO2021103823A1 (en) Model update system, model update method, and related device
CN109494757B (en) Voltage reactive power operation early warning method and system
CN112365366A (en) Micro-grid management method and system based on intelligent 5G slice
CN110555619A (en) Power supply capacity evaluation method based on intelligent power distribution network
CN107666403B (en) Index data acquisition method and device
WO2024066720A1 (en) Indicator threshold determination method and apparatus, storage medium, and electronic apparatus
CN110647086B (en) Intelligent operation and maintenance monitoring system based on operation big data analysis
CN111343271A (en) DTU (data transfer unit) equipment-based artificial intelligence cloud control method and system
CN108123436B (en) Voltage out-of-limit prediction model based on principal component analysis and multiple regression algorithm
CN112860768B (en) Electromagnetic spectrum available frequency recommendation method
CN116016150A (en) Early warning method and device for physical network, electronic equipment and storage medium
CN115328870A (en) Data sharing method and system for cloud manufacturing
CN115689320A (en) Health management method and device for base station operation and computer readable storage medium
CN113727092A (en) Video monitoring quality inspection method and device based on decision tree
CN106980925B (en) Regional power grid dispatching method based on big data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23869966

Country of ref document: EP

Kind code of ref document: A1