CN115130606A - KPI time sequence detection method and related device - Google Patents

KPI time sequence detection method and related device Download PDF

Info

Publication number
CN115130606A
CN115130606A CN202210857783.XA CN202210857783A CN115130606A CN 115130606 A CN115130606 A CN 115130606A CN 202210857783 A CN202210857783 A CN 202210857783A CN 115130606 A CN115130606 A CN 115130606A
Authority
CN
China
Prior art keywords
training set
time sequence
similarity
feature vector
kpi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210857783.XA
Other languages
Chinese (zh)
Inventor
李亚鹏
张渝
王汪
钱橙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210857783.XA priority Critical patent/CN115130606A/en
Publication of CN115130606A publication Critical patent/CN115130606A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a KPI time sequence detection method and a related device, which are used for improving the general generalization capability of the KPI time sequence detection method. The method comprises the following steps: acquiring a first KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected; extracting data characteristics of a first KPI time sequence to generate a test characteristic vector, and extracting data characteristics of a KPI time sequence set to generate a first training set; acquiring a positive sample set and a negative sample set in a first training set; sampling according to the positive sample set and the negative sample set to obtain a second training set; training a random forest algorithm by using a second training set to obtain an anomaly detection model; inputting the second training set and the test feature vector into an anomaly detection model to calculate to obtain a similarity matrix; and determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive type sample and the negative type sample. The method and the device can be applied to the fields of cloud technology, artificial intelligence and big data.

Description

KPI time sequence detection method and related device
Technical Field
The application relates to the technical field of operation and maintenance monitoring, in particular to a KPI time sequence detection method and a related device.
Background
For a long time, Key Performance Indicator (KPI) anomaly detection and alarm are popular topics studied in academic and industrial fields. Particularly in the internet field, the fluctuation of data such as product behavior logs, application information, polaris indexes, use amount of function points and the like needs to be monitored in time every day. And once the KPI time sequence has abnormal conditions, alarming related personnel at the first time and adjusting. The KPI time sequence is also an index which needs to be concerned by products, operation and developers at any moment, and whether the product system, the operation strategy and the like are abnormal or not can be analyzed in time through the fluctuation of the KPI time sequence.
The current process of abnormal detection of KPI time sequence is as follows: and (4) judging to adopt other methods such as a ring ratio or an equal ratio and the like by combining the data characteristics of different KPI time sequences. This scheme requires a threshold for abnormal alarms to be determined in advance, and an abnormality is identified only when the fluctuations exceed the threshold.
The determination of the threshold requires long-term experience accumulation of operators, the threshold of each KPI time sequence is not completely the same, and the operation and maintenance personnel need to deploy a specific algorithm one by one for each KPI time sequence and set different alarm thresholds, so the current abnormality detection scheme usually has good abnormality detection effect only for the KPI time sequences of specific types, but lacks good general generalization capability.
Disclosure of Invention
The embodiment of the application provides a KPI time sequence detection method and a related device, which are used for improving the general generalization capability of the KPI time sequence detection method.
In view of the above, an aspect of the present application provides a KPI time series detection method, including: acquiring a first key performance indicator KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected; extracting data characteristics of the first KPI time sequence to generate a test characteristic vector, and extracting data characteristics of each KPI time sequence in the KPI time sequence set to generate a first training set; acquiring a positive type sample set and a negative type sample set in the first training set, wherein the positive type sample set comprises KPI time sequence samples at normal time, and the negative type sample set comprises KPI time sequence samples at abnormal time; sampling the positive type sample set and the negative type sample set to obtain a second training set, wherein the number of the positive type samples in the second training set is the same as that of the negative type samples; training a random forest algorithm by using the second training set to obtain an anomaly detection model; inputting the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix corresponding to the second training set and the test feature vector; and determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive samples in the second training set and the negative samples in the second training set.
Another aspect of the present application provides a detection apparatus, including: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first key performance indicator KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected;
the feature extraction module is used for extracting data features of the first KPI time sequence to generate a test feature vector, and extracting data features of each KPI time sequence in the KPI time sequence set to generate a first training set;
the second acquisition module is used for acquiring a positive type sample set and a negative type sample set in the first training set, wherein the positive type sample set comprises KPI time sequence samples at normal time, and the negative type sample set comprises KPI time sequence samples at abnormal time;
the sampling module is used for sampling the positive type sample set and the negative type sample set to obtain a second training set, and the number of the positive type samples in the second training set is the same as that of the negative type samples;
the training module is used for training a random forest algorithm by using the second training set to obtain an abnormal detection model;
the detection module is used for inputting the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix corresponding to the second training set and the test feature vector; and determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive samples in the second training set and the negative samples in the second training set.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the feature extraction module is specifically configured to extract statistical features, fitting features, and original features of the first KPI time series to generate the test feature vector;
the feature extraction module is specifically configured to extract statistical features, fitting features, and original features of each KPI time sequence in the KPI time sequence set to generate the first training set.
In a possible design, in another implementation manner of another aspect of the embodiment of the present application, the sampling module is further configured to sample the positive type sample set and the negative type sample set to obtain a third training set, where the number of positive type samples in the third training set is the same as the number of negative type samples;
the training module is further used for training a random forest algorithm by using the third training set to update the anomaly detection model;
the detection module is further configured to input the third training set and the test feature vector into the updated anomaly detection model and output a second detection result of the first KPI time sequence, where the first detection result and the second detection result are used as a detection result set; and repeating the steps until the number of the detection results in the detection result set reaches a preset number, and determining the final detection result of the first KPI time sequence according to the detection results in the detection result set.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the detection module is specifically configured to
Respectively calculating a first similarity of the positive samples in the second training set and a second similarity of the negative samples in the second training set according to the similarity matrix;
determining a first classification result of the test feature vector according to the first similarity and the second similarity;
and determining a first detection result of the first KPI time sequence according to the first classification result.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the detection module is specifically configured to input the second training set and the test feature vector into the anomaly detection model, and calculate a first ratio of a first sample in the second training set and a first sample corresponding to the test feature vector falling into a same leaf node in the anomaly detection model, where the first ratio is used as a similarity between the first sample in the second training set and the first sample corresponding to the test feature vector;
inputting the second training set and the test feature vector into the anomaly detection model to calculate a second proportion value of a same leaf node of a second sample in the second training set and a second sample corresponding to the test feature vector falling into the anomaly detection model, wherein the second proportion value is used as the similarity of the second sample in the second training set and the second sample corresponding to the test feature vector;
and repeating the steps until the similarity between each sample in the second training set and each sample corresponding to the test feature vector is obtained through traversal calculation, and summarizing the similarity between each sample in the second training set and each sample corresponding to the test feature vector to obtain the similarity matrix.
In a possible design, in another implementation manner of another aspect of the embodiment of the present application, the detecting module is specifically configured to determine a similarity set corresponding to a positive type sample in the second training set from the similarity matrix, and sum up similarities in the similarity set corresponding to the positive type sample to obtain the first similarity;
and determining a similarity set corresponding to the negative type samples in the second training set from the similarity matrix, and summing all the similarities in the similarity set corresponding to the negative type samples to obtain the second similarity.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the detection module is specifically configured to determine that a first classification result of the test feature vector is normal when the first similarity is greater than the second similarity;
and when the first similarity is smaller than the second similarity, determining that a first classification result of the test feature vector is abnormal.
In a possible design, in another implementation manner of another aspect of the embodiment of the present application, the detection module is specifically configured to obtain a first numerical value and a second numerical value according to detection results in the detection result set, where the first numerical value is a number of detection results indicating that the first KPI time series is a normal KPI time series, and the second numerical value is a number of detection results indicating that the first KPI time series is an abnormal KPI time series;
when the first numerical value is larger than the second numerical value, determining that the first KPI time series is a normal time series;
and when the first numerical value is smaller than the second numerical value, determining that the first KPI time series is an abnormal time series.
Another aspect of the present application provides a computer device, comprising: a memory, a processor, and a bus system;
wherein, the memorizer is used for storing the procedure;
a processor for executing the program in the memory, the processor for performing the above-described aspects of the method according to instructions in the program code;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
Another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.
In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.
According to the technical scheme, the embodiment of the application has the following advantages: sampling is carried out in the normal sequence samples and the abnormal sequence samples to generate a balance data set, and then the sampled balance data set is used as training data of a random forest algorithm. The method avoids the phenomenon that the original unbalanced data set is directly used as a training set, so that the unbalanced degree of the data set is aggravated by a random forest algorithm in the sampling process, the diversity of data is favorably improved, the difference among different models is enriched, and the generalization performance of the KPI time sequence anomaly detection algorithm is improved.
Drawings
FIG. 1 is a schematic diagram of an architecture of a system implementing an embodiment of the present application;
FIG. 2 is a schematic diagram of an embodiment of a KPI time sequence detection method in the embodiment of the present application;
FIG. 3 is a schematic view of an embodiment of a detection device in the embodiment of the present application;
FIG. 4 is a schematic view of another embodiment of the detecting device in the embodiment of the present application;
fig. 5 is a schematic view of another embodiment of the detection device in the embodiment of the present application.
Detailed Description
The embodiment of the application provides a KPI time sequence detection method and a related device, which are used for improving the general generalization capability of the KPI time sequence detection method.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For the convenience of understanding, some terms in the embodiments of the present application are explained below:
KPI refers to a special time sequence with practical application significance obtained by timing sampling, such as website unit time access times, product Daily Active User number (DAU), unit time transaction amount, transaction amount in a transaction service system, website browsing amount of a website service system, and the like.
For a long time, Key Performance Indicator (KPI) anomaly detection alarms are popular topics studied in academic and industrial fields. Particularly in the field of internet, the fluctuation of data such as product behavior logs, application information, polaris indexes, use amount of function points and the like needs to be monitored in time every day. And once the KPI time sequence has abnormal conditions, alarming related personnel at the first time and adjusting. The KPI time sequence is also an index which needs to be concerned by products, operation and development personnel at any moment, and whether the product system, the operation strategy and the like are abnormal or not can be analyzed in time through the fluctuation of the KPI time sequence. The current process of abnormal detection of KPI time sequence is as follows: and (4) judging to adopt other methods such as a ring ratio or an equal ratio and the like by combining the data characteristics of different KPI time sequences. This scheme requires a threshold for an abnormal alarm to be determined in advance, and an abnormality is identified only when the fluctuation exceeds the threshold. The determination of the threshold requires long-term experience accumulation of operators, the threshold of each KPI time sequence is not completely the same, and the operation and maintenance personnel need to deploy a specific algorithm one by one for each KPI time sequence and set different alarm thresholds, so the current abnormality detection scheme usually has good abnormality detection effect only for the KPI time sequences of specific types, but lacks good general generalization capability.
In order to solve the technical problem, an embodiment of the present application provides a KPI time series detection method, including: acquiring a first key performance indicator KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected; extracting data characteristics of the first KPI time sequence to generate a test characteristic vector, and extracting data characteristics of each KPI time sequence in the KPI time sequence set to generate a first training set; acquiring a positive type sample set and a negative type sample set in the first training set, wherein the positive type sample set comprises KPI time sequence samples at normal time, and the negative type sample set comprises KPI time sequence samples at abnormal time; sampling the positive type sample set and the negative type sample set to obtain a second training set, wherein the number of the positive type samples in the second training set is the same as that of the negative type samples; training a random forest algorithm by using the second training set to obtain an anomaly detection model; inputting the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix corresponding to the second training set and the test feature vector; and determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive samples in the second training set and the negative samples in the second training set.
The method provided by the present application can be applied to a system architecture as shown in fig. 1, and as shown in the figure, the system includes a terminal device and a detection apparatus, where the terminal device may be set as an enterprise device or a device running enterprise software. The enterprise software may run on the terminal device in the form of a browser, may run on the terminal device in the form of an independent Application (APP), and the specific presentation form of the enterprise software is not limited herein. The detection device can be independent of the terminal equipment and performs data interaction with the terminal equipment in the form of a server; or, the detection device is integrated in the terminal device, and performs data interaction with the terminal device in the form of a client; or, the detection device may be independent of the terminal device, and perform data interaction with the terminal device in the form of the terminal device. When the detection device and the terminal equipment realize the method provided by the application, the detection device acquires a KPI time sequence generated when the terminal equipment operates as enterprise equipment or acquires a KPI time sequence generated when the terminal equipment operates enterprise software, and then the detection device detects whether the KPI time sequence is an abnormal time sequence. The server related to the application can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, safety service, Content Delivery Network (CDN), big data and an artificial intelligence platform. The terminal device may be a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a smart television, a smart watch, a vehicle-mounted device, a wearable device, and the like, but is not limited thereto. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The number of servers and terminal devices is also not limited. The scheme provided by the application can be independently completed by the terminal device, and can also be completed by the terminal device and the server in a matching manner, which is not specifically limited.
It is understood that in the specific implementation of the present application, the data related to KPI time series and the like are involved, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.
With reference to fig. 2, an embodiment of the KPI time series detection method in the embodiment of the present application includes:
201. acquiring a first key performance indicator KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected.
During the running period of the device or the application program, the detection device may perform timing sampling on the KPI time series of the device or the application program, that is, the detection device acquires the first KPI time series of the time point to be detected, and detects the first KPI time series of the timing sampling. In this embodiment, the detection device needs to acquire the KPI time sequence set in the preset time period before the time point to be detected while acquiring the first KPI time sequence.
It will be appreciated that the set of KPI time-series is used to update the anomaly detection model once, and then the first KPI time-series is detected according to the updated anomaly detection model. The preset time period may be a time period close to the time point to be detected, for example, if the detection device needs to detect the KPI time sequence of the application program a at 30 am on 5 months of 7, then the preset time period may be 30 pm on 9 am on 5 months of 7. And the set of KPI time series may be a set of KPI time series generated by the application running during the above time period.
In an exemplary scenario, the first KPI time series may include a log of the application a running at 30 am, such as at 1000 visits for the application a, 1000 ten thousand visits for the application a online, 1000 transactions generated in the application a, etc.
202. And extracting the data characteristics of the first KPI time sequence to generate a test characteristic vector, and extracting the data characteristics of each KPI time sequence in the KPI time sequence set to generate a first training set.
The detection device extracts the characteristics of the first KPI time sequence to obtain data characteristics, and generates the test characteristic vector according to the data characteristics. Similarly, the detection device also performs feature extraction on each KPI time sequence in the KPI time sequence set by using the same feature extraction manner to obtain data features, and generates the first training set according to the data features of the KPI time sequence set.
Optionally, in order to enrich the feature quantity of the measurement feature vector and the sample in the first training set when the detection device performs feature extraction on the first KPI time sequence and the KPI time sequence set, the detection device may perform feature extraction on a plurality of attribute features such as statistical features, fitting features, and original features on the first KPI time sequence and the KPI time sequence, so as to construct a first training set using time features as independent variables and whether the time features are abnormal as dependent variables, and a test feature vector for which a detection result is not obtained. In this embodiment, the statistical characteristic may be used to indicate a data characteristic obtained through statistics of a maximum value in each data, a minimum value in each data, and the like. The fitting characteristic can be a data characteristic after characteristic integration statistics such as a variation trend of each data in the KPI time series and seasonal variation condition of each data. The original characteristics can be used to indicate values of each data in the KPI time series or characteristics of the user corresponding to the application program, such as the gender of the user, where the user is located, the version number of the application program, and so on.
203. And acquiring a positive type sample set and a negative type sample set in the first training set, wherein the positive type sample set comprises KPI time sequence samples at normal time, and the negative type sample set comprises KPI time sequence samples at abnormal time.
In this embodiment, the detection apparatus may mark each KPI time series sample in the first training set, where the label is used to indicate that the KPI time series is a normal time series or an abnormal time series. Therefore, the KPI time sequence sample marked with the label for indicating that the KPI time sequence is a normal time sequence is used as a positive sample in the first training set, and the KPI time sequence sample marked with the label for indicating that the KPI time sequence is an abnormal time sequence is used as a negative sample. Then the detection device summarizes the positive samples into a positive sample set and summarizes the negative samples into a negative sample set.
In an exemplary scheme, the first training set includes 100 samples, 90 positive samples and 10 negative samples, the set of positive samples includes 90 positive samples, and the set of negative samples includes 10 negative samples.
204. And sampling the positive sample set and the negative sample set to obtain a second training set, wherein the number of the positive samples in the second training set is the same as that of the negative samples.
In order to ensure the balance of the training samples, the detection device needs to keep the number of positive type samples and the number of negative type samples in the training set for training consistent. Therefore, in this embodiment, it is proposed to perform sampling from the positive type sample set and the negative type sample set the same number of times, and combine the sampled positive type sample and the sampled negative type sample to obtain the second training set.
In an exemplary scheme, the detecting apparatus determines that the number of samples in the second training set is 100, and then 50 positive samples in the positive sample set and 50 negative samples in the negative sample set are required to be sampled.
205. And training a random forest algorithm by using the second training set to obtain an anomaly detection model.
The detection device enables the sample data in the second training set to pass through a target model corresponding to a random forest algorithm, calculates loss according to a predicted classification result obtained by the sample data and real data corresponding to the sample data in the second training set, and then adjusts parameters of the target model reversely according to the loss to obtain the abnormal detection model. It can be understood that the training process for the anomaly detection model in this embodiment is the same as that in the prior art, and details are not described here.
206. And inputting the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix corresponding to the second training set and the test feature vector.
After the detection device obtains the abnormal detection model according to the training of the second training set, the second training set and the test feature vector are simultaneously input into the abnormal detection model, so that a first detection result of the first KPI time sequence is obtained.
In this embodiment, in order to ensure the accuracy of the detection result, the detection apparatus employs a similarity determination method, which may specifically be as follows: the detection device inputs the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix of each sample and the test feature vector in the second training set.
Optionally, a specific way of the detection apparatus in calculating the similarity matrix between each sample in the second training set and the test feature vector may be as follows: inputting the second training set and the test feature vector into the anomaly detection model, and calculating a first proportion value of a same leaf node of a first sample in the second training set and a first sample corresponding to the test feature vector, wherein the first proportion value is used as the similarity of the first sample in the second training set and the first sample corresponding to the test feature vector; inputting the second training set and the test feature vector into the anomaly detection model to calculate a second proportion value of a same leaf node of a second sample in the second training set and a second sample corresponding to the test feature vector, wherein the second proportion value is used as the similarity of the second sample in the second training set and the second sample corresponding to the test feature vector; and repeating the steps until the similarity between each sample in the second training set and each sample corresponding to the test feature vector is obtained through traversal calculation, and summarizing the similarity between each sample in the second training set and each sample corresponding to the test feature vector to obtain the similarity matrix. In an exemplary scheme, assuming that the second training set includes a sample a and a sample B, when the detection apparatus inputs the second training set and the test feature vector into the anomaly detection model for the first similarity calculation, the sample a and the test feature vector fall in the same leaf node of the anomaly detection model, and the sample B and the test feature vector fall in different leaf nodes of the anomaly detection model; when the detection device inputs the second training set and the test feature vector into the anomaly detection model for second similarity calculation, the sample A and the test feature vector fall on the same leaf node of the anomaly detection model, and the sample B and the test feature vector fall on different leaf nodes of the anomaly detection model; when the detection device inputs the second training set and the test feature vector into the anomaly detection model for the third similarity calculation, the sample A and the test feature vector fall on different leaf nodes of the anomaly detection model, and the sample B and the test feature vector fall on different leaf nodes of the anomaly detection model; when the detection device inputs the second training set and the test feature vector into the abnormality detection model for fourth similarity calculation, the sample a and the test feature vector fall on different leaf nodes of the abnormality detection model, and the sample B and the test feature vector fall on the same leaf node of the abnormality detection model; the calculation is repeated until a predetermined number of times (for example, 100 times) is calculated, and then the number of times (assuming 45 times) that the sample a and the test feature vector fall in the same leaf node of the anomaly detection model is counted, and a ratio of the number of times that the sample a and the test feature vector fall in the same leaf node of the anomaly detection model to the number of times of calculation is calculated (i.e., 45/100 is 0.45), and then the ratio value is used as the similarity between the sample a and the test feature vector. Similarly, the similarity between the sample B and the test feature vector can also be obtained. According to the analogy calculation, the similarity between each sample in the second training set and the test feature vector is calculated, and the similarity is summarized to obtain the similarity matrix.
207. And determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive samples in the second training set and the negative samples in the second training set.
In this embodiment, after the detection device calculates the similarity matrix, the detection device may respectively calculate a first similarity between the positive samples in the second training set and the test feature vector and a second similarity between the negative samples in the second training set and the test feature vector according to the similarity matrix; determining a first classification result of the test feature vector according to the first similarity and the second similarity; and determining a first detection result of the first KPI time sequence according to the first classification result.
After obtaining the similarity matrix, the specific way of the detection apparatus respectively calculating the first similarity between the positive samples in the second training set and the test feature vector and the second similarity between the negative samples in the second training set and the test feature vector according to the similarity matrix may be as follows: determining a similarity set corresponding to a positive sample in the second training set from the similarity matrix, and summing all similarities in the similarity set corresponding to the positive sample to obtain the first similarity; and determining a similarity set corresponding to the negative type samples in the second training set from the similarity matrix, and summing all the similarities in the similarity set corresponding to the negative type samples to obtain the second similarity. In an exemplary scenario, assuming that there are 50 positive samples in the second training set, 50 negative samples in the second training set, and the test feature vector is used for a KPI time series, the similarity matrix is a 100 × 1 matrix, i.e. it includes 100 similarity values. Then the first similarity of the positive type sample and the test feature vector is the sum of 50 similarities of the positive type sample, and the second similarity of the negative type sample and the test feature vector is the sum of 50 similarities of the negative type sample.
It is to be understood that, when the detecting device calculates the first similarity and the second similarity, it may also calculate in other manners, such as averaging or calculating a variance of the similarities between the positive sample and the test feature vector, which is not limited herein.
After the detection device obtains the first similarity and the second similarity, the classification result of the test feature vector is determined according to the first similarity and the second similarity, which may specifically be as follows: when the first similarity is larger than the second similarity, determining that a first classification result of the test feature vector is normal; and when the first similarity is smaller than the second similarity, determining that a first classification result of the test feature vector is abnormal. In an exemplary scheme, assuming that the first similarity between the positive class samples in the second training set and the test feature vector is 20, and the second similarity between the negative class samples in the second training set and the test feature vector is 21, the first classification result of the test feature vector is abnormal; assuming that the first similarity between the positive type samples in the second training set and the test feature vector is 20, and the second similarity between the negative type samples in the second training set and the test feature vector is 19, the first classification result of the test feature vector is normal.
In this embodiment, in order to improve the accuracy of the detection result of the first KPI time sequence, the detection apparatus may repeat the steps 203 to 206 to obtain a plurality of detection results, and finally obtain the final detection result of the first KPI time sequence according to the detection result. The specific operation can be as follows: acquiring a first numerical value and a second numerical value according to detection results in the detection result set, wherein the first numerical value is the number of detection results indicating that the first KPI time sequence is a normal KPI time sequence, and the second numerical value is the number of detection results indicating that the first KPI time sequence is an abnormal KPI time sequence; when the first numerical value is larger than the second numerical value, determining that the first KPI time series is a normal time series; and when the first numerical value is smaller than the second numerical value, determining that the first KPI time series is an abnormal time series. In an exemplary scheme, assuming that the detecting apparatus repeats the operations of the steps 203 to 206 100 times, the detecting apparatus may obtain 100 detection results for the first KPI time sequence, where the 100 detection results include a detection result for indicating that the first KPI time sequence is a normal time sequence and a detection result for indicating that the first KPI time sequence is an abnormal time sequence; assuming that the number of the detection results for indicating that the first KPI time-series is a normal time-series is 65 and the number of the detection results for indicating that the first KPI time-series is an abnormal time-series is 35, determining that the first KPI time-series is a normal time-series. And if the number of the detection results for indicating that the first KPI time sequence is a normal time sequence is 45 and the number of the detection results for indicating that the first KPI time sequence is an abnormal time sequence is 55, determining that the first KPI time sequence is an abnormal time sequence.
In this embodiment, a balance data set is generated by sampling in the normal sequence sample and the abnormal sequence sample, and then the sampled balance data set is used as training data of a random forest algorithm. The method avoids the phenomenon that the original unbalanced data set is directly used as a training set, so that the unbalanced degree of the data set is aggravated by a random forest algorithm in the sampling process, the diversity of data is favorably improved, the difference among different models is enriched, and the generalization performance of the KPI time sequence anomaly detection algorithm is improved. Meanwhile, the KPI time sequence is subjected to feature extraction such as statistical features, fitting features, original features and the like, and the features can well reflect the dispersion degree, the variation trend, the pre-and-post correlation, the implicit characteristics and the like of the KPI time sequence, so that effective data features are provided for the abnormal detection of the KPI time sequence, and the accuracy of KPI time sequence detection is improved. Furthermore, the random forest similarity matrix is used for measuring the sample similarity and used as an initial model learning device, and the integrated learning idea is combined, so that the classification results of the multiple similarity matrixes are summarized and output as a final result, and the accuracy of the classification result is effectively improved.
The following describes the beneficial effects of the method provided by the present application in a specific experimental procedure:
the five algorithms in the experiment are respectively a ring ratio algorithm and a homonymy algorithm based on fixed configuration, a time series prediction algorithm Prophet developed based on Facebook, an anomallydetection algorithm developed by twitter, an isolated forest algorithm and a Local Outlie Factor (LOF) algorithm. Wherein, the loop ratio algorithm, the same-ratio algorithm and the Prophet algorithm need to set threshold parameters in advance in KPI sequence abnormality detection. Setting a threshold value of the ring ratio algorithm to be 10%, namely if the difference ratio of the T moment and the T-1 moment exceeds 10%, determining the difference value as an abnormal value; the threshold of the algorithm of the same proportion is set to be 10%, namely if the proportion of the difference between the T moment and the T-T moment (T represents a 1-day time period) exceeds 10%, the comparison is regarded as an abnormal value; the threshold value is set to be 10% based on the Prophet algorithm, namely when the difference ratio of the Prophet predicted value and the KPI sequence actual value exceeds 10%, the abnormal value is considered.
Selecting a real internet KPI data set from a website, collecting KPI data of real scenes from a plurality of internet companies by the competition data set, judging and marking abnormal data points by professionals, and providing the abnormal data points after desensitization treatment. The interval between every two time points is 1 minute or 5 minutes, and three KPI sequences are selected as an evaluation data set to evaluate the classification performance of the algorithm for identifying abnormal values. And selecting three evaluation indexes of Accuracy (Accuracy), F1 value (F1-measure) and F2 value (F2-measure) to evaluate the performance of the algorithm by referring to Microsoft and other KPI anomaly detection algorithm documents. The test results are shown in tables 1, 2 and 3:
TABLE 1
Method Accuracy F 1 -measure F 2 -measure
Ring ratio algorithm 0.07 0.08 0.18
Geometric algorithm 0.05 0.08 0.18
AnomalyDetection algorithm 0.98 0.67 0.58
LOF algorithm 0.77 0.26 0.46
Isolated forest algorithm 0.97 0.72 0.79
Prophet threshold algorithm 0.88 0.4 0.61
The algorithm provided by the application 0.99 0.92 0.88
TABLE 2
Method Accuracy F 1 -measure F 2 -measure
Ring ratioAlgorithm 0.24 0.39 0.61
Geometric algorithm 0.24 0.39 0.61
AnomalyDetection algorithm 0.80 0.43 0.35
LOF algorithm 0.73 0.43 0.43
Isolated forest algorithm 0.75 0.36 0.26
Prophet threshold algorithm 0.7 0.45 0.49
The algorithm provided by the application 0.73 0.58 0.68
TABLE 3
Method Accuracy F 1 -measure F 2 -measure
Ring ratio algorithm 0.14 0.25 0.46
Geometric algorithm 0.14 0.25 0.46
AnomalyDetection algorithm 0.86 0.06 0.04
LOF algorithm 0.66 0.39 0.54
Isolated forest algorithm 0.76 0.38 0.44
Prophet threshold algorithm 0.56 0.29 0.42
The algorithm provided in the present application 0.91 0.60 0.51
It will be appreciated that in tables 1, 2 and 3 above, the numbers are bolded to indicate that the algorithm performs best in this comparison of indications. By combining the results in table 1, table 2 and table 3, it can be seen that the detection and identification effects of the algorithm provided by the present application are better than those of the other five commonly used algorithms.
Referring to fig. 3, fig. 3 is a schematic diagram of an embodiment of the detecting device in the embodiment of the present application, and the detecting device 20 includes:
a first obtaining module 201, configured to obtain a first key performance indicator KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected;
a feature extraction module 202, configured to extract data features of the first KPI time sequence to generate a test feature vector, and extract data features of each KPI time sequence in the KPI time sequence set to generate a first training set;
a second obtaining module 203, configured to obtain a positive type sample set and a negative type sample set in the first training set, where the positive type sample set includes KPI time sequence samples at normal time, and the negative type sample set includes KPI time sequence samples at abnormal time;
a sampling module 204, configured to sample the positive type sample set and the negative type sample set to obtain a second training set, where the number of positive type samples in the second training set is the same as the number of negative type samples;
a training module 205, configured to train a random forest algorithm with the second training set to obtain an anomaly detection model;
a detection module 206, configured to input the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix corresponding to the second training set and the test feature vector; and determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive samples in the second training set and the negative samples in the second training set.
In the embodiment of the application, a detection device is provided. By adopting the device, the normal sequence samples and the abnormal sequence samples are sampled to generate a balance data set, and then the sampled balance data set is used as training data of a random forest algorithm. The method avoids the phenomenon that the original unbalanced data set is directly used as a training set, so that the unbalanced degree of the data set is aggravated by a random forest algorithm in the sampling process, and is favorable for improving the diversity of data, enriching the difference among different models and improving the generalization performance of the KPI time sequence anomaly detection algorithm.
Optionally, on the basis of the embodiment corresponding to fig. 3, in another embodiment of the detecting device 20 provided in the embodiment of the present application,
the feature extraction module 202 is specifically configured to extract statistical features, fitting features, and original features of the first KPI time sequence to generate the test feature vector;
the feature extraction module 202 is specifically configured to extract statistical features, fitting features, and original features of each KPI time sequence in the KPI time sequence set to generate the first training set.
In the embodiment of the application, a detection device is provided. By adopting the device, the KPI time sequence is subjected to feature extraction such as statistical features, fitting features, original features and the like, the features can well reflect the dispersion degree, the variation trend, the front-back association, the implicit characteristics and the like of the KPI time sequence, effective data features are provided for the abnormal detection of the KPI time sequence, and the accuracy of KPI time sequence detection is improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in another embodiment of the detection apparatus 20 provided in the embodiment of the present application, the sampling module 204 is further configured to sample the positive type sample set and the negative type sample set to obtain a third training set, where the number of positive type samples in the third training set is the same as the number of negative type samples;
the training module 205 is further configured to train a random forest algorithm to update the anomaly detection model by using the third training set;
the detection module 206 is further configured to input the third training set and the test feature vector into the updated anomaly detection model and output a second detection result of the first KPI time sequence, where the first detection result and the second detection result are used as a detection result set; and repeating the steps until the number of the detection results in the detection result set reaches a preset number, and determining the final detection result of the first KPI time sequence according to the detection results in the detection result set.
In the embodiment of the application, a detection device is provided. By adopting the device, independent repeated sampling is carried out in the normal sequence sample and the abnormal sequence sample to generate a balance data set, and then the sampled balance data set is used as training data of a random forest algorithm. The method avoids the phenomenon that the original unbalanced data set is directly used as a training set, so that the unbalanced degree of the data set is aggravated by a random forest algorithm in the sampling process, the diversity of data is favorably improved, the difference among different models is enriched, and the generalization performance of the KPI time sequence anomaly detection algorithm is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 3, in another embodiment of the detecting device 20 provided in the embodiment of the present application,
the detecting module 206 is specifically configured to calculate, according to the similarity matrix, a first similarity of the positive samples in the second training set and a second similarity of the negative samples in the second training set respectively;
determining a first classification result of the test feature vector according to the first similarity and the second similarity;
and determining a first detection result of the first KPI time sequence according to the first classification result.
In the embodiment of the application, a detection device is provided. By adopting the device, the similarity of the random forest similarity matrix measurement sample is used as an initial model learning device, and the classification results of the multiple similarity matrixes are summarized and output to the final result by combining the integrated learning idea, so that the accuracy of the classification result is effectively improved.
Alternatively, on the basis of the embodiment corresponding to fig. 3, in another embodiment of the detecting device 20 provided in the embodiment of the present application,
the detecting module 206 is specifically configured to input the second training set and the test feature vector into the anomaly detection model, and calculate a first ratio of a first sample in the second training set and a first sample corresponding to the test feature vector that fall in a same leaf node in the anomaly detection model, where the first ratio is used as a similarity between the first sample in the second training set and the first sample corresponding to the test feature vector;
inputting the second training set and the test feature vector into the anomaly detection model to calculate a second proportion value of a same leaf node of a second sample in the second training set and a second sample corresponding to the test feature vector, wherein the second proportion value is used as the similarity of the second sample in the second training set and the second sample corresponding to the test feature vector;
and repeating the steps until the similarity between each sample in the second training set and each sample corresponding to the test feature vector is obtained through traversal calculation, and summarizing the similarity between each sample in the second training set and each sample corresponding to the test feature vector to obtain the similarity matrix.
In the embodiment of the application, a detection device is provided. By adopting the device and combining the integrated learning idea, the similarity of a plurality of training samples and the test samples is integrated to obtain the similarity matrix, so that the accuracy of the classification result is effectively improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in another embodiment of the detection apparatus 20 provided in the embodiment of the present application, the detection module 206 is specifically configured to determine a similarity set corresponding to a positive type sample in the second training set from the similarity matrix, and sum up similarities in the similarity set corresponding to the positive type sample to obtain the first similarity;
and determining a similarity set corresponding to the negative type sample in the second training set from the similarity matrix, and summing all the similarities in the similarity set corresponding to the negative type sample to obtain the second similarity.
In the embodiment of the application, a detection device is provided. By adopting the device, the type of the positioning point to which each positioning point belongs can be judged according to the positioning speed information of each positioning point, so that the positioning point type of each positioning point can be determined through related calculation under the condition that the positioning point type cannot be directly acquired, and the feasibility and the operability of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in another embodiment of the detection apparatus 20 provided in the embodiment of the present application, the detection module 206 is specifically configured to determine that the first classification result of the test feature vector is normal when the first similarity is greater than the second similarity;
and when the first similarity is smaller than the second similarity, determining that a first classification result of the test feature vector is abnormal.
In the embodiment of the application, a detection device is provided. By adopting the device, the classification result is judged according to the similarity, so that the accuracy of the classification result is effectively improved.
Alternatively, on the basis of the embodiment corresponding to fig. 3, in another embodiment of the detecting device 20 provided in the embodiment of the present application,
the detecting module 206 is specifically configured to obtain a first numerical value and a second numerical value according to a detection result in the detection result set, where the first numerical value is a number of detection results indicating that the first KPI time sequence is a normal KPI time sequence, and the second numerical value is a number of detection results indicating that the first KPI time sequence is an abnormal KPI time sequence;
when the first value is larger than the second value, determining the first KPI time series as a normal time series;
and when the first numerical value is smaller than the second numerical value, determining that the first KPI time series is an abnormal time series.
In the embodiment of the application, a detection device is provided. By adopting the device, the similarity of a plurality of training samples and the similarity of a plurality of testing samples are integrated by combining the integrated learning idea to obtain the similarity matrix, so that the accuracy of the classification result is effectively improved.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a server provided by the embodiment of the present application, and the server 300 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) storing an application 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the server 300.
The Server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341, such as a Windows Server TM ,Mac OS X TM ,Unix TM ,Linux TM ,FreeBSD TM And so on.
The steps performed by the detection means in the above embodiments may be based on the server structure shown in fig. 4.
Referring to fig. 5, for convenience of description, only a portion related to the embodiment of the present application is shown, and specific technical details that are not disclosed refer to a method portion in the embodiment of the present application. In the embodiment of the present application, a terminal device is taken as an example to explain:
fig. 5 is a block diagram illustrating a partial structure of a smart phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 5, the smart phone includes: radio Frequency (RF) circuitry 410, memory 420, input unit 430, display unit 440, sensor 450, audio circuitry 460, wireless fidelity (WiFi) module 470, processor 480, and power supply 490. Those skilled in the art will appreciate that the smartphone configuration shown in fig. 5 is not intended to be limiting and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The following specifically introduces each component of the smartphone with reference to fig. 5:
the RF circuit 410 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 480; in addition, data for designing uplink is transmitted to the base station. In general, RF circuitry 410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 410 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.
The memory 420 may be used to store software programs and modules, and the processor 480 executes various functional applications and data processing of the smart phone by operating the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the smartphone, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The input unit 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smartphone. Specifically, the input unit 430 may include a touch panel 431 and other input devices 432. The touch panel 431, also called a touch screen, may collect touch operations of a user on or near the touch panel 431 (e.g., operations of the user on or near the touch panel 431 using any suitable object or accessory such as a finger or a stylus) and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 480, and receives and executes commands sent from the processor 480. In addition, the touch panel 431 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 430 may include other input devices 432 in addition to the touch panel 431. In particular, other input devices 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 440 may be used to display information input by the user or information provided to the user and various menus of the smartphone. The display unit 440 may include a display panel 441, and optionally, the display panel 441 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 431 may cover the display panel 441, and when the touch panel 431 detects a touch operation on or near the touch panel 431, the touch panel is transmitted to the processor 480 to determine the type of the touch event, and then the processor 480 provides a corresponding visual output on the display panel 441 according to the type of the touch event. Although in fig. 5, the touch panel 431 and the display panel 441 are two independent components to implement the input and output functions of the smart phone, in some embodiments, the touch panel 431 and the display panel 441 may be integrated to implement the input and output functions of the smart phone.
The smartphone may also include at least one sensor 450, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 441 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 441 and/or the backlight when the smartphone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for recognizing the attitude of the smartphone, and related functions (such as pedometer and tapping) for vibration recognition; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the smart phone, further description is omitted here.
The audio circuit 460, speaker 461, microphone 462 may provide an audio interface between the user and the smartphone. The audio circuit 460 may transmit the electrical signal converted from the received audio data to the speaker 461, and convert the electrical signal into a sound signal for output by the speaker 461; on the other hand, the microphone 462 converts the collected sound signals into electrical signals, which are received by the audio circuit 460 and converted into audio data, which are then processed by the audio data output processor 480, either by the RF circuit 410 for transmission to, for example, another smartphone, or by outputting the audio data to the memory 420 for further processing.
WiFi belongs to short-distance wireless transmission technology, and the smart phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 470, and provides wireless broadband internet access for the user. Although fig. 5 shows the WiFi module 470, it is understood that it does not belong to the essential constitution of the smartphone and can be omitted entirely as needed within the scope not changing the essence of the invention.
The processor 480 is a control center of the smart phone, connects various parts of the entire smart phone by using various interfaces and lines, and performs various functions of the smart phone and processes data by operating or executing software programs and/or modules stored in the memory 420 and calling data stored in the memory 420, thereby integrally monitoring the smart phone. Optionally, processor 480 may include one or more processing units; optionally, the processor 480 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into processor 480.
The smart phone also includes a power source 490 (e.g., a battery) for providing power to various components, optionally, the power source may be logically connected to the processor 480 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.
Although not shown, the smart phone may further include a camera, a bluetooth module, and the like, which are not described herein.
The steps performed by the detection means in the above-described embodiment may be based on the terminal device structure shown in fig. 5.
Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.
Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (11)

1. A KPI time sequence detection method is characterized by comprising the following steps:
acquiring a first key performance indicator KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected;
extracting data characteristics of the first KPI time sequence to generate a test characteristic vector, and extracting data characteristics of each KPI time sequence in the KPI time sequence set to generate a first training set;
acquiring a positive type sample set and a negative type sample set in the first training set, wherein the positive type sample set comprises KPI time sequence samples at normal time, and the negative type sample set comprises KPI time sequence samples at abnormal time;
sampling the positive type sample set and the negative type sample set to obtain a second training set, wherein the number of the positive type samples in the second training set is the same as that of the negative type samples;
training a random forest algorithm by using the second training set to obtain an anomaly detection model;
inputting the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix corresponding to the second training set and the test feature vector;
and determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive samples in the second training set and the negative samples in the second training set.
2. The method of claim 1, wherein extracting the data feature generation test feature vector for the first KPI time series comprises:
extracting statistical features, fitting features and original features of the first KPI time sequence to generate the test feature vector;
the extracting the data features of each KPI time sequence in the KPI time sequence set to generate a first training set comprises:
and extracting the statistical characteristics, fitting characteristics and original characteristics of each KPI time sequence in the KPI time sequence set to generate the first training set.
3. The method of claim 1, wherein after inputting the second training set and the test feature vector into the anomaly detection model to output a first detection result for the first KPI time series, the method further comprises:
sampling the positive type sample set and the negative type sample set to obtain a third training set, wherein the number of the positive type samples in the third training set is the same as that of the negative type samples;
training a random forest algorithm by using the third training set to update the anomaly detection model;
inputting the third training set and the test feature vector into the updated abnormal detection model to output a second detection result of the first KPI time sequence, wherein the first detection result and the second detection result are used as a detection result set;
and so on, until the number of the detection results in the detection result set reaches a preset number, determining the final detection result of the first KPI time sequence according to the detection results in the detection result set.
4. The method according to any of claims 1-3, wherein said determining a first detection result for the first KPI time series from the similarity matrix and positive class samples in the second training set and negative class samples in the second training set comprises:
respectively calculating a first similarity between the positive samples and the test feature vector in the second training set and a second similarity between the negative samples and the test feature vector in the second training set according to the similarity matrix;
determining a first classification result of the test feature vector according to the first similarity and the second similarity;
and determining a first detection result of the first KPI time sequence according to the first classification result.
5. The method of any of claims 1-3, wherein the inputting the second training set and the test feature vector into the anomaly detection model to compute a similarity matrix of the first training set and the test feature vector comprises:
inputting the second training set and the test feature vector into the anomaly detection model to calculate a first proportion value of a first sample in the second training set and a first sample corresponding to the test feature vector falling in a same leaf node in the anomaly detection model, wherein the first proportion value is used as the similarity of the first sample in the second training set and the first sample corresponding to the test feature vector;
inputting the second training set and the test feature vector into the anomaly detection model to calculate a second proportion value of a same leaf node of a second sample in the second training set and a second sample corresponding to the test feature vector, wherein the second proportion value is used as the similarity of the second sample in the second training set and the second sample corresponding to the test feature vector;
and repeating the steps until the similarity between each sample in the second training set and each sample corresponding to the test feature vector is obtained through traversal calculation, and summarizing the similarity between each sample in the second training set and each sample corresponding to the test feature vector to obtain the similarity matrix.
6. The method of claim 4, wherein the calculating the first similarity between the positive samples in the second training set and the test feature vector and the second similarity between the negative samples in the second training set and the test feature vector according to the similarity matrix comprises:
determining a similarity set corresponding to a positive sample in the second training set from the similarity matrix, and summing all similarities in the similarity set corresponding to the positive sample to obtain the first similarity;
and determining a similarity set corresponding to the negative type sample in the second training set from the similarity matrix, and summing all the similarities in the similarity set corresponding to the negative type sample to obtain the second similarity.
7. The method of claim 4, wherein determining the first classification result for the test feature vector based on the first similarity and the second similarity comprises:
when the first similarity is larger than the second similarity, determining that a first classification result of the test feature vector is normal;
and when the first similarity is smaller than the second similarity, determining that a first classification result of the test feature vector is abnormal.
8. The method according to claim 3, wherein said determining a final detection result for the first KPI time series according to the detection results in the set of detection results comprises:
acquiring a first numerical value and a second numerical value according to detection results in the detection result set, wherein the first numerical value is the number of detection results indicating that the first KPI time sequence is a normal KPI time sequence, and the second numerical value is the number of detection results indicating that the first KPI time sequence is an abnormal KPI time sequence;
when the first value is larger than the second value, determining the first KPI time series as a normal time series;
and when the first numerical value is smaller than the second numerical value, determining that the first KPI time series is an abnormal time series.
9. A detection device, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first key performance indicator KPI time sequence of a time point to be detected and a KPI time sequence set in a preset time period before the time point to be detected;
the feature extraction module is used for extracting the data features of the first KPI time sequence to generate a test feature vector, and extracting the data features of each KPI time sequence in the KPI time sequence set to generate a first training set;
the second acquisition module is used for acquiring a positive type sample set and a negative type sample set in the first training set, wherein the positive type sample set comprises KPI time sequence samples at normal time, and the negative type sample set comprises KPI time sequence samples at abnormal time;
the sampling module is used for sampling the positive type sample set and the negative type sample set to obtain a second training set, and the number of the positive type samples in the second training set is the same as that of the negative type samples;
the training module is used for training a random forest algorithm by using the second training set to obtain an abnormal detection model;
the detection module is used for inputting the second training set and the test feature vector into the anomaly detection model to calculate a similarity matrix corresponding to the second training set and the test feature vector; and determining a first detection result of the first KPI time sequence according to the similarity matrix and the positive samples in the second training set and the negative samples in the second training set.
10. A computer device, comprising: a memory, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor for executing the program in the memory, the processor for performing the method of any one of claims 1 to 8 according to instructions in program code;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
11. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 8.
CN202210857783.XA 2022-07-20 2022-07-20 KPI time sequence detection method and related device Pending CN115130606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210857783.XA CN115130606A (en) 2022-07-20 2022-07-20 KPI time sequence detection method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210857783.XA CN115130606A (en) 2022-07-20 2022-07-20 KPI time sequence detection method and related device

Publications (1)

Publication Number Publication Date
CN115130606A true CN115130606A (en) 2022-09-30

Family

ID=83383550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210857783.XA Pending CN115130606A (en) 2022-07-20 2022-07-20 KPI time sequence detection method and related device

Country Status (1)

Country Link
CN (1) CN115130606A (en)

Similar Documents

Publication Publication Date Title
CN111078479B (en) Memory detection model training method, memory detection method and device
US20160241589A1 (en) Method and apparatus for identifying malicious website
CN105867751B (en) Operation information processing method and device
CN110334124B (en) Compression algorithm selection method, device and equipment
CN107171894A (en) The method of terminal device, distributed high in the clouds detecting system and pattern detection
CN112540996A (en) Service data verification method and device, electronic equipment and storage medium
CN111222563A (en) Model training method, data acquisition method and related device
CN115904950A (en) Test case generation method, device, equipment and storage medium
CN113190646B (en) User name sample labeling method and device, electronic equipment and storage medium
CN110796552A (en) Risk prompting method and device
CN114840565A (en) Sampling query method, device, electronic equipment and computer readable storage medium
CN110659179A (en) Method and device for evaluating system running condition and electronic equipment
CN114282169A (en) Abnormal data detection method and related device
CN106484688B (en) Data processing method and system
CN109450853B (en) Malicious website determination method and device, terminal and server
CN109657469B (en) Script detection method and device
CN112182461A (en) Method and device for calculating webpage sensitivity
CN115130606A (en) KPI time sequence detection method and related device
CN115145910A (en) Protocol data management method and related device
CN112711516A (en) Data processing method and related device
CN112053216A (en) Risk management method of financial product and related device
CN117692898B (en) Supervision and early warning method and system with automatic risk identification function
CN109168154B (en) User behavior information collection method and device and mobile terminal
CN113094577B (en) Information display method, related equipment and storage medium
CN110390549B (en) Registration small number identification method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination