CN112181792A - Method, system and related assembly for abnormal marking of time sequence data - Google Patents

Method, system and related assembly for abnormal marking of time sequence data Download PDF

Info

Publication number
CN112181792A
CN112181792A CN202010993829.1A CN202010993829A CN112181792A CN 112181792 A CN112181792 A CN 112181792A CN 202010993829 A CN202010993829 A CN 202010993829A CN 112181792 A CN112181792 A CN 112181792A
Authority
CN
China
Prior art keywords
data
machine learning
labeling
abnormal
utilization rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010993829.1A
Other languages
Chinese (zh)
Inventor
苏海明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010993829.1A priority Critical patent/CN112181792A/en
Publication of CN112181792A publication Critical patent/CN112181792A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a method, a system and a related component for abnormal labeling of time sequence data, wherein in the method, performance data of system operation are collected; and then, the performance data is judged by combining an unsupervised machine learning algorithm and a supervised machine learning model, so that the condition of data missing or error marking caused by artificial factors is avoided, and the accuracy and efficiency of abnormal labeling of time sequence data can be improved.

Description

Method, system and related assembly for abnormal marking of time sequence data
Technical Field
The present application relates to the field of data processing and counting, and in particular, to a method and system for exception marking of time series data, and related components.
Background
With the continuous development of science and technology, electronic technology has also gained rapid development, and the variety of electronic products is also more and more, and people also enjoy various conveniences brought by the development of science and technology. People can enjoy comfortable life brought along with the development of science and technology through various types of electronic equipment.
In an electronic product (for example, a server), data in a cpu and a memory generally need to be monitored, because the data reflect the operating state of the server to a certain extent, and the data are processed to find abnormal behavior in operation, that is, abnormal diagnosis of the server, so that the data need to be labeled in order to find the characteristics of the monitored data.
The current data labeling work generally adopts manual work, the work is complicated, due to different perceptions of different people to the data, labeling results may be different, and the problem of inaccurate labeling results caused by divergence during labeling is inevitable.
Disclosure of Invention
The invention provides a method, a system and related components for abnormal annotation of time series data, which aim to solve or partially solve the technical problem of inaccurate annotation result caused by the existing annotation mode.
In order to solve the above technical problem, the present invention provides a method for annotating a time series data exception, wherein the method comprises:
collecting performance data of system operation;
carrying out abnormity marking on the performance data by using an unsupervised machine learning algorithm to obtain first abnormity marking data;
and judging the first abnormal labeling data by using a supervised machine learning model to obtain second abnormal labeling data, wherein the second abnormal labeling data is subordinate to the first abnormal labeling data.
Preferably, the acquiring performance data of the system operation specifically includes:
collecting the performance data according to a collection index item; wherein the index items include: cpu utilization rate, cpu user utilization rate, memory utilization rate and disk utilization rate.
Preferably, the performing the abnormal labeling on the performance data by using an unsupervised machine learning algorithm to obtain first abnormal labeling data specifically includes:
and carrying out abnormity labeling on the performance data by utilizing at least two unsupervised machine learning algorithms, and merging the abnormity labeled data obtained by the at least two unsupervised machine learning algorithms to obtain the first abnormity labeled data.
Preferably, after the first abnormal labeling data is determined by using the supervised machine learning model to obtain the second abnormal labeling data, the method further includes:
correcting the second abnormal labeling data to obtain corrected data;
and performing model optimization on the supervised machine learning model by using the correction data.
The invention discloses a system for abnormal labeling of time series data, which comprises:
the acquisition module is used for acquiring performance data of system operation;
the first processing module is used for carrying out abnormity marking on the performance data by using an unsupervised machine learning algorithm to obtain first abnormity marking data;
and the second processing module is used for judging the first abnormal labeling data by utilizing a supervised machine learning model to obtain second abnormal labeling data, and the second abnormal labeling data belongs to the first abnormal labeling data.
Preferably, the acquisition module is specifically configured to acquire the performance data according to an acquisition index item; wherein the index items include: cpu utilization rate, cpu user utilization rate, memory utilization rate and disk utilization rate.
Preferably, the first processing module is specifically configured to perform anomaly labeling on the performance data by using at least two unsupervised machine learning algorithms, and obtain a union set of anomaly labeling data obtained by the at least two unsupervised machine learning algorithms to obtain the first anomaly labeling data.
Preferably, the system further comprises:
the correction module is used for correcting the second abnormal labeling data to obtain corrected data;
and the model optimization module is used for performing model optimization on the supervised machine learning model by using the correction data.
The invention discloses a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The invention discloses a computer device, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the method when executing the program.
Through one or more technical schemes of the invention, the invention has the following beneficial effects or advantages:
the invention discloses a method, a system and a related component for abnormal labeling of time sequence data, wherein in the method, performance data of system operation are collected; and then, the performance data is judged by combining an unsupervised machine learning algorithm and a supervised machine learning model, so that the condition of data missing or error marking caused by artificial factors is avoided, and the accuracy and efficiency of abnormal labeling of time sequence data can be improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a diagram illustrating an implementation process of a method for exception marking of time series data according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a system for exception marking of time-series data according to an embodiment of the present invention.
Detailed Description
In order to make the present application more clearly understood by those skilled in the art to which the present application pertains, the following detailed description of the present application is made with reference to the accompanying drawings by way of specific embodiments.
3-sigma criterion: also known as the rule of thumb, if the data satisfies the positive-Taiwan distribution, the probability of the numerical distribution in (μ -3 σ, μ +3 σ) is 0.9974. Where μ represents the mean and σ represents the standard deviation.
Boosting: the lifting method is a method for reducing deviation in supervised learning.
LightGBM: LightGBM is a gradient Boosting framework that uses a decision tree based learning algorithm.
The embodiment of the invention provides a method for abnormally labeling time series data, which is used for reducing the pressure of manual labeling, adopts an automatic labeling method, and specifically adopts a method of combining machine learning unsupervised algorithm and supervised algorithm to label the performance data of a system, so that the accuracy and efficiency of the abnormal labeling of the time series data are improved.
Referring to fig. 1, a process diagram of an implementation of the method for exception marking of time series data according to the embodiment of the present invention is shown. The method specifically comprises the following steps:
step 101, collecting performance data of system operation.
Specifically, performance data is collected according to collection index items; wherein, the index item includes: cpu utilization rate, cpu user utilization rate, memory utilization rate and disk utilization rate. For example, in system performance monitoring of an OpenStack cloud platform, a telegraff tool may be used to collect performance data of system operations. The main collected data includes the real-time use condition of index items such as cpu utilization rate, cpu user utilization rate, memory utilization rate, disk utilization rate, etc., and the default collection period is 60s, but of course, other data may be used. The collected data enters an Influxdb time sequence database for storage, and the data can be subsequently acquired from the Influxdb time sequence database for abnormal point marking.
And 102, carrying out abnormity marking on the performance data by using an unsupervised machine learning algorithm to obtain first abnormity marking data.
Before exception marking, a calibration window needs to be determined, for example, the marking window is 180, that is, a marking window is set by using 3 hours of data. And marking the data according to the division of the calibration window during abnormal marking.
In the process of anomaly labeling, the unsupervised machine learning algorithm used includes but is not limited to: 3-sigma criteria, linear regression, isolated forest, and the like.
In the scheme, in order to avoid missing marks of abnormal data, the performance data can be abnormally marked by using at least two unsupervised machine learning algorithms, and abnormal marking data obtained by the at least two unsupervised machine learning algorithms are merged to obtain first abnormal marking data. And if at least two unsupervised machine learning algorithms detect no abnormal data points, determining that the data has no abnormal data points and directly skipping.
And 103, judging the first abnormal labeling data by using a supervised machine learning model to obtain second abnormal labeling data.
Specifically, supervised machine learning model determination is performed on first abnormal labeled data detected by an unsupervised machine learning algorithm. Generally, the error detection rate of supervised machine learning is lower than that of unsupervised machine learning algorithm. In this embodiment, the supervised machine learning model adopts the LightGBM model to perform the final abnormal labeling, and the labeled second abnormal labeling data is subordinate to the first abnormal labeling data, and the number of the labeled second abnormal labeling data is less than or equal to the number of the labeled first abnormal labeling data.
It can be seen that the data calibration of this embodiment uses a Boosting idea, when an unsupervised machine learning algorithm is used, a plurality of algorithms are used to merge detected abnormal data, so as to avoid the occurrence of data missing, and the detected abnormal data enters the LightGBM model for final determination.
And after labeling, storing all the second abnormal labeling data which are pre-labeled into the mysql database, wherein the stored second abnormal labeling data are labeling results of supervised machine learning, and writing marks such as time points, abnormal values, data curve labels and the like of the abnormal points into the mysql database.
As an alternative embodiment, after the first abnormal labeling data is determined by using the supervised machine learning model to obtain the second abnormal labeling data, the second abnormal labeling data is corrected to obtain corrected data. Specifically, in machine learning, no matter the machine learning method is supervised or unsupervised, the calculation result is not hundreds accurate. The results may be modified manually after machine labeling. The invention provides an interactive interface. And loading the original data and the pre-calibrated data. And correcting the data with the pre-calibration in a manual mode, and correcting the calibration error. The correction data is still stored in the mysql database.
And performing model optimization on the supervised machine learning model by using the correction data. The marking accuracy rate after model optimization is higher, so that the optimized supervised machine learning model is used for replacing the supervised machine learning model, and the first abnormal marking data is judged so as to improve the marking accuracy rate. It should be noted that the first abnormal labeling data herein is the data calibrated by the unsupervised machine learning algorithm for the next calibration window.
In order to achieve higher machine labeling data quality and reduce the manual workload in the calibration process, the LightGBM is optimized by using manually corrected data, which is also a retraining process. As the data volume increases, the data pattern and the capability of the trained model increase, and the result of machine calibration develops towards the advantage.
Therefore, the model can be optimized after each labeling, and the optimized model is used for labeling the next data, so that the labeling capability of the model can be gradually improved, and the labeling accuracy is higher and higher. And the automatic labeling method is adopted to replace a manual labeling method, so that the labeling efficiency is greatly improved.
Therefore, the method combines unsupervised and supervised machine learning algorithms to perform exception labeling on the stored performance data under the OpenStack cloud computing framework, and can improve the accuracy and efficiency of exception labeling of the time series data.
Based on the same inventive concept, the following embodiments describe a system for exception marking of time series data, referring to fig. 2, including:
the acquisition module 201 is used for acquiring performance data of system operation;
the first processing module 202 is configured to perform exception labeling on the performance data by using an unsupervised machine learning algorithm to obtain first exception labeling data;
the second processing module 203 is configured to determine the first abnormal labeling data by using a supervised machine learning model to obtain second abnormal labeling data, where the second abnormal labeling data is subordinate to the first abnormal labeling data.
As an optional embodiment, the acquisition module 201 is specifically configured to acquire performance data according to an acquisition index item; wherein, the index item includes: cpu utilization rate, cpu user utilization rate, memory utilization rate and disk utilization rate.
As an optional embodiment, the first processing module 202 is specifically configured to perform anomaly labeling on the performance data by using at least two unsupervised machine learning algorithms, and obtain a union set of anomaly labeling data obtained by the at least two unsupervised machine learning algorithms, so as to obtain first anomaly labeling data.
As an alternative embodiment, the system further comprises:
the correction module is used for correcting the second abnormal labeling data to obtain corrected data;
and the model optimization module is used for performing model optimization on the supervised machine learning model by using the correction data.
As an alternative embodiment, the system further comprises: and the storage module is used for storing the performance data into the Influxdb time sequence database and storing the second abnormal labeling data into the mysql database.
Based on the same inventive concept as in the previous embodiments, the embodiments of the present invention further provide a related component, in particular, a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any of the foregoing methods.
Based on the same inventive concept as in the previous embodiments, an embodiment of the present invention further provides a related component, specifically a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of any one of the foregoing methods.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method for exception marking of time series data, the method comprising:
collecting performance data of system operation;
carrying out abnormity marking on the performance data by using an unsupervised machine learning algorithm to obtain first abnormity marking data;
and judging the first abnormal labeling data by using a supervised machine learning model to obtain second abnormal labeling data, wherein the second abnormal labeling data is subordinate to the first abnormal labeling data.
2. The method of claim 1, wherein the collecting performance data of the system operation specifically comprises:
collecting the performance data according to a collection index item; wherein the index items include: cpu utilization rate, cpu user utilization rate, memory utilization rate and disk utilization rate.
3. The method of claim 1, wherein the performing anomaly labeling on the performance data by using an unsupervised machine learning algorithm to obtain first anomaly labeled data specifically comprises:
and carrying out abnormity labeling on the performance data by utilizing at least two unsupervised machine learning algorithms, and merging the abnormity labeled data obtained by the at least two unsupervised machine learning algorithms to obtain the first abnormity labeled data.
4. The method of claim 1, wherein after determining the first anomaly marking data using the supervised machine learning model and obtaining the second anomaly marking data, the method further comprises:
correcting the second abnormal labeling data to obtain corrected data;
and performing model optimization on the supervised machine learning model by using the correction data.
5. A system for exception tagging of time series data, comprising:
the acquisition module is used for acquiring performance data of system operation;
the first processing module is used for carrying out abnormity marking on the performance data by using an unsupervised machine learning algorithm to obtain first abnormity marking data;
and the second processing module is used for judging the first abnormal labeling data by utilizing a supervised machine learning model to obtain second abnormal labeling data, and the second abnormal labeling data belongs to the first abnormal labeling data.
6. The system of claim 5, wherein the collection module is specifically configured to collect the performance data according to a collection criteria item; wherein the index items include: cpu utilization rate, cpu user utilization rate, memory utilization rate and disk utilization rate.
7. The system of claim 5, wherein the first processing module is specifically configured to perform anomaly labeling on the performance data by using at least two unsupervised machine learning algorithms, and obtain the first anomaly labeling data by merging sets of anomaly labeling data obtained by the at least two unsupervised machine learning algorithms respectively.
8. The system of claim 5, wherein the system further comprises:
the correction module is used for correcting the second abnormal labeling data to obtain corrected data;
and the model optimization module is used for performing model optimization on the supervised machine learning model by using the correction data.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 4 when executing the program.
CN202010993829.1A 2020-09-21 2020-09-21 Method, system and related assembly for abnormal marking of time sequence data Withdrawn CN112181792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010993829.1A CN112181792A (en) 2020-09-21 2020-09-21 Method, system and related assembly for abnormal marking of time sequence data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010993829.1A CN112181792A (en) 2020-09-21 2020-09-21 Method, system and related assembly for abnormal marking of time sequence data

Publications (1)

Publication Number Publication Date
CN112181792A true CN112181792A (en) 2021-01-05

Family

ID=73955623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010993829.1A Withdrawn CN112181792A (en) 2020-09-21 2020-09-21 Method, system and related assembly for abnormal marking of time sequence data

Country Status (1)

Country Link
CN (1) CN112181792A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553239A (en) * 2021-07-27 2021-10-26 重庆紫光华山智安科技有限公司 Abnormal data detection method and related device
CN116956282A (en) * 2023-06-07 2023-10-27 广州天懋信息系统股份有限公司 Abnormality detection system based on network asset memory time sequence multi-feature data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111352971A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Bank system monitoring data anomaly detection method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111352971A (en) * 2020-02-28 2020-06-30 中国工商银行股份有限公司 Bank system monitoring data anomaly detection method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553239A (en) * 2021-07-27 2021-10-26 重庆紫光华山智安科技有限公司 Abnormal data detection method and related device
CN113553239B (en) * 2021-07-27 2023-02-28 重庆紫光华山智安科技有限公司 Abnormal data detection method and related device
CN116956282A (en) * 2023-06-07 2023-10-27 广州天懋信息系统股份有限公司 Abnormality detection system based on network asset memory time sequence multi-feature data
CN116956282B (en) * 2023-06-07 2024-02-06 广州天懋信息系统股份有限公司 Abnormality detection system based on network asset memory time sequence multi-feature data

Similar Documents

Publication Publication Date Title
CN111459778B (en) Operation and maintenance system abnormal index detection model optimization method, device and storage medium
CN111641519B (en) Abnormal root cause positioning method, device and storage medium
US20190155672A1 (en) Real-time anomaly detection and correlation of time-series data
CN109241997B (en) Method and device for generating training set
CN112181792A (en) Method, system and related assembly for abnormal marking of time sequence data
CN115698882A (en) Abnormal modulation cause identification device, abnormal modulation cause identification method, and abnormal modulation cause identification program
CN109308225B (en) Virtual machine abnormality detection method, device, equipment and storage medium
US20190265088A1 (en) System analysis method, system analysis apparatus, and program
US20180307218A1 (en) System and method for allocating machine behavioral models
US20210374634A1 (en) Work efficiency evaluation method, work efficiency evaluation apparatus, and program
WO2021241580A1 (en) Abnormality/irregularity cause identifying apparatus, abnormality/irregularity cause identifying method, and abnormality/irregularity cause identifying program
CN114357858B (en) Equipment degradation analysis method and system based on multitask learning model
CN110232130B (en) Metadata management pedigree generation method, apparatus, computer device and storage medium
CN110543869A (en) Ball screw service life prediction method and device, computer equipment and storage medium
US20230229136A1 (en) Abnormal irregularity cause identifying device, abnormal irregularity cause identifying method, and abnormal irregularity cause identifying program
CN117193088B (en) Industrial equipment monitoring method and device and server
CN117453763A (en) Data processing method, recording medium and system for dam safety monitoring
CN117592656A (en) Carbon footprint monitoring method and system based on carbon data accounting
CN106021115A (en) Non-supervision defect prediction method based on probabilities
CN117591860A (en) Data anomaly detection method and device
CN116230586B (en) Commonality analysis method and terminal of wafer manufacturing machine unit
CN116910499A (en) System state monitoring method and device, electronic equipment and readable storage medium
US20220405161A1 (en) Data selection assist device and data selection assist method
CN114528906A (en) Fault diagnosis method, device, equipment and medium for rotary machine
CN109165108B (en) Failure data reduction method and test method for software reliability accelerated test

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210105