CN116244139A - Alarm self-healing method and system based on time sequence data - Google Patents

Alarm self-healing method and system based on time sequence data Download PDF

Info

Publication number
CN116244139A
CN116244139A CN202211672565.5A CN202211672565A CN116244139A CN 116244139 A CN116244139 A CN 116244139A CN 202211672565 A CN202211672565 A CN 202211672565A CN 116244139 A CN116244139 A CN 116244139A
Authority
CN
China
Prior art keywords
alarm
time
self
healing
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211672565.5A
Other languages
Chinese (zh)
Inventor
王伟斌
李超德
刘宁
陈传凯
杨小华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinshu Technology Co ltd
Original Assignee
Beijing Xinshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinshu Technology Co ltd filed Critical Beijing Xinshu Technology Co ltd
Priority to CN202211672565.5A priority Critical patent/CN116244139A/en
Publication of CN116244139A publication Critical patent/CN116244139A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an alarming self-healing method and system based on time sequence data, and the system is mainly applied to the field of databases. In this field, to monitor the key performance state of database operation, it is necessary to periodically collect the values of the database performance indicators, thereby forming a list of time series data for each key performance indicator. Analyzing each column of time sequence data, finding out from the time sequence data in time when one or more indexes are abnormal, and giving an alarm. Thereafter, the indicator value is still continuously monitored and the alarm state is updated in time.

Description

Alarm self-healing method and system based on time sequence data
Technical Field
The invention relates to the field of databases, in particular to an alarming self-healing method and system based on time sequence data.
Background
The time series data are data columns recorded in time series with the same index. Time series data are currently very common in the fields of economy, biology, information, etc. The time sequence data often hides the change rule of the data along with time, and if the change rule can be found and utilized, the development of various fields of society can be effectively promoted. The upper stock price integrated index, which is a common upper stock price integrated index in economics, is a time series data, and reflects the overall trend of the upper stock price integrated index.
In the field of databases, the indexes capable of reflecting the performance states of the databases are numerous, and if the numerical values of some key performance indexes can be acquired on time, a plurality of columns of time sequence data capable of reflecting the change trend of each index can be obtained. Based on the collected database performance index values, abnormal conditions of the database state can be timely found, and once the abnormal conditions occur, an alarm is timely sent out. At present, the database performance state abnormality alarming method and system are more. However, when an abnormality occurs and alarms are generated, management of these alarms is lacking.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention discloses an alarm self-healing method based on time sequence data, which comprises the following steps:
(1) And (3) data acquisition: collecting key index values and storing the key index values in a local database;
specifically, K indexes in the target database system need to be monitored, K time sequences can be obtained, and the time sequences are expressed as S 1 ,S 2 ,…,S K The value of each time series at time 1,2, …, t is denoted s i1 ,s i2 ,…,s it Wherein i is more than or equal to 1 and less than or equal to K, each time is acquired at a certain time interval, and the acquisition time interval is designated by a user;
(2) Abnormality detection: analyzing each time sequence, and analyzing the time sequence by adopting machine learning or artificial intelligence to timely find out abnormality in the sequence;
(3) Alarming: when the time sequence is found to be abnormal, an alarm is sent out in time in the system, and a user is notified in a mail or mobile phone short message mode;
(4) Self-healing: after the system gives an alarm, continuously monitoring the corresponding performance index according to a formulated strategy, and canceling the alarm if the index value is found to be normal; otherwise, continuing to alarm, modifying the strategy, and continuing to monitor the alarm index according to the modified strategy.
Specifically, after abnormality is generated on the ith index at the time t and an alarm is triggered, the mth self-healing detection is carried out at the time t+n, and the ith index value is found not to be recovered to be normal through the mth self-healing detection, so that the next time t of self-healing detection is carried out new =t+n+f(m,l i ) Wherein f (m, l i ) As a function of the moment of calculation, l i The alarm grade corresponding to the ith index is classified into serious, general and slight, and the corresponding values are 3, 2 and 1 respectively; f (m, l) i )=roundup(min(2 m ,1000)/l i ) Where roundup () represents an upward rounding function, and min represents a minimum taking function.
In order to realize the method, the invention also discloses an alarming self-healing system based on time sequence data, which comprises an abnormal data acquisition module, a detection module, an alarming module and a self-healing module, in particular,
(1) The data acquisition module is used for acquiring key index values and storing the key index values into the local database;
(2) The abnormality detection module is used for analyzing each time sequence, analyzing the time sequence by adopting machine learning or artificial intelligence and timely finding abnormality in the sequence;
(3) The alarm module is used for timely sending an alarm in the system when the time sequence is found to be abnormal, and notifying a user in a mail or mobile phone short message mode;
(4) The self-healing module is used for continuously monitoring corresponding performance indexes according to a formulated strategy after the system gives an alarm, and canceling the alarm if the index value is found to be normal; otherwise, continuing to alarm, modifying the strategy, and continuing to monitor the alarm index according to the modified strategy.
The invention discloses an alarming self-healing method and system based on time sequence data, and the system is mainly applied to the field of databases. In this field, to monitor the key performance state of database operation, it is necessary to periodically collect the values of the database performance indicators, thereby forming a list of time series data for each key performance indicator. Analyzing each column of time sequence data, finding out from the time sequence data in time when one or more indexes are abnormal, and giving an alarm. Thereafter, the indicator value is still continuously monitored and the alarm state is updated in time.
Drawings
FIG. 1 shows the steps of the process of the present invention.
FIG. 2 is a schematic diagram of the system of the present invention.
Detailed Description
In the invention, K indexes in a target database system are supposed to be monitored (K is usually a positive integer not less than 1), and K time sequences are correspondingly obtained and are expressed as S 1 ,S 2 ,…,S K The value of each time series at time 1,2, …, t is denoted s i1 ,s i2 ,…,s it Wherein i is more than or equal to 1 and less than or equal to K, t represents a time value, and refers to a t-th acquisition performance index, each acquisition is performed at a certain time interval, and the acquisition time interval can be designated by a user.
During data acquisition, new acquisition data is added to each time sequence over time. In the time sequence updating process, the abnormality detection module performs abnormality detection on the time sequence by adopting an LSTM (Long Short-Term Memory) method. If the ith index is abnormal at a certain moment (supposing t moment), the alarm module is immediately shifted to give an alarm, and the user is notified in the modes of mail, mobile phone short message and the like.
After the alarm module gives an alarm, the data acquisition process is continued. Accordingly, the time series starts recording the index data after the time t. At this time, the self-healing module starts to work, and self-healing monitoring is carried out on the index generating the alarm. In order to ensure the efficiency of the system, a self-healing strategy is formulated, and the situation of 'collecting primary data and judging primary abnormality' is avoided.
After the ith index is abnormal at the time t and an alarm is triggered, the numerical value of the index still needs to be acquired later, and whether the newly acquired index value is normal or not is judged. If the alarm is normal, the alarm is canceled, and the self-healing process is ended; if the index value is abnormal, the time for judging the index value collected later is needed to be calculated, and the index value is detected again on time.
And assuming that the ith index is abnormal at the time t and an alarm is triggered, carrying out the mth self-healing detection at the time t+n, and finding that the ith index is not recovered to be normal through the mth self-healing detection. The next time self-healing detection time calculation method comprises the following steps:
t new =t+n+f(m,l i ),
wherein f (m, l) i ) As a function of the moment of calculation, l i For the alert level corresponding to the ith index, the severe, general and slight correspond to values 3, 2 and 1, respectively. f (m, l) i ) The calculation mode of (a) is as follows:
f(m,l i )=roumdup(min(2 m ,1000)/l i ),
where roundup () represents an upward rounding function, and min represents a minimum taking function.
Assuming that the corresponding values of the database connection number performance indexes 1,2,3,4 and 5 moments are 100,98,96,102,300 respectively, after being detected by the anomaly detection module, the database connection number corresponding to the moment 5 is found to be abnormal, the alarm module triggers an alarm, wherein the alarm is a serious alarm, and the corresponding alarm is l i The value is 3. The first self-healing test is started later, m=1, f (m, l i ) The value is 1. When the system acquires the database connection number performance index value at the 6 th moment, the abnormality detection is performed again. If the database connection number performance index value at the 6 th moment is 500 and the database connection number performance index value is still not abnormal after being detected by the abnormality detection module, preparing a second self-healing detection, wherein m=2, and obtaining f (m, l) i ) If the value is 2, the self-healing detection needs to be performed again on the index value at the 6+2 (i.e., 8) th time. Assuming that the database connection number performance index value at the 8 th moment is 100, and the database connection number performance index value is normal after being detected by the abnormality detection module, marking the state of the index as positiveOften, and cancel the corresponding alert.

Claims (3)

1. The alarming self-healing method based on time sequence data is characterized by comprising the following steps:
(1) And (3) data acquisition: collecting key index values and storing the key index values in a local database; k indexes in the target database system need to be monitored, K time sequences can be obtained and are expressed as S 1 ,S 2 ,…,S K The value of each time series at time 1,2, …, t is denoted s i1 ,s i2 ,…,s it Wherein i is more than or equal to 1 and less than or equal to K, each time is acquired at a certain time interval, and the acquisition time interval is designated by a user;
(2) Abnormality detection: analyzing each time sequence, and analyzing the time sequence by adopting machine learning or artificial intelligence to timely find out abnormality in the sequence;
(3) Alarming: when the time sequence is found to be abnormal, an alarm is sent out in time in the system, and a user is notified in a mail or mobile phone short message mode;
(4) Self-healing: after the system gives an alarm, continuously monitoring the corresponding performance index according to a formulated strategy, and canceling the alarm if the index value is found to be normal; otherwise, continuing to alarm, modifying the strategy, and continuing to monitor the alarm index according to the modified strategy.
2. The method of alarm self-healing based on time series data as claimed in claim 1, wherein in step (4), after abnormality is generated on the ith index at time t and alarm is triggered, the mth self-healing detection is performed at time t+n, and when the ith index is found not to be recovered to be normal through the mth self-healing detection, the next time t of self-healing detection is performed new =t+n+f(m,l i ) Wherein f (m, l i ) As a function of the moment of calculation, l i The alarm grade corresponding to the ith index is classified into serious, general and slight, and the corresponding values are 3, 2 and 1 respectively;
Figure FDA0004015728090000011
where roundup () represents an upward rounding function, and min represents a minimum taking function.
3. The warning self-healing system based on time sequence data is characterized by comprising an abnormal data acquisition module, a detection module, a warning module and a self-healing module, and specifically:
(1) The data acquisition module is used for acquiring key index values and storing the key index values into the local database;
(2) The abnormality detection module is used for analyzing each time sequence, analyzing the time sequence by adopting machine learning or artificial intelligence and timely finding abnormality in the sequence;
(3) The alarm module is used for timely sending an alarm in the system when the time sequence is found to be abnormal, and notifying a user in a mail or mobile phone short message mode;
(4) The self-healing module is used for continuously monitoring corresponding performance indexes according to a formulated strategy after the system gives an alarm, and canceling the alarm if the index value is found to be normal; otherwise, continuing to alarm, modifying the strategy, and continuing to monitor the alarm index according to the modified strategy.
CN202211672565.5A 2022-12-24 2022-12-24 Alarm self-healing method and system based on time sequence data Pending CN116244139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211672565.5A CN116244139A (en) 2022-12-24 2022-12-24 Alarm self-healing method and system based on time sequence data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211672565.5A CN116244139A (en) 2022-12-24 2022-12-24 Alarm self-healing method and system based on time sequence data

Publications (1)

Publication Number Publication Date
CN116244139A true CN116244139A (en) 2023-06-09

Family

ID=86628693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211672565.5A Pending CN116244139A (en) 2022-12-24 2022-12-24 Alarm self-healing method and system based on time sequence data

Country Status (1)

Country Link
CN (1) CN116244139A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178456A (en) * 2020-01-15 2020-05-19 腾讯科技(深圳)有限公司 Abnormal index detection method and device, computer equipment and storage medium
CN111459695A (en) * 2020-03-12 2020-07-28 平安科技(深圳)有限公司 Root cause positioning method and device, computer equipment and storage medium
CN111858231A (en) * 2020-05-11 2020-10-30 北京必示科技有限公司 Single index abnormality detection method based on operation and maintenance monitoring
CN113946492A (en) * 2021-10-19 2022-01-18 平安普惠企业管理有限公司 Intelligent operation and maintenance method, device, equipment and storage medium
CN115454778A (en) * 2022-09-27 2022-12-09 浙江大学 Intelligent monitoring system for abnormal time sequence indexes in large-scale cloud network environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178456A (en) * 2020-01-15 2020-05-19 腾讯科技(深圳)有限公司 Abnormal index detection method and device, computer equipment and storage medium
CN111459695A (en) * 2020-03-12 2020-07-28 平安科技(深圳)有限公司 Root cause positioning method and device, computer equipment and storage medium
CN111858231A (en) * 2020-05-11 2020-10-30 北京必示科技有限公司 Single index abnormality detection method based on operation and maintenance monitoring
CN113946492A (en) * 2021-10-19 2022-01-18 平安普惠企业管理有限公司 Intelligent operation and maintenance method, device, equipment and storage medium
CN115454778A (en) * 2022-09-27 2022-12-09 浙江大学 Intelligent monitoring system for abnormal time sequence indexes in large-scale cloud network environment

Similar Documents

Publication Publication Date Title
CN111126824B (en) Multi-index correlation model training method and multi-index anomaly analysis method
CN102819239B (en) Intelligent fault diagnosis method of numerical control machine tool
CN111475804A (en) Alarm prediction method and system
CN109887125B (en) Fault detection method and device
CN107797465A (en) Monitoring method and device
CN117131110B (en) Method and system for monitoring dielectric loss of capacitive equipment based on correlation analysis
CN117193164A (en) Fault monitoring method and system of numerical control machine tool
CN115561546A (en) Abnormity detection and alarm system for power system
CN116244139A (en) Alarm self-healing method and system based on time sequence data
CN109460311A (en) The management method and device of firmware abnormality
CN110411724B (en) Rotary machine fault diagnosis method, device and system and readable storage medium
CN115169650B (en) Equipment health prediction method for big data analysis
CN111638989B (en) Fault diagnosis method, device, storage medium and equipment
CN108275157A (en) A kind of driving behavior detecting system and method
CN113808725A (en) Equipment early warning system and method
CN112433918A (en) Database resource monitoring method and device
CN113413568A (en) Fire-fighting water pressure abnormity monitoring system and unsupervised abnormity detection method
CN113655776A (en) Vehicle detection method and device, electronic equipment and storage medium
CN112650644A (en) Monitoring method and system based on prometheus
CN113177873B (en) Urban public safety early warning system based on progressive early warning mode
CN115858505B (en) Data processing method for identifying state abnormality of frequency converter
CN116778688B (en) Machine room alarm event processing method, device, equipment and storage medium
CN111537022B (en) Punch forming production workshop safety monitoring system based on artificial intelligence
US11543808B2 (en) Sensor attribution for anomaly detection
CN116192612B (en) System fault monitoring and early warning system and method based on log analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination