CN115269315A - Abnormity detection method, device, equipment and medium - Google Patents

Abnormity detection method, device, equipment and medium Download PDF

Info

Publication number
CN115269315A
CN115269315A CN202210838294.XA CN202210838294A CN115269315A CN 115269315 A CN115269315 A CN 115269315A CN 202210838294 A CN202210838294 A CN 202210838294A CN 115269315 A CN115269315 A CN 115269315A
Authority
CN
China
Prior art keywords
data
characteristic data
data set
historical
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210838294.XA
Other languages
Chinese (zh)
Inventor
徐修颖
茅逸斐
熊慧君
国欣宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210838294.XA priority Critical patent/CN115269315A/en
Publication of CN115269315A publication Critical patent/CN115269315A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Abstract

The present disclosure provides an anomaly detection method, apparatus, device, and medium, which can be applied to the field of big data technology and the field of financial technology. The abnormality detection method includes: acquiring system operation real-time data, system operation normal historical characteristic data and system operation abnormal historical characteristic data from a database monitoring system; carrying out standardized processing on the real-time system operation data to obtain real-time system operation characteristic data; constructing a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set according to the normal operation characteristic data and the abnormal operation characteristic data of the historical system; inputting the real-time system operation characteristic data, the first characteristic data set and the second characteristic data set into a first classifier to obtain a first classification result; inputting the real-time system operation characteristic data, the third characteristic data set and the fourth characteristic data set into a second classifier to obtain a second classification result; and generating an abnormality detection result according to the first classification result and the second classification result.

Description

Abnormity detection method, device, equipment and medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for anomaly detection.
Background
At present, the health degree evaluation of a MySQL database of a data center is one of difficulties. To some extent, the health of the database system can be represented by a set of indicators. With the increase of the scale of a database system, the improvement of complexity and the improvement of monitoring coverage, the monitoring data volume is larger and larger, and operation and maintenance personnel cannot find quality problems from massive monitoring data quickly. In the present stage, the evaluation of the health degree of the database mainly depends on expert experience, and very professional database experts are needed to evaluate and sort the data quality and the fault flow according to subject characteristics and working experience of the database experts, so that a special evaluation system is established, the manual configuration cost is high, and the timeliness is not high.
Disclosure of Invention
In view of the foregoing, the present disclosure provides an anomaly detection method, apparatus, device, medium, and program product.
According to an aspect of the present disclosure, there is provided an abnormality detection method including:
acquiring system operation real-time data, system operation normal and normal historical characteristic data and system operation abnormal and abnormal historical characteristic data from a database monitoring system;
carrying out standardized processing on the real-time system operation data to obtain real-time system operation characteristic data;
constructing a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set according to the historical system normal operation characteristic data and the historical system abnormal operation characteristic data; the first characteristic data set comprises first neighbor sample data and historical system operation abnormal characteristic data, and the first neighbor sample data represents data which are in neighbor with the historical operation abnormal characteristic data in the historical operation normal characteristic data; the second characteristic data set comprises data left after the first neighbor sample data is removed from the historical system normal operation characteristic data; the third feature data set comprises second neighbor sample data; the fourth characteristic data set comprises historical system operation abnormal characteristic data, and the second neighbor sample data represents data which are neighbor to the historical operation abnormal characteristic data in the first characteristic data set;
inputting the real-time system operation characteristic data, the first characteristic data set and the second characteristic data set into a first classification model to obtain a first classification result;
inputting the real-time system operation characteristic data, the third characteristic data set and the fourth characteristic data set into a second classification model to obtain a second classification result; and
and generating an abnormality detection result according to the first classification result and the second classification result.
According to an embodiment of the present disclosure, a training method of a first classification model includes:
inputting the first characteristic data set and the second characteristic data set into a first initial classification model for training to obtain a first training classification result;
constructing a first confusion matrix according to the first training classification result, wherein the first confusion matrix comprises first classification result data, second classification result data and third classification result data; the first classification result data represents the number of samples of the historical abnormal operation feature data classified into the first feature data set, the second classification result data represents the number of samples of the historical abnormal operation feature data classified into the second feature data set, and the third classification result data represents the number of samples of the historical normal operation feature data classified into the first feature data set;
generating first classification performance index data of the first initial classification model according to the first classification result data, the second classification result data and the third classification result data;
and under the condition that the first classification performance index data meet a first preset condition, obtaining a trained first classification model.
According to an embodiment of the present disclosure, a training method of a second classification model includes:
inputting the third characteristic data set and the fourth characteristic data set into a second initial classification model for training to obtain a second training classification result;
constructing a second confusion matrix according to the second training classification result, wherein the second confusion matrix comprises fourth classification result data, fifth classification result data and sixth classification result data; the fourth classification result data represents the number of samples of the historical abnormal operation characteristic data classified into the fourth characteristic data set, the fifth classification result data represents the number of samples of the historical abnormal operation characteristic data classified into the third characteristic data set, and the sixth classification result data represents the number of samples of the historical normal operation characteristic data classified into the fourth characteristic data set;
generating second classification performance index data of the second initial classification model according to the fourth classification result data, the fifth classification result data and the sixth classification result data;
and under the condition that the second classification performance index data meet a second preset condition, obtaining a trained second classification model.
According to the embodiment of the disclosure, the method for standardizing the real-time system operation data to obtain the real-time system operation characteristic data comprises the following steps:
classifying the real-time system operation data according to the type of the real-time system operation data to obtain M real-time system operation data sets, wherein M is a positive integer;
calculating the average value of the real-time system operation data in the real-time system operation data set and the standard deviation of the real-time system operation data aiming at each real-time system operation data set;
and generating real-time system operation characteristic data according to the real-time system operation data in the real-time system operation data set, the average value and the standard deviation.
According to the embodiment of the disclosure, constructing a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set according to the historical system normal operation characteristic data and the historical system abnormal operation characteristic data comprises:
based on a K neighbor algorithm, acquiring first neighbor sample data from historical operating normal characteristic data according to historical system operating abnormal characteristic data;
constructing a first characteristic data set according to the first neighbor sample data and the abnormal operation characteristic data of the historical system;
removing the first neighbor sample data in the normal characteristic data of the operation of the historical system to obtain a second characteristic data set,
based on a K neighbor algorithm, collecting second neighbor sample data from the first characteristic data set according to the abnormal characteristic data of the historical system;
constructing a third characteristic data set according to the second neighbor sample data;
and constructing a fourth characteristic data set according to the abnormal operation characteristic data of the historical system.
According to an embodiment of the present disclosure, generating an anomaly detection result according to a first classification result and a second classification result includes:
under the condition that the first classification result is that the operation characteristic data is classified into a first characteristic data set, and the second classification result is that the operation characteristic data is classified into a fourth characteristic data set, generating an abnormal detection result as system operation abnormity;
under the condition that the first classification result is that the operation characteristic data is classified into a first characteristic data set, and the second classification result is that the operation characteristic data is classified into a third characteristic data set, generating an abnormal detection result as that the system operates normally;
and under the condition that the first classification result is that the operation characteristic data is classified into a second characteristic data set, and the second classification result is that the operation characteristic data is classified into a third characteristic data set, generating an abnormal detection result as that the system operates normally.
Another aspect of the present disclosure provides an abnormality detection apparatus including: the device comprises an acquisition module, a processing module, a construction module, a first classification module, a second classification module and a generation module. The acquisition module is used for acquiring real-time system operation data, historical system operation normal characteristic data and historical system operation abnormal characteristic data from the database monitoring system. And the processing module is used for carrying out standardized processing on the real-time system operation data to obtain real-time system operation characteristic data. The construction module is used for constructing a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set according to the normal operation characteristic data and the abnormal operation characteristic data of the historical system; the first characteristic data set comprises first neighbor sample data and historical system operation abnormal characteristic data, and the first neighbor sample data represents data which are neighbor to the historical operation abnormal characteristic data in the historical operation normal characteristic data; the second characteristic data set comprises data left after the first neighbor sample data is removed from the historical system normal operation characteristic data; the third feature data set comprises second neighbor sample data; the fourth feature data set comprises historical system operational anomaly feature data, and the second neighbor sample data characterizes data in the first feature data set that is neighbor to the historical operational anomaly feature data. And the first classification module is used for inputting the real-time system operation data, the first characteristic data set and the second characteristic data set into the first classification model to obtain a first classification result. And the second classification module is used for inputting the real-time system operation data, the third characteristic data set and the fourth characteristic data set into the second classification model to obtain a second classification result. And the generating module is used for generating an abnormal detection result according to the first classification result and the second classification result.
According to an embodiment of the present disclosure, a processing module includes a classification unit, a calculation unit, and a first generation unit. The classification unit is used for classifying the real-time system operation data according to the type of the real-time system operation data to obtain M real-time system operation data sets, wherein M is a positive integer. And the calculating unit is used for calculating the average value of the real-time system operation data in the real-time system operation data set and the standard deviation of the real-time system operation data aiming at each real-time system operation data set. And the first generating unit is used for generating the real-time system operation characteristic data according to the real-time system operation data, the average value and the standard deviation in the real-time system operation data set.
According to an embodiment of the present disclosure, a building module includes a first collecting unit, a first building unit, a removing unit, a second collecting unit, a second building unit, and a third building unit. The first acquisition unit is used for acquiring first neighbor sample data from historical operating normal characteristic data according to historical system operating abnormal characteristic data based on a K neighbor algorithm. And the first construction unit is used for constructing a first characteristic data set according to the first neighbor sample data and the historical system operation abnormal characteristic data. And the removing unit is used for removing the first neighbor sample data in the normal operating characteristic data of the historical system to obtain a second characteristic data set. And the second acquisition unit is used for acquiring second neighbor sample data from the first characteristic data set according to the abnormal characteristic data of the historical system operation based on the K neighbor algorithm. And the second construction unit is used for constructing a third feature data set according to the second neighbor sample data. And the third construction unit is used for constructing a fourth characteristic data set according to the abnormal operation characteristic data of the historical system.
According to an embodiment of the present disclosure, the generation module includes a second generation unit, a third generation unit, and a fourth generation unit. And the second generation unit is used for generating an abnormal detection result as the system operation abnormity under the condition that the first classification result is that the operation characteristic data is classified into the first characteristic data set, and the second classification result is that the operation characteristic data is classified into the fourth characteristic data set. And the third generation unit is used for generating an abnormal detection result as that the system normally operates under the condition that the first classification result is that the operation characteristic data is classified into the first characteristic data set and the second classification result is that the operation characteristic data is classified into the third characteristic data set. And the fourth generation unit is used for generating an abnormal detection result as that the system normally operates under the condition that the first classification result is that the operating characteristic data is classified into the second characteristic data set, and the second classification result is that the operating characteristic data is classified into the third characteristic data set.
Another aspect of the present disclosure provides an electronic device including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described anomaly detection method.
Another aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to perform the above-described anomaly detection method.
Another aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described anomaly detection method.
According to the embodiment of the disclosure, a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set are constructed by using historical operation abnormal characteristic data and historical operation normal characteristic data, a first classification result is obtained by inputting real-time system operation characteristic data, the first characteristic data set and the second characteristic data set into a first classification model, a second classification result is obtained by inputting real-time system operation characteristic data, the third characteristic data set and the fourth characteristic data set into a second classification model, and an abnormal detection result is generated according to the first classification result and the second classification result. Because the first characteristic data set comprises first neighbor sample data acquired from a historical operating normal characteristic data set, and the third characteristic data set comprises second neighbor sample data acquired from the first characteristic data set, when the real-time system operating data is subjected to abnormity detection, the data class balance degree in the first classification model and the second classification model can be improved, meanwhile, noise sample data introduced by oversampling is reduced, and the accuracy of system operating abnormity detection through classification is improved.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of an anomaly detection method, apparatus, device, medium and program product according to embodiments of the present disclosure;
FIG. 2 schematically illustrates a flow chart of an anomaly detection method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of a method of training a first classification model according to an embodiment of the disclosure;
FIG. 4 schematically illustrates a flow chart of a method of training a second classification model according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart for obtaining real-time system operational characteristic data according to an embodiment of the present disclosure;
FIG. 6 schematically shows a flow chart of constructing a first feature data set, a second feature data set, a third feature data set and a fourth feature data set according to an embodiment of the disclosure;
FIG. 7 schematically shows a flow chart for generating anomaly detection results according to an embodiment of the present disclosure;
fig. 8 schematically shows a block diagram of the structure of an abnormality detection apparatus according to an embodiment of the present disclosure; and
fig. 9 schematically shows a block diagram of an electronic device adapted to implement an anomaly detection method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
It should be noted that the anomaly detection method and apparatus of the present disclosure can be used in the field of big data technology and financial technology, and can also be used in any field except the financial field.
An embodiment of the present disclosure provides an anomaly detection method, including: acquiring system operation real-time data, system operation normal historical characteristic data and system operation abnormal historical characteristic data from a database monitoring system; carrying out standardized processing on the real-time system operation data to obtain real-time system operation characteristic data; constructing a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set according to the normal operation characteristic data and the abnormal operation characteristic data of the historical system; the first characteristic data set comprises first neighbor sample data and historical system operation abnormal characteristic data, and the first neighbor sample data represents data which are neighbor to the historical operation abnormal characteristic data in the historical operation normal characteristic data; the second characteristic data set comprises data left after the first neighbor sample data is removed from the historical system normal operation characteristic data; the third feature data set comprises second neighbor sample data; the fourth characteristic data set comprises historical system operation abnormal characteristic data, and the second neighbor sample data represents data which are neighbor to the historical operation abnormal characteristic data in the first characteristic data set; inputting the real-time system operation characteristic data, the first characteristic data set and the second characteristic data set into a first classification model to obtain a first classification result; inputting the real-time system operation characteristic data, the third characteristic data set and the fourth characteristic data set into a second classification model to obtain a second classification result; and generating an abnormality detection result according to the first classification result and the second classification result.
Fig. 1 schematically shows an application scenario of the anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The database monitoring systems 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, and the like (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The backend management server may analyze and process the received data such as the user request, and feed back a processing result (for example, a web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the anomaly detection method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the abnormality detection apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The anomaly detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the abnormality detection apparatus provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
The abnormality detection method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 7 based on the scenario described in fig. 1.
Fig. 2 schematically shows a flow chart of an anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 2, the abnormality detecting method of this embodiment includes operations S210 to S260.
In operation S210, system operation real-time data, system operation normal history feature data, and system operation abnormal history feature data are acquired from the database monitoring system.
According to an embodiment of the present disclosure, the system operation real-time data may include operation data capable of reflecting the health of the database system, such as: CPU usage, connection number usage which may reflect the percentage of the number of threads connecting MySQL to the connection upper limit, disk bandwidth usage which may reflect the how busy the disk is with the MySQL instance, the number of statements waiting due to lock contention in InnodB, concurrency number usage which may reflect the percentage of the number of concurrent threads executing in the database to the total upper limit, and so on.
In operation S220, the real-time system operation data is standardized to obtain real-time system operation characteristic data.
According to the embodiment of the disclosure, because the real-time system operation data are more in types and the magnitude and the unit of each type of real-time system operation data are different, the standardized real-time system operation characteristic data can be obtained by standardizing the different types of real-time system operation data. For example: the normalization can be performed in various ways such as a flux-reduction process, a normalization process, a centralization process, and the like.
In operation S230, a first feature data set, a second feature data set, a third feature data set, and a fourth feature data set are constructed according to the historical system normal operation feature data and the historical system abnormal operation feature data; the first characteristic data set comprises first neighbor sample data and historical system operation abnormal characteristic data, and the first neighbor sample data represents data which are in neighbor with the historical operation abnormal characteristic data in the historical operation normal characteristic data; the second characteristic data set comprises data left after the first neighbor sample data is removed from the historical system normal operation characteristic data; the third feature data set comprises second neighbor sample data; the fourth feature data set comprises historical system operational anomaly feature data and the second neighbor sample data characterizes data in the first feature data set that neighbors the historical operational anomaly feature data.
According to the embodiment of the disclosure, in the historical real-time system operation characteristic data, the historical system operation abnormal characteristic data belongs to a few types of sample data, and the historical system normal characteristic data belongs to a plurality of types of sample data. The historical system operational anomaly characteristic data may include n data, for example: (a)1,a2,a3...an) The historical system normal feature data may include m data, such as: (b)1,b2,b3...bm) Wherein m is greater than n.
According to the embodiment of the disclosure, K1 data neighboring to the abnormal operation characteristic data of the historical system can be collected from the normal characteristic data of the historical system based on a K-neighbor algorithm to obtain first neighbor sample data, for example (b)1,b2,b3...bk1) The first feature data set may then be (a)1,a2,a3...an,b1,b2,b3...bk1) The second feature data set may be (b)k1+1,bk1+2,bk1+3...bm) The fourth feature data set may be (a)1,a2,a3...an)。
According to the embodiment of the disclosure, K2 data neighboring to the abnormal operation characteristic data of the historical system can be collected from the first characteristic data set based on a K neighboring algorithm, so as to obtain second neighboring sample data, for example: (b)1,b2,b3...bk2) Then the third feature data set may be: (b)1,b2,b3...bk2)。
In operation S240, the real-time system operation feature data, the first feature data set, and the second feature data set are input into the first classification model to obtain a first classification result.
According to an embodiment of the present disclosure, the real-time system operation characteristic data may be represented as (p 1, p2, p3.. P)i) And i can represent that the real-time system operation characteristic data comprises i index data capable of reflecting the health degree of the database.
According to an embodiment of the present disclosure, the first classification result may include real-time system operation feature data, the first feature data set, and the second feature data passing through the first classification model, the real-time system operation feature data being classified into the first feature data set or the second feature data set.
In operation S250, the real-time system operation feature data, the third feature data set, and the fourth feature data set are input into the second classification model to obtain a second classification result.
According to an embodiment of the present disclosure, the second classification result may include that the real-time system operation feature data, the third feature data set, and the fourth feature data set are classified into the third feature data set or the fourth feature data set through the second classification model.
In operation S260, an abnormality detection result is generated according to the first classification result and the second classification result.
According to an embodiment of the present disclosure, for example: the first classification result is that the real-time system operation characteristic data is classified into a first characteristic data set, and the real-time system operation characteristic data belongs to a minority class data set. And classifying the real-time system operation characteristic data of the second classification result into a fourth characteristic data set to indicate that the real-time system operation characteristic data is abnormal operation characteristic data, wherein the generated abnormal detection result can comprise the abnormal operation of the current database system.
According to the embodiment of the disclosure, a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set are constructed by using historical operation abnormal characteristic data and historical operation normal characteristic data, a first classification result is obtained by inputting real-time system operation characteristic data, the first characteristic data set and the second characteristic data set into a first classification model, a second classification result is obtained by inputting real-time system operation characteristic data, the third characteristic data set and the fourth characteristic data set into a second classification model, and an abnormal detection result is generated according to the first classification result and the second classification result. Because the first characteristic data set comprises first neighbor sample data acquired from a historical operating normal characteristic data set, and the third characteristic data set comprises second neighbor sample data acquired from the first characteristic data set, when the real-time system operating data is subjected to abnormity detection, the data class balance degree in the first classification model and the second classification model can be improved, meanwhile, noise sample data introduced by oversampling is reduced, and the accuracy of system operating abnormity detection through classification is improved.
Fig. 3 schematically shows a flow chart of a training method of a first classification model according to an embodiment of the disclosure.
As shown in fig. 3, the training method of the first classification model of this embodiment includes operations S310 to S340.
In operation S310, the first feature data set and the second feature data set are input into the first initial classification model for training, so as to obtain a first training classification result.
In operation S320, a first confusion matrix is constructed according to the first training classification result, wherein the first confusion matrix includes first classification result data, second classification result data, and third classification result data; the first classification result data characterizes a number of samples of the historical abnormal-operation feature data classified into the first feature data set, the second classification result data characterizes a number of samples of the historical abnormal-operation feature data classified into the second feature data set, and the third classification result data characterizes a number of samples of the historical normal-operation feature data classified into the first feature data set.
According to the embodiment of the present disclosure, in the conventional classification problem, the accuracy rate is generally adopted as the evaluation index of the model performance, but when processing the unbalanced classification task, the accuracy rate is not suitable as the evaluation index of the model performance. For example: a data set containing 99% of the majority class samples and only 1% of the minority class samples. If all samples are classified into majority classes, the classifier accuracy is as high as 99%, but the classification accuracy for minority classes of samples is 0%. Therefore, in the embodiment of the present disclosure, the classification accuracy of a few classes of samples is used as a model training evaluation index in the process of training the classification model.
In operation S330, first classification performance index data of the first initial classification model is generated according to the first classification result data, the second classification result data, and the third classification result data.
According to an embodiment of the present disclosure, the first classification performance index data F1Can be represented by formula (1):
Figure BDA0003746305150000121
wherein, F1Representing a first classification performance index; recall1 represents the first Recall rate; precision1 represents a first Precision.
According to an embodiment of the present disclosure, the first Recall rate Recall1 represents a proportion of the minority sample that is correctly classified as the minority class, and may be represented by equation (2):
Figure BDA0003746305150000122
wherein TP1 represents the first classification result data; FN1 denotes the third classification result data.
According to an embodiment of the present disclosure, the first Precision1 represents a proportion of samples classified as minority classes correctly, which may be represented by equation (3):
Figure BDA0003746305150000131
wherein TP1 represents the first classification result data; FP1 denotes the second classification result data.
In operation S340, under the condition that the first classification performance index data satisfies the first preset condition, a trained first classification model is obtained.
According to an embodiment of the present disclosure, in order to verify the stability of the first classification model, the performance of the first classification model may be further evaluated using a five-fold cross-validation method. For example: the original data set used for training may be randomized, splitting the data set into 5 groups, each group having the same data distribution as the original data. For each set of data, the set of data was taken as the validation set, the remaining set of data was taken as the training set, the model was fitted on the training set and evaluated on the validation set, the evaluation score was retained and the model was discarded. The first preset condition may be that the evaluation model score is the highest, and then the trained first classification model is the classification model with the highest evaluation score of the model performance.
According to the embodiment of the disclosure, a confusion matrix of the classification training result is constructed, the model evaluation index is determined by using the accuracy and the recall rate of the classification of a few types of samples, and a first classification model with high accuracy and good stability of the classification result can be obtained for a training data set with less system abnormal data.
FIG. 4 schematically shows a flow chart of a training method of a second classification model according to an embodiment of the disclosure.
As shown in fig. 4, the training method of the second classification model of this embodiment includes operations S410 to S440.
In operation S410, the third feature data set and the fourth feature data set are input into the second initial classification model for training, so as to obtain a second training classification result.
In operation S440, a second confusion matrix is constructed according to the second training classification result, where the second confusion matrix includes fourth classification result data, fifth classification result data, and sixth classification result data; the fourth classification result data represents the number of samples of the historical abnormal-operation feature data classified into the fourth feature data set, the fifth classification result data represents the number of samples of the historical abnormal-operation feature data classified into the third feature data set, and the sixth classification result data represents the number of samples of the historical normal-operation feature data classified into the fourth feature data set.
In operation S430, second classification performance index data of the second initial classification model is generated according to the fourth classification result data, the fifth classification result data, and the sixth classification result data.
According to an embodiment of the present disclosure, the second classification performance indicator data F2Can be represented by formula (4):
Figure BDA0003746305150000141
wherein, F2Representing a second classification performance index; recall2 represents the second Recall rate; precision2 represents the second Precision.
According to the embodiment of the present disclosure, the second Recall rate Recall1 represents a proportion of the minority samples that are correctly classified as the minority class, and may be represented by formula (5):
Figure BDA0003746305150000142
wherein TP2 represents fourth classification result data; FN2 denotes sixth classification result data.
According to an embodiment of the present disclosure, the second Precision2 represents a proportion of samples that are correctly classified as minority classes in the samples classified as minority classes, and may be represented by equation (6):
Figure BDA0003746305150000143
wherein TP2 represents fourth classification result data; FP2 represents the fifth classification result data.
In operation S440, a trained second classification model is obtained when the second classification performance index data satisfies a second preset condition.
According to an embodiment of the present disclosure, in order to verify the stability of the second classification model, a five-fold cross-validation method may be employed to further evaluate the performance of the second classification model. The specific evaluation step is the same as the process of evaluating the first classification model, and is not described herein again. The second preset condition may be that the evaluation score is highest.
According to the embodiment of the disclosure, by constructing the confusion matrix of the classification training result, the model evaluation index is determined by using the accuracy and recall rate of the classification of a few types of samples, and for the training data set with less system abnormal data, the second classification model with high classification result accuracy and good stability can be obtained.
Fig. 5 schematically illustrates a flow chart for obtaining real-time system operational characteristic data according to an embodiment of the present disclosure.
As shown in fig. 5, the method of extracting real-time system operation feature data of this embodiment includes operations S510 to S530.
In operation S510, the real-time system operation data is classified according to the type of the real-time system operation data to obtain M real-time system operation data sets, where M is a positive integer.
According to an embodiment of the present disclosure, the real-time system operation data may include: CPU utilization rate, connection number utilization rate, disk bandwidth utilization rate and concurrent number utilization rate. According to the type of the real-time system operation data, 4 real-time system operation data sets can be obtained: CPU usage data set, connection number usage data set, disk bandwidth utilization data set, and concurrent number usage data set.
In operation S520, for each real-time system operation data set, an average value of the real-time system operation data in the real-time system operation data set and a standard deviation of the real-time system operation data are calculated.
According to an embodiment of the present disclosure, taking the CPU usage data set as an example, the CPU usage data set may be represented as (x)1,x2,x3...xn)。
In operation S530, real-time system operation characteristic data is generated according to the real-time system operation data in the real-time system operation data set and the mean value and the standard deviation.
According to an embodiment of the present disclosure, the real-time system operation characteristic data may be as shown in equation (7):
Figure BDA0003746305150000151
wherein Z represents the real-time system operation characteristic data, x represents the real-time system operation data, mu represents the average value of all data in the real-time system operation data set, and sigma represents the standard deviation of all data in the real-time system operation data set.
According to the embodiment of the disclosure, the average value and the standard deviation of the real-time system operation data are calculated, and the different types of real-time system operation data are converted into the standard characteristic data, so that the training of the classification model is performed, and the influence of the data type on the model parameters is reduced.
Fig. 6 schematically shows a flow chart for constructing a first feature data set, a second feature data set, a third feature data set and a fourth feature data set according to an embodiment of the present disclosure.
As shown in fig. 6, the method of constructing a feature data set of this embodiment includes operations S610 to S660.
In operation S610, first neighbor sample data is collected from the historical operating normal feature data based on the K neighbor algorithm according to the historical system operating abnormal feature data.
In operation S620, a first feature data set is constructed according to the first neighbor sample data and the historical system operation abnormal feature data.
In operation S630, the first neighbor sample data in the history system operation normal feature data is removed, and a second feature data set is obtained.
In operation S640, second neighbor sample data is collected from the first feature data set according to the historical system operating abnormal feature data based on the K neighbor algorithm.
In operation S650, a third feature data set is constructed according to the second neighbor sample data.
In operation S660, a fourth feature data set is constructed according to the historical system operation abnormal feature data.
According to an embodiment of the present disclosure, the historical system operational anomaly characteristic data may include (a)1,a2,a3...an) Based on K neighbor algorithm, collecting K1 data neighbor to abnormal operation feature data of historical system from normal feature data of historical system to obtain first neighbor sample data, such as (b)1,b2,b3...bk1) The first feature data set may then be (a)1,a2,a3...an,b1,b2,b3...bk1) The second feature data set may be (b)k1+1,bk1+2,bk1+3...bm) The fourth feature data set may be (a)1,a2,a3...an)。
According to the embodiment of the present disclosure, K2 data neighboring to the abnormal operation characteristic data of the historical system may be collected from the first characteristic data set based on a K neighboring algorithm to obtain second neighboring sample data, for example: (b)1,b2,b3...bk2) Then the third feature data set may be: (b)1,b2,b3...bk2)。
According to the embodiment of the disclosure, the first neighbor sample data is collected from the historical system normal operation characteristic data to form the first characteristic data set with the historical system abnormal operation characteristic data, so that the problems that a noise sample is introduced in random sampling and the classification difficulty is increased can be avoided. And then, second neighbor sample data is collected from the first characteristic data set, so that the unbalanced composition of the sample data is improved while new sample data is not synthesized, and the classification performance is improved.
Fig. 7 schematically shows a flow chart for generating an anomaly detection result according to an embodiment of the present disclosure.
As shown in fig. 7, the method of generating an abnormality detection result of this embodiment includes operations S710 to S730.
In operation S710, in a case that the first classification result is that the real-time system operation feature data is classified into the first feature data set, and the second classification result is that the real-time system operation feature data is classified into the fourth feature data set, an abnormality detection result is generated as a system operation abnormality.
In operation S720, in a case that the first classification result is that the real-time system operation feature data is classified into the first feature data set, and the second classification result is that the real-time system operation feature data is classified into the third feature data set, the abnormality detection result is generated as that the system is normally operated.
In operation S730, in a case that the first classification result is that the real-time system operation feature data is classified into the second feature data set, and the second classification result is that the real-time system operation feature data is classified into the third feature data set, the abnormality detection result is generated as that the system is normally operated.
According to the embodiment of the disclosure, when the real-time system operation characteristic data is classified into the first characteristic data set by the first classification model and the real-time system operation characteristic data is classified into the fourth characteristic data set by the second classification model, the real-time system operation characteristic data is represented as system operation abnormity. In other cases, the system is operated normally.
According to the embodiment of the disclosure, the abnormal detection result is generated according to the first classification result and the second classification result, so that the abnormal condition of the database can be automatically and accurately detected, the manual intervention is reduced, and the working pressure of operation and maintenance personnel is reduced.
Based on the anomaly detection method, the disclosure also provides an anomaly detection device. The apparatus will be described in detail below with reference to fig. 8.
Fig. 8 schematically shows a block diagram of the structure of an abnormality detection apparatus according to an embodiment of the present disclosure.
As shown in fig. 8, the abnormality detection apparatus 800 of this embodiment includes an acquisition module 810, a processing module 820, a construction module 830, a first classification module 840, a second classification module 850, and a generation module 860.
The obtaining module 810 is configured to obtain real-time system operation data, historical system operation normal characteristic data, and historical system operation abnormal characteristic data from the database monitoring system. In an embodiment, the obtaining module 810 may be configured to perform the operation S210 described above, which is not described herein again.
The processing module 820 is used for standardizing the real-time system operation data to obtain real-time system operation characteristic data. In an embodiment, the processing module 820 may be configured to perform the operation S220 described above, which is not described herein again.
The building module 830 is configured to build a first feature data set, a second feature data set, a third feature data set, and a fourth feature data set according to the historical system normal operation feature data and the historical system abnormal operation feature data; the first characteristic data set comprises first neighbor sample data and historical system operation abnormal characteristic data, and the first neighbor sample data represents data which are in neighbor with the historical operation abnormal characteristic data in the historical operation normal characteristic data; the second characteristic data set comprises data left after the first neighbor sample data is removed from the historical system normal operation characteristic data; the third feature data set comprises second neighbor sample data; the fourth feature data set comprises historical system operational anomaly feature data, and the second neighbor sample data characterizes data in the first feature data set that is neighbor to the historical operational anomaly feature data. In an embodiment, the building module 830 may be configured to perform the operation S230 described above, and is not described herein again.
The first classification module 840 is configured to input the real-time system operation feature data, the first feature data set, and the second feature data set into the first classifier to obtain a first classification result. The first classification module 840 may be configured to perform the operation S240 described above, which is not described herein again.
The second classification module 850 is configured to input the real-time system operation feature data, the third feature data set, and the fourth feature data set into the second classifier, so as to obtain a second classification result. The second classification module 850 may be configured to perform the operation S250 described above, and will not be described herein again.
The generating module 860 is configured to generate an anomaly detection result according to the first classification result and the second classification result. The generating module 860 may be configured to perform the operation S260 described above, and is not described herein again.
According to an embodiment of the present disclosure, a processing module includes a classification unit, a calculation unit, and a first generation unit. The classification unit is used for classifying the real-time system operation data according to the type of the real-time system operation data to obtain M real-time system operation data sets, wherein M is a positive integer. And the calculating unit is used for calculating the average value of the real-time system operation data in the real-time system operation data set and the standard deviation of the real-time system operation data aiming at each real-time system operation data set. And the first generation unit is used for generating the real-time system operation characteristic data according to the real-time system operation data in the real-time system operation data set, the average value and the standard deviation.
According to an embodiment of the present disclosure, a building module includes a first collecting unit, a first building unit, a removing unit, a second collecting unit, a second building unit, and a third building unit. The first acquisition unit is used for acquiring first neighbor sample data from historical operating normal characteristic data according to historical system operating abnormal characteristic data based on a K neighbor algorithm. And the first construction unit is used for constructing a first characteristic data set according to the first neighbor sample data and the historical system operation abnormal characteristic data. And the removing unit is used for removing the first neighbor sample data in the normal operating characteristic data of the historical system to obtain a second characteristic data set. And the second acquisition unit is used for acquiring second neighbor sample data from the first feature data set according to the abnormal feature data of the historical system operation based on the K neighbor algorithm. And the second construction unit is used for constructing a third feature data set according to the second neighbor sample data. And the third construction unit is used for constructing a fourth characteristic data set according to the abnormal operation characteristic data of the historical system.
According to an embodiment of the present disclosure, the generation module includes a second generation unit, a third generation unit, and a fourth generation unit. And the second generation unit is used for generating an abnormal detection result as a system operation abnormity under the condition that the first classification result is that the operation characteristic data is classified into the first characteristic data set, and the second classification result is that the operation characteristic data is classified into the fourth characteristic data set. And the third generation unit is used for generating an abnormal detection result as that the system normally operates under the condition that the first classification result is that the operation characteristic data is classified into the first characteristic data set and the second classification result is that the operation characteristic data is classified into the third characteristic data set. And the fourth generation unit is used for generating an abnormal detection result as that the system normally operates under the condition that the first classification result is that the operating characteristic data is classified into the second characteristic data set, and the second classification result is that the operating characteristic data is classified into the third characteristic data set.
According to an embodiment of the present disclosure, any multiple of the obtaining module 810, the processing module 820, the constructing module 830, the first classifying module 840, the second classifying module 850, and the generating module 860 may be combined into one module to be implemented, or any one of the modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 810, the processing module 820, the constructing module 830, the first classifying module 840, the second classifying module 850, and the generating module 860 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or by any other reasonable manner of integrating or packaging a circuit, or by any one of three implementations of software, hardware, and firmware, or by any suitable combination of any several of them. Alternatively, at least one of the obtaining module 810, the processing module 820, the constructing module 830, the first classifying module 840, the second classifying module 850 and the generating module 860 may be at least partially implemented as a computer program module which, when executed, may perform a corresponding function.
Fig. 9 schematically shows a block diagram of an electronic device adapted to implement an anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 can include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or related chipset(s) and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. A drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated by a person skilled in the art that various combinations or/and combinations of features recited in the various embodiments of the disclosure and/or in the claims may be made, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. An anomaly detection method comprising:
acquiring system operation real-time data, system operation normal historical characteristic data and system operation abnormal historical characteristic data from a database monitoring system;
carrying out standardized processing on the real-time system operation data to obtain real-time system operation characteristic data;
according to the historical system normal operation characteristic data and the historical system abnormal operation characteristic data, a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set are constructed; wherein a first feature data set comprises first neighbor sample data and the historical system operation abnormal feature data, the first neighbor sample data characterizing data that is neighbor to the historical operation abnormal feature data in the historical operation normal feature data; the second characteristic data set comprises data left after the first neighbor sample data is removed from the historical system operation normal characteristic data; the third feature data set comprises second neighbor sample data; the fourth feature data set comprises the historical system operational anomaly feature data, the second neighbor sample data characterizing data in the first feature data set that neighbors the historical operational anomaly feature data;
inputting the real-time system operation characteristic data, the first characteristic data set and the second characteristic data set into a first classification model to obtain a first classification result;
inputting the real-time system operation characteristic data, the third characteristic data set and the fourth characteristic data set into a second classification model to obtain a second classification result; and
and generating an abnormal detection result according to the first classification result and the second classification result.
2. The method of claim 1, wherein the training method of the first classification model comprises:
inputting the first characteristic data set and the second characteristic data set into a first initial classification model for training to obtain a first training classification result;
constructing a first confusion matrix according to the first training classification result, wherein the first confusion matrix comprises first classification result data, second classification result data and third classification result data; the first classification result data characterizes a number of samples of historical abnormal-operation feature data classified into the first feature data set, the second classification result data characterizes a number of samples of historical abnormal-operation feature data classified into the second feature data set, and the third classification result data characterizes a number of samples of historical normal-operation feature data classified into the first feature data set;
generating first classification performance index data of the first initial classification model according to the first classification result data, the second classification result data and the third classification result data;
and under the condition that the first classification performance index data meet a first preset condition, obtaining the trained first classification model.
3. The method of claim 1, wherein the training method of the second classification model comprises:
inputting the third characteristic data set and the fourth characteristic data set into a second initial classification model for training to obtain a second training classification result;
constructing a second confusion matrix according to the second training classification result, wherein the second confusion matrix comprises fourth classification result data, fifth classification result data and sixth classification result data; the fourth classification result data characterizes a number of samples of historical operationally-abnormal feature data classified into the fourth feature data set, the fifth classification result data characterizes a number of samples of historical operationally-abnormal feature data classified into the third feature data set, and the sixth classification result data characterizes a number of samples of historical operationally-normal feature data classified into the fourth feature data set;
generating second classification performance index data of the second initial classification model according to the fourth classification result data, the fifth classification result data and the sixth classification result data;
and under the condition that the second classification performance index data meet a second preset condition, obtaining the trained second classification model.
4. The method of claim 1, wherein the normalizing the real-time system operating data to obtain real-time system operating characteristic data comprises:
classifying the real-time system operation data according to the type of the real-time system operation data to obtain M real-time system operation data sets, wherein M is a positive integer;
calculating the average value of the real-time system running data in each real-time system running data set and the standard deviation of the real-time system running data;
and generating the real-time system operation characteristic data according to the real-time system operation data in the real-time system operation data set, the average value and the standard deviation.
5. The method of claim 1, wherein constructing a first feature data set, a second feature data set, a third feature data set, and a fourth feature data set from the historical system operational health feature data and the historical system operational anomaly feature data comprises:
acquiring first neighbor sample data from the historical operating normal characteristic data based on a K neighbor algorithm according to the historical system operating abnormal characteristic data;
constructing the first characteristic data set according to the first neighbor sample data and the historical system operation abnormal characteristic data;
removing the first neighbor sample data in the historical system operating normal feature data to obtain the second feature data set,
acquiring second neighbor sample data from the first characteristic data set according to the abnormal operating characteristic data of the historical system based on a K neighbor algorithm;
constructing the third feature data set according to the second neighbor sample data;
and constructing the fourth characteristic data set according to the abnormal operation characteristic data of the historical system.
6. The method of claim 1, wherein generating an anomaly detection result from the first classification result and the second classification result comprises:
generating the abnormal detection result as system operation abnormity under the condition that the first classification result is that the real-time system operation characteristic data is classified into the first characteristic data set and the second classification result is that the real-time system operation characteristic data is classified into the fourth characteristic data set;
under the condition that the first classification result is that the real-time system operation characteristic data is classified into the first characteristic data set, and the second classification result is that the real-time system operation characteristic data is classified into the third characteristic data set, generating the abnormal detection result as that the system is normally operated;
and generating the abnormal detection result as normal system operation under the condition that the first classification result is that the real-time system operation characteristic data is classified into the second characteristic data set and the second classification result is that the real-time system operation characteristic data is classified into the third characteristic data set.
7. An abnormality detection device comprising:
the acquisition module is used for acquiring real-time system operation data, historical system operation normal characteristic data and historical system operation abnormal characteristic data from the database monitoring system;
the processing module is used for carrying out standardized processing on the real-time system operation data to obtain real-time system operation characteristic data;
the construction module is used for constructing a first characteristic data set, a second characteristic data set, a third characteristic data set and a fourth characteristic data set according to the normal operation characteristic data of the historical system and the abnormal operation characteristic data of the historical system; wherein the first feature data set comprises first neighbor sample data and the historical system operational anomaly feature data, the first neighbor sample data characterizing data that is neighbor to the historical operational anomaly feature data in the historical operational normal feature data; the second characteristic data set comprises data left after the first neighbor sample data is removed from the historical system operation normal characteristic data; the third feature data set comprises second neighbor sample data; the fourth feature data set comprises the historical system operational anomaly feature data, the second neighbor sample data characterizing data in the first feature data set that is neighbor to the historical operational anomaly feature data;
the first classification module is used for inputting the real-time system operation data, the first characteristic data set and the second characteristic data set into a first classification model to obtain a first classification result;
the second classification module is used for inputting the real-time system operation data, the third characteristic data set and the fourth characteristic data set into a second classification model to obtain a second classification result; and
and the generating module is used for generating an abnormal detection result according to the first classification result and the second classification result.
8. An electronic device, comprising:
one or more processors;
a storage device to store one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.
9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, carries out the method according to any one of claims 1 to 6.
CN202210838294.XA 2022-07-14 2022-07-14 Abnormity detection method, device, equipment and medium Pending CN115269315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210838294.XA CN115269315A (en) 2022-07-14 2022-07-14 Abnormity detection method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210838294.XA CN115269315A (en) 2022-07-14 2022-07-14 Abnormity detection method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115269315A true CN115269315A (en) 2022-11-01

Family

ID=83765088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210838294.XA Pending CN115269315A (en) 2022-07-14 2022-07-14 Abnormity detection method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115269315A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115856514A (en) * 2023-02-28 2023-03-28 宝胜高压电缆有限公司 Intelligent operation abnormity monitoring and positioning method and system for polypropylene cable

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115856514A (en) * 2023-02-28 2023-03-28 宝胜高压电缆有限公司 Intelligent operation abnormity monitoring and positioning method and system for polypropylene cable

Similar Documents

Publication Publication Date Title
US11275642B2 (en) Tuning context-aware rule engine for anomaly detection
CN111290924B (en) Monitoring method and device and electronic equipment
US9811391B1 (en) Load balancing and conflict processing in workflow with task dependencies
CN107392259B (en) Method and device for constructing unbalanced sample classification model
CN114298221A (en) Fault determination method and device, electronic equipment and computer readable storage medium
US11645540B2 (en) Deep graph de-noise by differentiable ranking
CN115913710A (en) Abnormality detection method, apparatus, device and storage medium
US10853130B1 (en) Load balancing and conflict processing in workflow with task dependencies
CN111581258A (en) Safety data analysis method, device, system, equipment and storage medium
CN115269315A (en) Abnormity detection method, device, equipment and medium
CN114202256A (en) Architecture upgrading early warning method and device, intelligent terminal and readable storage medium
CN111582649B (en) Risk assessment method and device based on user APP single-heat coding and electronic equipment
CN116225848A (en) Log monitoring method, device, equipment and medium
CN113791897B (en) Method and system for displaying server baseline detection report of rural telecommunication system
CN115204733A (en) Data auditing method and device, electronic equipment and storage medium
CN113656391A (en) Data detection method and device, storage medium and electronic equipment
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN113032237A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112860652A (en) Operation state prediction method and device and electronic equipment
CN112579429A (en) Problem positioning method and device
US20210397538A1 (en) Diagnosing application problems by learning from fault injections
CN115292146B (en) System capacity estimation method, system, equipment and storage medium
CN115981970B (en) Fortune dimension analysis method, device, equipment and medium
CN116450465B (en) Data processing method, device, equipment and medium
CN111274088B (en) Real-time monitoring method, device, medium and electronic equipment for big data platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination