CN112906727A - Method and system for real-time online detection of virtual machine state - Google Patents

Method and system for real-time online detection of virtual machine state Download PDF

Info

Publication number
CN112906727A
CN112906727A CN201911226077.XA CN201911226077A CN112906727A CN 112906727 A CN112906727 A CN 112906727A CN 201911226077 A CN201911226077 A CN 201911226077A CN 112906727 A CN112906727 A CN 112906727A
Authority
CN
China
Prior art keywords
data
model
detection result
kpi
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911226077.XA
Other languages
Chinese (zh)
Inventor
杜璟彦
李伟泽
李祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201911226077.XA priority Critical patent/CN112906727A/en
Publication of CN112906727A publication Critical patent/CN112906727A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/301Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is a virtual computing platform, e.g. logically partitioned systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The present disclosure provides a method and system for real-time online detection of virtual machine state. Reading OCSVM model parameters, DAEN model parameters and SVM classification model parameters obtained through training from a training model database so as to complete the initialization of an OCSVM model, a DAEN model and an SVM classification model; acquiring KPI data of a virtual machine on line; carrying out misuse detection on the KPI data by using an OCSVM model to obtain a first detection result; performing abnormity detection on the KPI data by using a DAEN model to obtain a second detection result; if the first detection result is consistent with the second detection result, outputting the first detection result or the second detection result as a detection result of the KPI data; if the first detection result is inconsistent with the second detection result, the first detection result and the second detection result are classified and judged by using an SVM classification model so as to output the detection result of the KPI data. The method and the device can detect the state of the virtual machine on line, and effectively reduce the rate of missing report and the rate of false report.

Description

Method and system for real-time online detection of virtual machine state
Technical Field
The present disclosure relates to the field of cloud computing, and in particular, to a method and system for real-time online detection of a virtual machine state.
Background
Cloud computing provides services to users through virtualization technologies. The virtual machine is used as a carrier of cloud computing service, and the abnormal state of the virtual machine not only affects the use experience of a user, but also can cause that a service system cannot run, thereby causing various losses which are difficult to measure. The method has the advantages that the abnormal state of the running virtual machine is detected in time, operation and maintenance personnel are prompted to take necessary measures, and the method is an important means for ensuring the stable running of all service systems under the cloud computing platform.
Conventional Detection methods are classified into Anomaly Detection (Anomaly Detection) and Misuse Detection (Misuse Detection). Anomaly detection defines normal behavior first and all other behaviors as anomalies. Misuse detection defines abnormal behavior first and all other behaviors as normal. The anomaly detection can detect unknown abnormal behaviors, but if all normal behaviors cannot be defined, a higher false alarm rate occurs. The false alarm rate of misuse detection is low, the detection speed is high, but unknown abnormal behaviors cannot be found, and the false alarm rate is high.
In a cloud computing environment, the online monitoring data of the state of a virtual machine mainly has three characteristics: 1) the data is large in scale and multiple in types; 2) the manual marking cost is high, the sample data with normal or abnormal marks is less, and a large amount of data is not marked; 3) the data distribution is unbalanced, most of the collected monitoring data are normal data, and the number of samples of abnormal data is small. For monitoring the state of a virtual machine in a cloud environment, the currently adopted anomaly detection method mainly collects index data of the virtual machine in a normal state at fixed time intervals, then utilizes a long-term memory neural network to train a normal behavior model of the virtual machine in an off-line manner, and deploys the training model on a server to perform online real-time detection. However, the method is difficult to accurately acquire all the normal behavior state data of the virtual machine, and the false alarm rate is high; meanwhile, a small amount of abnormal data generated in the running of the virtual machine is not fully utilized, and the detection efficiency is reduced.
Disclosure of Invention
The invention provides a scheme for efficiently detecting the state of a virtual machine on line in real time so as to effectively reduce the rate of missing report and the rate of false report.
According to a first aspect of the embodiments of the present disclosure, a method for real-time online detection of a virtual machine state is provided, including: reading the trained parameters of the OCSVM model, the DAEN model and the SVM classification model of the support vector machine from a training model database to complete the initialization of the OCSVM model, the DAEN model and the SVM classification model; acquiring key performance index KPI data of the virtual machine on line; carrying out misuse detection on the KPI data by using an OCSVM model to obtain a first detection result; performing abnormity detection on the KPI data by using a DAEN model to obtain a second detection result; judging whether the first detection result is consistent with the second detection result; if the first detection result is consistent with the second detection result, outputting the first detection result or the second detection result as a detection result of the KPI data; if the first detection result is inconsistent with the second detection result, the first detection result and the second detection result are classified and judged by using an SVM classification model so as to output the detection result of the KPI data.
In some embodiments, the detecting misuse of KPI data using the OCSVM model comprises: processing the KPI data by using an OCSVM model to determine a difference value of decision boundaries of the PKI data and the OCSVM model; judging whether the KPI data is positioned in the decision boundary or not according to the difference value; if the KPI data is located in the decision boundary, the output first detection result indicates that the KPI data is abnormal; and if the KPI data is positioned outside the decision boundary, the output first detection result indicates that the KPI data is normal.
In some embodiments, anomaly detection of KPI data using the DAEN model comprises: processing the KPI data by using a DAEN model to calculate a square error value of the KPI data; judging whether the square error value of the KPI data is smaller than a preset square error threshold or not; if the square error value of the KPI data is smaller than a preset square error threshold, the output second detection result indicates that the KPI data is normal; and if the square error value of the KPI data is greater than the preset square error threshold, indicating that the KPI data is abnormal by the output second detection result.
In some embodiments, the KPI information includes at least one of CPU load, CPU usage, total number of processes in running state, memory occupied by processes, total amount of physical memory, available capacity of physical memory, network card outflow rate, network card inflow rate, disk read rate, disk write rate, used space and free space of file system partition.
In some embodiments, historical monitoring data of the virtual machine KPI is obtained, wherein the historical monitoring data comprises an unlabeled data set and a labeled data set, and the unlabeled data amount is greater than the labeled data amount; processing the data in the unmarked data set by using an isolated forest algorithm to form a marked abnormal data set and a marked normal data set; based on the marked abnormal data set, an OCSVM algorithm is adopted to train an abnormal behavior model, the marked data set is utilized to adjust the abnormal behavior model so as to obtain an OCSVM model, and OCSVM model parameters are stored in a training model database; based on the marked normal data set, training a normal behavior model by adopting a DAEN algorithm, adjusting the normal behavior model by utilizing the marked data set to obtain a DAEN model, and storing DAEN model parameters into a training model database; calculating a difference value from each data in the labeled data set to a decision boundary of the OCSVM model by using the OCSVM model, and generating a first operation data set according to the calculated difference value; calculating the square error value of each data in the labeled data set by using a DAEN model, calculating the difference value between each square error value and a preset square error threshold, and generating a second operation data set according to the calculated difference value; merging the first operational data set and the second operational data set, and adding corresponding category labels in the labeled data set to obtain a third operational data set; and performing two-classification training on the third operation data set by using an SVM classification algorithm to obtain an SVM classification model, and storing SVM classification model parameters into a training model database.
In some embodiments, data that is labeled as normal is distributed evenly over data that is labeled as abnormal in the labeled dataset.
According to a second aspect of the embodiments of the present disclosure, there is provided a system for real-time online detection of a virtual machine state, including: the model initialization module is configured to read trained single-classification support vector machine OCSVM model parameters, deep self-coding network DAEN model parameters and support vector machine SVM classification model parameters from a training model database so as to complete initialization of an OCSVM model, a DAEN model and an SVM classification model; the online detection module is configured to acquire key performance indicator KPI data of the virtual machine online, perform misuse detection on the KPI data by using an OCSVM model to obtain a first detection result, perform anomaly detection on the KPI data by using a DAEN model to obtain a second detection result, determine whether the first detection result is consistent with the second detection result, output the first detection result or the second detection result as the detection result of the KPI data if the first detection result is consistent with the second detection result, and perform classification determination on the first detection result and the second detection result by using an SVM classification model to output the detection result of the KPI data if the first detection result is inconsistent with the second detection result.
In some embodiments, the online detection module is configured to process the KPI data by using the OCSVM model to determine a difference between the decision boundary of the PKI data and the OCSVM model, determine whether the KPI data is located inside the decision boundary according to the difference, if the KPI data is located inside the decision boundary, output a first detection result indicating that the KPI data is abnormal, and if the KPI data is located outside the decision boundary, output a first detection result indicating that the KPI data is normal.
In some embodiments, the online detection module is configured to process the KPI data using the dae model to calculate a square error value of the KPI data, determine whether the square error value of the KPI data is smaller than a preset square error threshold, output a second detection result indicating that the KPI data is normal if the square error value of the KPI data is smaller than the preset square error threshold, and output the second detection result indicating that the KPI data is abnormal if the square error value of the KPI data is larger than the preset square error threshold.
In some embodiments, the KPI information includes at least one of CPU load, CPU usage, total number of processes in running state, memory occupied by processes, total amount of physical memory, available capacity of physical memory, network card outflow rate, network card inflow rate, disk read rate, disk write rate, used space and free space of file system partition.
In some embodiments, the system further comprises: the training module is configured to acquire virtual machine KPI historical monitoring data, wherein the historical monitoring data comprises an unmarked data set and a marked data set, the unmarked data amount is greater than the marked data amount, data in the unmarked data set is processed by using an isolated forest algorithm to form a marked abnormal data set and a marked normal data set, an OCSVM algorithm is adopted to train an abnormal behavior model based on the marked abnormal data set, the marked data set is used to adjust the abnormal behavior model to obtain an OCSVM model, and OCSVM model parameters are stored in a training model database; based on the marked normal data set, training a normal behavior model by adopting a DAEN algorithm, adjusting the normal behavior model by utilizing the marked data set to obtain a DAEN model, and storing DAEN model parameters into a training model database; calculating a difference value from each data in the labeled data set to a decision boundary of the OCSVM model by using the OCSVM model, and generating a first operation data set according to the calculated difference value; calculating the square error value of each data in the labeled data set by using a DAEN model, calculating the difference value between each square error value and a preset square error threshold, and generating a second operation data set according to the calculated difference value; merging the first operational data set and the second operational data set, and adding corresponding category labels in the labeled data set to obtain a third operational data set; and performing two-classification training on the third operation data set by using an SVM classification algorithm to obtain an SVM classification model, and storing SVM classification model parameters into a training model database.
In some embodiments, data that is labeled as normal is distributed evenly over data that is labeled as abnormal in the labeled dataset.
According to a third aspect of the embodiments of the present disclosure, there is provided a system for real-time online detection of a virtual machine state, including: a memory configured to store instructions; a processor coupled to the memory, the processor configured to perform a method implementing any of the embodiments described above based on instructions stored by the memory.
According to a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, in which computer instructions are stored, and when executed by a processor, the computer-readable storage medium implements the method according to any of the embodiments described above.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram of a method for real-time online detection of virtual machine state according to one embodiment of the present disclosure;
FIG. 2 is a flow diagram of a method for real-time online detection of virtual machine state according to another embodiment of the present disclosure;
FIG. 3 is a block diagram of a system for real-time online detection of virtual machine state according to one embodiment of the present disclosure;
FIG. 4 is a schematic block diagram of a system for real-time online detection of virtual machine state, according to another embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a system for real-time online detection of virtual machine states according to another embodiment of the present disclosure.
It should be understood that the dimensions of the various parts shown in the figures are not drawn to scale. Further, the same or similar reference numerals denote the same or similar components.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. The description of the exemplary embodiments is merely illustrative and is in no way intended to limit the disclosure, its application, or uses. The present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that: the relative arrangement of parts and steps, the composition of materials and values set forth in these embodiments are to be construed as illustrative only and not as limiting unless otherwise specifically stated.
The use of the word "comprising" or "comprises" and the like in this disclosure means that the elements listed before the word encompass the elements listed after the word and do not exclude the possibility that other elements may also be encompassed.
All terms (including technical or scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs unless specifically defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
FIG. 1 is a flow diagram of a method for real-time online detection of virtual machine state according to one embodiment of the present disclosure. In some embodiments, the following method steps for real-time online detection of virtual machine state are performed by a system for real-time online detection of virtual machine state.
In step 101, the trained OCSVM (One Class Support Vector Machine) model parameters, DAEN (Deep auto encoder Network) model parameters and SVM (Support Vector Machine) classification model parameters are read from the training model database to complete the initialization of the OCSVM model, the DAEN model and the SVM classification model.
In some embodiments, the training model database may be MySQL or other type of database.
In step 102, virtual machine KPI (Key Performance Indicator) data is obtained online.
In some embodiments, the KPI information includes at least one of CPU (Central Processing Unit) load, CPU usage, total number of processes in operation, memory occupied by processes, total amount of physical memory, available capacity of physical memory, network card outflow rate, network card inflow rate, disk read rate, disk write rate, file system partition used space, and free space.
In some embodiments, KPI data may be collected by deploying a Zabbix Agent on a target virtual machine.
In step 103, the ocsv model is used to perform misuse detection on the KPI data to obtain a first detection result.
In some embodiments, the KPI data is processed by using an OCSVM model to determine a difference between the decision boundary of the PKI data and the OCSVM model, and determine whether the KPI data is located inside the decision boundary based on the difference. If the KPI data is located in the decision boundary, the output first detection result indicates that the KPI data is abnormal; and if the KPI data is positioned outside the decision boundary, the output first detection result indicates that the KPI data is normal.
In step 104, the KPI data is detected for abnormalities by using the DAEN model to obtain a second detection result.
In some embodiments, the method includes processing KPI data by using a DAEN model to calculate a square error value of the KPI data, determining whether the square error value of the KPI data is smaller than a preset square error threshold, and outputting a second detection result indicating that the KPI data is normal if the square error value of the KPI data is smaller than the preset square error threshold; and if the square error value of the KPI data is greater than the preset square error threshold, indicating that the KPI data is abnormal by the output second detection result.
In step 105, it is determined whether the first detection result and the second detection result match.
If the first detection result is consistent with the second detection result, executing step 106; if the first and second detection results are not the same, step 107 is executed.
In step 106, the first detection result or the second detection result is output as a detection result of the KPI data.
In step 107, a classification judgment is performed on the first detection result and the second detection result by using an SVM classification model to output a detection result of the KPI data.
It should be noted that, since the SVM classification itself is not the point of the invention of the present disclosure, the description is not made here.
In the method for detecting the state of the virtual machine in real time on line provided by the above embodiment of the present disclosure, online data is detected by using the trained OCSVM model, the DAEN model and the SVM classification model, so that the state of the virtual machine can be detected in real time and efficiently on line and an abnormality in operation can be found in time under the condition of large total data amount and less labeled data.
FIG. 2 is a flowchart illustrating a method for real-time online detection of virtual machine state according to another embodiment of the present disclosure. In some embodiments, the following method steps for real-time online detection of virtual machine state are performed by a system for real-time online detection of virtual machine state.
In step 201, historical monitoring data of the KPI of the virtual machine is obtained, where the historical monitoring data includes an unlabeled data set and a labeled data set, and an unlabeled data amount is greater than a labeled data amount.
In some embodiments, in the Labeled data set Labeled _ Dataset, the data Labeled as normal and the data Labeled as abnormal are equally distributed.
In step 202, the data in the unmarked dataset is processed using an Isolation Forest (Isolation Forest) algorithm to form a marked abnormal dataset and a marked normal dataset.
For example, a marked anomaly data set composed of data marked for anomalies is denoted by IF _ anomaly _ Dataset, and a marked Normal data set composed of data marked for normals is denoted by IF _ Normal _ Dataset.
It should be noted here that, in the process of processing a large amount of unlabeled data by using the isolated forest algorithm, points which are sparsely distributed and are far from a population with high density are marked as abnormal, and other points are marked as normal.
In step 203, based on the Labeled Abnormal data set IF _ Abnormal _ Dataset, an OCSVM algorithm is adopted to train the Abnormal behavior model, the Labeled data set Labeled _ Dataset is used to adjust the Abnormal behavior model to obtain an OCSVM model, and OCSVM model parameters are stored in a training model database.
The OCSVM model has a decision boundary, and the output value of the model is the difference value from a data point to the decision boundary. Data points are determined to be abnormal when they are within the decision boundary and normal when they are outside the decision boundary.
In step 204, based on the marked normal data set, a DAEN algorithm is adopted to train the normal behavior model, the marked data set Labeled _ Dataset is used to adjust the normal behavior model to obtain the DAEN model, and DAEN model parameters are stored in a training model database.
The output of the DAEN model is the squared error value. The DAEN model has a square error threshold, and is determined to be normal when the calculation result of the input data is lower than the threshold, and is determined to be abnormal when the calculation result exceeds the threshold.
In step 205, the OCSVM model is used to calculate the difference between each data in the Labeled data set Labeled _ Dataset and the decision boundary of the OCSVM model, and generate a first set of calculated data OCSVM _ Dataset according to the calculated difference.
In step 206, a square error value of each data in the Labeled data set Labeled _ Dataset is calculated by using the DAEN model, a difference value between each square error value and a preset square error threshold is calculated, and a second operation data set DAEN _ Dataset is generated according to the calculated difference value.
In step 207, the first operation data set OCSVM _ Dataset and the second operation data set DAEN _ Dataset are merged, and a corresponding category label in the Labeled data set Labeled _ Dataset is added to obtain a third operation data set Dataset.
In step 208, performing two-class training on the third operation data set Dataset by using an SVM classification algorithm to obtain an SVM classification model, and storing SVM classification model parameters into a training model database.
In some embodiments, the orphan forest algorithm, the OCSVM algorithm, and the SVM classification algorithm may be implemented using Spark, and the DAEN algorithm may be implemented using TensorFlow.
Fig. 3 is a schematic structural diagram of a system for real-time online detection of virtual machine states according to one embodiment of the present disclosure. As shown in FIG. 3, the system for real-time online detection of virtual machine state includes a model initialization module 31 and an online detection module 32.
The model initialization module 31 is configured to read the trained OCSVM model parameters, DAEN model parameters, and SVM classification model parameters from the training model database to complete the initialization of the OCSVM model, DAEN model, and SVM classification model.
In some embodiments, the training model database may be MySQL or other type of database.
The online detection module 32 is configured to acquire virtual machine KPI data online, perform misuse detection on the KPI data by using an OCSVM model to obtain a first detection result, perform anomaly detection on the KPI data by using a DAEN model to obtain a second detection result, determine whether the first detection result is consistent with the second detection result, output the first detection result or the second detection result as a detection result of the KPI data if the first detection result is consistent with the second detection result, and perform classification determination on the first detection result and the second detection result by using an SVM classification model to output the detection result of the KPI data if the first detection result is inconsistent with the second detection result.
In some embodiments, the KPI information includes at least one of CPU load, CPU usage, total number of processes in a running state, memory occupied by processes, total amount of physical memory, available capacity of physical memory, network card outflow rate, network card inflow rate, disk read rate, disk write rate, file system partition used space, and free space.
In some embodiments, KPI data may be collected by deploying a Zabbix Agent on a target virtual machine.
In some embodiments, the online detection module 32 is configured to process the KPI data by using the OCSVM model to determine a difference between the decision boundary of the OCSVM model and the PKI data, determine whether the KPI data is located inside the decision boundary according to the difference, output a first detection result indicating that the KPI data is abnormal if the KPI data is located inside the decision boundary, and indicate that the KPI data is normal if the KPI data is located outside the decision boundary.
In some embodiments, the online detection module 32 is configured to process the KPI data using the dae model to calculate a square error value of the KPI data, determine whether the square error value of the KPI data is smaller than a preset square error threshold, output a second detection result indicating that the KPI data is normal if the square error value of the KPI data is smaller than the preset square error threshold, and output the second detection result indicating that the KPI data is abnormal if the square error value of the KPI data is larger than the preset square error threshold.
Fig. 4 is a schematic structural diagram of a system for real-time online detection of virtual machine states according to another embodiment of the present disclosure. Fig. 4 differs from fig. 3 in that in the embodiment shown in fig. 4 the system further comprises a training module 33.
The training module 33 is configured to obtain historical monitoring data of the KPI of the virtual machine, where the historical monitoring data includes an unlabeled data set and a labeled data set, and an amount of the unlabeled data is greater than an amount of the labeled data.
In some embodiments, in the Labeled data set Labeled _ Dataset, the data Labeled as normal and the data Labeled as abnormal are equally distributed.
The training module 33 processes the data in the unmarked dataset by using an isolated forest algorithm to form a marked abnormal dataset and a marked normal dataset, trains an abnormal behavior model by using an OCSVM algorithm based on the marked abnormal dataset, adjusts the abnormal behavior model by using the marked dataset to obtain an OCSVM model, and stores parameters of the OCSVM model in a training model database; based on the marked normal data set, training a normal behavior model by adopting a DAEN algorithm, adjusting the normal behavior model by utilizing the marked data set to obtain a DAEN model, and storing DAEN model parameters into a training model database; calculating a difference value from each data in the labeled data set to a decision boundary of the OCSVM model by using the OCSVM model, and generating a first operation data set according to the calculated difference value; calculating the square error value of each data in the labeled data set by using a DAEN model, calculating the difference value between each square error value and a preset square error threshold, and generating a second operation data set according to the calculated difference value; merging the first operational data set and the second operational data set, and adding corresponding category labels in the labeled data set to obtain a third operational data set; and performing two-classification training on the third operation data set by using an SVM classification algorithm to obtain an SVM classification model, and storing SVM classification model parameters into a training model database.
It should be noted here that the OCSVM model has a decision boundary, and the model output value is a difference value from a data point to the decision boundary. Data points are determined to be abnormal when they are within the decision boundary and normal when they are outside the decision boundary.
In addition, the output value of the DAEN model is a squared error value. The DAEN model has a square error threshold, and is determined to be normal when the calculation result of the input data is lower than the threshold, and is determined to be abnormal when the calculation result exceeds the threshold.
In some embodiments, the orphan forest algorithm, the OCSVM algorithm, and the SVM classification algorithm may be implemented using Spark, and the DAEN algorithm may be implemented using TensorFlow.
Fig. 5 is a schematic structural diagram of a system for real-time online detection of virtual machine states according to another embodiment of the present disclosure. As shown in fig. 5, the system includes a memory 51 and a processor 52.
The memory 51 is used to store instructions. The processor 52 is coupled to the memory 51. The processor 52 is configured to perform a method as referred to in any of the embodiments of fig. 1 and 2 based on the instructions stored by the memory.
As shown in fig. 5, the system further comprises a communication interface 53 for information interaction with other devices. Meanwhile, the system also comprises a bus 54, and the processor 52, the communication interface 53 and the memory 51 are communicated with each other through the bus 54.
The Memory 51 may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM). Such as at least one disk storage. The memory 51 may also be a memory array. The storage 51 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules.
Further, the processor 52 may be a central processing unit, or may be an ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium. The computer-readable storage medium stores computer instructions, which when executed by the processor implement a method according to any one of fig. 1 and 2.
Through implementing this disclosed scheme, can obtain following beneficial effect:
1) aiming at the characteristic that the number of abnormal samples in unmarked data is far less than the number of normal samples, the data is marked by adopting an unsupervised isolated forest algorithm, the processing speed is high, the effect is good, the workload of manually marking the data by operation and maintenance personnel is greatly reduced, and the cost is reduced;
2) for abnormal data of a small sample, an OCSVM algorithm is adopted to train an abnormal behavior model, misuse detection is carried out, and the false alarm rate is reduced;
3) for normal data of a large sample, a DAEN algorithm is adopted to train a normal behavior model, abnormal detection is carried out, and the missing report rate is reduced;
4) after a large amount of data labeled by an isolated forest algorithm are used for training a model, a small amount of sample data manually marked is used for fine tuning of the model, so that the accuracy of detecting the model is improved;
5) the OCSVM algorithm and the DAEN algorithm are utilized to perform parallel processing on the online data, when the judgment results of the OCSVM algorithm and the DAEN algorithm are consistent, the detection result is directly output, and when the judgment results of the OCSVM algorithm and the DAEN algorithm are inconsistent, the SVM classification algorithm is adopted to perform further judgment, so that the time consumption of calculation is shortened, the real-time performance is guaranteed, and the detection accuracy is improved.
In some embodiments, the functional modules may be implemented as a general purpose Processor, a Programmable Logic Controller (PLC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable Logic device, discrete Gate or transistor Logic, discrete hardware components, or any suitable combination thereof, for performing the functions described in this disclosure.
So far, embodiments of the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be understood by those skilled in the art that various changes may be made in the above embodiments or equivalents may be substituted for elements thereof without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (14)

1. A method for real-time online detection of virtual machine state, comprising:
reading the trained parameters of the OCSVM model, the DAEN model and the SVM classification model of the support vector machine from a training model database to complete the initialization of the OCSVM model, the DAEN model and the SVM classification model;
acquiring key performance index KPI data of the virtual machine on line;
carrying out misuse detection on the KPI data by using an OCSVM model to obtain a first detection result; performing abnormity detection on the KPI data by using a DAEN model to obtain a second detection result;
judging whether the first detection result is consistent with the second detection result;
if the first detection result is consistent with the second detection result, outputting the first detection result or the second detection result as a detection result of the KPI data;
if the first detection result is inconsistent with the second detection result, the first detection result and the second detection result are classified and judged by using an SVM classification model so as to output the detection result of the KPI data.
2. The method of claim 1, wherein misuse detection of KPI data using an OCSVM model comprises:
processing the KPI data by using an OCSVM model to determine a difference value of decision boundaries of the PKI data and the OCSVM model;
judging whether the KPI data is positioned in the decision boundary or not according to the difference value;
if the KPI data is located in the decision boundary, the output first detection result indicates that the KPI data is abnormal;
and if the KPI data is positioned outside the decision boundary, the output first detection result indicates that the KPI data is normal.
3. The method of claim 1, wherein anomaly detection of KPI data using a dae model comprises:
processing the KPI data by using a DAEN model to calculate a square error value of the KPI data;
judging whether the square error value of the KPI data is smaller than a preset square error threshold or not;
if the square error value of the KPI data is smaller than a preset square error threshold, the output second detection result indicates that the KPI data is normal;
and if the square error value of the KPI data is greater than the preset square error threshold, indicating that the KPI data is abnormal by the output second detection result.
4. The method of claim 1, wherein,
the KPI information comprises at least one of CPU load, CPU utilization rate, total process number in running state, memory occupied by the process, total physical memory, available capacity of the physical memory, network card outflow rate, network card inflow rate, disk reading rate, disk writing rate, used space and free space of the file system partition.
5. The method of any of claims 1-4, further comprising:
acquiring KPI historical monitoring data of the virtual machine, wherein the historical monitoring data comprises an unmarked data set and a marked data set, and the unmarked data amount is greater than the marked data amount;
processing the data in the unmarked data set by using an isolated forest algorithm to form a marked abnormal data set and a marked normal data set;
based on the marked abnormal data set, an OCSVM algorithm is adopted to train an abnormal behavior model, the marked data set is utilized to adjust the abnormal behavior model so as to obtain an OCSVM model, and OCSVM model parameters are stored in a training model database;
based on the marked normal data set, training a normal behavior model by adopting a DAEN algorithm, adjusting the normal behavior model by utilizing the marked data set to obtain a DAEN model, and storing DAEN model parameters into a training model database;
calculating a difference value from each data in the labeled data set to a decision boundary of the OCSVM model by using the OCSVM model, and generating a first operation data set according to the calculated difference value; calculating the square error value of each data in the labeled data set by using a DAEN model, calculating the difference value between each square error value and a preset square error threshold, and generating a second operation data set according to the calculated difference value;
merging the first operational data set and the second operational data set, and adding corresponding category labels in the labeled data set to obtain a third operational data set;
and performing two-classification training on the third operation data set by using an SVM classification algorithm to obtain an SVM classification model, and storing SVM classification model parameters into a training model database.
6. The method of claim 5, wherein,
in the labeled dataset, the data labeled as normal and the data labeled as abnormal are evenly distributed.
7. A system for real-time online detection of virtual machine state, comprising:
the model initialization module is configured to read trained single-classification support vector machine OCSVM model parameters, deep self-coding network DAEN model parameters and support vector machine SVM classification model parameters from a training model database so as to complete initialization of an OCSVM model, a DAEN model and an SVM classification model;
the online detection module is configured to acquire key performance indicator KPI data of the virtual machine online, perform misuse detection on the KPI data by using an OCSVM model to obtain a first detection result, perform anomaly detection on the KPI data by using a DAEN model to obtain a second detection result, determine whether the first detection result is consistent with the second detection result, output the first detection result or the second detection result as the detection result of the KPI data if the first detection result is consistent with the second detection result, and perform classification determination on the first detection result and the second detection result by using an SVM classification model to output the detection result of the KPI data if the first detection result is inconsistent with the second detection result.
8. The system of claim 7, wherein,
the online detection module is configured to process the KPI data by using the OCSVM model to determine a difference value between the decision boundary of the PKI data and the OCSVM model, determine whether the KPI data is located inside the decision boundary according to the difference value, if the KPI data is located inside the decision boundary, output a first detection result indicating that the KPI data is abnormal, and if the KPI data is located outside the decision boundary, output a first detection result indicating that the KPI data is normal.
9. The system of claim 7, wherein,
the online detection module is configured to process the KPI data by using the DAEN model to calculate a square error value of the KPI data, determine whether the square error value of the KPI data is smaller than a preset square error threshold, indicate that the KPI data is normal if the square error value of the KPI data is smaller than the preset square error threshold, and indicate that the KPI data is abnormal if the square error value of the KPI data is larger than the preset square error threshold.
10. The system of claim 7, wherein,
the KPI information comprises at least one of CPU load, CPU utilization rate, total process number in running state, memory occupied by the process, total physical memory, available capacity of the physical memory, network card outflow rate, network card inflow rate, disk reading rate, disk writing rate, used space and free space of the file system partition.
11. The system of any of claims 7-10, further comprising:
the training module is configured to acquire virtual machine KPI historical monitoring data, wherein the historical monitoring data comprises an unmarked data set and a marked data set, the unmarked data amount is greater than the marked data amount, data in the unmarked data set is processed by using an isolated forest algorithm to form a marked abnormal data set and a marked normal data set, an OCSVM algorithm is adopted to train an abnormal behavior model based on the marked abnormal data set, the marked data set is used to adjust the abnormal behavior model to obtain an OCSVM model, and OCSVM model parameters are stored in a training model database; based on the marked normal data set, training a normal behavior model by adopting a DAEN algorithm, adjusting the normal behavior model by utilizing the marked data set to obtain a DAEN model, and storing DAEN model parameters into a training model database; calculating a difference value from each data in the labeled data set to a decision boundary of the OCSVM model by using the OCSVM model, and generating a first operation data set according to the calculated difference value; calculating the square error value of each data in the labeled data set by using a DAEN model, calculating the difference value between each square error value and a preset square error threshold, and generating a second operation data set according to the calculated difference value; merging the first operational data set and the second operational data set, and adding corresponding category labels in the labeled data set to obtain a third operational data set; and performing two-classification training on the third operation data set by using an SVM classification algorithm to obtain an SVM classification model, and storing SVM classification model parameters into a training model database.
12. The system of claim 11, wherein,
in the labeled dataset, the data labeled as normal and the data labeled as abnormal are evenly distributed.
13. A system for real-time online detection of virtual machine state, comprising:
a memory configured to store instructions;
a processor coupled to the memory, the processor configured to perform implementing the method of any of claims 1-6 based on instructions stored by the memory.
14. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions which, when executed by a processor, implement the method of any one of claims 1-6.
CN201911226077.XA 2019-12-04 2019-12-04 Method and system for real-time online detection of virtual machine state Pending CN112906727A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911226077.XA CN112906727A (en) 2019-12-04 2019-12-04 Method and system for real-time online detection of virtual machine state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911226077.XA CN112906727A (en) 2019-12-04 2019-12-04 Method and system for real-time online detection of virtual machine state

Publications (1)

Publication Number Publication Date
CN112906727A true CN112906727A (en) 2021-06-04

Family

ID=76104528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911226077.XA Pending CN112906727A (en) 2019-12-04 2019-12-04 Method and system for real-time online detection of virtual machine state

Country Status (1)

Country Link
CN (1) CN112906727A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116661954A (en) * 2023-07-21 2023-08-29 苏州浪潮智能科技有限公司 Virtual machine abnormality prediction method, device, communication equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10049302B1 (en) * 2017-07-17 2018-08-14 Sas Institute Inc. Classification system training
US20180238951A1 (en) * 2016-09-07 2018-08-23 Jiangnan University Decision Tree SVM Fault Diagnosis Method of Photovoltaic Diode-Clamped Three-Level Inverter
CN109324604A (en) * 2018-11-29 2019-02-12 中南大学 A kind of intelligent train resultant fault analysis method based on source signal
CN109582793A (en) * 2018-11-23 2019-04-05 深圳前海微众银行股份有限公司 Model training method, customer service system and data labeling system, readable storage medium storing program for executing
CN109960753A (en) * 2019-02-13 2019-07-02 平安科技(深圳)有限公司 Detection method, device, storage medium and the server of equipment for surfing the net user
KR20190081408A (en) * 2017-12-29 2019-07-09 이화여자대학교 산학협력단 System and method for detecting network intrusion, computer readable medium for performing the method
CN110139315A (en) * 2019-04-26 2019-08-16 东南大学 A kind of wireless network fault detection method based on self-teaching
US20190304849A1 (en) * 2018-03-27 2019-10-03 Streammosaic, Inc. Selective inclusion/exclusion of semiconductor chips in accelerated failure tests
CN110443274A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180238951A1 (en) * 2016-09-07 2018-08-23 Jiangnan University Decision Tree SVM Fault Diagnosis Method of Photovoltaic Diode-Clamped Three-Level Inverter
US10049302B1 (en) * 2017-07-17 2018-08-14 Sas Institute Inc. Classification system training
KR20190081408A (en) * 2017-12-29 2019-07-09 이화여자대학교 산학협력단 System and method for detecting network intrusion, computer readable medium for performing the method
US20190304849A1 (en) * 2018-03-27 2019-10-03 Streammosaic, Inc. Selective inclusion/exclusion of semiconductor chips in accelerated failure tests
CN109582793A (en) * 2018-11-23 2019-04-05 深圳前海微众银行股份有限公司 Model training method, customer service system and data labeling system, readable storage medium storing program for executing
CN109324604A (en) * 2018-11-29 2019-02-12 中南大学 A kind of intelligent train resultant fault analysis method based on source signal
CN109960753A (en) * 2019-02-13 2019-07-02 平安科技(深圳)有限公司 Detection method, device, storage medium and the server of equipment for surfing the net user
CN110139315A (en) * 2019-04-26 2019-08-16 东南大学 A kind of wireless network fault detection method based on self-teaching
CN110443274A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHOU YING: ""Automatic Detection of Photovoltaic Module Cells using Multi-Channel Convolutional Neural Network"", 《2018 CHINESE AUTOMATION CONGRESS》, 24 January 2019 (2019-01-24) *
佟国峰;李勇;丁伟利;岳晓阳;: "遥感影像变化检测算法综述", 中国图象图形学报, no. 12, 16 December 2015 (2015-12-16) *
李宁: ""基于大数据的互联网异常流量检测研究"", 《成都工业学院学报》, 15 December 2018 (2018-12-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116661954A (en) * 2023-07-21 2023-08-29 苏州浪潮智能科技有限公司 Virtual machine abnormality prediction method, device, communication equipment and storage medium
CN116661954B (en) * 2023-07-21 2023-11-03 苏州浪潮智能科技有限公司 Virtual machine abnormality prediction method, device, communication equipment and storage medium

Similar Documents

Publication Publication Date Title
US10216558B1 (en) Predicting drive failures
US11048729B2 (en) Cluster evaluation in unsupervised learning of continuous data
US20190095266A1 (en) Detection of Misbehaving Components for Large Scale Distributed Systems
CN102541667B (en) Method and system using hashing function to distinguish random and repeat errors in a memory system
CN110164501B (en) Hard disk detection method, device, storage medium and equipment
US11657121B2 (en) Abnormality detection device, abnormality detection method and computer readable medium
Huong et al. Federated learning-based explainable anomaly detection for industrial control systems
CN111767957B (en) Log abnormality detection method and device, storage medium and electronic equipment
JP6871877B2 (en) Information processing equipment, information processing methods and computer programs
CN103034567B (en) Find and repair the apparatus and method of corrupt data
CN115248757A (en) Hard disk health assessment method and storage device
CN113837596A (en) Fault determination method and device, electronic equipment and storage medium
CN114994543A (en) Energy storage power station battery fault diagnosis method and device and storage medium
CN109240867A (en) Hard disk failure prediction technique
CN112906727A (en) Method and system for real-time online detection of virtual machine state
CN106485526A (en) A kind of diagnostic method of data mining model and device
CN114416467A (en) Anomaly detection method and device
JP7359206B2 (en) Learning devices, learning methods, and programs
CN115981911A (en) Memory failure prediction method, electronic device and computer-readable storage medium
CN113296951A (en) Resource allocation scheme determination method and equipment
CN113551156A (en) Pipeline state monitoring method and device based on deep learning and storage medium
CN112990329A (en) System abnormity diagnosis method and device
US20190236268A1 (en) Behavior determining method, behavior determining apparatus, and non-transitory computer readable medium
Chou et al. Economic design of variable sampling intervals charts with B&L switching rule
CN112395179B (en) Model training method, disk prediction method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220127

Address after: 100007 room 205-32, floor 2, building 2, No. 1 and No. 3, qinglonghutong a, Dongcheng District, Beijing

Applicant after: Tianyiyun Technology Co.,Ltd.

Address before: No.31, Financial Street, Xicheng District, Beijing, 100033

Applicant before: CHINA TELECOM Corp.,Ltd.

TA01 Transfer of patent application right