WO2024001254A1 - Server anomaly detection method and apparatus, device, and readable storage medium - Google Patents

Server anomaly detection method and apparatus, device, and readable storage medium Download PDF

Info

Publication number
WO2024001254A1
WO2024001254A1 PCT/CN2023/078528 CN2023078528W WO2024001254A1 WO 2024001254 A1 WO2024001254 A1 WO 2024001254A1 CN 2023078528 W CN2023078528 W CN 2023078528W WO 2024001254 A1 WO2024001254 A1 WO 2024001254A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
server system
system data
server
anomaly detection
Prior art date
Application number
PCT/CN2023/078528
Other languages
French (fr)
Chinese (zh)
Inventor
邹德强
满宏涛
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024001254A1 publication Critical patent/WO2024001254A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H05ELECTRIC TECHNIQUES NOT OTHERWISE PROVIDED FOR
    • H05KPRINTED CIRCUITS; CASINGS OR CONSTRUCTIONAL DETAILS OF ELECTRIC APPARATUS; MANUFACTURE OF ASSEMBLAGES OF ELECTRICAL COMPONENTS
    • H05K7/00Constructional details common to different types of electric apparatus
    • H05K7/20Modifications to facilitate cooling, ventilating, or heating
    • H05K7/20709Modifications to facilitate cooling, ventilating, or heating for server racks or cabinets; for data centers, e.g. 19-inch computer racks
    • H05K7/20718Forced ventilation of a gaseous coolant
    • HELECTRICITY
    • H05ELECTRIC TECHNIQUES NOT OTHERWISE PROVIDED FOR
    • H05KPRINTED CIRCUITS; CASINGS OR CONSTRUCTIONAL DETAILS OF ELECTRIC APPARATUS; MANUFACTURE OF ASSEMBLAGES OF ELECTRICAL COMPONENTS
    • H05K7/00Constructional details common to different types of electric apparatus
    • H05K7/20Modifications to facilitate cooling, ventilating, or heating
    • H05K7/20709Modifications to facilitate cooling, ventilating, or heating for server racks or cabinets; for data centers, e.g. 19-inch computer racks
    • H05K7/20836Thermal management, e.g. server temperature control

Definitions

  • This application relates to the technical fields of artificial intelligence and anomaly detection, and in particular to a server anomaly detection method, device, equipment and non-volatile readable storage medium.
  • Anomaly detection is the detection of illogical abnormal data in the data set, that is, outliers, inconsistencies, and special points. It is suitable for system health detection, sensor network event detection, fault detection, etc., to ensure the normal operation of the system ecosystem.
  • Anomaly detection is one of the applications of machine learning.
  • the algorithm principle is based on probability statistics, nearest neighbor, clustering, etc. There are many classic algorithms and derivative algorithms, which can be divided into supervised learning, unsupervised learning and Semi-supervised learning, etc.
  • BMC Baseboard Management Controller
  • BMC generally uses thresholds as judgment conditions when detecting server systems. When the temperature exceeds the threshold, fans are used to lower the temperature to keep the system in a healthy state. However, this conditioned reflection lags behind slightly, and high temperature damage to components is irreversible and will reduce component life. When a major system risk occurs in the server, the fan cooling effect is weak, resulting in standby, crash and other adverse consequences. If reasonable responses and adjustments are not made, it will cause file loss and other situations, causing significant economic losses and also affecting production safety. bring hidden dangers. In the pre-researched BMC solution, traditional anomaly detection based on machine learning, especially distance-based, is prone to computational explosion.
  • the purpose of this application is to provide a server anomaly detection method.
  • This method can scientifically allocate computing resources through dual-end collaborative anomaly detection, prevent the explosion of computing volume, improve detection efficiency, and effectively avoid high-burden detection such as distance-based anomaly detection.
  • the disadvantages of load computing; another purpose of this application is to provide a server anomaly detection device, equipment and non-volatile readable storage medium.
  • a server anomaly detection method including:
  • the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to perform superimposed anomaly detection on each server system data.
  • the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to perform superimposed anomaly detection on each server system data, including:
  • each server system data based on each average path length when abnormal data is detected in each server system data based on each average path length, it also includes:
  • the first abnormality detection result is fed back to the baseboard management controller, so that the baseboard management controller controls the fan to cool down the corresponding system component.
  • the server system data after performing superimposed anomaly detection for each server system data in combination with the normal probability threshold, the abnormal probability threshold, and the normal probability and abnormal probability corresponding to the server system data, it also includes:
  • the server abnormality maintenance operation is performed based on the first abnormality detection result and the second abnormality detection result.
  • performing server abnormality maintenance operations in combination with the first anomaly detection result and the second anomaly detection result includes:
  • a disk archiving instruction is sent to the baseboard management controller, In order to enable the baseboard management controller to perform a disk seal operation and send an abnormality detection report to the superior;
  • a fan control instruction is sent to the baseboard management controller so that the baseboard management controller controls the fan. Cool down the corresponding system components;
  • the first abnormality detection result is that abnormal data exists
  • the second abnormality detection result is that there is server system data with a normal probability within the normal probability threshold and an abnormality probability within the abnormality probability threshold
  • binary tree construction is performed based on each feature data, including:
  • Each distributed computing structural unit in the baseboard management controller is used to construct a preset number of binary trees in parallel based on each characteristic data.
  • each normal data and each abnormal data obtained by the remote end's offloading of each server system data are obtained, including:
  • each normal data and each abnormal data obtained by remotely diverting each server system data are obtained.
  • each server system data after receiving each server system data, it also includes:
  • the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model after combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model to perform superimposed anomaly detection on each server system data, it also includes:
  • feature extraction is performed on each server system data, including:
  • calculating the average path length corresponding to each server system data in a binary tree group composed of binary trees includes:
  • a binary tree group composed of binary trees, for each server system data, calculate the distance from the leaf node where the server system data is located in each binary tree to the root node, and obtain the path length of the server system data on each binary tree;
  • each normal data and each abnormal data obtained by the remote end's offloading of each server system data are obtained, including:
  • each normal data and each abnormal data obtained by the remote end diverting the data of each server system are obtained.
  • a server anomaly detection device including:
  • Data receiving module used to receive data from each server system
  • the feature extraction module is used to extract features from each server system data and obtain each feature data
  • the binary tree building module is used to construct binary trees based on each feature data to obtain each binary tree;
  • the path length calculation module is used to calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree;
  • the data acquisition module is used to obtain the normal data and the abnormal data obtained by remotely diverting the data of each server system when abnormal data is detected in the data of each server system based on the average path length;
  • the model building module is used to establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data;
  • the superposition anomaly detection module is used to perform superposition anomaly detection on each server system data by combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model.
  • a server anomaly detection device including:
  • Memory used to store computer programs
  • the processor is used to implement the steps of the previous server anomaly detection method when executing the computer program.
  • a non-volatile readable storage medium A computer program is stored on the non-volatile readable storage medium. When the computer program is executed by a processor, the steps of the previous server anomaly detection method are implemented.
  • the server anomaly detection method receives data from each server system; performs feature extraction on each server system data to obtain each feature data; constructs a binary tree based on each feature data to obtain each binary tree; calculates the binary tree composed of each binary tree The average path length corresponding to each server system data in the group; when based on each average path When the length detects abnormal data in the data of each server system, it obtains the normal data and the abnormal data obtained by shunting the data of each server system from the remote end; establishes the first multivariate Gaussian distribution model based on each normal data, and based on each abnormality The second multivariate Gaussian distribution model is established for the data; the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to perform overlay anomaly detection on each server system data.
  • each binary tree is constructed based on the extracted feature data, and each server system data in the binary tree group composed of each binary tree is calculated.
  • the average path length is, and initial anomaly detection is performed on each server system data based on each average path length.
  • the server system data is divided into normal data and abnormal data, and a multivariate Gaussian distribution model is established based on the normal data and abnormal data respectively, so that superimposed abnormality detection is performed on each server system data at the remote end.
  • Proximal anomaly detection has the characteristics of edge computing, omitting the data transmission process and responding faster.
  • the near-end detects an abnormality in the server system data, it can promptly protect the system components before they start to heat up or before they heat up to prevent high temperature damage to the components. It can also maintain the system's optimal working status and efficient output.
  • the remote end uses a multivariate Gaussian distribution model to perform global anomaly detection, which is triggered by the near-end anomaly detection and performs superimposed anomaly detection to predict major risks such as server standby and crash, so that maintenance measures can be taken in advance.
  • computing resources can be scientifically allocated to prevent the explosion of calculations, improve detection efficiency, and effectively avoid the disadvantages of high-load computing such as distance-based anomaly detection.
  • this application also provides server anomaly detection devices, equipment and non-volatile readable storage media corresponding to the above-mentioned server anomaly detection method, which have the above technical effects and will not be described again here.
  • Figure 1 is an implementation flow chart of the server anomaly detection method in the embodiment of the present application.
  • Figure 2 is another implementation flow chart of the server anomaly detection method in the embodiment of the present application.
  • Figure 3 is a structural block diagram of a server anomaly detection device in an embodiment of the present application.
  • Figure 4 is a structural block diagram of a server anomaly detection device in an embodiment of the present application.
  • Figure 5 is a schematic structural diagram of a server anomaly detection device provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an embodiment of a non-volatile readable storage medium provided by an embodiment of the present application.
  • Figure 1 is an implementation flow chart of a server anomaly detection method in an embodiment of the present application.
  • the method may include the following steps:
  • S101 Receive system data from each server.
  • server system data corresponding to each system component will be generated, and the baseboard management controller receives each server system data.
  • S102 Perform feature extraction on each server system data to obtain each feature data.
  • Characteristic data can include CPU temperature, voltage, memory usage, CPU load, network traffic, etc.
  • the method may further include the following steps:
  • feature extraction of each server system data may include the following steps:
  • the baseboard management controller includes a temporary storage module integrated inside the chip. After receiving the data of each server system, the baseboard management controller can store the data of each server system in the temporary storage module.
  • the temporary storage module can be set as a storage unit with queue attributes, that is, data is first in, first out, and is used to temporarily store server system data. When the temporary storage module is saturated, the data is slid and stored. One unit data x * slides in from the left end, and one unit data slides out from the right end. The newly slid in unit data is marked as the data point to be detected x * . There is a data collection process in the initial stage. When the temporary storage module is saturated, the edge-end (i.e., near-end) anomaly detection environment is ready. It is assumed that the server system generates status information every 15 minutes, that is, one unit of data, and the temporary storage module slides in one unit of data.
  • feature extraction of each server system data may include the following steps:
  • Step 1 Randomly select a preset number of server system data from each server system data
  • Step 2 Extract features from the selected server system data.
  • randomly selecting a part of the server system data for feature extraction, and selecting a part of the features from all the extracted features for binary tree construction it can not only ensure the diversity of the server system data on each tree, but also reduce memory consumption and avoid Dimensional disaster.
  • S103 Construct a binary tree based on each feature data to obtain each binary tree.
  • a binary tree is constructed based on each feature data.
  • the bagging method can be used to construct a binary tree to obtain each binary tree.
  • Termination conditions can include:
  • S104 Calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree.
  • the average path length corresponding to each server system data in the binary tree group composed of each binary tree is calculated.
  • step S104 may include the following steps:
  • Step 1 In the binary tree group composed of each binary tree, for each server system data, calculate the distance from the leaf node where the server system data is located in each binary tree to the root node, and obtain the path length of the server system data on each binary tree;
  • Step 2 Calculate the average path length on each binary tree to obtain the average path length corresponding to the server system data.
  • each server system data When calculating the average path length corresponding to each server system data, first calculate the distance from the leaf node to the root node in each binary tree for each server system data, and obtain the path of the server system data on each binary tree. The length is h(x). Then the average path length h(x) on each binary tree is calculated to obtain the average path length E[h(x)] corresponding to the server system data.
  • S105 When abnormal data is detected in each server system data based on each average path length, obtain the normal data and abnormal data obtained by the remote end's offloading of each server system data.
  • each server system data After calculating the average path length corresponding to each server system data in the binary tree group composed of each binary tree, it is determined whether there is abnormal data in each server system data based on each average path length.
  • the server system data When the server system data is sent to the near end, the same server system data will also be sent to the remote end (such as a cloud platform), and the remote end will The server system data is divided into normal data and abnormal data.
  • remote abnormality detection is triggered to obtain the normal data and abnormal data obtained by the remote end shunting the data of each server system.
  • step S105 may include the following steps:
  • Step 1 Calculate the anomaly score of each server system data in the binary tree group based on the average path length
  • Step 2 When it is detected that abnormal data exists in each server system data according to each abnormality score, obtain each normal data and each abnormal data obtained by shunting each server system data from the remote end.
  • the anomaly score of each server system data in the binary tree group can be calculated based on each average path length.
  • each normal data and each abnormal data obtained by remotely diverting each server system data are obtained.
  • the anomaly score can map the anomaly concept to the [0, 1] interval, which is defined as follows:
  • mapping formula is: If and only if s(x (*) , n)> ⁇ , the server system data x (*) to be detected is determined to be abnormal.
  • step S105 may include the following steps:
  • each normal data and each abnormal data obtained by the remote end diverting the data of each server system are obtained.
  • the average path length E[h(x)] of abnormal data is short and easy to segment.
  • the abnormal path length threshold ha can be set in advance, when it is determined that there is an average path length smaller than the preset abnormal path length threshold, such as when there is an average path length E[h(x * )] ⁇ of the server system data x (* )
  • sample x (*) is judged to be abnormal. In this case, obtain the normal data and abnormal data obtained by the remote end's offloading of system data of each server.
  • S106 Establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data.
  • a first multivariate Gaussian distribution model is established based on the normal data
  • a second multivariate Gaussian distribution model is established based on the abnormal data.
  • the first multivariate Gaussian distribution model p 1 (x) of normal data can be obtained:
  • the second multivariate Gaussian distribution model p 2 (x) of the probability model for abnormal data can be obtained:
  • the first multivariate Gaussian distribution model established based on each normal data and the second multivariate Gaussian distribution model established based on each abnormal data are obtained.
  • S107 Combine the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model to perform superimposed anomaly detection on each server system data.
  • the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to analyze each server system data. Perform overlay anomaly detection.
  • the method may also include the following steps: when abnormal data exists in each server system data, removing the abnormal data in the temporary storage module.
  • the data point x * to be detected is abnormal, the data flow in the temporary storage module is not slid but is directly eliminated. This achieves the separation of normal data and abnormal data.
  • each binary tree is constructed based on the extracted feature data, and each server system data in the binary tree group composed of each binary tree is calculated.
  • the average path length is, and initial anomaly detection is performed on each server system data based on each average path length.
  • the server system data is divided into normal data and abnormal data, and a multivariate Gaussian distribution model is established based on the normal data and abnormal data respectively, so that superimposed abnormality detection is performed on each server system data at the remote end.
  • Proximal anomaly detection has the characteristics of edge computing, omitting the data transmission process and responding faster.
  • the near-end detects an abnormality in the server system data, it can promptly protect the system components before they start to heat up or before they heat up to prevent high temperature damage to the components. Even if the system is damaged, it can also maintain the optimal working condition of the system and produce efficient output.
  • the remote end uses a multivariate Gaussian distribution model to perform global anomaly detection, which is triggered by the near-end anomaly detection and performs superimposed anomaly detection to predict major risks such as server standby and crash, so that maintenance measures can be taken in advance.
  • computing resources can be scientifically allocated to prevent the explosion of calculations, improve detection efficiency, and effectively avoid the disadvantages of high-load computing such as distance-based anomaly detection.
  • Figure 2 is another implementation flow chart of the server anomaly detection method in the embodiment of the present application.
  • the method may include the following steps:
  • S201 Receive system data from each server.
  • S202 Perform feature extraction on each server system data to obtain each feature data.
  • S203 Construct a binary tree based on each feature data to obtain each binary tree.
  • constructing a binary tree based on each feature data may include the following steps:
  • Each distributed computing structural unit in the baseboard management controller is used to construct a preset number of binary trees in parallel based on each characteristic data.
  • each distributed computing structural unit in the baseboard management controller is used to perform a preset number of binary trees in parallel based on each characteristic data. Construct. By utilizing each distributed computing structural unit to construct each binary tree in parallel, the efficiency of binary tree construction is greatly improved.
  • An attention mechanism is added to the construction process of the binary tree, which only cares about the segmentation of the point x * to be detected. Therefore, the binary tree does not need to segment all data points and can be stopped in advance to improve efficiency.
  • S204 Calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree.
  • the first abnormality detection result When abnormal data is detected in each server system data according to each average path length, the first abnormality detection result is obtained.
  • the first abnormality detection result may include the specific component in which the abnormality occurs.
  • the first abnormality detection result is fed back to the baseboard management controller.
  • the baseboard management controller can parse out which system component is abnormal, and then Control the fan to cool down the corresponding system components, so that when the near-end detects (or predicts) an abnormality in the server system data, the system components can be protected at the beginning of the heating (or before the temperature rises) to prevent the high temperature from damaging the components. Even if it is damaged, it can still maintain the system's optimal working condition and efficient output.
  • S207 Obtain the normal data and abnormal data obtained by the remote end from distributing the data of each server system.
  • S208 Establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data.
  • S209 Use the first multivariate Gaussian distribution model to calculate the normal probability corresponding to each server system data, and use the second multivariate Gaussian distribution model to calculate the abnormal probability corresponding to each server system data.
  • the first multivariate Gaussian distribution model is used to calculate the normal probability corresponding to each server system data
  • the second multivariate Gaussian distribution model is used to calculate the normal probability of each server system data.
  • the abnormal probabilities corresponding to the server system data respectively.
  • S210 Obtain the preset normal probability threshold and abnormal probability threshold, and for each server system data, perform superimposed abnormality detection based on the normal probability threshold, abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data.
  • the thresholds ⁇ n and ⁇ a can be set. For the server system data to be detected, if and only if p 1 (x (*) ) ⁇ n and p 2 (x (*) ) ⁇ a , the model It will determine that an abnormality has occurred (or is about to occur) in the server, feedback to the baseboard management controller to seal the disk, and send a report to the superior, so that the operator can reasonably formulate a work plan and ensure the integrity of the work.
  • a second abnormality detection result obtained by superimposed abnormality detection is obtained. That is, by comparing the normal probability corresponding to the server system data with the normal probability threshold, and comparing the abnormal probability corresponding to the server system data with the abnormal probability threshold, the second abnormality detection result is obtained through the two comparison results.
  • S212 Perform server abnormality maintenance operations based on the first abnormality detection result and the second abnormality detection result.
  • a server anomaly maintenance operation is performed based on the first anomaly detection result and the second anomaly detection result.
  • step S212 may include the following steps:
  • Step 1 When the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data whose normal probability is not within the normal probability threshold and the abnormal probability is within the abnormal probability threshold, send the disk to the baseboard management controller Archive instructions to cause the baseboard management controller to perform a disk archive operation and send an abnormality detection report to the superior;
  • Step 2 When the first abnormality detection result is that there is abnormal data and the second abnormality detection result is that there is no server system data with an abnormality probability within the abnormality probability threshold, send a fan control instruction to the baseboard management controller so that the baseboard management controller The controller controls the fan to cool down the corresponding system components;
  • Step 3 When the first abnormality detection result is that there is abnormal data, and the second abnormality detection result is that there is server system data with normal probability within the normal probability threshold and abnormality probability within the abnormality probability threshold, send the fan to the baseboard management controller Control instructions to enable the baseboard management controller to control fans to cool down corresponding system components.
  • the abnormal probability value is less than ⁇ a , that is, when E[h(x * )] ⁇ h a &p 1 (x (*) ) ⁇ n &p 2 (x (*) ) ⁇ a or s(x (*) , n)> ⁇ &p 1 (x (*) ) ⁇ n &p 2 (x (*) ) ⁇ a
  • a disk sealing instruction is sent to the baseboard management controller.
  • the baseboard management controller performs the disk sealing operation according to the disk sealing instruction and sends an abnormality detection report to the superior.
  • the first anomaly detection result is that there is abnormal data and the second anomaly detection result is that there is no server system data with an abnormal probability within the abnormal probability threshold, that is, the normal probability value is greater than or equal to ⁇ n as the normal probability threshold range, and the abnormal probability value Less than ⁇ a is the abnormal probability threshold range
  • E[h(x * )] ⁇ h a &p 2 (x (*) )> ⁇ a or s(x (*) , n)> ⁇ &p 2 (x (*) )> ⁇ a it indicates that there is a minor abnormality in the system component, and a fan control instruction is sent to the baseboard management controller.
  • the baseboard management controller controls the fan to cool down the corresponding system component according to the fan control instruction.
  • the second anomaly detection result is that there is server system data with normal probability within the normal probability threshold and abnormal probability within the abnormal probability threshold, that is, the normal probability value is greater than or equal to ⁇ n .
  • Normal probability threshold range abnormal probability value less than ⁇ a is the abnormal probability threshold range, when E[h(x * )] ⁇ h a &p 1 (x (*) ) ⁇ n &p 2 (x (*) ) ⁇ When a or s(x (*) , n)> ⁇ &p 1 (x (*) ) ⁇ n &p 2 (x (*) ) ⁇ a , it indicates that there is a minor abnormality in the system component, and the board management control The controller sends fan control instructions so that the baseboard management controller controls the fans to cool down the corresponding system components.
  • the threshold ⁇ is set, and if and only if p(x (*) ) ⁇ , the server system data x (*) is determined to be abnormal.
  • this application also provides a server anomaly detection device.
  • the server anomaly detection device described below and the server anomaly detection method described above can be mutually referenced.
  • Figure 3 is a structural block diagram of a server anomaly detection device in an embodiment of the present application.
  • the device may include:
  • Data receiving module 31 is used to receive data from each server system
  • the feature extraction module 32 is used to extract features from each server system data to obtain each feature data;
  • the binary tree construction module 33 is used to construct a binary tree based on each feature data to obtain each binary tree;
  • the path length calculation module 34 is used to calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree;
  • the data acquisition module 35 is used to acquire the normal data and the abnormal data obtained by the remote end shunting the data of each server system when abnormal data is detected in each server system data according to each average path length;
  • the model building module 36 is used to establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data;
  • the superimposed anomaly detection module 37 is used to perform superimposed anomaly detection on each server system data by combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model.
  • each binary tree is constructed based on the extracted feature data, and each server system data in the binary tree group composed of each binary tree is calculated.
  • the average path length is, and initial anomaly detection is performed on each server system data based on each average path length.
  • the server system data is divided into normal data and abnormal data, and a multivariate Gaussian distribution model is established based on the normal data and abnormal data respectively, so that superimposed abnormality detection is performed on each server system data at the remote end.
  • Proximal anomaly detection has the characteristics of edge computing, omitting the data transmission process and responding faster.
  • the near-end detects an abnormality in the server system data, it can promptly protect the system components before they start to heat up or before they heat up to prevent high temperature damage to the components. It can also maintain the system's optimal working status and efficient output.
  • the remote end uses a multivariate Gaussian distribution model to perform global anomaly detection, which is triggered by the near-end anomaly detection and performs superimposed anomaly detection to predict major risks such as server standby and crash, so that maintenance measures can be taken in advance.
  • computing resources can be scientifically allocated to prevent the explosion of calculations, improve detection efficiency, and effectively avoid the disadvantages of high-load computing such as distance-based anomaly detection.
  • the superimposed anomaly detection module 37 includes:
  • the probability calculation submodule is used to calculate the normal probability corresponding to each server system data using the first multivariate Gaussian distribution model, and calculate the abnormal probability corresponding to each server system data using the second multivariate Gaussian distribution model;
  • the superimposed anomaly detection sub-module is used to obtain the preset normal probability threshold and abnormal probability threshold. For each server system data, it performs superimposed anomaly detection based on the normal probability threshold, abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data. .
  • the device may further include:
  • a first result obtaining module configured to obtain a first abnormality detection result when abnormal data is detected in each server system data according to each average path length;
  • the component cooling module is used to feed back the first abnormality detection result to the baseboard management controller, so that the baseboard management controller controls the fan to perform a cooling operation on the corresponding system component.
  • the device may further include:
  • the first result acquisition module performs superimposed anomaly detection on each server system data by combining the normal probability threshold, the abnormal probability threshold, and the normal probability and abnormal probability corresponding to the server system data, and obtains the second anomaly detection result obtained by the superimposed anomaly detection. ;
  • the server abnormality maintenance module is used to perform server abnormality maintenance operations based on the first abnormality detection result and the second abnormality detection result.
  • the server exception maintenance module includes:
  • Disk archiving and report sending sub-module used when the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data whose normal probability is not within the normal probability threshold and the abnormal probability is within the abnormal probability threshold, Send a disk sealing instruction to the baseboard management controller so that the baseboard management controller performs a disk sealing operation and sends an abnormality detection report to the superior;
  • the first component cooling submodule is configured to send a fan control instruction to the baseboard management controller when the first abnormality detection result is that abnormal data exists and the second abnormality detection result is that there is no server system data with an abnormality probability within the abnormality probability threshold. , so that the baseboard management controller controls the fan to cool down the corresponding system components;
  • the second component cooling submodule is used to send the cooling signal to the server when the first abnormality detection result is that there is abnormal data, and the second abnormality detection result is that there is server system data with normal probability within the normal probability threshold and abnormality probability within the abnormality probability threshold.
  • the baseboard management controller sends a fan control instruction so that the baseboard management controller controls the fan to cool down the corresponding system component.
  • the data acquisition module 35 includes:
  • the anomaly score calculation submodule is used to calculate the anomaly score of each server system data in the binary tree group based on each average path length;
  • the data acquisition sub-module is used to obtain the normal data and abnormal data obtained by remote-end shunting of each server system data when abnormal data is detected in each server system data based on each abnormality score.
  • the device may further include:
  • the data storage module is used to store the data of each server system in a temporary storage module with queue attributes after receiving the data of each server system;
  • the feature extraction module 32 is specifically a module that obtains each server system data from the temporary storage module and performs feature extraction on each server system data.
  • the device may further include:
  • the data elimination module is used to perform superimposed anomaly detection on each server system data by combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model.
  • the temporary storage module The abnormal data is removed.
  • the feature extraction module 32 includes:
  • the data selection submodule is used to randomly select a preset number of server system data from each server system data;
  • the feature extraction submodule is used to extract features from the selected server system data.
  • the path length calculation module 34 includes:
  • the path length calculation submodule is used to calculate the distance from the leaf node where the server system data is located in each binary tree to the root node for each server system data in a binary tree group composed of binary trees, and obtain the distance between the server system data in each binary tree.
  • the path on is long;
  • the average calculation submodule is used to average the path lengths on each binary tree to obtain the average path length corresponding to the server system data.
  • the data acquisition module 35 specifically acquires each normal data and each data obtained by the remote end shunting the data of each server system when it is determined that there is an average path length smaller than the preset abnormal path length threshold. Module for exception data.
  • Figure 4 is a schematic diagram of the server anomaly detection device provided by this application.
  • the device may include:
  • Memory 332 for storing computer programs
  • the processor 322 is configured to implement the steps of the server anomaly detection method of the above method embodiment when executing the computer program.
  • Figure 5 is a schematic diagram of the specific structure of a server anomaly detection device provided in this embodiment.
  • the server anomaly detection device may vary greatly due to different configurations or performance, and may include a processor (central processing unit, CPU) 322 (for example, one or more processors) and a memory 332.
  • the memory 332 stores One or more computer applications 342 or data 344. Among them, the memory 332 may be short-term storage or persistent storage.
  • the program stored in the memory 332 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device.
  • the processor 322 may be configured to communicate with the memory 332 and execute a series of instruction operations in the memory 332 on the server anomaly detection device 301 .
  • the server anomaly detection device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or, one or more operating systems 341.
  • the steps in the server anomaly detection method described above can be implemented by the structure of the server anomaly detection device.
  • the present application also provides a non-volatile readable storage medium.
  • a computer program is stored on the non-volatile readable storage medium.
  • the computer program is executed by the processor, the follow these steps:
  • Receive each server system data perform feature extraction on each server system data to obtain each characteristic data; construct a binary tree based on each characteristic data to obtain each binary tree; calculate the average corresponding to each server system data in the binary tree group composed of each binary tree.
  • Path length when abnormal data is detected in each server system data according to each average path length, obtain each normal data and each abnormal data obtained by the remote end's shunting of each server system data; establish a first multiplex based on each normal data Gaussian distribution model, and establish a second multivariate Gaussian distribution model based on each abnormal data; combine the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model to perform superimposed anomaly detection on each server system data.
  • the non-volatile readable storage medium can include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • U disk mobile hard disk
  • read-only memory Read-Only Memory
  • RAM random access memory
  • magnetic disk or optical disk etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Thermal Sciences (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present application discloses a server anomaly detection method, comprising: performing feature extraction on pieces of received server system data; constructing binary trees according to the pieces of extracted feature data; calculating the average path length respectively corresponding to each piece of server system data in the binary tree group obtained by construction; when the presence of anomalous data is detected in the pieces of server system data according to the average path lengths, acquiring normal data and anomalous data obtained by remotely shunting the pieces of server system data; establishing a first multivariate Gaussian distribution model on the basis of the pieces of normal data, and establishing a second multivariate Gaussian distribution model on the basis of the pieces of anomalous data; and performing overlay anomaly detection on the pieces of server system data in combination with the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model. The present application improves detection efficiency and effectively mitigates the drawbacks generally associated with high computational loads such as distance-based anomaly detection. The present application further discloses an apparatus, a device, and a storage medium which have corresponding technical effects.

Description

一种服务器异常检测方法、装置、设备及可读存储介质A server anomaly detection method, device, equipment and readable storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年06月28日提交中国专利局、申请号202210738323.5、申请名称为“一种服务器异常检测方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on June 28, 2022, application number 202210738323.5, and the application name is "A server anomaly detection method, device, equipment and readable storage medium", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及人工智能与异常检测技术领域,特别是涉及一种服务器异常检测方法、装置、设备及非易失性可读存储介质。This application relates to the technical fields of artificial intelligence and anomaly detection, and in particular to a server anomaly detection method, device, equipment and non-volatile readable storage medium.
背景技术Background technique
异常检测是检测数据集中不符合逻辑的异常数据,即离群点、不一致点、特殊点,适用于系统健康检测、传感器网络事件检测、故障检测等,保障系统生态的正常运作。异常检测属于机器学习的应用之一,概括来说,算法原理基于概率统计、基于最近邻、基于聚类等角度,有许多经典算法及衍生算法,又可分为有监督学习、无监督学习和半监督学习等。Anomaly detection is the detection of illogical abnormal data in the data set, that is, outliers, inconsistencies, and special points. It is suitable for system health detection, sensor network event detection, fault detection, etc., to ensure the normal operation of the system ecosystem. Anomaly detection is one of the applications of machine learning. In summary, the algorithm principle is based on probability statistics, nearest neighbor, clustering, etc. There are many classic algorithms and derivative algorithms, which can be divided into supervised learning, unsupervised learning and Semi-supervised learning, etc.
BMC(Baseboard Management Controller,基板管理控制器),是整个服务器系统的“大管家”,具有一系列的监视和控制功能,利用传感器监视系统部件温度、湿度、电压、风扇、电源、通信参数、操作系统函数等,做适合的调节,保持系统处于健康状态。BMC具有丰富的解决方案,服务器带内与带外的联合监控方式,可以调取任何系统的状态信息,如CPU(Central Processing Unit,中央处理器)负载、内存使用率、网络流量、扇区磁盘通道数量等。BMC (Baseboard Management Controller) is the "big housekeeper" of the entire server system. It has a series of monitoring and control functions. It uses sensors to monitor system component temperature, humidity, voltage, fans, power supply, communication parameters, and operations. System functions, etc., make appropriate adjustments to keep the system in a healthy state. BMC has a wealth of solutions. The joint monitoring method of server in-band and out-of-band can retrieve the status information of any system, such as CPU (Central Processing Unit, central processing unit) load, memory usage, network traffic, and sector disks. number of channels, etc.
当前,BMC在服务器系统的检测中,一般使用阈值作为判断条件,当温度超出阈值后,利用风扇将温度降下来,保持系统处于健康状态。然而,这种条件反射略有滞后,高温对部件的损害是不可逆的,会降低部件寿命。当服务器发生重大系统风险时,风扇降温的效果微弱,待机、死机等不良后果,若未能做出合理的反应及调整,会造成文件丢失等情况,带来重大的经济损失,也给生产安全带来隐患。在预研的BMC方案中,传统基于机器学习的异常检测,尤其基于距离,易发生计算爆炸。Currently, BMC generally uses thresholds as judgment conditions when detecting server systems. When the temperature exceeds the threshold, fans are used to lower the temperature to keep the system in a healthy state. However, this conditioned reflection lags behind slightly, and high temperature damage to components is irreversible and will reduce component life. When a major system risk occurs in the server, the fan cooling effect is weak, resulting in standby, crash and other adverse consequences. If reasonable responses and adjustments are not made, it will cause file loss and other situations, causing significant economic losses and also affecting production safety. bring hidden dangers. In the pre-researched BMC solution, traditional anomaly detection based on machine learning, especially distance-based, is prone to computational explosion.
发明内容Contents of the invention
本申请的目的是提供一种服务器异常检测方法,该方法通过双端协同异常检测,可以科学分配计算资源,防止计算量爆炸,提高检测效率,有效规避一般基于距离异常检测等高负 荷计算的弊端;本申请的另一目的是提供一种服务器异常检测装置、设备及非易失性可读存储介质。The purpose of this application is to provide a server anomaly detection method. This method can scientifically allocate computing resources through dual-end collaborative anomaly detection, prevent the explosion of computing volume, improve detection efficiency, and effectively avoid high-burden detection such as distance-based anomaly detection. The disadvantages of load computing; another purpose of this application is to provide a server anomaly detection device, equipment and non-volatile readable storage medium.
为解决上述技术问题,本申请提供如下技术方案:In order to solve the above technical problems, this application provides the following technical solutions:
一种服务器异常检测方法,包括:A server anomaly detection method, including:
接收各服务器系统数据;Receive system data from each server;
对各服务器系统数据进行特征提取,得到各特征数据;Perform feature extraction on each server system data to obtain each feature data;
根据各特征数据进行二叉树构建,得到各二叉树;Construct a binary tree based on each feature data to obtain each binary tree;
计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度;Calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree;
当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据;When abnormal data is detected in the data of each server system based on the average path length, obtain the normal data and abnormal data obtained by the remote end's offloading of the data of each server system;
基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型;Establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data;
结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测。The first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to perform superimposed anomaly detection on each server system data.
在一些实施例中,结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测,包括:In some embodiments, the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to perform superimposed anomaly detection on each server system data, including:
利用第一多元高斯分布模型计算各服务器系统数据分别对应的正常概率,并利用第二多元高斯分布模型计算各服务器系统数据分别对应的异常概率;Use the first multivariate Gaussian distribution model to calculate the normal probability corresponding to each server system data, and use the second multivariate Gaussian distribution model to calculate the abnormal probability corresponding to each server system data;
获取预设的正常概率阈值和异常概率阈值,针对每个服务器系统数据,结合正常概率阈值、异常概率阈值以及服务器系统数据对应的正常概率和异常概率进行叠加异常检测。Obtain the preset normal probability threshold and abnormal probability threshold, and perform superimposed abnormality detection based on the normal probability threshold, abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data for each server system data.
在一些实施例中,当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,还包括:In some embodiments, when abnormal data is detected in each server system data based on each average path length, it also includes:
获取第一异常检测结果;Obtain the first abnormality detection result;
将第一异常检测结果反馈至基板管理控制器,以使基板管理控制器控制风扇对相应系统部件进行降温操作。The first abnormality detection result is fed back to the baseboard management controller, so that the baseboard management controller controls the fan to cool down the corresponding system component.
在一些实施例中,在针对每个服务器系统数据,结合正常概率阈值、异常概率阈值以及服务器系统数据对应的正常概率和异常概率进行叠加异常检测之后,还包括:In some embodiments, after performing superimposed anomaly detection for each server system data in combination with the normal probability threshold, the abnormal probability threshold, and the normal probability and abnormal probability corresponding to the server system data, it also includes:
获取叠加异常检测得到的第二异常检测结果;Obtain the second anomaly detection result obtained by superimposed anomaly detection;
结合第一异常检测结果和第二异常检测结果进行服务器异常维护操作。 The server abnormality maintenance operation is performed based on the first abnormality detection result and the second abnormality detection result.
在一些实施例中,结合第一异常检测结果和第二异常检测结果进行服务器异常维护操作,包括:In some embodiments, performing server abnormality maintenance operations in combination with the first anomaly detection result and the second anomaly detection result includes:
当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率不在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送磁盘封存指令,以使基板管理控制器进行磁盘封存操作,并向上级发送异常检测报告;When the first abnormality detection result is that abnormal data exists, and the second abnormality detection result is that there is server system data whose normal probability is not within the normal probability threshold and the abnormality probability is within the abnormality probability threshold, a disk archiving instruction is sent to the baseboard management controller, In order to enable the baseboard management controller to perform a disk seal operation and send an abnormality detection report to the superior;
当第一异常检测结果为存在异常数据且第二异常检测结果为不存在异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送风扇控制指令,以使基板管理控制器控制风扇对相应系统部件进行降温操作;When the first abnormality detection result is that abnormal data exists and the second abnormality detection result is that there is no server system data with an abnormality probability within the abnormality probability threshold, a fan control instruction is sent to the baseboard management controller so that the baseboard management controller controls the fan. Cool down the corresponding system components;
当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送风扇控制指令,以使基板管理控制器控制风扇对相应系统部件进行降温操作。When the first abnormality detection result is that abnormal data exists, and the second abnormality detection result is that there is server system data with a normal probability within the normal probability threshold and an abnormality probability within the abnormality probability threshold, sending a fan control instruction to the baseboard management controller, So that the baseboard management controller controls the fan to cool down the corresponding system components.
在一些实施例中,根据各特征数据进行二叉树构建,包括:In some embodiments, binary tree construction is performed based on each feature data, including:
利用基板管理控制器中各分布式计算结构单元根据各特征数据并行进行预设数量的二叉树构建。Each distributed computing structural unit in the baseboard management controller is used to construct a preset number of binary trees in parallel based on each characteristic data.
在一些实施例中,当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据,包括:In some embodiments, when abnormal data is detected in each server system data based on each average path length, each normal data and each abnormal data obtained by the remote end's offloading of each server system data are obtained, including:
根据各平均路径长度分别计算各服务器系统数据在二叉树群中的异常得分;Calculate the anomaly score of each server system data in the binary tree group based on each average path length;
当根据各异常得分检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。When it is detected that abnormal data exists in each server system data according to each abnormality score, each normal data and each abnormal data obtained by remotely diverting each server system data are obtained.
在一些实施例中,在接收各服务器系统数据之后,还包括:In some embodiments, after receiving each server system data, it also includes:
将各服务器系统数据存储至具有队列属性的临时储存模块中;Store each server system data in a temporary storage module with queue attributes;
对各服务器系统数据进行特征提取,包括:Perform feature extraction on each server system data, including:
从临时储存模块中获取各服务器系统数据,并对各服务器系统数据进行特征提取。Obtain each server system data from the temporary storage module, and perform feature extraction on each server system data.
在一些实施例中,在结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测之后,还包括:In some embodiments, after combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model to perform superimposed anomaly detection on each server system data, it also includes:
当各服务器系统数据中存在异常数据时,对临时储存模块中的异常数据进行剔除操作。When there is abnormal data in each server system data, the abnormal data in the temporary storage module is eliminated.
在一些实施例中,对各服务器系统数据进行特征提取,包括:In some embodiments, feature extraction is performed on each server system data, including:
从各服务器系统数据中随机选取预设数量的服务器系统数据;Randomly select a preset number of server system data from each server system data;
对选取到的各服务器系统数据进行特征提取。 Perform feature extraction on the selected server system data.
在一些实施例中,计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度,包括:In some embodiments, calculating the average path length corresponding to each server system data in a binary tree group composed of binary trees includes:
在由各二叉树构成的二叉树群中,针对每个服务器系统数据计算服务器系统数据在每个二叉树中所在叶子节点到根节点的距离,得到服务器系统数据在每个二叉树上的路径长;In a binary tree group composed of binary trees, for each server system data, calculate the distance from the leaf node where the server system data is located in each binary tree to the root node, and obtain the path length of the server system data on each binary tree;
对各二叉树上的路径长进行均值计算,得到服务器系统数据对应的平均路径长度。Calculate the average path length on each binary tree to obtain the average path length corresponding to the server system data.
在一些实施例中,当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据,包括:In some embodiments, when abnormal data is detected in each server system data based on each average path length, each normal data and each abnormal data obtained by the remote end's offloading of each server system data are obtained, including:
当确定存在小于预设异常路径长度阈值的平均路径长度时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。When it is determined that there is an average path length smaller than the preset abnormal path length threshold, each normal data and each abnormal data obtained by the remote end diverting the data of each server system are obtained.
一种服务器异常检测装置,包括:A server anomaly detection device, including:
数据接收模块,用于接收各服务器系统数据;Data receiving module, used to receive data from each server system;
特征提取模块,用于对各服务器系统数据进行特征提取,得到各特征数据;The feature extraction module is used to extract features from each server system data and obtain each feature data;
二叉树构建模块,用于根据各特征数据进行二叉树构建,得到各二叉树;The binary tree building module is used to construct binary trees based on each feature data to obtain each binary tree;
路径长度计算模块,用于计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度;The path length calculation module is used to calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree;
数据获取模块,用于当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据;The data acquisition module is used to obtain the normal data and the abnormal data obtained by remotely diverting the data of each server system when abnormal data is detected in the data of each server system based on the average path length;
模型建立模块,用于基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型;The model building module is used to establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data;
叠加异常检测模块,用于结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测。The superposition anomaly detection module is used to perform superposition anomaly detection on each server system data by combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model.
一种服务器异常检测设备,包括:A server anomaly detection device, including:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于执行计算机程序时实现如前服务器异常检测方法的步骤。The processor is used to implement the steps of the previous server anomaly detection method when executing the computer program.
一种非易失性可读存储介质,非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如前服务器异常检测方法的步骤。A non-volatile readable storage medium. A computer program is stored on the non-volatile readable storage medium. When the computer program is executed by a processor, the steps of the previous server anomaly detection method are implemented.
本申请所提供的服务器异常检测方法,接收各服务器系统数据;对各服务器系统数据进行特征提取,得到各特征数据;根据各特征数据进行二叉树构建,得到各二叉树;计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度;当根据各平均路径 长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据;基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型;结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测。The server anomaly detection method provided by this application receives data from each server system; performs feature extraction on each server system data to obtain each feature data; constructs a binary tree based on each feature data to obtain each binary tree; calculates the binary tree composed of each binary tree The average path length corresponding to each server system data in the group; when based on each average path When the length detects abnormal data in the data of each server system, it obtains the normal data and the abnormal data obtained by shunting the data of each server system from the remote end; establishes the first multivariate Gaussian distribution model based on each normal data, and based on each abnormality The second multivariate Gaussian distribution model is established for the data; the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to perform overlay anomaly detection on each server system data.
由上述技术方案可知,通过在近端对接收到的各服务器系统数据进行特征提取,根据提取到的各特征数据构建得到各二叉树,计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度,根据各平均路径长度对各服务器系统数据进行初始异常检测。远端在接收到各服务器系统数据时,会预先将各服务器系统数据分流为各正常数据和各异常数据,当在近端进行初始异常检测的检测结果为存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据,并基于各正常数据和各异常数据分别进行建立多元高斯分布模型,从而在远端对各服务器系统数据进行叠加异常检测。近端异常检测具有边缘计算的特点,省略数据传输过程,反应速度更快。当近端检测到服务器系统数据发生异常后,可以及时在系统部件升温之初或未升温前对部件进行保护,防止高温对部件的损坏,也可保持系统的最优工作状态,高效输出。远端使用多元高斯分布模型进行全局异常检测,由近端异常检测触发,作叠加异常检测,预知服务器待机、死机等重大风险,进而可以提前采取维护措施。通过双端协同异常检测,可以科学分配计算资源,防止计算量爆炸,提高检测效率,有效规避一般基于距离异常检测等高负荷计算的弊端。It can be seen from the above technical solution that by performing feature extraction on the received server system data at the near end, each binary tree is constructed based on the extracted feature data, and each server system data in the binary tree group composed of each binary tree is calculated. The average path length is, and initial anomaly detection is performed on each server system data based on each average path length. When the remote end receives the data of each server system, it will pre-divide the data of each server system into each normal data and each abnormal data. When the detection result of the initial abnormality detection at the near end is that there is abnormal data, the remote end will obtain the data of each server system. The server system data is divided into normal data and abnormal data, and a multivariate Gaussian distribution model is established based on the normal data and abnormal data respectively, so that superimposed abnormality detection is performed on each server system data at the remote end. Proximal anomaly detection has the characteristics of edge computing, omitting the data transmission process and responding faster. When the near-end detects an abnormality in the server system data, it can promptly protect the system components before they start to heat up or before they heat up to prevent high temperature damage to the components. It can also maintain the system's optimal working status and efficient output. The remote end uses a multivariate Gaussian distribution model to perform global anomaly detection, which is triggered by the near-end anomaly detection and performs superimposed anomaly detection to predict major risks such as server standby and crash, so that maintenance measures can be taken in advance. Through dual-end collaborative anomaly detection, computing resources can be scientifically allocated to prevent the explosion of calculations, improve detection efficiency, and effectively avoid the disadvantages of high-load computing such as distance-based anomaly detection.
相应的,本申请还提供了与上述服务器异常检测方法相对应的服务器异常检测装置、设备和非易失性可读存储介质,具有上述技术效果,在此不再赘述。Correspondingly, this application also provides server anomaly detection devices, equipment and non-volatile readable storage media corresponding to the above-mentioned server anomaly detection method, which have the above technical effects and will not be described again here.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1为本申请实施例中服务器异常检测方法的一种实施流程图;Figure 1 is an implementation flow chart of the server anomaly detection method in the embodiment of the present application;
图2为本申请实施例中服务器异常检测方法的另一种实施流程图;Figure 2 is another implementation flow chart of the server anomaly detection method in the embodiment of the present application;
图3为本申请实施例中一种服务器异常检测装置的结构框图;Figure 3 is a structural block diagram of a server anomaly detection device in an embodiment of the present application;
图4为本申请实施例中一种服务器异常检测设备的结构框图;Figure 4 is a structural block diagram of a server anomaly detection device in an embodiment of the present application;
图5为本申请实施例提供的一种服务器异常检测设备的具体结构示意图; Figure 5 is a schematic structural diagram of a server anomaly detection device provided by an embodiment of the present application;
图6为本申请实施例提供的一种非易失性可读存储介质的实施例的示意图。FIG. 6 is a schematic diagram of an embodiment of a non-volatile readable storage medium provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the present application will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Obviously, the described embodiments are only some of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
参见图1,图1为本申请实施例中服务器异常检测方法的一种实施流程图,该方法可以包括以下步骤:Referring to Figure 1, Figure 1 is an implementation flow chart of a server anomaly detection method in an embodiment of the present application. The method may include the following steps:
S101:接收各服务器系统数据。S101: Receive system data from each server.
在服务器运行过程中,会生成各系统部件分别对应的服务器系统数据,基板管理控制器接收各服务器系统数据。During the operation of the server, server system data corresponding to each system component will be generated, and the baseboard management controller receives each server system data.
S102:对各服务器系统数据进行特征提取,得到各特征数据。S102: Perform feature extraction on each server system data to obtain each feature data.
在接收到各服务器系统数据之后,对各服务器系统数据进行特征提取,得到各特征数据。特征数据可以包括CPU温度、电压、内存使用率、CPU负载、网络流量等。After receiving each server system data, feature extraction is performed on each server system data to obtain each feature data. Characteristic data can include CPU temperature, voltage, memory usage, CPU load, network traffic, etc.
在本申请的一种具体实施方式中,在步骤S101之后,该方法还可以包括以下步骤:In a specific implementation of the present application, after step S101, the method may further include the following steps:
将各服务器系统数据存储至具有队列属性的临时储存模块中;Store each server system data in a temporary storage module with queue attributes;
相应的,对各服务器系统数据进行特征提取,可以包括以下步骤:Correspondingly, feature extraction of each server system data may include the following steps:
从临时储存模块中获取各服务器系统数据,并对各服务器系统数据进行特征提取。Obtain each server system data from the temporary storage module, and perform feature extraction on each server system data.
基板管理控制器中包含集成在芯片内部的临时储存模块,基板管理控制器在接收到各服务器系统数据之后,可以将各服务器系统数据存储至临时储存模块。临时储存模块可以设置为具有队列属性的储存单元,即数据先入先出,用于临时储存服务器系统数据。当临时储存模块饱和后,数据滑动储存,左端滑入一个单位数据x*,右端滑出一个单位数据,新滑入的单位数据标记为待检测数据点x*。初始期存在数据收集过程,当临时储存模块饱和后,边缘端(即近端)异常检测环境就绪。假定服务器系统每15分钟产生一个状态信息,即一个单位数据,临时储存模块滑入一个单位数据。The baseboard management controller includes a temporary storage module integrated inside the chip. After receiving the data of each server system, the baseboard management controller can store the data of each server system in the temporary storage module. The temporary storage module can be set as a storage unit with queue attributes, that is, data is first in, first out, and is used to temporarily store server system data. When the temporary storage module is saturated, the data is slid and stored. One unit data x * slides in from the left end, and one unit data slides out from the right end. The newly slid in unit data is marked as the data point to be detected x * . There is a data collection process in the initial stage. When the temporary storage module is saturated, the edge-end (i.e., near-end) anomaly detection environment is ready. It is assumed that the server system generates status information every 15 minutes, that is, one unit of data, and the temporary storage module slides in one unit of data.
在本申请的一种具体实施方式中,对各服务器系统数据进行特征提取,可以包括以下步骤:In a specific implementation of this application, feature extraction of each server system data may include the following steps:
步骤一:从各服务器系统数据中随机选取预设数量的服务器系统数据;Step 1: Randomly select a preset number of server system data from each server system data;
步骤二:对选取到的各服务器系统数据进行特征提取。Step 2: Extract features from the selected server system data.
为方便描述,可以将上述两个步骤结合起来进行说明。For convenience of description, the above two steps can be combined for explanation.
还可以在接收到各服务器系统数据之后,先从所有的服务器系统数据中随机选取预设数量的服务器系统数据,即随机选择出一部分服务器系统数据,仅对选取的各服务器系统数据进行特征提取。通过随机选取一部分服务器系统数据进行特征提取,并从提取到的所有特征中选择出一部分特征进行二叉树构建,既可以保证每棵树上服务器系统数据的多样性,还可以减少内存的消耗,可避免维度灾难。选择特征时,可以通过随机选择方式进行特征选择,充分利用随机选择速度快的优势,也可以通过使用峰度检验的方式进行特征选择,从而保证 有较好的特征选取效果。It is also possible to randomly select a preset number of server system data from all server system data after receiving each server system data, that is, randomly select a part of the server system data, and only perform feature extraction on the selected server system data. By randomly selecting a part of the server system data for feature extraction, and selecting a part of the features from all the extracted features for binary tree construction, it can not only ensure the diversity of the server system data on each tree, but also reduce memory consumption and avoid Dimensional disaster. When selecting features, you can select features by random selection, taking full advantage of the fast speed of random selection, or you can select features by using kurtosis testing to ensure It has better feature selection effect.
S103:根据各特征数据进行二叉树构建,得到各二叉树。S103: Construct a binary tree based on each feature data to obtain each binary tree.
在从各服务器系统数据中提取到各特征数据之后,根据各特征数据进行二叉树构建,如可以利用袋装法进行二叉树构建,得到各二叉树。After each feature data is extracted from each server system data, a binary tree is constructed based on each feature data. For example, the bagging method can be used to construct a binary tree to obtain each binary tree.
在构建二叉树时,将选取的各服务器系统数据放到根节点,随机从预先选择的各特征数据中选择一个特征,在当前特征中随机产生一个切割点c,切割点c产生于该特征的最小值和最大值之间,以此切割点生成一个超平面,将服务器系统数据空间切分成两个子空间,将该特征下小于c的服务器系统数据放在左子树,将该特征下大于等于c的服务器系统数据放在右子树。各子树递归步骤分割服务器系统数据,不断构造新的子树,直到遇到满足终止条件。When building a binary tree, put the selected server system data into the root node, randomly select a feature from the pre-selected feature data, and randomly generate a cutting point c in the current feature. The cutting point c is generated from the minimum value of the feature. between the value and the maximum value, use this cutting point to generate a hyperplane, divide the server system data space into two subspaces, put the server system data that is less than c under this feature on the left subtree, and put the server system data that is greater than or equal to c under this feature The server system data is placed in the right subtree. Each subtree recursively steps to divide the server system data and continuously construct new subtrees until the termination condition is met.
终止条件可以包括:Termination conditions can include:
(1)分割出待检测点;(1) Segment the points to be detected;
(2)子树已到达限定高度l=ceiling(log2nt),其中,nt为预先选取的服务器系统数据总数;(2) The subtree has reached the limited height l=ceiling(log 2 nt ), where nt is the total number of pre-selected server system data;
(3)子树上的服务器系统数据所有特征值相同;(3) All characteristic values of the server system data on the subtree are the same;
(4)子树无法继续分割。(4) The subtree cannot be further divided.
S104:计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度。S104: Calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree.
在构建得到各二叉树之后,计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度。After each binary tree is constructed, the average path length corresponding to each server system data in the binary tree group composed of each binary tree is calculated.
在本申请的一种具体实施方式中,步骤S104可以包括以下步骤:In a specific implementation of the present application, step S104 may include the following steps:
步骤一:在由各二叉树构成的二叉树群中,针对每个服务器系统数据计算服务器系统数据在每个二叉树中所在叶子节点到根节点的距离,得到服务器系统数据在每个二叉树上的路径长;Step 1: In the binary tree group composed of each binary tree, for each server system data, calculate the distance from the leaf node where the server system data is located in each binary tree to the root node, and obtain the path length of the server system data on each binary tree;
步骤二:对各二叉树上的路径长进行均值计算,得到服务器系统数据对应的平均路径长度。Step 2: Calculate the average path length on each binary tree to obtain the average path length corresponding to the server system data.
为方便描述,可以将上述两个步骤结合起来进行说明。For convenience of description, the above two steps can be combined for explanation.
在计算各服务器系统数据分别对应的平均路径长度时,首先针对每个服务器系统数据计算其在每个二叉树中所在叶子节点到根节点的距离,分别得到该服务器系统数据在每个二叉树上的路径长h(x)。再对各二叉树上的路径长h(x)进行均值计算,得到该服务器系统数据对应的平均路径长度E[h(x)]。When calculating the average path length corresponding to each server system data, first calculate the distance from the leaf node to the root node in each binary tree for each server system data, and obtain the path of the server system data on each binary tree. The length is h(x). Then the average path length h(x) on each binary tree is calculated to obtain the average path length E[h(x)] corresponding to the server system data.
S105:当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。S105: When abnormal data is detected in each server system data based on each average path length, obtain the normal data and abnormal data obtained by the remote end's offloading of each server system data.
在计算得到在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度之后,根据各平均路径长度判断各服务器系统数据中是否存在异常数据。服务器系统数据在发送至近端的同时,也会将相同的服务器系统数据发送至远端(如云平台),远端会将各服 务器系统数据分流为各正常数据和各异常数据。当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,触发远端异常检测,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。After calculating the average path length corresponding to each server system data in the binary tree group composed of each binary tree, it is determined whether there is abnormal data in each server system data based on each average path length. When the server system data is sent to the near end, the same server system data will also be sent to the remote end (such as a cloud platform), and the remote end will The server system data is divided into normal data and abnormal data. When abnormal data is detected in the data of each server system based on the average path length, remote abnormality detection is triggered to obtain the normal data and abnormal data obtained by the remote end shunting the data of each server system.
在本申请的一种具体实施方式中,步骤S105可以包括以下步骤:In a specific implementation of the present application, step S105 may include the following steps:
步骤一:根据各平均路径长度分别计算各服务器系统数据在二叉树群中的异常得分;Step 1: Calculate the anomaly score of each server system data in the binary tree group based on the average path length;
步骤二:当根据各异常得分检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。Step 2: When it is detected that abnormal data exists in each server system data according to each abnormality score, obtain each normal data and each abnormal data obtained by shunting each server system data from the remote end.
为方便描述,可以将上述两个步骤结合起来进行说明。For convenience of description, the above two steps can be combined for explanation.
在计算得到在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度之后,可以根据各平均路径长度分别计算各服务器系统数据在二叉树群中的异常得分。当根据各异常得分检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。After calculating the average path length corresponding to each server system data in the binary tree group composed of each binary tree, the anomaly score of each server system data in the binary tree group can be calculated based on each average path length. When it is detected that abnormal data exists in each server system data according to each abnormality score, each normal data and each abnormal data obtained by remotely diverting each server system data are obtained.
可以根据异常得分与平均路径长度及二叉树高度之间的关系,进行异常得分计算。给定一个n个样本的数据集,二叉树的高度为:
w(n)=2H(n-1)-(2(n-1)-n);
The anomaly score can be calculated based on the relationship between the anomaly score, the average path length and the height of the binary tree. Given a data set of n samples, the height of the binary tree is:
w(n)=2H(n-1)-(2(n-1)-n);
其中,H(i)=ln(i)+0.5772156649为调和数。Among them, H(i)=ln(i)+0.5772156649 is the harmonic number.
异常得分可以将异常概念映射到[0,1]区间,定义如下:
The anomaly score can map the anomaly concept to the [0, 1] interval, which is defined as follows:
设置阈值δ,δ和ha是映射关系,即一一对应,映射公式为:当且仅当s(x(*),n)>δ时,待检测服务器系统数据x(*)被判定为异常。Set the threshold δ, δ and ha are mapping relationships, that is, one-to-one correspondence. The mapping formula is: If and only if s(x (*) , n)>δ, the server system data x (*) to be detected is determined to be abnormal.
一般,当s(x(*),n)趋向于1时,待检测服务器系统数据x(*)被判定为异常,当s(x(*),n)趋向于0时,待检测服务器系统数据x(*)被判定为正常。Generally, when s(x (*) , n) tends to 1, the server system data x (*) to be detected is determined to be abnormal. When s(x (*) , n) tends to 0, the server system to be detected Data x (*) is judged to be normal.
在本申请的一种具体实施方式中,步骤S105可以包括以下步骤:In a specific implementation of the present application, step S105 may include the following steps:
当确定存在小于预设异常路径长度阈值的平均路径长度时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。When it is determined that there is an average path length smaller than the preset abnormal path length threshold, each normal data and each abnormal data obtained by the remote end diverting the data of each server system are obtained.
一般情况下,异常数据的平均路径长度E[h(x)]偏短,易分割出去。可以预先设定异常路径长度阈值ha,当确定存在小于预设异常路径长度阈值的平均路径长度时,如当存在服务器系统数据x(*)的平均路径长E[h(x*)]≤ha时,样本x(*)被判定为异常。在这种情况下,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。Generally, the average path length E[h(x)] of abnormal data is short and easy to segment. The abnormal path length threshold ha can be set in advance, when it is determined that there is an average path length smaller than the preset abnormal path length threshold, such as when there is an average path length E[h(x * )]≤ of the server system data x (* ) When h a , sample x (*) is judged to be abnormal. In this case, obtain the normal data and abnormal data obtained by the remote end's offloading of system data of each server.
S106:基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型。S106: Establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data.
在获取到远端对各服务器系统数据进行分流得到的各正常数据和各异常数据之后,基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型。 After obtaining the normal data and the abnormal data obtained by remotely shunting the data of each server system, a first multivariate Gaussian distribution model is established based on the normal data, and a second multivariate Gaussian distribution model is established based on the abnormal data.
在对第一多元高斯分布模型建立的过程中,通过以下公式计算N1个正常数据的均值μ1和协方差∑1
In the process of establishing the first multivariate Gaussian distribution model, the mean μ 1 and covariance ∑ 1 of N 1 normal data are calculated through the following formula:
可得正常数据的第一多元高斯分布模型p1(x):
The first multivariate Gaussian distribution model p 1 (x) of normal data can be obtained:
在对第二多元高斯分布模型建立的过程中,通过以下公式计算N2个正常数据的均值μ2和协方差∑2
In the process of establishing the second multivariate Gaussian distribution model, the mean μ 2 and covariance ∑ 2 of N 2 normal data are calculated through the following formula:
可得异常数据的概率模型第二多元高斯分布模型p2(x):
The second multivariate Gaussian distribution model p 2 (x) of the probability model for abnormal data can be obtained:
从而得到分别基于各正常数据建立的第一多元高斯分布模型和基于各异常数据建立的第二多元高斯分布模型。Thus, the first multivariate Gaussian distribution model established based on each normal data and the second multivariate Gaussian distribution model established based on each abnormal data are obtained.
S107:结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测。S107: Combine the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model to perform superimposed anomaly detection on each server system data.
在基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型之后,结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测。After establishing the first multivariate Gaussian distribution model based on each normal data and establishing the second multivariate Gaussian distribution model based on each abnormal data, the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to analyze each server system data. Perform overlay anomaly detection.
在本申请的一种具体实施方式中,在步骤S107之后,该方法还可以包括以下步骤:当各服务器系统数据中存在异常数据时,对临时储存模块中的异常数据进行剔除操作。承接上述举例,当待检测数据点x*异常时,临时储存模块中数据流不进行滑动,而直接剔除。从而实现对正常数据和异常数据的分离。In a specific implementation manner of the present application, after step S107, the method may also include the following steps: when abnormal data exists in each server system data, removing the abnormal data in the temporary storage module. Following the above example, when the data point x * to be detected is abnormal, the data flow in the temporary storage module is not slid but is directly eliminated. This achieves the separation of normal data and abnormal data.
由上述技术方案可知,通过在近端对接收到的各服务器系统数据进行特征提取,根据提取到的各特征数据构建得到各二叉树,计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度,根据各平均路径长度对各服务器系统数据进行初始异常检测。远端在接收到各服务器系统数据时,会预先将各服务器系统数据分流为各正常数据和各异常数据,当在近端进行初始异常检测的检测结果为存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据,并基于各正常数据和各异常数据分别进行建立多元高斯分布模型,从而在远端对各服务器系统数据进行叠加异常检测。近端异常检测具有边缘计算的特点,省略数据传输过程,反应速度更快。当近端检测到服务器系统数据发生异常后,可以及时在系统部件升温之初或未升温前对部件进行保护,防止高温对部件的损 坏,也可保持系统的最优工作状态,高效输出。远端使用多元高斯分布模型进行全局异常检测,由近端异常检测触发,作叠加异常检测,预知服务器待机、死机等重大风险,进而可以提前采取维护措施。通过双端协同异常检测,可以科学分配计算资源,防止计算量爆炸,提高检测效率,有效规避一般基于距离异常检测等高负荷计算的弊端。It can be seen from the above technical solution that by performing feature extraction on the received server system data at the near end, each binary tree is constructed based on the extracted feature data, and each server system data in the binary tree group composed of each binary tree is calculated. The average path length is, and initial anomaly detection is performed on each server system data based on each average path length. When the remote end receives the data of each server system, it will pre-divide the data of each server system into each normal data and each abnormal data. When the detection result of the initial abnormality detection at the near end is that there is abnormal data, the remote end will obtain the data of each server system. The server system data is divided into normal data and abnormal data, and a multivariate Gaussian distribution model is established based on the normal data and abnormal data respectively, so that superimposed abnormality detection is performed on each server system data at the remote end. Proximal anomaly detection has the characteristics of edge computing, omitting the data transmission process and responding faster. When the near-end detects an abnormality in the server system data, it can promptly protect the system components before they start to heat up or before they heat up to prevent high temperature damage to the components. Even if the system is damaged, it can also maintain the optimal working condition of the system and produce efficient output. The remote end uses a multivariate Gaussian distribution model to perform global anomaly detection, which is triggered by the near-end anomaly detection and performs superimposed anomaly detection to predict major risks such as server standby and crash, so that maintenance measures can be taken in advance. Through dual-end collaborative anomaly detection, computing resources can be scientifically allocated to prevent the explosion of calculations, improve detection efficiency, and effectively avoid the disadvantages of high-load computing such as distance-based anomaly detection.
需要说明的是,基于上述实施例,本申请实施例还提供了相应的改进方案。在后续实施例中涉及与上述实施例中相同步骤或相应步骤之间可相互参考,相应的有益效果也可相互参照,在下文的改进实施例中不再一一赘述。It should be noted that, based on the above embodiments, the embodiments of the present application also provide corresponding improvement solutions. In the subsequent embodiments, the same steps or corresponding steps as in the above embodiments may be referred to each other, and the corresponding beneficial effects may also be referred to each other, which will not be described again in the following improved embodiments.
参见图2,图2为本申请实施例中服务器异常检测方法的另一种实施流程图,该方法可以包括以下步骤:Referring to Figure 2, Figure 2 is another implementation flow chart of the server anomaly detection method in the embodiment of the present application. The method may include the following steps:
S201:接收各服务器系统数据。S201: Receive system data from each server.
S202:对各服务器系统数据进行特征提取,得到各特征数据。S202: Perform feature extraction on each server system data to obtain each feature data.
S203:根据各特征数据进行二叉树构建,得到各二叉树。S203: Construct a binary tree based on each feature data to obtain each binary tree.
在本申请的一种具体实施方式中,根据各特征数据进行二叉树构建,可以包括以下步骤:In a specific implementation of the present application, constructing a binary tree based on each feature data may include the following steps:
利用基板管理控制器中各分布式计算结构单元根据各特征数据并行进行预设数量的二叉树构建。Each distributed computing structural unit in the baseboard management controller is used to construct a preset number of binary trees in parallel based on each characteristic data.
基板管理控制器中存在多个分布式计算结构单元,预先设置待构建的二叉树数量,在二叉树构建时,利用基板管理控制器中各分布式计算结构单元根据各特征数据并行进行预设数量的二叉树构建。通过利用各分布式计算结构单元对各二叉树进行并行构建,较大地提升了二叉树构建效率。There are multiple distributed computing structural units in the baseboard management controller, and the number of binary trees to be constructed is preset. When building the binary tree, each distributed computing structural unit in the baseboard management controller is used to perform a preset number of binary trees in parallel based on each characteristic data. Construct. By utilizing each distributed computing structural unit to construct each binary tree in parallel, the efficiency of binary tree construction is greatly improved.
在二叉树的构建过程中加入注意力机制,只关心待检测点x*的分割情况,所以二叉树并不需要分割所有数据点,可以提前停止,提升效率。An attention mechanism is added to the construction process of the binary tree, which only cares about the segmentation of the point x * to be detected. Therefore, the binary tree does not need to segment all data points and can be stopped in advance to improve efficiency.
S204:计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度。S204: Calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree.
S205:当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取第一异常检测结果。S205: When abnormal data is detected in each server system data according to each average path length, obtain the first abnormality detection result.
当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取第一异常检测结果。第一异常检测结果中可以包含具体的是哪个部件出现了异常。When abnormal data is detected in each server system data according to each average path length, the first abnormality detection result is obtained. The first abnormality detection result may include the specific component in which the abnormality occurs.
S206:将第一异常检测结果反馈至基板管理控制器,以使基板管理控制器控制风扇对相应系统部件进行降温操作。S206: Feed back the first abnormality detection result to the baseboard management controller, so that the baseboard management controller controls the fan to cool down the corresponding system component.
在获取到第一异常检测结果之后,将第一异常检测结果反馈至基板管理控制器,基板管理控制器在接收到第一异常检测结果之后,可以解析出具体是哪个系统部件出现了异常,进而控制风扇对相应系统部件进行降温操作,从而当近端检测(或预测)到服务器系统数据发生异常后,可在系统部件升温之初(或未升温前)对部件进行保护,防止高温对部件的损坏,也可保持系统的最优工作状态,高效输出。After obtaining the first abnormality detection result, the first abnormality detection result is fed back to the baseboard management controller. After receiving the first abnormality detection result, the baseboard management controller can parse out which system component is abnormal, and then Control the fan to cool down the corresponding system components, so that when the near-end detects (or predicts) an abnormality in the server system data, the system components can be protected at the beginning of the heating (or before the temperature rises) to prevent the high temperature from damaging the components. Even if it is damaged, it can still maintain the system's optimal working condition and efficient output.
S207:获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。S207: Obtain the normal data and abnormal data obtained by the remote end from distributing the data of each server system.
S208:基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型。 S208: Establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data.
S209:利用第一多元高斯分布模型计算各服务器系统数据分别对应的正常概率,并利用第二多元高斯分布模型计算各服务器系统数据分别对应的异常概率。S209: Use the first multivariate Gaussian distribution model to calculate the normal probability corresponding to each server system data, and use the second multivariate Gaussian distribution model to calculate the abnormal probability corresponding to each server system data.
在建立得到第一多元高斯分布模型和第二多元高斯分布模型之后,利用第一多元高斯分布模型计算各服务器系统数据分别对应的正常概率,并利用第二多元高斯分布模型计算各服务器系统数据分别对应的异常概率。After establishing the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model, the first multivariate Gaussian distribution model is used to calculate the normal probability corresponding to each server system data, and the second multivariate Gaussian distribution model is used to calculate the normal probability of each server system data. The abnormal probabilities corresponding to the server system data respectively.
S210:获取预设的正常概率阈值和异常概率阈值,针对每个服务器系统数据,结合正常概率阈值、异常概率阈值以及服务器系统数据对应的正常概率和异常概率进行叠加异常检测。S210: Obtain the preset normal probability threshold and abnormal probability threshold, and for each server system data, perform superimposed abnormality detection based on the normal probability threshold, abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data.
预先设置正常概率阈值和异常概率阈值,获取预设的正常概率阈值和异常概率阈值,针对每个服务器系统数据,结合正常概率阈值、异常概率阈值以及服务器系统数据对应的正常概率和异常概率进行叠加异常检测。Set the normal probability threshold and abnormal probability threshold in advance, obtain the preset normal probability threshold and abnormal probability threshold, and for each server system data, combine the normal probability threshold, abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data for superposition abnormal detection.
承接步骤S106,可以设置阈值∈n和∈a,对于待检测服务器系统数据,当且仅当p1(x(*))<∈n且p2(x(*))<∈a时,模型会判断服务器出现(或即将出现)异常,反馈基板管理控制器封存磁盘,并向上级发送报告,以使作业人员合理制定工作计划,保证工作的完整性。Following step S106, the thresholds ∈ n and ∈ a can be set. For the server system data to be detected, if and only if p 1 (x (*) ) <∈ n and p 2 (x (*) ) <∈ a , the model It will determine that an abnormality has occurred (or is about to occur) in the server, feedback to the baseboard management controller to seal the disk, and send a report to the superior, so that the operator can reasonably formulate a work plan and ensure the integrity of the work.
S211:获取叠加异常检测得到的第二异常检测结果。S211: Obtain the second anomaly detection result obtained by superimposed anomaly detection.
在结合正常概率阈值、异常概率阈值以及服务器系统数据对应的正常概率和异常概率进行叠加异常检测之后,获取叠加异常检测得到的第二异常检测结果。即,通过将服务器系统数据对应的正常概率与正常概率阈值进行对比,并将服务器系统数据对应的异常概率与异常概率阈值进行对比,通过两个对比结果得到第二异常检测结果。After superimposed abnormality detection is performed by combining the normal probability threshold, the abnormal probability threshold, and the normal probability and abnormal probability corresponding to the server system data, a second abnormality detection result obtained by superimposed abnormality detection is obtained. That is, by comparing the normal probability corresponding to the server system data with the normal probability threshold, and comparing the abnormal probability corresponding to the server system data with the abnormal probability threshold, the second abnormality detection result is obtained through the two comparison results.
S212:结合第一异常检测结果和第二异常检测结果进行服务器异常维护操作。S212: Perform server abnormality maintenance operations based on the first abnormality detection result and the second abnormality detection result.
在得到第一异常检测结果和第二异常检测结果之后,结合第一异常检测结果和第二异常检测结果进行服务器异常维护操作。After obtaining the first anomaly detection result and the second anomaly detection result, a server anomaly maintenance operation is performed based on the first anomaly detection result and the second anomaly detection result.
在本申请的一种具体实施方式中,步骤S212可以包括以下步骤:In a specific implementation manner of the present application, step S212 may include the following steps:
步骤一:当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率不在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送磁盘封存指令,以使基板管理控制器进行磁盘封存操作,并向上级发送异常检测报告;Step 1: When the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data whose normal probability is not within the normal probability threshold and the abnormal probability is within the abnormal probability threshold, send the disk to the baseboard management controller Archive instructions to cause the baseboard management controller to perform a disk archive operation and send an abnormality detection report to the superior;
步骤二:当第一异常检测结果为存在异常数据且第二异常检测结果为不存在异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送风扇控制指令,以使基板管理控制器控制风扇对相应系统部件进行降温操作;Step 2: When the first abnormality detection result is that there is abnormal data and the second abnormality detection result is that there is no server system data with an abnormality probability within the abnormality probability threshold, send a fan control instruction to the baseboard management controller so that the baseboard management controller The controller controls the fan to cool down the corresponding system components;
步骤三:当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送风扇控制指令,以使基板管理控制器控制风扇对相应系统部件进行降温操作。Step 3: When the first abnormality detection result is that there is abnormal data, and the second abnormality detection result is that there is server system data with normal probability within the normal probability threshold and abnormality probability within the abnormality probability threshold, send the fan to the baseboard management controller Control instructions to enable the baseboard management controller to control fans to cool down corresponding system components.
为方便描述,可以将上述三个步骤结合起来进行说明。For convenience of description, the above three steps can be combined for explanation.
当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率不在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,以正常概率值大于等于∈n为正常概率阈值范围,异常概率值小于∈a为异常概率阈值范围,即当E[h(x*)]≤ha&p1(x(*))<∈n&p2(x(*))<∈a或者s(x(*),n)>δ&p1(x(*))<∈n&p2(x(*))<∈a 时,说明存在系统部件出现较严重的异常,向基板管理控制器发送磁盘封存指令,基板管理控制器根据磁盘封存指令进行磁盘封存操作,并向上级发送异常检测报告。When the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data whose normal probability is not within the normal probability threshold and the abnormal probability is within the abnormal probability threshold, it is considered normal if the normal probability value is greater than or equal to ∈ n Probability threshold range, the abnormal probability value is less than ∈ a , that is, when E[h(x * )]≤h a &p 1 (x (*) )<∈ n &p 2 (x (*) )<∈ a or s(x (*) , n)>δ&p 1 (x (*) )<∈ n &p 2 (x (*) )<∈ a When, it indicates that there is a serious abnormality in the system component, and a disk sealing instruction is sent to the baseboard management controller. The baseboard management controller performs the disk sealing operation according to the disk sealing instruction and sends an abnormality detection report to the superior.
当第一异常检测结果为存在异常数据且第二异常检测结果为不存在异常概率在异常概率阈值内的服务器系统数据时,即以正常概率值大于等于∈n为正常概率阈值范围,异常概率值小于∈a为异常概率阈值范围,当E[h(x*)]≤ha&p2(x(*))>∈a或者s(x(*),n)>δ&p2(x(*))>∈a时,说明存在系统部件出现较轻的异常,向基板管理控制器发送风扇控制指令,基板管理控制器根据风扇控制指令控制风扇对相应系统部件进行降温操作。When the first anomaly detection result is that there is abnormal data and the second anomaly detection result is that there is no server system data with an abnormal probability within the abnormal probability threshold, that is, the normal probability value is greater than or equal to ∈ n as the normal probability threshold range, and the abnormal probability value Less than ∈ a is the abnormal probability threshold range, when E[h(x * )]≤h a &p 2 (x (*) )>∈ a or s(x (*) , n)>δ&p 2 (x (*) )>∈ a , it indicates that there is a minor abnormality in the system component, and a fan control instruction is sent to the baseboard management controller. The baseboard management controller controls the fan to cool down the corresponding system component according to the fan control instruction.
当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,即以正常概率值大于等于∈n为正常概率阈值范围,异常概率值小于∈a为异常概率阈值范围,当E[h(x*)]≤ha&p1(x(*))<∈n&p2(x(*))<∈a或者s(x(*),n)>δ&p1(x(*))<∈n&p2(x(*))<∈a时,说明存在系统部件出现较轻的异常,向基板管理控制器发送风扇控制指令,以使基板管理控制器控制风扇对相应系统部件进行降温操作。When the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data with normal probability within the normal probability threshold and abnormal probability within the abnormal probability threshold, that is, the normal probability value is greater than or equal to ∈ n . Normal probability threshold range, abnormal probability value less than ∈ a is the abnormal probability threshold range, when E[h(x * )]≤h a &p 1 (x (*) )<∈ n &p 2 (x (*) )<∈ When a or s(x (*) , n)>δ&p 1 (x (*) )<∈ n &p 2 (x (*) )<∈ a , it indicates that there is a minor abnormality in the system component, and the board management control The controller sends fan control instructions so that the baseboard management controller controls the fans to cool down the corresponding system components.
若是考虑工程应用场景,也可合理修改模型的计算方式,达到预期效果且计算廉价。假设服务器系统数据特征具有独立性,那么:
If engineering application scenarios are considered, the calculation method of the model can also be reasonably modified to achieve the desired effect and the calculation is cheap. Assuming that the server system data characteristics are independent, then:
其中,是服务器系统任意特征数据,便有:
in, is any characteristic data of the server system, then there are:
那么:
So:
其中,设置阈值∈,当且仅当p(x(*))<∈时,服务器系统数据x(*)判断为异常。Among them, the threshold ∈ is set, and if and only if p(x (*) ) <∈, the server system data x (*) is determined to be abnormal.
相应于上面的方法实施例,本申请还提供了一种服务器异常检测装置,下文描述的服务器异常检测装置与上文描述的服务器异常检测方法可相互对应参照。Corresponding to the above method embodiments, this application also provides a server anomaly detection device. The server anomaly detection device described below and the server anomaly detection method described above can be mutually referenced.
参见图3,图3为本申请实施例中一种服务器异常检测装置的结构框图,该装置可以包括:Referring to Figure 3, Figure 3 is a structural block diagram of a server anomaly detection device in an embodiment of the present application. The device may include:
数据接收模块31,用于接收各服务器系统数据;Data receiving module 31 is used to receive data from each server system;
特征提取模块32,用于对各服务器系统数据进行特征提取,得到各特征数据;The feature extraction module 32 is used to extract features from each server system data to obtain each feature data;
二叉树构建模块33,用于根据各特征数据进行二叉树构建,得到各二叉树;The binary tree construction module 33 is used to construct a binary tree based on each feature data to obtain each binary tree;
路径长度计算模块34,用于计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度;The path length calculation module 34 is used to calculate the average path length corresponding to each server system data in the binary tree group composed of each binary tree;
数据获取模块35,用于当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据; The data acquisition module 35 is used to acquire the normal data and the abnormal data obtained by the remote end shunting the data of each server system when abnormal data is detected in each server system data according to each average path length;
模型建立模块36,用于基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型;The model building module 36 is used to establish a first multivariate Gaussian distribution model based on each normal data, and establish a second multivariate Gaussian distribution model based on each abnormal data;
叠加异常检测模块37,用于结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测。The superimposed anomaly detection module 37 is used to perform superimposed anomaly detection on each server system data by combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model.
由上述技术方案可知,通过在近端对接收到的各服务器系统数据进行特征提取,根据提取到的各特征数据构建得到各二叉树,计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度,根据各平均路径长度对各服务器系统数据进行初始异常检测。远端在接收到各服务器系统数据时,会预先将各服务器系统数据分流为各正常数据和各异常数据,当在近端进行初始异常检测的检测结果为存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据,并基于各正常数据和各异常数据分别进行建立多元高斯分布模型,从而在远端对各服务器系统数据进行叠加异常检测。近端异常检测具有边缘计算的特点,省略数据传输过程,反应速度更快。当近端检测到服务器系统数据发生异常后,可以及时在系统部件升温之初或未升温前对部件进行保护,防止高温对部件的损坏,也可保持系统的最优工作状态,高效输出。远端使用多元高斯分布模型进行全局异常检测,由近端异常检测触发,作叠加异常检测,预知服务器待机、死机等重大风险,进而可以提前采取维护措施。通过双端协同异常检测,可以科学分配计算资源,防止计算量爆炸,提高检测效率,有效规避一般基于距离异常检测等高负荷计算的弊端。It can be seen from the above technical solution that by performing feature extraction on the received server system data at the near end, each binary tree is constructed based on the extracted feature data, and each server system data in the binary tree group composed of each binary tree is calculated. The average path length is, and initial anomaly detection is performed on each server system data based on each average path length. When the remote end receives the data of each server system, it will pre-divide the data of each server system into each normal data and each abnormal data. When the detection result of the initial abnormality detection at the near end is that there is abnormal data, the remote end will obtain the data of each server system. The server system data is divided into normal data and abnormal data, and a multivariate Gaussian distribution model is established based on the normal data and abnormal data respectively, so that superimposed abnormality detection is performed on each server system data at the remote end. Proximal anomaly detection has the characteristics of edge computing, omitting the data transmission process and responding faster. When the near-end detects an abnormality in the server system data, it can promptly protect the system components before they start to heat up or before they heat up to prevent high temperature damage to the components. It can also maintain the system's optimal working status and efficient output. The remote end uses a multivariate Gaussian distribution model to perform global anomaly detection, which is triggered by the near-end anomaly detection and performs superimposed anomaly detection to predict major risks such as server standby and crash, so that maintenance measures can be taken in advance. Through dual-end collaborative anomaly detection, computing resources can be scientifically allocated to prevent the explosion of calculations, improve detection efficiency, and effectively avoid the disadvantages of high-load computing such as distance-based anomaly detection.
在本申请的一种具体实施方式中,叠加异常检测模块37包括:In a specific implementation of the present application, the superimposed anomaly detection module 37 includes:
概率计算子模块,用于利用第一多元高斯分布模型计算各服务器系统数据分别对应的正常概率,并利用第二多元高斯分布模型计算各服务器系统数据分别对应的异常概率;The probability calculation submodule is used to calculate the normal probability corresponding to each server system data using the first multivariate Gaussian distribution model, and calculate the abnormal probability corresponding to each server system data using the second multivariate Gaussian distribution model;
叠加异常检测子模块,用于获取预设的正常概率阈值和异常概率阈值,针对每个服务器系统数据,结合正常概率阈值、异常概率阈值以及服务器系统数据对应的正常概率和异常概率进行叠加异常检测。The superimposed anomaly detection sub-module is used to obtain the preset normal probability threshold and abnormal probability threshold. For each server system data, it performs superimposed anomaly detection based on the normal probability threshold, abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data. .
在本申请的一种具体实施方式中,该装置还可以包括:In a specific implementation manner of the present application, the device may further include:
第一结果获得模块,用于当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取第一异常检测结果;A first result obtaining module, configured to obtain a first abnormality detection result when abnormal data is detected in each server system data according to each average path length;
部件降温模块,用于将第一异常检测结果反馈至基板管理控制器,以使基板管理控制器控制风扇对相应系统部件进行降温操作。The component cooling module is used to feed back the first abnormality detection result to the baseboard management controller, so that the baseboard management controller controls the fan to perform a cooling operation on the corresponding system component.
在本申请的一种具体实施方式中,该装置还可以包括:In a specific implementation manner of the present application, the device may further include:
第一结果获得模块,在针对每个服务器系统数据,结合正常概率阈值、异常概率阈值以及服务器系统数据对应的正常概率和异常概率进行叠加异常检测之后,获取叠加异常检测得到的第二异常检测结果;The first result acquisition module performs superimposed anomaly detection on each server system data by combining the normal probability threshold, the abnormal probability threshold, and the normal probability and abnormal probability corresponding to the server system data, and obtains the second anomaly detection result obtained by the superimposed anomaly detection. ;
服务器异常维护模块,用于结合第一异常检测结果和第二异常检测结果进行服务器异常维护操作。The server abnormality maintenance module is used to perform server abnormality maintenance operations based on the first abnormality detection result and the second abnormality detection result.
在本申请的一种具体实施方式中,服务器异常维护模块包括:In a specific implementation of this application, the server exception maintenance module includes:
磁盘封存及报告发送子模块,用于当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率不在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送磁盘封存指令,以使基板管理控制器进行磁盘封存操作,并向上级发送异常检测报告; Disk archiving and report sending sub-module, used when the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data whose normal probability is not within the normal probability threshold and the abnormal probability is within the abnormal probability threshold, Send a disk sealing instruction to the baseboard management controller so that the baseboard management controller performs a disk sealing operation and sends an abnormality detection report to the superior;
第一部件降温子模块,用于当第一异常检测结果为存在异常数据且第二异常检测结果为不存在异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送风扇控制指令,以使基板管理控制器控制风扇对相应系统部件进行降温操作;The first component cooling submodule is configured to send a fan control instruction to the baseboard management controller when the first abnormality detection result is that abnormal data exists and the second abnormality detection result is that there is no server system data with an abnormality probability within the abnormality probability threshold. , so that the baseboard management controller controls the fan to cool down the corresponding system components;
第二部件降温子模块,用于当第一异常检测结果为存在异常数据,且第二异常检测结果为存在正常概率在正常概率阈值内且异常概率在异常概率阈值内的服务器系统数据时,向基板管理控制器发送风扇控制指令,以使基板管理控制器控制风扇对相应系统部件进行降温操作。The second component cooling submodule is used to send the cooling signal to the server when the first abnormality detection result is that there is abnormal data, and the second abnormality detection result is that there is server system data with normal probability within the normal probability threshold and abnormality probability within the abnormality probability threshold. The baseboard management controller sends a fan control instruction so that the baseboard management controller controls the fan to cool down the corresponding system component.
在本申请的一种具体实施方式中,数据获取模块35包括:In a specific implementation of this application, the data acquisition module 35 includes:
异常得分计算子模块,用于根据各平均路径长度分别计算各服务器系统数据在二叉树群中的异常得分;The anomaly score calculation submodule is used to calculate the anomaly score of each server system data in the binary tree group based on each average path length;
数据获取子模块,用于当根据各异常得分检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据。The data acquisition sub-module is used to obtain the normal data and abnormal data obtained by remote-end shunting of each server system data when abnormal data is detected in each server system data based on each abnormality score.
在本申请的一种具体实施方式中,该装置还可以包括:In a specific implementation manner of the present application, the device may further include:
数据存储模块,用于在接收各服务器系统数据之后,将各服务器系统数据存储至具有队列属性的临时储存模块中;The data storage module is used to store the data of each server system in a temporary storage module with queue attributes after receiving the data of each server system;
特征提取模块32具体为从临时储存模块中获取各服务器系统数据,并对各服务器系统数据进行特征提取的模块。The feature extraction module 32 is specifically a module that obtains each server system data from the temporary storage module and performs feature extraction on each server system data.
在本申请的一种具体实施方式中,该装置还可以包括:In a specific implementation manner of the present application, the device may further include:
数据剔除模块,用于在结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测之后,当各服务器系统数据中存在异常数据时,对临时储存模块中的异常数据进行剔除操作。The data elimination module is used to perform superimposed anomaly detection on each server system data by combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model. When there is abnormal data in each server system data, the temporary storage module The abnormal data is removed.
在本申请的一种具体实施方式中,特征提取模块32包括:In a specific implementation of the present application, the feature extraction module 32 includes:
数据选取子模块,用于从各服务器系统数据中随机选取预设数量的服务器系统数据;The data selection submodule is used to randomly select a preset number of server system data from each server system data;
特征提取子模块,用于对选取到的各服务器系统数据进行特征提取。The feature extraction submodule is used to extract features from the selected server system data.
在本申请的一种具体实施方式中,路径长度计算模块34包括:In a specific implementation of the present application, the path length calculation module 34 includes:
路径长计算子模块,用于在由各二叉树构成的二叉树群中,针对每个服务器系统数据计算服务器系统数据在每个二叉树中所在叶子节点到根节点的距离,得到服务器系统数据在每个二叉树上的路径长;The path length calculation submodule is used to calculate the distance from the leaf node where the server system data is located in each binary tree to the root node for each server system data in a binary tree group composed of binary trees, and obtain the distance between the server system data in each binary tree. The path on is long;
均值计算子模块,用于对各二叉树上的路径长进行均值计算,得到服务器系统数据对应的平均路径长度。The average calculation submodule is used to average the path lengths on each binary tree to obtain the average path length corresponding to the server system data.
在本申请的一种具体实施方式中,数据获取模块35具体为当确定存在小于预设异常路径长度阈值的平均路径长度时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据的模块。In a specific implementation manner of the present application, the data acquisition module 35 specifically acquires each normal data and each data obtained by the remote end shunting the data of each server system when it is determined that there is an average path length smaller than the preset abnormal path length threshold. Module for exception data.
相应于上面的方法实施例,参见图4,图4为本申请所提供的服务器异常检测设备的示意图,该设备可以包括:Corresponding to the above method embodiment, refer to Figure 4, which is a schematic diagram of the server anomaly detection device provided by this application. The device may include:
存储器332,用于存储计算机程序;Memory 332 for storing computer programs;
处理器322,用于执行计算机程序时实现上述方法实施例的服务器异常检测方法的步骤。The processor 322 is configured to implement the steps of the server anomaly detection method of the above method embodiment when executing the computer program.
具体的,请参考图5,图5为本实施例提供的一种服务器异常检测设备的具体结构示意 图,该服务器异常检测设备可因配置或性能不同而产生比较大的差异,可以包括处理器(central processing units,CPU)322(例如,一个或一个以上处理器)和存储器332,存储器332存储有一个或一个以上的计算机应用程序342或数据344。其中,存储器332可以是短暂存储或持久存储。存储在存储器332的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据处理设备中的一系列指令操作。更进一步地,处理器322可以设置为与存储器332通信,在服务器异常检测设备301上执行存储器332中的一系列指令操作。服务器异常检测设备301还可以包括一个或一个以上电源326,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口358,和/或,一个或一个以上操作系统341。Specifically, please refer to Figure 5. Figure 5 is a schematic diagram of the specific structure of a server anomaly detection device provided in this embodiment. As shown in the figure, the server anomaly detection device may vary greatly due to different configurations or performance, and may include a processor (central processing unit, CPU) 322 (for example, one or more processors) and a memory 332. The memory 332 stores One or more computer applications 342 or data 344. Among them, the memory 332 may be short-term storage or persistent storage. The program stored in the memory 332 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data processing device. Furthermore, the processor 322 may be configured to communicate with the memory 332 and execute a series of instruction operations in the memory 332 on the server anomaly detection device 301 . The server anomaly detection device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input and output interfaces 358, and/or, one or more operating systems 341.
上文所描述的服务器异常检测方法中的步骤可以由服务器异常检测设备的结构实现。The steps in the server anomaly detection method described above can be implemented by the structure of the server anomaly detection device.
相应于上面的方法实施例,参照图6,本申请还提供一种非易失性可读存储介质,非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时可实现如下步骤:Corresponding to the above method embodiment, with reference to Figure 6, the present application also provides a non-volatile readable storage medium. A computer program is stored on the non-volatile readable storage medium. When the computer program is executed by the processor, the Follow these steps:
接收各服务器系统数据;对各服务器系统数据进行特征提取,得到各特征数据;根据各特征数据进行二叉树构建,得到各二叉树;计算在由各二叉树构成的二叉树群中各服务器系统数据分别对应的平均路径长度;当根据各平均路径长度检测到各服务器系统数据中存在异常数据时,获取远端对各服务器系统数据进行分流得到的各正常数据和各异常数据;基于各正常数据建立第一多元高斯分布模型,并基于各异常数据建立第二多元高斯分布模型;结合第一多元高斯分布模型和第二多元高斯分布模型对各服务器系统数据进行叠加异常检测。Receive each server system data; perform feature extraction on each server system data to obtain each characteristic data; construct a binary tree based on each characteristic data to obtain each binary tree; calculate the average corresponding to each server system data in the binary tree group composed of each binary tree. Path length; when abnormal data is detected in each server system data according to each average path length, obtain each normal data and each abnormal data obtained by the remote end's shunting of each server system data; establish a first multiplex based on each normal data Gaussian distribution model, and establish a second multivariate Gaussian distribution model based on each abnormal data; combine the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model to perform superimposed anomaly detection on each server system data.
该非易失性可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The non-volatile readable storage medium can include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc. The medium on which program code is stored.
对于本申请提供的非易失性可读存储介质的介绍请参照上述方法实施例,本申请在此不做赘述。For an introduction to the non-volatile readable storage medium provided by this application, please refer to the above method embodiments, and this application will not elaborate further here.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置、设备及非易失性可读存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. As for the devices, equipment and non-volatile readable storage media disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the description is relatively simple. For relevant details, please refer to the description in the method section.
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的技术方案及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。 This article uses specific examples to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the technical solutions and core ideas of the present application. It should be noted that for those of ordinary skill in the art, several improvements and modifications can be made to the present application without departing from the principles of the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.

Claims (20)

  1. 一种服务器异常检测方法,其特征在于,包括:A server anomaly detection method, characterized by including:
    接收各服务器系统数据;Receive system data from each server;
    对各所述服务器系统数据进行特征提取,得到各特征数据;Perform feature extraction on each server system data to obtain each feature data;
    根据各所述特征数据进行二叉树构建,得到各二叉树;Construct a binary tree according to each of the characteristic data to obtain each binary tree;
    计算在由各所述二叉树构成的二叉树群中各所述服务器系统数据分别对应的平均路径长度;Calculate the average path length corresponding to each of the server system data in the binary tree group composed of each of the binary trees;
    当根据各所述平均路径长度检测到各所述服务器系统数据中存在异常数据时,获取远端对各所述服务器系统数据进行分流得到的各正常数据和各异常数据;When abnormal data is detected in each of the server system data according to each of the average path lengths, obtain the normal data and the abnormal data obtained by diverting the data of each of the server systems from the remote end;
    基于各所述正常数据建立第一多元高斯分布模型,并基于各所述异常数据建立第二多元高斯分布模型;Establish a first multivariate Gaussian distribution model based on each of the normal data, and establish a second multivariate Gaussian distribution model based on each of the abnormal data;
    结合所述第一多元高斯分布模型和所述第二多元高斯分布模型对各所述服务器系统数据进行叠加异常检测。The first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model are combined to perform superimposed anomaly detection on each of the server system data.
  2. 根据权利要求1所述的服务器异常检测方法,其特征在于,结合所述第一多元高斯分布模型和所述第二多元高斯分布模型对各所述服务器系统数据进行叠加异常检测,包括:The server anomaly detection method according to claim 1, characterized in that, combining the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model to perform superimposed abnormality detection on each of the server system data, including:
    利用所述第一多元高斯分布模型计算各所述服务器系统数据分别对应的正常概率,并利用所述第二多元高斯分布模型计算各所述服务器系统数据分别对应的异常概率;Using the first multivariate Gaussian distribution model to calculate the normal probability corresponding to each of the server system data, and using the second multivariate Gaussian distribution model to calculate the abnormal probability corresponding to each of the server system data;
    获取预设的正常概率阈值和异常概率阈值,针对每个服务器系统数据,结合所述正常概率阈值、所述异常概率阈值以及所述服务器系统数据对应的正常概率和异常概率进行叠加异常检测。Obtain the preset normal probability threshold and abnormal probability threshold, and perform superimposed abnormality detection for each server system data in combination with the normal probability threshold, the abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data.
  3. 根据权利要求2所述的服务器异常检测方法,其特征在于,当根据各所述平均路径长度检测到各所述服务器系统数据中存在异常数据时,还包括:The server anomaly detection method according to claim 2, characterized in that when abnormal data is detected in each of the server system data according to each of the average path lengths, it further includes:
    获取第一异常检测结果;Obtain the first abnormality detection result;
    将所述第一异常检测结果反馈至基板管理控制器,以使所述基板管理控制器控制风扇对相应系统部件进行降温操作。The first abnormality detection result is fed back to the baseboard management controller, so that the baseboard management controller controls the fan to perform a cooling operation on the corresponding system component.
  4. 根据权利要求3所述的服务器异常检测方法,其特征在于,在针对每个服务器系统数据,结合所述正常概率阈值、所述异常概率阈值以及所述服务器系统数据对应的正常概率和异常概率进行叠加异常检测之后,还包括:The server anomaly detection method according to claim 3, characterized in that, for each server system data, the normal probability threshold, the abnormal probability threshold and the normal probability and abnormal probability corresponding to the server system data are combined. After superimposing anomaly detection, it also includes:
    获取叠加异常检测得到的第二异常检测结果;Obtain the second anomaly detection result obtained by superimposed anomaly detection;
    结合所述第一异常检测结果和所述第二异常检测结果进行服务器异常维护操作。 Perform server abnormality maintenance operations in combination with the first abnormality detection result and the second abnormality detection result.
  5. 根据权利要求4所述的服务器异常检测方法,其特征在于,结合所述第一异常检测结果和所述第二异常检测结果进行服务器异常维护操作,包括:The server anomaly detection method according to claim 4, characterized in that, combining the first anomaly detection result and the second anomaly detection result to perform server anomaly maintenance operations, including:
    当所述第一异常检测结果为存在异常数据,且所述第二异常检测结果为存在正常概率不在所述正常概率阈值内且异常概率在所述异常概率阈值内的服务器系统数据时,向所述基板管理控制器发送磁盘封存指令,以使所述基板管理控制器进行磁盘封存操作,并向上级发送异常检测报告;When the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data with a normal probability that is not within the normal probability threshold and an abnormality probability that is within the abnormal probability threshold, The baseboard management controller sends a disk sealing instruction, so that the baseboard management controller performs a disk sealing operation and sends an abnormality detection report to a superior;
    当所述第一异常检测结果为存在异常数据且所述第二异常检测结果为不存在异常概率在所述异常概率阈值内的服务器系统数据时,向所述基板管理控制器发送风扇控制指令,以使所述基板管理控制器控制风扇对相应系统部件进行降温操作;When the first abnormality detection result is that abnormal data exists and the second abnormality detection result is that there is no server system data with an abnormality probability within the abnormality probability threshold, sending a fan control instruction to the baseboard management controller, So that the baseboard management controller controls the fan to perform a cooling operation on the corresponding system component;
    当所述第一异常检测结果为存在异常数据,且所述第二异常检测结果为存在正常概率在所述正常概率阈值内且异常概率在所述异常概率阈值内的服务器系统数据时,向所述基板管理控制器发送风扇控制指令,以使所述基板管理控制器控制风扇对相应系统部件进行降温操作。When the first anomaly detection result is that there is abnormal data, and the second anomaly detection result is that there is server system data with a normal probability within the normal probability threshold and an abnormality probability within the abnormal probability threshold, the The baseboard management controller sends a fan control instruction, so that the baseboard management controller controls the fan to perform a cooling operation on the corresponding system component.
  6. 根据权利要求1所述的服务器异常检测方法,其特征在于,所述根据各所述特征数据进行二叉树构建,包括:The server anomaly detection method according to claim 1, wherein the binary tree construction based on each of the characteristic data includes:
    利用所述基板管理控制器中各分布式计算结构单元根据各所述特征数据并行进行预设数量的二叉树构建。Each distributed computing structural unit in the baseboard management controller is used to construct a preset number of binary trees in parallel according to each of the characteristic data.
  7. 根据权利要求6所述的服务器异常检测方法,其特征在于,所述二叉树包括左子树和右子树,所述利用所述基板管理控制器中各分布式计算结构单元根据各所述特征数据并行进行预设数量的二叉树构建,包括:The server anomaly detection method according to claim 6, wherein the binary tree includes a left subtree and a right subtree, and each distributed computing structural unit in the baseboard management controller is used according to each of the characteristic data. Performs a preset number of binary tree constructions in parallel, including:
    利用所述基板管理控制器中各分布式计算结构单元根据各所述特征数据将所述服务器系统数据划分至所述左子树或所述右子树,生成所述预设数量的二叉树。Each distributed computing structural unit in the baseboard management controller is used to divide the server system data into the left subtree or the right subtree according to each of the characteristic data, and generate the preset number of binary trees.
  8. 根据权利要求7所述的服务器异常检测方法,其特征在于,所述利用所述基板管理控制器中各分布式计算结构单元根据各所述特征数据将所述服务器系统数据划分至所述左子树或所述右子树,生成所述预设数量的二叉树,包括:The server anomaly detection method according to claim 7, characterized in that the use of each distributed computing structural unit in the baseboard management controller is used to divide the server system data into the left subsystem according to each of the characteristic data. tree or the right subtree to generate the preset number of binary trees, including:
    利用所述基板管理控制器中各分布式计算结构单元从各所述特征数据中确定当前特征数据以及与所述当前特征数据对应的切割点;Utilize each distributed computing structural unit in the substrate management controller to determine the current characteristic data and the cutting point corresponding to the current characteristic data from each of the characteristic data;
    将小于所述切割点的服务器系统数据置于左子树,将大于或等于所述切割点的服务器系统数据置于右子树,直至满足预设终止条件时生成所述预设数量的二叉树构建。Place server system data smaller than the cut point in the left subtree, place server system data greater than or equal to the cut point in the right subtree, and generate the preset number of binary tree constructions until the preset termination condition is met. .
  9. 根据权利要求1至6任一项所述的服务器异常检测方法,其特征在于,当根据各所述平均路径长度检测到各所述服务器系统数据中存在异常数据时,获取远端对各所述 服务器系统数据进行分流得到的各正常数据和各异常数据,包括:The server anomaly detection method according to any one of claims 1 to 6, characterized in that when abnormal data is detected in each of the server system data according to each of the average path lengths, the remote end is obtained for each of the server system data. The normal data and abnormal data obtained by shunting the server system data include:
    根据各所述平均路径长度分别计算各所述服务器系统数据在所述二叉树群中的异常得分;Calculate the anomaly score of each server system data in the binary tree group according to each of the average path lengths;
    当根据各所述异常得分检测到各所述服务器系统数据中存在异常数据时,获取远端对各所述服务器系统数据进行分流得到的各正常数据和各异常数据。When it is detected that abnormal data exists in each of the server system data according to each of the abnormal scores, each normal data and each abnormal data obtained by the remote end shunting each of the server system data are obtained.
  10. 根据权利要求1所述的服务器异常检测方法,其特征在于,所述二叉树群包括二叉树高度,所述根据各所述平均路径长度分别计算各所述服务器系统数据在所述二叉树群中的异常得分,包括:The server anomaly detection method according to claim 1, wherein the binary tree group includes a binary tree height, and the anomaly score of each server system data in the binary tree group is calculated based on each of the average path lengths. ,include:
    采用预设映射关系分别计算与各所述平均路径长度对应的各所述服务器系统数据在所述二叉树群中的异常得分;Using a preset mapping relationship to respectively calculate the anomaly score of each server system data corresponding to each of the average path lengths in the binary tree group;
    其中,所述预设映射关系为所述平均路径长度、所述二叉树高度以及所述异常得分之间的关系。Wherein, the preset mapping relationship is the relationship between the average path length, the binary tree height and the anomaly score.
  11. 根据权利要求1所述的服务器异常检测方法,其特征在于,在接收各服务器系统数据之后,还包括:The server anomaly detection method according to claim 1, characterized in that, after receiving each server system data, it further includes:
    将各所述服务器系统数据存储至具有队列属性的临时储存模块中;Store each server system data in a temporary storage module with queue attributes;
    对各所述服务器系统数据进行特征提取,包括:Feature extraction is performed on each server system data, including:
    从所述临时储存模块中获取各所述服务器系统数据,并对各所述服务器系统数据进行特征提取。Obtain each server system data from the temporary storage module, and perform feature extraction on each server system data.
  12. 根据权利要求11所述的服务器异常检测方法,其特征在于,所述将各所述服务器系统数据存储至具有队列属性的临时储存模块中,包括:The server anomaly detection method according to claim 11, wherein storing each server system data in a temporary storage module with queue attributes includes:
    若所述具有队列属性的临时储存模块饱和时,将各所述服务器系统数据滑动存储至所述临时储存模块中。If the temporary storage module with the queue attribute is saturated, each server system data is slidably stored in the temporary storage module.
  13. 根据权利要求11所述的服务器异常检测方法,其特征在于,所述特征数据至少包括温度信息、电压信息、内存使用率、负载信息以及流量信息,所述对各所述服务器系统数据进行特征提取,得到各特征数据,包括:The server anomaly detection method according to claim 11, wherein the feature data at least includes temperature information, voltage information, memory usage, load information and flow information, and feature extraction is performed on each server system data. , obtain each characteristic data, including:
    从所述临时储存模块中获取各所述服务器系统数据,并对各所述服务器系统数据进行特征提取,获得所述温度信息、所述电压信息、所述内存使用率、所述负载信息以及所述流量信息。Obtain each server system data from the temporary storage module, perform feature extraction on each server system data, and obtain the temperature information, the voltage information, the memory usage, the load information and the Describe traffic information.
  14. 根据权利要求10所述的服务器异常检测方法,其特征在于,在结合所述第一多元高斯分布模型和所述第二多元高斯分布模型对各所述服务器系统数据进行叠加异常检测之后,还包括: The server anomaly detection method according to claim 10, characterized in that, after superimposing anomaly detection on each of the server system data in combination with the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model, Also includes:
    当各所述服务器系统数据中存在异常数据时,对所述临时储存模块中的异常数据进行剔除操作。When there is abnormal data in each of the server system data, the abnormal data in the temporary storage module is eliminated.
  15. 根据权利要求1所述的服务器异常检测方法,其特征在于,对各所述服务器系统数据进行特征提取,包括:The server anomaly detection method according to claim 1, characterized in that feature extraction of each server system data includes:
    从各服务器系统数据中随机选取预设数量的服务器系统数据;Randomly select a preset number of server system data from each server system data;
    对选取到的各所述服务器系统数据进行特征提取。Feature extraction is performed on the selected server system data.
  16. 根据权利要求1所述的服务器异常检测方法,其特征在于,计算在由各所述二叉树构成的二叉树群中各所述服务器系统数据分别对应的平均路径长度,包括:The server anomaly detection method according to claim 1, characterized in that calculating the average path length corresponding to each server system data in the binary tree group composed of each of the binary trees includes:
    在由各所述二叉树构成的二叉树群中,针对每个服务器系统数据计算所述服务器系统数据在每个二叉树中所在叶子节点到根节点的距离,得到所述服务器系统数据在每个二叉树上的路径长;In the binary tree group composed of each of the binary trees, calculate the distance from the leaf node where the server system data is located in each binary tree to the root node for each server system data, and obtain the distance of the server system data on each binary tree. The path is long;
    对各二叉树上的路径长进行均值计算,得到所述服务器系统数据对应的平均路径长度。The average path length on each binary tree is calculated to obtain the average path length corresponding to the server system data.
  17. 根据权利要求1所述的服务器异常检测方法,其特征在于,当根据各所述平均路径长度检测到各所述服务器系统数据中存在异常数据时,获取远端对各所述服务器系统数据进行分流得到的各正常数据和各异常数据,包括:The server anomaly detection method according to claim 1, characterized in that, when abnormal data is detected in each of the server system data according to each of the average path lengths, the remote end is obtained to divert the data of each of the server systems. The normal data and abnormal data obtained include:
    当确定存在小于预设异常路径长度阈值的平均路径长度时,获取远端对各所述服务器系统数据进行分流得到的各所述正常数据和各所述异常数据。When it is determined that there is an average path length smaller than the preset abnormal path length threshold, each of the normal data and each of the abnormal data obtained by the remote end diverting the server system data are obtained.
  18. 一种服务器异常检测装置,其特征在于,包括:A server anomaly detection device, characterized by including:
    数据接收模块,用于接收各服务器系统数据;Data receiving module, used to receive data from each server system;
    特征提取模块,用于对各所述服务器系统数据进行特征提取,得到各特征数据;A feature extraction module, used to extract features from each server system data to obtain each feature data;
    二叉树构建模块,用于根据各所述特征数据进行二叉树构建,得到各二叉树;A binary tree building module is used to construct a binary tree based on each of the characteristic data to obtain each binary tree;
    路径长度计算模块,用于计算在由各所述二叉树构成的二叉树群中各所述服务器系统数据分别对应的平均路径长度;A path length calculation module, used to calculate the average path length corresponding to each server system data in the binary tree group composed of each of the binary trees;
    数据获取模块,用于当根据各所述平均路径长度检测到各所述服务器系统数据中存在异常数据时,获取远端对各所述服务器系统数据进行分流得到的各正常数据和各异常数据;A data acquisition module, configured to acquire normal data and abnormal data obtained by remote-end shunting of each server system data when abnormal data is detected in each of the server system data according to each of the average path lengths;
    模型建立模块,用于基于各所述正常数据建立第一多元高斯分布模型,并基于各所述异常数据建立第二多元高斯分布模型;A model building module, configured to establish a first multivariate Gaussian distribution model based on each of the normal data, and establish a second multivariate Gaussian distribution model based on each of the abnormal data;
    叠加异常检测模块,用于结合所述第一多元高斯分布模型和所述第二多元高斯分布模型对各所述服务器系统数据进行叠加异常检测。 A superposition anomaly detection module is used to perform superposition anomaly detection on each of the server system data in combination with the first multivariate Gaussian distribution model and the second multivariate Gaussian distribution model.
  19. 一种服务器异常检测设备,其特征在于,包括:A server anomaly detection device, which is characterized by including:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至17任一项所述服务器异常检测方法的步骤。A processor, configured to implement the steps of the server anomaly detection method according to any one of claims 1 to 17 when executing the computer program.
  20. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述服务器异常检测方法的步骤。 A non-volatile readable storage medium, characterized in that a computer program is stored on the non-volatile readable storage medium, and when the computer program is executed by a processor, it implements any one of claims 1 to 17 The steps of the server anomaly detection method.
PCT/CN2023/078528 2022-06-28 2023-02-27 Server anomaly detection method and apparatus, device, and readable storage medium WO2024001254A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210738323.5 2022-06-28
CN202210738323.5A CN114826971B (en) 2022-06-28 2022-06-28 Server abnormity detection method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
WO2024001254A1 true WO2024001254A1 (en) 2024-01-04

Family

ID=82522604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078528 WO2024001254A1 (en) 2022-06-28 2023-02-27 Server anomaly detection method and apparatus, device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN114826971B (en)
WO (1) WO2024001254A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118012662A (en) * 2024-04-08 2024-05-10 广东琴智科技研究院有限公司 Distributed fault distribution method, intelligent computing cloud operating system and computing platform

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826971B (en) * 2022-06-28 2022-12-27 苏州浪潮智能科技有限公司 Server abnormity detection method, device, equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666169A (en) * 2020-05-13 2020-09-15 云南电网有限责任公司信息中心 Improved isolated forest algorithm and Gaussian distribution-based combined data anomaly detection method
US20210192586A1 (en) * 2019-12-20 2021-06-24 Cintra Holding US Corp. Systems and Methods for Detecting and Responding to Anomalous Traffic Conditions
CN113887932A (en) * 2021-09-29 2022-01-04 平安医疗健康管理股份有限公司 Operation and maintenance management and control method and device based on artificial intelligence and computer equipment
CN114826971A (en) * 2022-06-28 2022-07-29 苏州浪潮智能科技有限公司 Server abnormity detection method, device, equipment and readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008082B (en) * 2019-03-16 2022-06-17 平安科技(深圳)有限公司 Abnormal task intelligent monitoring method, device, equipment and storage medium
CN113361186B (en) * 2021-04-28 2023-04-07 山东大学 Complete data-based wind turbine generator fault diagnosis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210192586A1 (en) * 2019-12-20 2021-06-24 Cintra Holding US Corp. Systems and Methods for Detecting and Responding to Anomalous Traffic Conditions
CN111666169A (en) * 2020-05-13 2020-09-15 云南电网有限责任公司信息中心 Improved isolated forest algorithm and Gaussian distribution-based combined data anomaly detection method
CN113887932A (en) * 2021-09-29 2022-01-04 平安医疗健康管理股份有限公司 Operation and maintenance management and control method and device based on artificial intelligence and computer equipment
CN114826971A (en) * 2022-06-28 2022-07-29 苏州浪潮智能科技有限公司 Server abnormity detection method, device, equipment and readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Control Charts and Machine Learning for Anomaly Detection in Manufacturing", 1 January 2022, SPRINGER INTERNATIONAL PUBLISHING, Cham, ISBN: 978-3-030-83819-5, article BARBARIOL TOMMASO, CHIARA FILIPPO DALLA, MARCATO DAVIDE, SUSTO GIAN ANTONIO: "A Review of Tree-Based Approaches for Anomaly Detection", pages: 149 - 185, XP009551291, DOI: 10.1007/978-3-030-83819-5_7 *
"Master's Theses ", 6 June 2020, SHAN DONG UNIVERSITY, China, article ZHANG, QINGFENG: "Anomaly Detection System of Heating Secondary Pipe Network Based on Big Data Platform", XP009551288, DOI: 10.27272/d.cnki.gshdu.2020.005103 *
WANG LIN; YU JUN; SHI LINGPENG; LU SHIDA; PANG HENGMAO; CHEN HAIYANG; MEI ZHU; XU MINGJIE; QIAN LIN: "Anomaly monitoring in high-density data centers based on gaussian distribution anomaly detection algorithm", 2020 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING AND COMPUTER APPLICATIONS( AEECA), IEEE, 25 August 2020 (2020-08-25), pages 836 - 841, XP033835993, DOI: 10.1109/AEECA49918.2020.9213549 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118012662A (en) * 2024-04-08 2024-05-10 广东琴智科技研究院有限公司 Distributed fault distribution method, intelligent computing cloud operating system and computing platform

Also Published As

Publication number Publication date
CN114826971A (en) 2022-07-29
CN114826971B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
WO2024001254A1 (en) Server anomaly detection method and apparatus, device, and readable storage medium
US10452845B2 (en) Generic framework to detect cyber threats in electric power grid
CN108964960B (en) Alarm event processing method and device
CN111131379B (en) Distributed flow acquisition system and edge calculation method
CN107608865B (en) Data storage method and device
WO2020124973A1 (en) Power optimization method and apparatus therefor, and photovoltaic device and photovoltaic system
KR102096466B1 (en) Device and method for remote control and alarm using real time database
CN113724100B (en) Power grid monitoring alarm message processing method of distributed cluster
CN112532435B (en) Operation and maintenance method, operation and maintenance management platform, equipment and medium
CN114095392B (en) Communication power supply monitoring method and system based on Internet of things
CN110838940B (en) Underground cable inspection task configuration method and device
CN116071902B (en) Method, equipment and medium for monitoring power equipment of machine room
US11925825B1 (en) Linkage control system and linkage control method for battery clusters
CN116578177A (en) Intelligent cooling energy-saving method, system, equipment and storage medium
CN116644567A (en) Method, system, equipment and medium for determining key transmission section of power system
CN113570473B (en) Equipment fault monitoring method, device, computer equipment and storage medium
JP6226463B2 (en) Network management system, network device and control device
CN105892387B (en) The automatic reporting device of computer room hidden danger and method based on cross-platform multi-point data acquisition MPCA model
CN114143095A (en) Power distribution terminal DTU intrusion detection method and system based on isolated forest
CN113553588B (en) Terminal software management method
CN110333006B (en) Temperature detection method, device, equipment and storage medium for culture area
CN114339468B (en) Data transmission method and device of unit equipment, computer equipment and storage medium
CN114362980A (en) Protocol hang login account identification method and device, computer equipment and storage medium
CN212115347U (en) Network flow data acquisition system
CN115952413A (en) Abnormal battery box detection method and device based on isolated forest and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23829446

Country of ref document: EP

Kind code of ref document: A1