CN115496233A - Markov model-based data middlebox operation fault prediction method - Google Patents

Markov model-based data middlebox operation fault prediction method Download PDF

Info

Publication number
CN115496233A
CN115496233A CN202210882372.6A CN202210882372A CN115496233A CN 115496233 A CN115496233 A CN 115496233A CN 202210882372 A CN202210882372 A CN 202210882372A CN 115496233 A CN115496233 A CN 115496233A
Authority
CN
China
Prior art keywords
fault
layer
data
maintenance
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210882372.6A
Other languages
Chinese (zh)
Inventor
张倩宜
包永迪
郝美薇
江黛茹
张旭
颜阳
杨丹丹
付嘉鑫
胡博
张驰
申琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Tianjin Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Tianjin Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202210882372.6A priority Critical patent/CN115496233A/en
Publication of CN115496233A publication Critical patent/CN115496233A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a data middlebox operation fault prediction method based on a Markov model, which comprises the following steps of A, quantizing an operation state; and B: making a fault diagnosis model; step C: optimizing an early warning mode, and providing a quantitative evaluation system of the operating state of a six-layer system, such as a service fault phenomenon, a component fault phenomenon and the like; then, a fault diagnosis model is established based on the state indexes, model optimization is carried out by means of hidden Markov, the fault is rapidly positioned by the operation and maintenance personnel in the auxiliary data, and the operation and maintenance efficiency is improved; and finally, designing and realizing an online fault diagnosis positioning system. On the basis of a six-layer operation state quantitative evaluation system and an active operation and maintenance mode of the Tianjin power data, on one hand, a troubleshooting main body of the Tianjin power data can be quickly positioned; on the other hand, based on the fault diagnosis model and the active operation and maintenance model, active early warning of the operation risk of the platform in the Tianjin power data and rapid fault processing can be achieved.

Description

Markov model-based data middlebox operation fault prediction method
Technical Field
The invention belongs to the field of data analysis, relates to a prediction technology of a data center station operation fault, and particularly relates to a prediction method of the data center station operation fault based on a Markov model.
Background
The data center is a strategic selection and organization form, and a set of mechanisms which continuously change data into assets and serve businesses is constructed by taking tangible products and implementation methods as supports according to specific business modes and organization structures of enterprises. The data center station generally has four capabilities of data acquisition and integration, data purification and processing, data service visualization and data value change.
The operation and maintenance work of the current data has a series of difficulties. The main body is as follows: the system has the advantages that the system is complex in structure, so that daily routing inspection is complex, workload is high, system risks are hidden, and the system risks are not easy to find; compared with the traditional mature software service, the data center fault diagnosis is difficult, and the experience dependence on operation and maintenance personnel is serious.
Because the data center station has a great difference from the traditional B/S architecture information system in terms of system architecture or physical deployment, operation and maintenance personnel are required to be familiar with the contents of the traditional operation and maintenance systems such as the data center station host level, middleware and application systems, and learn and know the routing inspection and use of various novel components of the data center station. At present, the work in the aspect of the operation fault promotion of the data center station mainly depends on operation and maintenance personnel with certain experience, and an effective automatic method is lacked.
Aiming at the problem of pain points of station operation and maintenance in the data, the invention provides a Markov model-based method for predicting station operation faults in the data. The markov model is a double stochastic process, which is divided into a markov chain for describing the transformation process between different states and an observation process, in the first stochastic process, which is generally described by a transition probability matrix; the observation process is a second stochastic process of the markov model, which is mainly used to describe the relationship between the state sequence and the observation sequence, and is described by the observation probability matrix. The method of the invention carries out fault prediction on the operation of the data center station based on the Markov model, and not only can carry out rapid fault diagnosis on the data center station, but also can actively early warn the system state.
Disclosure of Invention
The invention provides a Markov model-based method for predicting the operation fault of a data center station, which can be used for not only quickly diagnosing the fault of the data center station, but also actively early warning the system state.
The technical problem to be solved by the invention is realized by adopting the following technical scheme:
a data middlebox operation fault prediction method based on a Markov model comprises the following method steps:
step A: quantifying operating conditions
Combining two aspects of power grid marketing, equipment, human resources, operation and inspection related service requirements and a data center application component architecture system, finishing the combing and basic data acquisition of various faults, component states and fault influence ranges of a data center, finishing data layering, classification and relation combing by using a mathematical model, constructing a quantitative evaluation system for data center operation, objectively prompting potential risks of the system by using quantitative data, and measuring the health state of the data center, wherein the quantitative evaluation system for data center operation comprises six data center operation state quantitative evaluation systems of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range';
and B: fault diagnosis model formulation
For rapidly positioning various faults of a station in data, a fault diagnosis probability mathematical model is constructed by utilizing the incidence relation between evaluation system data and components, probability verification and optimization are carried out based on a Markov chain, the fault diagnosis accuracy is improved, after the station in the data breaks down, fault data is input, the sequence of troubleshooting components can be rapidly screened, the efficiency is improved for recovering data service, the incidence relation in the system model is evaluated according to six layers of states, and station operation and maintenance accumulated data in power data, the probability among all layers of incidence is defined, therefore, after a first layer of service fault happens, the troubleshooting sequence of faults caused by a fifth layer of operation and maintenance objects can be rapidly screened according to the incidence direction and probability distribution of all layers of incidence, meanwhile, the corresponding influence range of the sixth layer can be synchronously early warned, and a model algorithm is as follows:
Figure RE-GDA0003940247170000021
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0003940247170000022
representing operation and maintenance object y j Service failure x caused by operation problem i The probability of occurrence;
Figure RE-GDA0003940247170000023
representing a component failure c n Occurrence of a resulting traffic failure x i The probability of occurrence;
Figure RE-GDA0003940247170000024
representative index t k Component failure due to exception c n The probability of occurrence;
Figure RE-GDA0003940247170000025
representative state S m Index t caused by abnormality k A probability of an anomaly;
Figure RE-GDA0003940247170000026
representing operation and maintenance object y j Occurrence of running problem resulting in state S m The probability of an abnormality occurring is determined,
x is a service Fault, C is a Component Fault, T is an index Target, S is a state Status, Y is an operation and maintenance object, the numbers of the service Fault, the Component Fault, a key operation index, a key operation state and the operation and maintenance object are m, N, K, L and r respectively, i =1 \8230, m, N =1 \8230, N, K =1 \8230, K, L =1 \8230, L, j =1 \8230, r, in the model, the incidence relation probability of each node is operation and maintenance data accumulation, and a Markov model is introduced to optimize the random state conversion probability in order to improve the accuracy of the model;
and C: early warning mode optimization
Based on a data center evaluation system and a fault diagnosis model, a single threshold early warning operation and maintenance mode is abandoned, the abnormal state of the system is dynamically judged through an algorithm, a related dynamic threshold is adopted to replace a simple threshold to set an alarm threshold, the operation risk of the data center is reduced, and based on a hidden Markov data center fault diagnosis model, an active operation and maintenance mode of the data center is provided and is respectively constructed from three aspects of active early warning, active learning and active feedback.
Moreover, the first-layer service failure phenomenon defines various service requirements and various service accidents of the power grid, and the state of the layer is the most intuitive experience state of a user of the data center station and is the starting point of operation, operation and maintenance of the data center station; the second layer of component failure phenomenon defines common failures of each component of the station in the electric power data, the synchronization of various failures of the layer can cause service failures of the first layer, and the state layer is the starting point of platform operation and maintenance of station operation and maintenance personnel in the data; the third layer of key component indexes defines key operation index parameters of each component in the power data, the abnormal indexes of the layer can cause the occurrence of the faults of the second layer of components, and the state layer is the key layer for daily routing inspection and fault diagnosis; the state value of a key component at a fourth layer defines the operation parameters of each bottom layer component in the power data, the state information of the layer affects the operation condition of the index of the component at the third layer, and the abnormal state of the layer is the root cause of various operation faults at the upper layer; a fifth layer operation and maintenance object, which defines a bottom layer operation and maintenance main body of each component of the electric power data center platform, wherein the bottom layer is the foundation of the data center platform base; the sixth layer of influence range has the same meaning as the first layer of service, but is a service fault which may be caused after the fifth layer of operation and maintenance object is abnormal, and the sixth layer is a service range in which the active operation and maintenance is intervened in advance.
Moreover, the optimization process of the power diagnosis model starts from two aspects, namely, the basic probability among the incidence relations in the original model is iterated according to model prediction, verification and feedback; secondly, the diagnosis model is verified and probability optimized based on a hidden Markov chain, when the Markov chain is used for predicting faults of a power data relay station, a fault path in a six-layer state evaluation system needs to be selected, a fifth operation and maintenance object in the path, namely a real cause subject of the fault, can be defined as a fault i, then any path n is selected as a research object, the probability P of the line n with the initial fault is calculated, after a series of condition judgment, the comprehensive state transition probability of the line n is calculated, then the calculation result is substituted into the prediction model of the Markov chain, the fault probability of the line is calculated, the path with the maximum fault probability value is selected as the next-stage fault path of the current path through calculation and comparison, the path corresponding to the maximum probability value is selected through continuously repeating the prediction process, and the probability of the fault of each path is comprehensively analyzed and evaluated by using the mode.
In addition, in the aspect of active early warning, because the fault diagnosis model is passively positioned from top to bottom based on a six-layer state evaluation system, in an active operation and maintenance mode, the data center station patrols from bottom to top based on the six-layer state evaluation system, and after the abnormity of the 'key operation state' related to the operation and maintenance object and the 'key operation index' is found, early warning intervention is performed in advance, so that the faults of components of the data center station are avoided, further business faults are caused, and the service quality of the data center station is ensured.
In the aspect of active learning, based on fault model positioning, after fault troubleshooting is completed, data samples, influence ranges, processing methods and the like related to faults are stored in a fault knowledge base and then labeled, and when the running indexes of the data are similar to the labeled historical index features, the system can be directly matched with historical faults to give an alarm and matched with corresponding processing methods to guide the solving and recovery of the faults. By the active learning mode, the operation and maintenance experience can be transmitted among the whole operation and maintenance team, so that the unification of the operation and maintenance standards is ensured.
In the aspect of active feedback, an online fault diagnosis tool of a power data central operation and maintenance map is designed and realized, manual fault troubleshooting is assisted, a fault data characteristic knowledge base is established, operation and maintenance efficiency of a data central operation and maintenance system is optimized, and operation and maintenance quality of operation and maintenance personnel is improved.
The invention has the advantages and positive effects that:
the invention relates to a method for predicting a platform operation fault of a data middle platform based on a Markov model, which comprises the steps of firstly providing a six-layer system operation state quantitative evaluation system, and constructing the six-layer data middle platform operation state quantitative evaluation system of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range'; then, a fault diagnosis model is established based on the state indexes, model optimization is carried out by means of hidden Markov, and the fault is rapidly positioned by the station operation and maintenance personnel in the data, so that the operation and maintenance efficiency is improved; and finally, designing and realizing an online fault diagnosis positioning system.
The hidden Markov model-based data center fault diagnosis system provided by the invention is based on a six-layer running state quantitative evaluation system and an active operation and maintenance mode in Tianjin electric power data, and has a good effect in the Tianjin electric power public data center operation and maintenance process. On one hand, a troubleshooting main body in the Tianjin power data can be quickly positioned; on the other hand, based on the fault diagnosis model and the active operation and maintenance model, active early warning of the operation risk of the platform in the Tianjin power data and rapid fault processing can be achieved.
Drawings
FIG. 1 is a flow chart of the Markov model based prediction platform operational failure detection of the present invention;
FIG. 2 is a diagram of a quantitative evaluation system for six-layer operation states according to the present invention;
FIG. 3 is a diagram of a Markov model relationship in accordance with the present invention.
Detailed Description
The present invention is further illustrated by the following specific examples, which are intended to be illustrative, not limiting and are not intended to limit the scope of the invention.
The invention provides a Markov model-based data middling station operation fault prediction method, which comprises the following steps:
step A, quantizing the running state
The method comprises the steps of combining service requirements of power grid marketing, equipment, human resources, operation inspection and the like and a data center application component architecture system, finishing sorting and basic data acquisition of various faults, component states and fault influence ranges of the data center, finishing data layering, classification and relation sorting by using a mathematical model, constructing a quantitative evaluation system for data center operation, objectively prompting system potential risks by using quantitative data, and measuring the health state of the data center.
By combining the practical grid digital application service and the technical architecture of the Tianjin power data middle station, a quantitative evaluation system of the operation state of the middle station in six-layer data, namely a service fault phenomenon, a component fault phenomenon, a key operation index, a key operation state, an operation and maintenance object and an influence range, is constructed, and is shown in fig. 2.
A first layer: the service failure phenomenon defines various service requirements and various service accidents of the power grid, and the state of the layer is the most intuitive experience state of a data center station user and is the starting point of operation, operation and maintenance of the data center station.
A second layer: the component failure phenomenon defines common failures of each component of the station in the power data, the synchronization of various failures of the layer can cause service failures of the first layer, and the state layer is the starting point of platform operation and maintenance of station operation and maintenance personnel in the data.
And a third layer: the key component indexes define key operation index parameters of each component in the Tianjin power data, the second layer of component faults can be caused by abnormal indexes of the state layer, and the state layer is the key layer for daily routing inspection and fault diagnosis.
A fourth layer: the key component state value defines the operation parameters of each bottom component of the electric power data middle station, the state information of the layer affects the operation condition of the index of the third layer component, and the state abnormity of the layer is the root cause of various operation faults of the upper layer.
And a fifth layer: the operation and maintenance object defines a bottom operation and maintenance main body of each component of the electric power data middle platform, and the bottom operation and maintenance main body is a foundation of a data middle platform base.
A sixth layer: the influence range is the same as the first-layer service meaning, but the service fault which has not occurred yet is the service fault which may be caused after the fifth-layer operation and maintenance object is abnormal, and the layer is the service range which is intervened in advance by the active operation and maintenance.
And B: fault diagnosis model formulation
For rapidly positioning various faults of the data center, a fault diagnosis probability mathematical model is constructed by utilizing the incidence relation between evaluation system data and components, probability verification and optimization are carried out based on a Markov chain, and the fault diagnosis accuracy is improved. After the data center station breaks down, the sequence of the troubleshooting components can be quickly screened by inputting fault data, and the efficiency is improved for recovering data services.
And defining the probability among the associations of each layer according to the association relationship in the six-layer state evaluation system model and the station operation and maintenance accumulated data in the power data. Therefore, after a first-layer service fault occurs, a troubleshooting sequence of faults caused by a fifth-layer operation and maintenance object can be rapidly screened according to the associated trend and probability distribution of each layer, meanwhile, the corresponding influence range of the sixth layer can be synchronously early warned, and the summary of a model algorithm is as follows:
Figure RE-GDA0003940247170000051
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0003940247170000052
representing operation and maintenance object y j Service failure x caused by operation problem i The probability of occurrence;
Figure RE-GDA0003940247170000053
representative of component failure c n Occurrence of a resulting traffic failure x i The probability of occurrence;
Figure RE-GDA0003940247170000054
representative index t k Component failure due to exception c n The probability of occurrence;
Figure RE-GDA0003940247170000055
representative state S m Index t of abnormality induction k A probability of an anomaly;
Figure RE-GDA0003940247170000056
representing operation and maintenance object y j Occurrence of running problem resulting in state S m Probability of occurrence of an anomaly.
X is a service Fault, C is a Component Fault, T is an index (Target), S is a state (Status), and Y is an operation and maintenance object. The number of the service faults, the component faults, the key operation indexes, the key operation states and the operation and dimension objects is m, N, K, L and r, i = 1\8230, m, N =1 \8230, N, K =1 \823030, K, L =1 \8230l, j =1 \8230andr
In the model, the incidence relation probability of each node is the accumulation of operation and maintenance data, and in order to improve the accuracy of the model, a Markov model is introduced to optimize the random state transformation probability. The principle of the markov model is schematically illustrated in fig. 3.
The optimization process of the power diagnosis model starts from two aspects, namely, the basic probability among the incidence relations in the original model is iterated according to model prediction and verification feedback; and secondly, carrying out verification and probability optimization on the diagnosis model based on the hidden Markov chain.
When the Markov model is used for predicting the faults of the power data middlebox, firstly, a fault path in a six-layer state evaluation system needs to be selected, and a fifth operation and maintenance object in the path, namely a real cause subject of the fault, can be defined as a fault i. And then, selecting any path n as a research object, and calculating the probability P of the initial fault of the line n. After a series of conditional decisions, the integrated state transition probability for line n is calculated. Then, the calculation result is substituted into a prediction model of a markov chain, and a line fault probability is calculated. And calculating and comparing, and selecting the path with the maximum fault probability value as the next-level fault path of the current path. And continuously repeating the prediction process to select the path corresponding to the maximum probability value, and comprehensively analyzing and evaluating the probability of each path having faults by using the mode.
Step C: early warning mode optimization
Based on a data center evaluation system and a fault diagnosis model, a single threshold early warning operation and maintenance mode is abandoned, the abnormal state of the system is dynamically judged through an algorithm, and an alarm threshold is set by adopting an associated dynamic threshold instead of a simple threshold, so that the running risk of the data center is reduced.
A hidden Markov-based data center fault diagnosis model provides a data center active operation and maintenance mode which is respectively constructed from three aspects of active early warning, active learning and active feedback.
In the aspect of active early warning, because the fault diagnosis model is passively positioned from top to bottom based on a six-layer state evaluation system, in the active operation and maintenance mode, the data center is patrolled from bottom to top based on the six-layer state evaluation system, and after the abnormity of the 'key operation state' related to the operation and maintenance object and the 'key operation index' is found, early warning intervention is performed in advance, so that the faults of components of the data center are avoided, further the service faults are caused, and the service quality of the data center is ensured.
In the aspect of active learning, based on fault model positioning, after fault troubleshooting is completed, data samples, influence ranges, processing methods and the like related to faults are stored in a fault knowledge base, and then labeling is carried out. When the operation index of the data is similar to the marked historical index, the system can directly match the historical fault for warning, and match a corresponding processing method to guide the solution and recovery of the fault. By the active learning mode, the operation and maintenance experience can be transmitted among the whole operation and maintenance team, so that the unification of the operation and maintenance standards is ensured.
In the aspect of active feedback, an 'operation and maintenance map in power data' (an online fault diagnosis tool) is designed and realized, manual fault troubleshooting is assisted, a fault data characteristic knowledge base is established, operation and maintenance efficiency in data is optimized, and operation and maintenance quality of operation and maintenance personnel is improved.
The invention relates to a method for predicting a platform operation fault of a data middle platform based on a Markov model, which comprises the steps of firstly providing a six-layer system operation state quantitative evaluation system, and constructing the six-layer data middle platform operation state quantitative evaluation system of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range'; then, a fault diagnosis model is established based on the state indexes, model optimization is carried out by means of hidden Markov, the fault is rapidly positioned by the operation and maintenance personnel in the auxiliary data, and the operation and maintenance efficiency is improved; and finally, designing and realizing an online fault diagnosis positioning system.
The data middle station fault diagnosis system based on the Markov model takes a quantitative evaluation system of six layers of operation states of the data middle station in Tianjin electric power data and an active operation and maintenance mode as the basis, and plays a good effect in the operation and maintenance process of the data middle station in Tianjin electric power public data. On one hand, a troubleshooting main body in Tianjin power data can be quickly positioned; on the other hand, based on the fault diagnosis model and the active operation and maintenance model, active early warning of the operation risk of the platform in the Tianjin power data and rapid fault processing can be achieved.
Although the embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, alterations and modifications are possible without departing from the spirit and scope of this disclosure and appended claims, and accordingly, the scope of this disclosure is not limited to the embodiments disclosed.

Claims (6)

1. A Markov model-based data middlebox operation fault prediction method is characterized by comprising the following steps: the method comprises the following steps:
step A: quantifying operating conditions
Combining two aspects of power grid marketing, equipment, human resources, operation and inspection related service requirements and a data center application component architecture system, finishing the combing and basic data acquisition of various faults, component states and fault influence ranges of a data center, finishing data layering, classification and relation combing by using a mathematical model, constructing a quantitative evaluation system for data center operation, objectively prompting potential risks of the system by using quantitative data, and measuring the health state of the data center, wherein the quantitative evaluation system for data center operation comprises six data center operation state quantitative evaluation systems of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range';
and B: fault diagnosis model formulation
For rapidly positioning various faults of a station in data, a fault diagnosis probability mathematical model is constructed by utilizing the incidence relation between evaluation system data and components, probability verification and optimization are carried out based on a Markov chain, the fault diagnosis accuracy is improved, after the station in the data breaks down, fault data is input, the sequence of troubleshooting components can be rapidly screened, the efficiency is improved for recovering data service, the incidence relation in the system model is evaluated according to six layers of states, and station operation and maintenance accumulated data in power data, the probability among all layers of incidence is defined, therefore, after a first layer of service fault happens, the troubleshooting sequence of faults caused by a fifth layer of operation and maintenance objects can be rapidly screened according to the incidence direction and probability distribution of all layers of incidence, meanwhile, the corresponding influence range of the sixth layer can be synchronously early warned, and a model algorithm is as follows:
Figure RE-FDA0003940247160000011
in the formula (I), the compound is shown in the specification,
Figure RE-FDA0003940247160000012
representing operation and maintenance object y j Service failure x caused by operation problem i The probability of occurrence;
Figure RE-FDA0003940247160000013
representative of component failure c n Probability of occurrence of a traffic fault xi;
Figure RE-FDA0003940247160000014
represents the index t k Component failure due to exception c n The probability of occurrence;
Figure RE-FDA0003940247160000015
representative of state S m Index t of abnormality induction k A probability of an anomaly;
Figure RE-FDA0003940247160000016
representing operation and maintenance object y j Occurrence of operational problem resulting in state S m The probability of an abnormality occurring is determined,
x is a service Fault, C is a Component Fault, T is an index Target, S is a state Status, Y is an operation object, the numbers of the service Fault, the Component Fault, a key operation index, a key operation state and the operation object are m, N, K, L and r respectively, i =1 823030, m, N =1 8230, N, K =1 8230k, L =1 8230l, j =1 8230r, and in the model, the incidence relation probability of each node is operation data accumulation, and a Markov model is introduced to optimize the random state conversion probability in order to improve the accuracy of the model;
step C: early warning mode optimization
Based on a data center station evaluation system and a fault diagnosis model, a single threshold early warning operation and maintenance mode is abandoned, the abnormal state of the system is dynamically judged through an algorithm, a relevant dynamic threshold is adopted to replace a simple threshold to set an alarm threshold, the operation risk of the data center station is reduced, based on the hidden Markov data center station fault diagnosis model, the active operation and maintenance mode of the data center station is provided, and the active operation and maintenance mode is respectively constructed from three aspects of active early warning, active learning and active feedback.
2. The Markov model-based data center operation failure prediction method of claim 1, wherein: the first-layer service fault phenomenon defines various service requirements and various service accidents of the power grid, and the state of the first-layer service fault phenomenon is the most intuitive experience state of a user of the data center station and is the starting point of operation, operation and maintenance of the data center station; the second layer of component failure phenomenon defines common failures of each component of the station in the electric power data, the synchronization of various failures of the layer can cause service failures of the first layer, and the state layer is the starting point of platform operation and maintenance of station operation and maintenance personnel in the data; the third layer of key component indexes defines key operation index parameters of each component in the power data, the abnormal indexes of the third layer can cause the fault of the second layer of components, and the state layer is the key layer for daily routing inspection and fault diagnosis; the state value of the key component of the fourth layer defines the operation parameters of each bottom component of the electric power data, the state information of the layer influences the operation condition of the index of the component of the third layer, and the state abnormity of the layer is the root cause of various operation faults of the upper layer; a fifth layer operation and maintenance object, which defines a bottom layer operation and maintenance main body of each component of the electric power data console, wherein the bottom layer is the basis of the data console base; the sixth layer of influence range has the same meaning as the first layer of service, but the service fault which does not occur yet is a service fault which may be caused after the fifth layer of operation and maintenance object is abnormal, and the sixth layer is a service range in which the active operation and maintenance is intervened in advance.
3. The Markov model-based data center operation failure prediction method of claim 1, wherein: the optimization process of the power diagnosis model starts from two aspects, namely, the basic probability among the incidence relations in the original model is iterated according to model prediction, verification and feedback; secondly, the diagnosis model is verified and probability optimized based on a hidden Markov chain, when the Markov chain is used for predicting faults of a power data relay station, a fault path in a six-layer state evaluation system needs to be selected, a fifth operation and maintenance object in the path, namely a real cause subject of the fault, can be defined as a fault i, then any path n is selected as a research object, the probability P of the line n with the initial fault is calculated, after a series of condition judgment, the comprehensive state transition probability of the line n is calculated, then the calculation result is substituted into the prediction model of the Markov chain, the fault probability of the line is calculated, the path with the maximum fault probability value is selected as the next-stage fault path of the current path through calculation and comparison, the path corresponding to the maximum probability value is selected through continuously repeating the prediction process, and the probability of the fault of each path is comprehensively analyzed and evaluated by using the mode.
4. The Markov model-based data center operation failure prediction method of claim 1, wherein: in the aspect of active early warning, because the fault diagnosis model is passively positioned from top to bottom based on a six-layer state evaluation system, in the active operation and maintenance mode, the data center is patrolled from bottom to top based on the six-layer state evaluation system, and after the abnormity of the 'key operation state' related to the operation and maintenance object and the 'key operation index' is found, early warning intervention is performed in advance, so that the faults of components of the data center are avoided, further the service faults are caused, and the service quality of the data center is ensured.
5. The markov model-based data center station operational failure prediction method of claim 1, wherein: in the aspect of active learning, based on fault model positioning, after fault troubleshooting is completed, data samples, influence ranges, processing methods and the like related to faults are stored in a fault knowledge base and then labeled, when the operation indexes of the data are similar to the labeled historical index characteristics, the system can be directly matched with historical faults to give an alarm, and matched with corresponding processing methods to guide the solving and the recovery of the faults. By the active learning mode, the operation and maintenance experience can be transferred among the whole operation and maintenance teams, so that the unification of the operation and maintenance standards is ensured.
6. The Markov model-based data center operation failure prediction method of claim 1, wherein: in the aspect of active feedback, an online fault diagnosis tool of a power data middle station operation and maintenance map is designed and realized, manual fault troubleshooting is assisted, a fault data characteristic knowledge base is established, and the operation and maintenance quality of operation and maintenance personnel is improved while the operation and maintenance efficiency of the power data middle station is optimized.
CN202210882372.6A 2022-07-26 2022-07-26 Markov model-based data middlebox operation fault prediction method Pending CN115496233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210882372.6A CN115496233A (en) 2022-07-26 2022-07-26 Markov model-based data middlebox operation fault prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210882372.6A CN115496233A (en) 2022-07-26 2022-07-26 Markov model-based data middlebox operation fault prediction method

Publications (1)

Publication Number Publication Date
CN115496233A true CN115496233A (en) 2022-12-20

Family

ID=84467193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210882372.6A Pending CN115496233A (en) 2022-07-26 2022-07-26 Markov model-based data middlebox operation fault prediction method

Country Status (1)

Country Link
CN (1) CN115496233A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245357A (en) * 2023-01-31 2023-06-09 南京工大金泓能源科技有限公司 Fault diagnosis method and system for intelligent energy-saving cabinet
CN118503831A (en) * 2024-07-16 2024-08-16 南通职业大学 Building information model management system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116245357A (en) * 2023-01-31 2023-06-09 南京工大金泓能源科技有限公司 Fault diagnosis method and system for intelligent energy-saving cabinet
CN116245357B (en) * 2023-01-31 2023-09-22 南京工大金泓能源科技有限公司 Fault diagnosis method and system for intelligent energy-saving cabinet
CN118503831A (en) * 2024-07-16 2024-08-16 南通职业大学 Building information model management system
CN118503831B (en) * 2024-07-16 2024-09-20 南通职业大学 Building information model management system

Similar Documents

Publication Publication Date Title
Lv et al. Safety poka yoke in zero-defect manufacturing based on digital twins
CN115496233A (en) Markov model-based data middlebox operation fault prediction method
CN108873859B (en) Bridge type grab ship unloader fault prediction model method based on improved association rule
CN110705710A (en) Knowledge graph-based industrial fault analysis expert system
CN110703057A (en) Power equipment partial discharge diagnosis method based on data enhancement and neural network
CN111124852A (en) Fault prediction method and system based on BMC health management module
CN108920609A (en) Electric power experiment data mining method based on multi-dimensional analysis
CN108398934A (en) The system that a kind of equipment fault for rail traffic monitors
Wenner et al. The concept of digital twin to revolutionise infrastructure maintenance: The pilot project smartBRIDGE Hamburg
CN111915026A (en) Fault processing method and device, electronic equipment and storage medium
JP2024073353A (en) Comprehensive fault diagnosing method for hydroelectric power generation unit
CN111598467A (en) Reliability evaluation method and system for gathering and transportation combined station and key equipment
CN114004262A (en) Gearbox bearing fault detection method and system
CN117557127A (en) Power grid dispatching system supporting platform reliability assessment method, system and storage medium
CN114996110B (en) Deep inspection optimization method and system based on micro-service architecture
Thalmann et al. Cognitive decision support for industrial product life cycles: A position paper
CN118278914A (en) Method for realizing equipment fault rush repair based on GIS (geographic information System)
CN118192908A (en) Order data processing method and device of cloud printer
CN113094826A (en) Task reliability-based remaining life prediction method for multi-state manufacturing system
CN117333038A (en) Economic trend analysis system based on big data
Wenner et al. smartBRIDGE Hamburg: A digital twin to optimise infrastructure maintenance
CN114897262A (en) Rail transit equipment fault prediction method based on deep learning
CN115600695A (en) Fault diagnosis method of metering equipment
CN114779739A (en) Fault monitoring method for industrial process under cloud edge end cooperation based on probability map model
CN115297016A (en) Deep learning-based power network activity evaluation and prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination