CN115496233A - Markov model-based data middlebox operation fault prediction method - Google Patents
Markov model-based data middlebox operation fault prediction method Download PDFInfo
- Publication number
- CN115496233A CN115496233A CN202210882372.6A CN202210882372A CN115496233A CN 115496233 A CN115496233 A CN 115496233A CN 202210882372 A CN202210882372 A CN 202210882372A CN 115496233 A CN115496233 A CN 115496233A
- Authority
- CN
- China
- Prior art keywords
- fault
- layer
- data
- maintenance
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012423 maintenance Methods 0.000 claims abstract description 102
- 238000003745 diagnosis Methods 0.000 claims abstract description 44
- 238000011158 quantitative evaluation Methods 0.000 claims abstract description 17
- 238000013024 troubleshooting Methods 0.000 claims abstract description 15
- 238000005457 optimization Methods 0.000 claims abstract description 13
- 238000011156 evaluation Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 11
- 230000002159 abnormal effect Effects 0.000 claims description 10
- 238000007689 inspection Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000013178 mathematical model Methods 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims description 6
- 230000005856 abnormality Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000007704 transition Effects 0.000 claims description 4
- 238000009825 accumulation Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000009472 formulation Methods 0.000 claims description 3
- 230000036541 health Effects 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 238000011160 research Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000006698 induction Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 4
- 238000005309 stochastic process Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/20—Administration of product repair or maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Educational Administration (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention relates to a data middlebox operation fault prediction method based on a Markov model, which comprises the following steps of A, quantizing an operation state; and B: making a fault diagnosis model; step C: optimizing an early warning mode, and providing a quantitative evaluation system of the operating state of a six-layer system, such as a service fault phenomenon, a component fault phenomenon and the like; then, a fault diagnosis model is established based on the state indexes, model optimization is carried out by means of hidden Markov, the fault is rapidly positioned by the operation and maintenance personnel in the auxiliary data, and the operation and maintenance efficiency is improved; and finally, designing and realizing an online fault diagnosis positioning system. On the basis of a six-layer operation state quantitative evaluation system and an active operation and maintenance mode of the Tianjin power data, on one hand, a troubleshooting main body of the Tianjin power data can be quickly positioned; on the other hand, based on the fault diagnosis model and the active operation and maintenance model, active early warning of the operation risk of the platform in the Tianjin power data and rapid fault processing can be achieved.
Description
Technical Field
The invention belongs to the field of data analysis, relates to a prediction technology of a data center station operation fault, and particularly relates to a prediction method of the data center station operation fault based on a Markov model.
Background
The data center is a strategic selection and organization form, and a set of mechanisms which continuously change data into assets and serve businesses is constructed by taking tangible products and implementation methods as supports according to specific business modes and organization structures of enterprises. The data center station generally has four capabilities of data acquisition and integration, data purification and processing, data service visualization and data value change.
The operation and maintenance work of the current data has a series of difficulties. The main body is as follows: the system has the advantages that the system is complex in structure, so that daily routing inspection is complex, workload is high, system risks are hidden, and the system risks are not easy to find; compared with the traditional mature software service, the data center fault diagnosis is difficult, and the experience dependence on operation and maintenance personnel is serious.
Because the data center station has a great difference from the traditional B/S architecture information system in terms of system architecture or physical deployment, operation and maintenance personnel are required to be familiar with the contents of the traditional operation and maintenance systems such as the data center station host level, middleware and application systems, and learn and know the routing inspection and use of various novel components of the data center station. At present, the work in the aspect of the operation fault promotion of the data center station mainly depends on operation and maintenance personnel with certain experience, and an effective automatic method is lacked.
Aiming at the problem of pain points of station operation and maintenance in the data, the invention provides a Markov model-based method for predicting station operation faults in the data. The markov model is a double stochastic process, which is divided into a markov chain for describing the transformation process between different states and an observation process, in the first stochastic process, which is generally described by a transition probability matrix; the observation process is a second stochastic process of the markov model, which is mainly used to describe the relationship between the state sequence and the observation sequence, and is described by the observation probability matrix. The method of the invention carries out fault prediction on the operation of the data center station based on the Markov model, and not only can carry out rapid fault diagnosis on the data center station, but also can actively early warn the system state.
Disclosure of Invention
The invention provides a Markov model-based method for predicting the operation fault of a data center station, which can be used for not only quickly diagnosing the fault of the data center station, but also actively early warning the system state.
The technical problem to be solved by the invention is realized by adopting the following technical scheme:
a data middlebox operation fault prediction method based on a Markov model comprises the following method steps:
step A: quantifying operating conditions
Combining two aspects of power grid marketing, equipment, human resources, operation and inspection related service requirements and a data center application component architecture system, finishing the combing and basic data acquisition of various faults, component states and fault influence ranges of a data center, finishing data layering, classification and relation combing by using a mathematical model, constructing a quantitative evaluation system for data center operation, objectively prompting potential risks of the system by using quantitative data, and measuring the health state of the data center, wherein the quantitative evaluation system for data center operation comprises six data center operation state quantitative evaluation systems of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range';
and B: fault diagnosis model formulation
For rapidly positioning various faults of a station in data, a fault diagnosis probability mathematical model is constructed by utilizing the incidence relation between evaluation system data and components, probability verification and optimization are carried out based on a Markov chain, the fault diagnosis accuracy is improved, after the station in the data breaks down, fault data is input, the sequence of troubleshooting components can be rapidly screened, the efficiency is improved for recovering data service, the incidence relation in the system model is evaluated according to six layers of states, and station operation and maintenance accumulated data in power data, the probability among all layers of incidence is defined, therefore, after a first layer of service fault happens, the troubleshooting sequence of faults caused by a fifth layer of operation and maintenance objects can be rapidly screened according to the incidence direction and probability distribution of all layers of incidence, meanwhile, the corresponding influence range of the sixth layer can be synchronously early warned, and a model algorithm is as follows:
in the formula (I), the compound is shown in the specification,
representing operation and maintenance object y j Service failure x caused by operation problem i The probability of occurrence;
representing a component failure c n Occurrence of a resulting traffic failure x i The probability of occurrence;
representing operation and maintenance object y j Occurrence of running problem resulting in state S m The probability of an abnormality occurring is determined,
x is a service Fault, C is a Component Fault, T is an index Target, S is a state Status, Y is an operation and maintenance object, the numbers of the service Fault, the Component Fault, a key operation index, a key operation state and the operation and maintenance object are m, N, K, L and r respectively, i =1 \8230, m, N =1 \8230, N, K =1 \8230, K, L =1 \8230, L, j =1 \8230, r, in the model, the incidence relation probability of each node is operation and maintenance data accumulation, and a Markov model is introduced to optimize the random state conversion probability in order to improve the accuracy of the model;
and C: early warning mode optimization
Based on a data center evaluation system and a fault diagnosis model, a single threshold early warning operation and maintenance mode is abandoned, the abnormal state of the system is dynamically judged through an algorithm, a related dynamic threshold is adopted to replace a simple threshold to set an alarm threshold, the operation risk of the data center is reduced, and based on a hidden Markov data center fault diagnosis model, an active operation and maintenance mode of the data center is provided and is respectively constructed from three aspects of active early warning, active learning and active feedback.
Moreover, the first-layer service failure phenomenon defines various service requirements and various service accidents of the power grid, and the state of the layer is the most intuitive experience state of a user of the data center station and is the starting point of operation, operation and maintenance of the data center station; the second layer of component failure phenomenon defines common failures of each component of the station in the electric power data, the synchronization of various failures of the layer can cause service failures of the first layer, and the state layer is the starting point of platform operation and maintenance of station operation and maintenance personnel in the data; the third layer of key component indexes defines key operation index parameters of each component in the power data, the abnormal indexes of the layer can cause the occurrence of the faults of the second layer of components, and the state layer is the key layer for daily routing inspection and fault diagnosis; the state value of a key component at a fourth layer defines the operation parameters of each bottom layer component in the power data, the state information of the layer affects the operation condition of the index of the component at the third layer, and the abnormal state of the layer is the root cause of various operation faults at the upper layer; a fifth layer operation and maintenance object, which defines a bottom layer operation and maintenance main body of each component of the electric power data center platform, wherein the bottom layer is the foundation of the data center platform base; the sixth layer of influence range has the same meaning as the first layer of service, but is a service fault which may be caused after the fifth layer of operation and maintenance object is abnormal, and the sixth layer is a service range in which the active operation and maintenance is intervened in advance.
Moreover, the optimization process of the power diagnosis model starts from two aspects, namely, the basic probability among the incidence relations in the original model is iterated according to model prediction, verification and feedback; secondly, the diagnosis model is verified and probability optimized based on a hidden Markov chain, when the Markov chain is used for predicting faults of a power data relay station, a fault path in a six-layer state evaluation system needs to be selected, a fifth operation and maintenance object in the path, namely a real cause subject of the fault, can be defined as a fault i, then any path n is selected as a research object, the probability P of the line n with the initial fault is calculated, after a series of condition judgment, the comprehensive state transition probability of the line n is calculated, then the calculation result is substituted into the prediction model of the Markov chain, the fault probability of the line is calculated, the path with the maximum fault probability value is selected as the next-stage fault path of the current path through calculation and comparison, the path corresponding to the maximum probability value is selected through continuously repeating the prediction process, and the probability of the fault of each path is comprehensively analyzed and evaluated by using the mode.
In addition, in the aspect of active early warning, because the fault diagnosis model is passively positioned from top to bottom based on a six-layer state evaluation system, in an active operation and maintenance mode, the data center station patrols from bottom to top based on the six-layer state evaluation system, and after the abnormity of the 'key operation state' related to the operation and maintenance object and the 'key operation index' is found, early warning intervention is performed in advance, so that the faults of components of the data center station are avoided, further business faults are caused, and the service quality of the data center station is ensured.
In the aspect of active learning, based on fault model positioning, after fault troubleshooting is completed, data samples, influence ranges, processing methods and the like related to faults are stored in a fault knowledge base and then labeled, and when the running indexes of the data are similar to the labeled historical index features, the system can be directly matched with historical faults to give an alarm and matched with corresponding processing methods to guide the solving and recovery of the faults. By the active learning mode, the operation and maintenance experience can be transmitted among the whole operation and maintenance team, so that the unification of the operation and maintenance standards is ensured.
In the aspect of active feedback, an online fault diagnosis tool of a power data central operation and maintenance map is designed and realized, manual fault troubleshooting is assisted, a fault data characteristic knowledge base is established, operation and maintenance efficiency of a data central operation and maintenance system is optimized, and operation and maintenance quality of operation and maintenance personnel is improved.
The invention has the advantages and positive effects that:
the invention relates to a method for predicting a platform operation fault of a data middle platform based on a Markov model, which comprises the steps of firstly providing a six-layer system operation state quantitative evaluation system, and constructing the six-layer data middle platform operation state quantitative evaluation system of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range'; then, a fault diagnosis model is established based on the state indexes, model optimization is carried out by means of hidden Markov, and the fault is rapidly positioned by the station operation and maintenance personnel in the data, so that the operation and maintenance efficiency is improved; and finally, designing and realizing an online fault diagnosis positioning system.
The hidden Markov model-based data center fault diagnosis system provided by the invention is based on a six-layer running state quantitative evaluation system and an active operation and maintenance mode in Tianjin electric power data, and has a good effect in the Tianjin electric power public data center operation and maintenance process. On one hand, a troubleshooting main body in the Tianjin power data can be quickly positioned; on the other hand, based on the fault diagnosis model and the active operation and maintenance model, active early warning of the operation risk of the platform in the Tianjin power data and rapid fault processing can be achieved.
Drawings
FIG. 1 is a flow chart of the Markov model based prediction platform operational failure detection of the present invention;
FIG. 2 is a diagram of a quantitative evaluation system for six-layer operation states according to the present invention;
FIG. 3 is a diagram of a Markov model relationship in accordance with the present invention.
Detailed Description
The present invention is further illustrated by the following specific examples, which are intended to be illustrative, not limiting and are not intended to limit the scope of the invention.
The invention provides a Markov model-based data middling station operation fault prediction method, which comprises the following steps:
step A, quantizing the running state
The method comprises the steps of combining service requirements of power grid marketing, equipment, human resources, operation inspection and the like and a data center application component architecture system, finishing sorting and basic data acquisition of various faults, component states and fault influence ranges of the data center, finishing data layering, classification and relation sorting by using a mathematical model, constructing a quantitative evaluation system for data center operation, objectively prompting system potential risks by using quantitative data, and measuring the health state of the data center.
By combining the practical grid digital application service and the technical architecture of the Tianjin power data middle station, a quantitative evaluation system of the operation state of the middle station in six-layer data, namely a service fault phenomenon, a component fault phenomenon, a key operation index, a key operation state, an operation and maintenance object and an influence range, is constructed, and is shown in fig. 2.
A first layer: the service failure phenomenon defines various service requirements and various service accidents of the power grid, and the state of the layer is the most intuitive experience state of a data center station user and is the starting point of operation, operation and maintenance of the data center station.
A second layer: the component failure phenomenon defines common failures of each component of the station in the power data, the synchronization of various failures of the layer can cause service failures of the first layer, and the state layer is the starting point of platform operation and maintenance of station operation and maintenance personnel in the data.
And a third layer: the key component indexes define key operation index parameters of each component in the Tianjin power data, the second layer of component faults can be caused by abnormal indexes of the state layer, and the state layer is the key layer for daily routing inspection and fault diagnosis.
A fourth layer: the key component state value defines the operation parameters of each bottom component of the electric power data middle station, the state information of the layer affects the operation condition of the index of the third layer component, and the state abnormity of the layer is the root cause of various operation faults of the upper layer.
And a fifth layer: the operation and maintenance object defines a bottom operation and maintenance main body of each component of the electric power data middle platform, and the bottom operation and maintenance main body is a foundation of a data middle platform base.
A sixth layer: the influence range is the same as the first-layer service meaning, but the service fault which has not occurred yet is the service fault which may be caused after the fifth-layer operation and maintenance object is abnormal, and the layer is the service range which is intervened in advance by the active operation and maintenance.
And B: fault diagnosis model formulation
For rapidly positioning various faults of the data center, a fault diagnosis probability mathematical model is constructed by utilizing the incidence relation between evaluation system data and components, probability verification and optimization are carried out based on a Markov chain, and the fault diagnosis accuracy is improved. After the data center station breaks down, the sequence of the troubleshooting components can be quickly screened by inputting fault data, and the efficiency is improved for recovering data services.
And defining the probability among the associations of each layer according to the association relationship in the six-layer state evaluation system model and the station operation and maintenance accumulated data in the power data. Therefore, after a first-layer service fault occurs, a troubleshooting sequence of faults caused by a fifth-layer operation and maintenance object can be rapidly screened according to the associated trend and probability distribution of each layer, meanwhile, the corresponding influence range of the sixth layer can be synchronously early warned, and the summary of a model algorithm is as follows:
in the formula (I), the compound is shown in the specification,
representing operation and maintenance object y j Service failure x caused by operation problem i The probability of occurrence;
representative of component failure c n Occurrence of a resulting traffic failure x i The probability of occurrence;
representing operation and maintenance object y j Occurrence of running problem resulting in state S m Probability of occurrence of an anomaly.
X is a service Fault, C is a Component Fault, T is an index (Target), S is a state (Status), and Y is an operation and maintenance object. The number of the service faults, the component faults, the key operation indexes, the key operation states and the operation and dimension objects is m, N, K, L and r, i = 1\8230, m, N =1 \8230, N, K =1 \823030, K, L =1 \8230l, j =1 \8230andr
In the model, the incidence relation probability of each node is the accumulation of operation and maintenance data, and in order to improve the accuracy of the model, a Markov model is introduced to optimize the random state transformation probability. The principle of the markov model is schematically illustrated in fig. 3.
The optimization process of the power diagnosis model starts from two aspects, namely, the basic probability among the incidence relations in the original model is iterated according to model prediction and verification feedback; and secondly, carrying out verification and probability optimization on the diagnosis model based on the hidden Markov chain.
When the Markov model is used for predicting the faults of the power data middlebox, firstly, a fault path in a six-layer state evaluation system needs to be selected, and a fifth operation and maintenance object in the path, namely a real cause subject of the fault, can be defined as a fault i. And then, selecting any path n as a research object, and calculating the probability P of the initial fault of the line n. After a series of conditional decisions, the integrated state transition probability for line n is calculated. Then, the calculation result is substituted into a prediction model of a markov chain, and a line fault probability is calculated. And calculating and comparing, and selecting the path with the maximum fault probability value as the next-level fault path of the current path. And continuously repeating the prediction process to select the path corresponding to the maximum probability value, and comprehensively analyzing and evaluating the probability of each path having faults by using the mode.
Step C: early warning mode optimization
Based on a data center evaluation system and a fault diagnosis model, a single threshold early warning operation and maintenance mode is abandoned, the abnormal state of the system is dynamically judged through an algorithm, and an alarm threshold is set by adopting an associated dynamic threshold instead of a simple threshold, so that the running risk of the data center is reduced.
A hidden Markov-based data center fault diagnosis model provides a data center active operation and maintenance mode which is respectively constructed from three aspects of active early warning, active learning and active feedback.
In the aspect of active early warning, because the fault diagnosis model is passively positioned from top to bottom based on a six-layer state evaluation system, in the active operation and maintenance mode, the data center is patrolled from bottom to top based on the six-layer state evaluation system, and after the abnormity of the 'key operation state' related to the operation and maintenance object and the 'key operation index' is found, early warning intervention is performed in advance, so that the faults of components of the data center are avoided, further the service faults are caused, and the service quality of the data center is ensured.
In the aspect of active learning, based on fault model positioning, after fault troubleshooting is completed, data samples, influence ranges, processing methods and the like related to faults are stored in a fault knowledge base, and then labeling is carried out. When the operation index of the data is similar to the marked historical index, the system can directly match the historical fault for warning, and match a corresponding processing method to guide the solution and recovery of the fault. By the active learning mode, the operation and maintenance experience can be transmitted among the whole operation and maintenance team, so that the unification of the operation and maintenance standards is ensured.
In the aspect of active feedback, an 'operation and maintenance map in power data' (an online fault diagnosis tool) is designed and realized, manual fault troubleshooting is assisted, a fault data characteristic knowledge base is established, operation and maintenance efficiency in data is optimized, and operation and maintenance quality of operation and maintenance personnel is improved.
The invention relates to a method for predicting a platform operation fault of a data middle platform based on a Markov model, which comprises the steps of firstly providing a six-layer system operation state quantitative evaluation system, and constructing the six-layer data middle platform operation state quantitative evaluation system of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range'; then, a fault diagnosis model is established based on the state indexes, model optimization is carried out by means of hidden Markov, the fault is rapidly positioned by the operation and maintenance personnel in the auxiliary data, and the operation and maintenance efficiency is improved; and finally, designing and realizing an online fault diagnosis positioning system.
The data middle station fault diagnosis system based on the Markov model takes a quantitative evaluation system of six layers of operation states of the data middle station in Tianjin electric power data and an active operation and maintenance mode as the basis, and plays a good effect in the operation and maintenance process of the data middle station in Tianjin electric power public data. On one hand, a troubleshooting main body in Tianjin power data can be quickly positioned; on the other hand, based on the fault diagnosis model and the active operation and maintenance model, active early warning of the operation risk of the platform in the Tianjin power data and rapid fault processing can be achieved.
Although the embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, alterations and modifications are possible without departing from the spirit and scope of this disclosure and appended claims, and accordingly, the scope of this disclosure is not limited to the embodiments disclosed.
Claims (6)
1. A Markov model-based data middlebox operation fault prediction method is characterized by comprising the following steps: the method comprises the following steps:
step A: quantifying operating conditions
Combining two aspects of power grid marketing, equipment, human resources, operation and inspection related service requirements and a data center application component architecture system, finishing the combing and basic data acquisition of various faults, component states and fault influence ranges of a data center, finishing data layering, classification and relation combing by using a mathematical model, constructing a quantitative evaluation system for data center operation, objectively prompting potential risks of the system by using quantitative data, and measuring the health state of the data center, wherein the quantitative evaluation system for data center operation comprises six data center operation state quantitative evaluation systems of 'service fault phenomenon', 'component fault phenomenon', 'key operation index', 'key operation state', 'operation and maintenance object' and 'influence range';
and B: fault diagnosis model formulation
For rapidly positioning various faults of a station in data, a fault diagnosis probability mathematical model is constructed by utilizing the incidence relation between evaluation system data and components, probability verification and optimization are carried out based on a Markov chain, the fault diagnosis accuracy is improved, after the station in the data breaks down, fault data is input, the sequence of troubleshooting components can be rapidly screened, the efficiency is improved for recovering data service, the incidence relation in the system model is evaluated according to six layers of states, and station operation and maintenance accumulated data in power data, the probability among all layers of incidence is defined, therefore, after a first layer of service fault happens, the troubleshooting sequence of faults caused by a fifth layer of operation and maintenance objects can be rapidly screened according to the incidence direction and probability distribution of all layers of incidence, meanwhile, the corresponding influence range of the sixth layer can be synchronously early warned, and a model algorithm is as follows:
in the formula (I), the compound is shown in the specification,
representing operation and maintenance object y j Service failure x caused by operation problem i The probability of occurrence;
representing operation and maintenance object y j Occurrence of operational problem resulting in state S m The probability of an abnormality occurring is determined,
x is a service Fault, C is a Component Fault, T is an index Target, S is a state Status, Y is an operation object, the numbers of the service Fault, the Component Fault, a key operation index, a key operation state and the operation object are m, N, K, L and r respectively, i =1 823030, m, N =1 8230, N, K =1 8230k, L =1 8230l, j =1 8230r, and in the model, the incidence relation probability of each node is operation data accumulation, and a Markov model is introduced to optimize the random state conversion probability in order to improve the accuracy of the model;
step C: early warning mode optimization
Based on a data center station evaluation system and a fault diagnosis model, a single threshold early warning operation and maintenance mode is abandoned, the abnormal state of the system is dynamically judged through an algorithm, a relevant dynamic threshold is adopted to replace a simple threshold to set an alarm threshold, the operation risk of the data center station is reduced, based on the hidden Markov data center station fault diagnosis model, the active operation and maintenance mode of the data center station is provided, and the active operation and maintenance mode is respectively constructed from three aspects of active early warning, active learning and active feedback.
2. The Markov model-based data center operation failure prediction method of claim 1, wherein: the first-layer service fault phenomenon defines various service requirements and various service accidents of the power grid, and the state of the first-layer service fault phenomenon is the most intuitive experience state of a user of the data center station and is the starting point of operation, operation and maintenance of the data center station; the second layer of component failure phenomenon defines common failures of each component of the station in the electric power data, the synchronization of various failures of the layer can cause service failures of the first layer, and the state layer is the starting point of platform operation and maintenance of station operation and maintenance personnel in the data; the third layer of key component indexes defines key operation index parameters of each component in the power data, the abnormal indexes of the third layer can cause the fault of the second layer of components, and the state layer is the key layer for daily routing inspection and fault diagnosis; the state value of the key component of the fourth layer defines the operation parameters of each bottom component of the electric power data, the state information of the layer influences the operation condition of the index of the component of the third layer, and the state abnormity of the layer is the root cause of various operation faults of the upper layer; a fifth layer operation and maintenance object, which defines a bottom layer operation and maintenance main body of each component of the electric power data console, wherein the bottom layer is the basis of the data console base; the sixth layer of influence range has the same meaning as the first layer of service, but the service fault which does not occur yet is a service fault which may be caused after the fifth layer of operation and maintenance object is abnormal, and the sixth layer is a service range in which the active operation and maintenance is intervened in advance.
3. The Markov model-based data center operation failure prediction method of claim 1, wherein: the optimization process of the power diagnosis model starts from two aspects, namely, the basic probability among the incidence relations in the original model is iterated according to model prediction, verification and feedback; secondly, the diagnosis model is verified and probability optimized based on a hidden Markov chain, when the Markov chain is used for predicting faults of a power data relay station, a fault path in a six-layer state evaluation system needs to be selected, a fifth operation and maintenance object in the path, namely a real cause subject of the fault, can be defined as a fault i, then any path n is selected as a research object, the probability P of the line n with the initial fault is calculated, after a series of condition judgment, the comprehensive state transition probability of the line n is calculated, then the calculation result is substituted into the prediction model of the Markov chain, the fault probability of the line is calculated, the path with the maximum fault probability value is selected as the next-stage fault path of the current path through calculation and comparison, the path corresponding to the maximum probability value is selected through continuously repeating the prediction process, and the probability of the fault of each path is comprehensively analyzed and evaluated by using the mode.
4. The Markov model-based data center operation failure prediction method of claim 1, wherein: in the aspect of active early warning, because the fault diagnosis model is passively positioned from top to bottom based on a six-layer state evaluation system, in the active operation and maintenance mode, the data center is patrolled from bottom to top based on the six-layer state evaluation system, and after the abnormity of the 'key operation state' related to the operation and maintenance object and the 'key operation index' is found, early warning intervention is performed in advance, so that the faults of components of the data center are avoided, further the service faults are caused, and the service quality of the data center is ensured.
5. The markov model-based data center station operational failure prediction method of claim 1, wherein: in the aspect of active learning, based on fault model positioning, after fault troubleshooting is completed, data samples, influence ranges, processing methods and the like related to faults are stored in a fault knowledge base and then labeled, when the operation indexes of the data are similar to the labeled historical index characteristics, the system can be directly matched with historical faults to give an alarm, and matched with corresponding processing methods to guide the solving and the recovery of the faults. By the active learning mode, the operation and maintenance experience can be transferred among the whole operation and maintenance teams, so that the unification of the operation and maintenance standards is ensured.
6. The Markov model-based data center operation failure prediction method of claim 1, wherein: in the aspect of active feedback, an online fault diagnosis tool of a power data middle station operation and maintenance map is designed and realized, manual fault troubleshooting is assisted, a fault data characteristic knowledge base is established, and the operation and maintenance quality of operation and maintenance personnel is improved while the operation and maintenance efficiency of the power data middle station is optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210882372.6A CN115496233A (en) | 2022-07-26 | 2022-07-26 | Markov model-based data middlebox operation fault prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210882372.6A CN115496233A (en) | 2022-07-26 | 2022-07-26 | Markov model-based data middlebox operation fault prediction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115496233A true CN115496233A (en) | 2022-12-20 |
Family
ID=84467193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210882372.6A Pending CN115496233A (en) | 2022-07-26 | 2022-07-26 | Markov model-based data middlebox operation fault prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115496233A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116245357A (en) * | 2023-01-31 | 2023-06-09 | 南京工大金泓能源科技有限公司 | Fault diagnosis method and system for intelligent energy-saving cabinet |
CN118503831A (en) * | 2024-07-16 | 2024-08-16 | 南通职业大学 | Building information model management system |
-
2022
- 2022-07-26 CN CN202210882372.6A patent/CN115496233A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116245357A (en) * | 2023-01-31 | 2023-06-09 | 南京工大金泓能源科技有限公司 | Fault diagnosis method and system for intelligent energy-saving cabinet |
CN116245357B (en) * | 2023-01-31 | 2023-09-22 | 南京工大金泓能源科技有限公司 | Fault diagnosis method and system for intelligent energy-saving cabinet |
CN118503831A (en) * | 2024-07-16 | 2024-08-16 | 南通职业大学 | Building information model management system |
CN118503831B (en) * | 2024-07-16 | 2024-09-20 | 南通职业大学 | Building information model management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lv et al. | Safety poka yoke in zero-defect manufacturing based on digital twins | |
CN115496233A (en) | Markov model-based data middlebox operation fault prediction method | |
CN108873859B (en) | Bridge type grab ship unloader fault prediction model method based on improved association rule | |
CN110705710A (en) | Knowledge graph-based industrial fault analysis expert system | |
CN110703057A (en) | Power equipment partial discharge diagnosis method based on data enhancement and neural network | |
CN111124852A (en) | Fault prediction method and system based on BMC health management module | |
CN108920609A (en) | Electric power experiment data mining method based on multi-dimensional analysis | |
CN108398934A (en) | The system that a kind of equipment fault for rail traffic monitors | |
Wenner et al. | The concept of digital twin to revolutionise infrastructure maintenance: The pilot project smartBRIDGE Hamburg | |
CN111915026A (en) | Fault processing method and device, electronic equipment and storage medium | |
JP2024073353A (en) | Comprehensive fault diagnosing method for hydroelectric power generation unit | |
CN111598467A (en) | Reliability evaluation method and system for gathering and transportation combined station and key equipment | |
CN114004262A (en) | Gearbox bearing fault detection method and system | |
CN117557127A (en) | Power grid dispatching system supporting platform reliability assessment method, system and storage medium | |
CN114996110B (en) | Deep inspection optimization method and system based on micro-service architecture | |
Thalmann et al. | Cognitive decision support for industrial product life cycles: A position paper | |
CN118278914A (en) | Method for realizing equipment fault rush repair based on GIS (geographic information System) | |
CN118192908A (en) | Order data processing method and device of cloud printer | |
CN113094826A (en) | Task reliability-based remaining life prediction method for multi-state manufacturing system | |
CN117333038A (en) | Economic trend analysis system based on big data | |
Wenner et al. | smartBRIDGE Hamburg: A digital twin to optimise infrastructure maintenance | |
CN114897262A (en) | Rail transit equipment fault prediction method based on deep learning | |
CN115600695A (en) | Fault diagnosis method of metering equipment | |
CN114779739A (en) | Fault monitoring method for industrial process under cloud edge end cooperation based on probability map model | |
CN115297016A (en) | Deep learning-based power network activity evaluation and prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |