Power communication equipment state prediction method based on improved decision tree
Technical Field
The invention relates to the technical field of state analysis of power equipment, in particular to a prediction method of a power communication equipment state based on an improved decision tree.
background
With the continuous expansion of the scale of the smart power grid in China, the number of communication equipment and the network coverage range of the power communication network used as the power communication network for bearing the operation management and production scheduling services are rapidly increased, and the function of the power communication as the basic support of the comprehensive service of the power grid is more and more prominent. Therefore, the operation and maintenance level and the quality guarantee of the power communication network are further improved, and the method is very important for safe operation of the power communication network and the smart grid; at present, the operation and maintenance of the power communication network mainly carries out post-incident treatment of faults aiming at real-time alarm information of communication equipment, and the passive response type operation and maintenance mode can not meet the requirements of on-line and intelligent innovation development of the communication network far away and is difficult to effectively support and promote the vigorous development of the smart power grid. In order to improve the operation and maintenance efficiency of communication production and realize lean management of a power communication network, massive historical and real-time data such as historical defects and overhaul, current performance values and state values and the like are needed to be integrated by means of informatization, the service life prediction analysis of communication equipment based on the operation state is realized by utilizing a data mining technology, an active maintenance technical means is provided for the communication network, and the problems of shortage of operation and maintenance personnel, continuous expansion of network coverage and continuous expansion of the number of equipment are solved.
The predictive analysis of the equipment operation state through various data mining technologies has become a development trend of network operation and maintenance research. The decision tree learning algorithm has the advantages of high classification speed, simple algorithm implementation and the like, and becomes one of the most extensive state prediction algorithms. In practical application scenarios, however, the classical decision tree learning algorithm has disadvantages such as inherent multivalue bias and low computational efficiency, and the decision tree learning algorithm needs to be further improved to be more suitable for practical application requirements of the power communication network.
The classical decision tree learning algorithm, which is the most commonly used classical algorithm in data mining branches, is generally used for classifying and predicting unknown data. Since the 60 s of the 20 th century, decision tree learning has been widely applied in the fields of rule extraction, data classification, predictive analysis and the like, and particularly, after the concept of entropy in shannon-based information theory is introduced in j.r. quinlan, the proposed ID3(Iterative Dichotomiser 3) algorithm has enabled the decision tree learning algorithm to be continuously applied and to be greatly developed in different emerging application fields due to the concise and efficient decision selection process.
Instead of repeatedly traversing the selected test attributes in the ID3 decision tree algorithm, a top-down search using a greedy algorithm and a depth-first strategy is used to traverse all of the test attributes to construct the entire decision tree. The core idea is that in the selection of each level node of a decision tree, the maximum information entropy drop is used as the division standard of the test attribute of the current node, namely, if the node has the test attribute which is not divided yet and has the highest information gain, the node is used as the division standard. And continuously searching and traversing until a decision tree capable of perfectly classifying the training samples is obtained. The main algorithm is as follows:
let a sample data set S, which can be divided into different classes Ci(i ═ 1, 2,. n), where s isiIs of class Cithe entropy of the information corresponding to the n categories into which the set S is divided is:
in the formula (1), piIn the representation set S belonging to the ith class Ciis a probability of
Assume that the set of all mutexes in the test attribute A is XA,SvIs a sample subset S of the sample data set S with v as the test attribute A, namely Sv={s∈S|Asv, selecting a test attribute A, and then selecting a sample set S of the node on each branch nodevEntropy of classification is H (S)v). The entropy of the information resulting from the selection of the test attribute A is defined for each subset SvWeighted average of entropy with weight of Svis proportional to the original sample SThe entropy of the information obtained from knowing the test property a is:
in the formula (2), H (S)v) As a subset S of samplesvthe entropy of information of (1).
The information Gain (S, V) of test attribute a for data set S is:
Gain(S,V)=H(S)-H(S,A) (3)
gain (S, V) refers to the reduction of the expected value of the information entropy caused by the known value of the test attribute A. If the Gain (S, V) value is larger, it means that the selection of the test attribute a provides a larger amount of information for classifying the sample data set, and the classification effect is better.
compared with other classification algorithms such as statistical models, neural networks, genetic algorithms and the like, the ID3 decision tree learning algorithm performs inductive learning on the basis of examples, and has the characteristics of simplicity and intuition in implementation, high classification speed, minimum average depth and the like. But also has the defects of low spanning tree efficiency, inherent multi-value bias, only single attribute detection and the like. In the power communication network, the operation state values between different communication devices may have strong correlation or weak correlation, and the network topology is complex.
therefore, in order to meet the actual operation and maintenance management needs of the power communication network, it is necessary to develop a prediction method for the state of the power communication equipment based on an improved decision tree; the traditional algorithm is improved and applied to the state prediction analysis of the power communication equipment, so that a prediction method of the prior state is provided for the operation and maintenance of the power communication network.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for predicting the state of the power communication equipment based on an improved decision tree, so that the potential fault hazard which may exist is predicted and judged, and finally, the method is provided for operation and maintenance personnel to overhaul in advance.
In order to solve the technical problems, the invention adopts the technical scheme that: the method for predicting the state of the power communication equipment based on the improved decision tree specifically comprises the following steps:
(1) collecting inherent data of the power communication equipment and storing the collected inherent data into a database; simultaneously collecting real-time monitoring information of the power communication equipment;
(2) Performing data mining and analysis on the inherent data collected in the step (1), constructing a predictive analysis decision tree with multiple variable quantities, and obtaining the associated parameters and the related values of the power communication equipment and the monitoring points and the abnormal characteristic values of the power communication equipment;
(3) and analyzing the running state of the electric power communication equipment by combining the associated parameters and the related values of the electric power communication equipment and the monitoring points and the abnormal characteristic values of the electric power communication equipment with the real-time monitoring information of the electric power communication equipment, thereby obtaining the fault prediction and maintenance guidance of the electric power communication equipment.
by adopting the technical scheme, historical operation and maintenance information of the power communication equipment is extracted, the improved multi-variable prediction analysis decision tree is adopted to mine the characteristic value under the abnormal condition of the equipment, the associated parameter value of the equipment and the correlation between the associated parameter value and the associated parameter value are analyzed, the possible fault hidden danger is predicted and judged by combining with the current real-time monitoring and collected equipment operation data, and finally the fault hidden danger is provided for operation and maintenance personnel to overhaul in advance.
As a preferred embodiment of the present invention, the data specific to the power communication equipment collected in the step (1) includes history information of the equipment, equipment inspection information, and equipment defect information.
as a preferred technical solution of the present invention, the specific method for constructing the predictive analysis decision tree with multiple variable numbers by performing data mining and analysis on the intrinsic data collected in the step (1) in the step (2) is as follows:
s21 defines a decision table information system S ═ U, R, V, F, where domain U is a collection of non-empty finite objects and R is a collection of all attributes; the R is divided into a test attribute set A and a decision attribute set D;
s22, a complete multi-variable predictive analysis decision tree is gradually constructed from the core of the test attribute set A relative to the decision attribute set D.
As a preferred embodiment of the present invention, the above-mentionedthe relationship between R and the test attribute set a and the decision attribute set D in step S21 is:V=∪r∈R(Vr),VrIs the value range of the attribute R, the information function F: u × R → V. On the basis of in-depth study on the ID3 decision tree algorithm, the decision table attributes are subjected to reduction, kernel solving, generalization and the like by means of a rough set theory, and a simple and efficient multivariable decision tree is constructed, so that the inherent defect of the ID3 decision tree can be effectively avoided, the computational complexity is effectively reduced, the prediction analysis efficiency is improved, and the method has great practical value and application prospect. As one of the most common classical algorithms in data mining branches, decision tree learning algorithms are commonly used to classify and predict unknown data. Since the 60 s of the 20 th century, decision tree learning has been widely applied in the fields of rule extraction, data classification, predictive analysis and the like, and particularly, after the concept of entropy in shannon-based information theory is introduced in j.r. quinlan, the proposed ID3(Iterative Dichotomiser 3) algorithm has enabled the decision tree learning algorithm to be continuously applied and to be greatly developed in different emerging application fields due to the concise and efficient decision selection process.
As a preferred embodiment of the present invention, the specific method of step S22 is:
S221, constructing a decision table according to the sample data set;
S222, calculating cores of the test attribute set A and the relative decision attribute set D, and marking as careD={a1,a2,…,akget it out ifGo to step S223, otherwise go to step S224;
s223, an ID3 decision tree algorithm is adopted to select an optimal attribute which is used as a check attribute of the node;
s224 shows a conjunctive normal form P ═ a1∧a2∧…∧akcalculating the generalized GEN of P relative decision attribute DD(P) and (D) addingit is used as the test attribute of the root node of the decision tree;
S225 calculates the remaining conditional attribute set A/care in the current sample data setD(A) Selecting the attribute with the minimum roughness from the roughness of each attribute in the decision attribute set D as the optimal solution of the node inspection attribute; the roughness calculation formula is as follows:
whereinIs roughness;
αCi(X) is approximate precision;
a lower approximation set of X;
Upper approximate set of X:
bnCiX is a boundary region, which represents the difference between the upper approximate set and the lower approximate set of the set X, and it is impossible to determine whether the set X belongs to a set formed by objects of X, if R is known.
By adopting the technical scheme, the traditional ID3 decision tree learning algorithm is improved by utilizing a roughness calculation method, the original information entropy reduction is replaced by the minimum roughness to determine the classified check attribute, the structural association among different attributes is effectively enhanced, and the generated decision tree structure is improved. Therefore, when the correlation degree between the running state attribute values of the power communication equipment is strong and the state prediction analysis of conflict-free data is carried out, a more optimized solution can be obtained by applying the improved decision tree algorithm, and the calculation workload is relatively small.
As a preferred technical scheme of the invention, the test attribute set in the predictive analysis decision tree with multiple variable numbers comprises error code abnormality, optical power abnormality, drift abnormality, optical fiber abnormality, bias current abnormality, jitter current abnormality, machine room temperature abnormality and machine room power supply abnormality.
Drawings
the following further detailed description of embodiments of the invention is made with reference to the accompanying drawings:
Fig. 1 is a flow diagram of a method for predicting the state of a power communication device based on an improved decision tree;
Fig. 2 is a structural diagram of a decision tree based improvement of a prediction method of a power communication device state based on the decision tree.
Detailed Description
As shown in fig. 1, the method for predicting the state of the power communication device based on the improved decision tree specifically includes the following steps: the method specifically comprises the following steps:
(1) Collecting inherent data of the power communication equipment and storing the collected inherent data into a database; simultaneously collecting real-time monitoring information of the power communication equipment; the inherent data of the power communication equipment collected in the step (1) comprises history information, equipment maintenance information and equipment defect information of the equipment;
(2) Performing data mining and analysis on the inherent data collected in the step (1), constructing a predictive analysis decision tree with multiple variable quantities, and obtaining the associated parameters and the related values of the power communication equipment and the monitoring points and the abnormal characteristic values of the power communication equipment;
The specific method for constructing the predictive analysis decision tree with multiple variable numbers by performing data mining and analysis on the intrinsic data collected in the step (1) in the step (2) is as follows:
S21 defines a decision table information system S ═ U, R, V, F, where domain U is a collection of non-empty finite objects and R is a collection of all attributes; the R is divided into a test attribute set A and a decision attribute set D;
S22, starting from the core of the test attribute set A relative to the decision attribute set D, gradually constructing a complete multi-variable prediction analysis decision treeIn step S21, the relationship between R and the test attribute set a and the decision attribute set D is:V=Ur∈R(Vr),VrIs the value range of the attribute R, the information function F: u × R → V;
The specific method of step S22 is as follows:
S221, constructing a decision table according to the sample data set;
S222, calculating cores of the test attribute set A and the relative decision attribute set D, and marking as careD={a1,a2,…,akGet it out ifGo to step S223, otherwise go to step S224;
s223, an ID3 decision tree algorithm is adopted to select an optimal attribute which is used as a check attribute of the node;
S224 shows a conjunctive normal form P ═ a1∧a2∧…∧akCalculating the generalized GEN of P relative decision attribute DD(P) and using it as a check attribute of the root node of the decision tree;
S225 calculates the remaining conditional attribute set A/care in the current sample data setD(A) Selecting the attribute with the minimum roughness from the roughness of each attribute in the decision attribute set D as the optimal solution of the node inspection attribute; the roughness calculation formula is as follows:
whereinis roughness;
aCi(X) is approximate precision;
A lower approximation set of X;
upper approximate set of X:
bnCiX is a boundary region and represents the difference between the upper approximate set and the lower approximate set of the set X, and whether the set X belongs to a set formed by the objects of X cannot be determined under the judgment of knowing R;
(3) Analyzing the running state of the electric power communication equipment by combining the relevant parameters and relevant values of the electric power communication equipment and monitoring points and the abnormal characteristic value of the electric power communication equipment with the real-time monitoring information of the electric power communication equipment, thereby obtaining fault prediction and maintenance guidance of the electric power communication equipment;
The test attribute set in the multi-variable prediction analysis decision tree comprises error code abnormality, optical power abnormality, drift abnormality, optical fiber abnormality, bias current abnormality, jitter current abnormality, machine room temperature abnormality and machine room power supply abnormality.
the specific algorithm application of the predictive analysis decision tree with multiple variable numbers is constructed in the method for predicting the state of the power communication equipment based on the improved decision tree:
And (3) constructing a decision table according to the collected test data samples of the related operating states of the power communication equipment, as shown in table 1.
Wherein, the universe U corresponds to a set of {1, 2.... 8} of collected test data samples; the test attribute A corresponds to a set of 7 classes of test features in the test data sample as { A }1,A2,......,A7}; the abnormal type set of the test data sample corresponding to the decision attribute D is { I, II,.
TABLE 1 decision table corresponding to communication device states
wherein, the universe U corresponds to a set of {1, 2.... 8} of collected test data samples; the test attribute A corresponds to a set of 7 classes of test features in the test data sample as { A }1,A2,......,A7}; the abnormal type set of the test data sample corresponding to the decision attribute D is { I, II,.
Secondly, calculating a core of a decision attribute D corresponding to the test attribute A; definition of posIND(A)(D)={1,2,3,4,5,6,7,8}=U;
Judging test attribute Ai(i ═ 1, 2, …, 7) importance in testing attribute a for decision attribute D: if it isThen it indicates the AiIs not essential, otherwise it means the AiIs necessary;
According to the calculation of Table 1, A2,A3,A4,A5is not necessary in the test attribute A for the decision attribute D, whereas A1,A6,A7is necessary in test attribute a for decision attribute D; i.e. coreD(A)={A1,A6,A7}。
Then, a conjunctive normal form P ═ a is set1∧A6∧A7Calculating the equivalent class division of the generalization of the P pair decision attribute D on the domain U, and obtaining U/ind (P) { {1}, {3}, {4}, {5}, {7}, {8}, and {2, 6} }; generalized GEN due to compositionD(P) test attributes A and decision attributes D can be partitioned into unique equivalence mappings, thus GEN can be dividedD(P) as a root node of the decision tree;
finally, calculating the check attribute of the next-level decision tree node in sequence until the test attribute set is empty; for decision attribute D' ═ {2, 6}, a is calculated according to equation (4) respectively2,A3,A4,A5roughness for D'; the roughness is shown in Table 2.
TABLE 2 coarse Table of decision attributes D
as can be seen from Table 2, the minimum roughness isThus testing attribute A3And A5Can be used as the check attribute of D'.
the attribute with the minimum roughness is used as a judgment basis, the check attributes of the nodes of each level are continuously screened out from the rest test attribute sets, and finally, an obtained improved decision tree structure is shown in fig. 2. The method for determining the classification checking attribute by calculating the roughness effectively overcomes the defects of the traditional decision tree learning algorithm, can properly deal with classification problems such as data uncertainty, multivariable data and data incompleteness, and optimizes and simplifies the decision tree structure. The abnormal operation states of the equipment in the power communication network are expressed in various ways, the occurrence mechanism of the abnormal operation states is complex and changeable, the improved decision tree provided by the invention can learn rules from sample data, and the improved decision tree has self-organization and self-adaptability. With the continuous collection of running state information in an actual environment, the number of available sample data is increased continuously, and error samples are gradually submerged in massive correct samples, so that the construction of a decision tree is more and more accurate. Meanwhile, the introduction of the rough set theory can better process sample data of different characteristic values such as continuous quantity, numerical quantity and the like acquired in an actual production environment, and a simple and quick prediction analysis classification method is realized.
While the embodiments of the present invention have been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.