CN111242204A

CN111242204A - Operation and maintenance management and control platform fault feature extraction method

Info

Publication number: CN111242204A
Application number: CN202010015277.7A
Authority: CN
Inventors: 姜涛; 曹杰; 王蕾; 薄小永; 曲朝阳; 薛凯; 于建友; 吕洪波; 胡可为; 徐鹏程; 于成立; 周玉光
Original assignee: Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co; Northeast Dianli University; State Grid Jilin Electric Power Corp; Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Current assignee: Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co; State Grid Jilin Electric Power Corp; Northeast Electric Power University; Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2020-06-05

Abstract

A fault feature extraction method for an operation and maintenance management and control platform is characterized by comprising the following steps: and performing principal component analysis feature extraction, secondary feature selection and the like. The high-dimensional space samples can be converted into low-dimensional space samples based on principal component analysis feature extraction, the redundancy of feature attributes is reduced while the feature dimensions are reduced, main classification information is reserved, the calculation complexity of a classifier is greatly reduced, and the training time is shortened; and because a secondary feature selection function is embedded in the feature extraction process, the evaluation results are sorted by combining a heuristic sequence backward search strategy based on association rule feature selection, and further the key features of the feature subset are determined, so that the feature attributes have maximum association-minimum redundancy, that is, the association degree of the attribute features and the class attributes can be improved to the maximum extent, the redundancy between the attributes is reduced, and the classification precision of the control faults is obviously improved. The method is scientific and reasonable, has strong applicability, and can be widely applied to various fault classification management and control platforms.

Description

Operation and maintenance management and control platform fault feature extraction method

Technical Field

The invention relates to the technical field of information system operation and maintenance management and control fault feature extraction, in particular to a fault feature extraction method for an operation and maintenance management and control platform.

Background

The information system management and control platform is used for remotely monitoring hardware equipment and software application in real time in order to acquire information such as system running conditions, running trends and the like. The control platform needs to monitor the device in a network environment, and in the network environment, data transmission usually brings corresponding features to the data stream, and the features are important bases for realizing data identification. When the management and control equipment is used for monitoring, a large amount of fault information can be collected, and the characteristic extraction and selection technology is the basis for carrying out classification and identification on the fault information. The feature extraction and selection technology can realize the selection of key monitoring features under the multi-attribute and high-redundancy information environment.

In the intelligent management and control platform of the information system, in order to strengthen the centralized management and unified monitoring of the system, the whole network monitoring of the network and the safety equipment is realized, accurate fault judgment and processing suggestions are provided, and the fault solving capability and efficiency of personnel are improved. To achieve this goal, feature extraction and selection techniques are used to determine key features for monitoring fault data, each fault type may contain a number of features, and the key features that are most representative of such fault type are selected. The feature extraction and selection technology has the advantages that in the process of identifying and classifying fault types, on the premise of reducing data redundancy, the accuracy of fault identification is greatly improved. Compared with other technologies, the key characteristics which can represent the fault types most accurately can be selected.

Through the feature extraction and selection technology, the fault types are effectively identified and classified, so that the faults are quickly and efficiently analyzed and processed, the management personnel are quickly alarmed in time, and 24-hour unmanned continuous monitoring is realized.

The operation and maintenance management and control platform fault data contains data with more characteristics, and the data is called high-dimensional data. The fault types are automatically classified based on partial features of high-dimensional data, but the features in some fault data do not contribute much to the classification result. In addition, because of the existence of a certain correlation and redundancy among the features, a large time and space overhead is generated in the classification process, and the fault classification effect is poor. The performance of the classifier is greatly influenced by the redundant features of the high-dimensional data, and particularly, a standard supervised learning classification algorithm which takes all data features as a decision function is adopted. Therefore, for the classifier based on supervised learning, the original data features of the classifier are extracted or selected before classification, so that the redundancy of data is reduced, and the generalization capability of the classifier can be effectively improved. At present, the fault statistical characteristics of the fault classification of the management and control platform can reach hundreds. In order to improve the efficiency and accuracy of the classification algorithm and effectively reduce the scale of the original data and the redundancy among the features, feature selection and extraction need to be performed on the features of the original high-dimensional data. The feature selection is to select an optimal feature subset from the features of the original data, wherein the feature subset can represent the distribution characteristics of the original data to the maximum extent; the feature extraction is to map high-dimensional data samples into low-dimensional samples through transformation by means of a mapping principle, and a new sample feature combination is formed after mapping, so that the combination not only reduces dimensionality, but also can fully represent original features due to mapping transformation.

Disclosure of Invention

The invention aims to solve the problem that redundant information is not sufficiently removed by simply adopting a feature selection method when the dependence of similarity among data is strong, and provides the fault feature extraction method for the operation and maintenance management and control platform, which is scientific, reasonable, high in applicability, capable of more effectively removing data redundancy and obtaining better classification accuracy under the condition of determining feature subsets.

The purpose of the invention is realized by the following technical scheme: a fault feature extraction method for an operation and maintenance management and control platform is characterized by comprising the following contents:

1) principal component analysis feature extraction

Principal Component Analysis (PCA) is to perform sample space transformation, determine the maximum variance of all original feature vectors by projection, perform feature extraction on the positioning discrimination vectors of the projection directions, change the original samples into low-dimensional samples dispersed as much as possible after projection transformation, and simultaneously keep the difference of original high-dimensional sample space before transformation, wherein N samples are set in the original high-dimensional space, and X belongs to RⁿEach sample is an X_i＝[x_i1,...,x_in]^T∈RⁿIf the mean vector is M, the corresponding feature vector is X_i＝[x_1i,...,x_ni]∈RⁿAnd the corresponding covariance matrix is formula (1),

the distribution variance of the sample on the feature vector, namely the feature value of the covariance matrix of formula (1), the orthogonal matrix obtained by diagonalizing the covariance matrix of formula (1) is formula (2),

denotes Q as

Where M is the dimension of the orthogonal matrix Q, and PCA derives the eigenvalues λ in the matrix based on Q₁≥λ₂≥…≥λ_nAnd calculating the orthonormal eigenvector v corresponding to the eigenvalue₁≥v₂≥…≥v_nObtaining the orthonormal eigenvector u of the covariance matrix S through the eigenvalue of the orthonormal matrix Q and the corresponding orthonormal eigenvector₁,u₂,…u_dAs in equation (3) where the orthonormal eigenvector u₁,u₂,…u_dCorresponding to the first d largest non-zero eigenvalues of S,

setting t to 95%, u_iT, the cumulative contribution of the principal components of the spatial samples on the first d axes is 95% of the original data, thus, for any sample x_iMapping the space to a reduced-dimension low-dimension sample space U-U₁,u₂,…u_dX of_iIs characterized by y ═ u (u)₁,u₂,…u_d)^Tx_iThen y is_iFor sample points in a low-dimensional space, through the spatial sample transformation of PCA, the transformed samples can represent 95% of the accumulated contribution rate of the principal components, and the original spatial dimension is reduced from n to d, wherein d is smaller than n, so that the spatial dimension is greatly reduced, and the function of feature extraction is played;

2) quadratic feature selection

After PCA (principal component analysis) feature extraction, embedding a quadratic feature Selection algorithm for further obtaining an optimal feature subset and key features of a PCA low-dimensional space, wherein the algorithm is based on Filter-based association rule feature Selection (CFS), adopts a heuristic sequence backward search strategy when carrying out Correlation evaluation on sample features, determines the optimal feature subset through the Correlation sequencing of the features,

CFS uses the correlation of the characteristics as an evaluation standard, is a Filter type characteristic selection algorithm, aims to reduce the redundancy between attributes and improve the correlation degree of the attribute characteristics and the class attributes under the corresponding search strategy, achieves the screening of the attributes with high redundancy and the attributes irrelevant to the classes, uses a formula (4) as the evaluation standard, and uses M for evaluating the k characteristics of the characteristic subset S_sRepresentation in which the mean of the correlation of the feature attributes with the classes is

The mean of the correlation between attributes is used

As shown in formula (4), the candidate feature subset determined by the association rule feature selection algorithm can make the feature attribute have the maximum association-minimum redundancy, that is, the association degree between the attribute feature and the class attribute can be improved to the maximum extent, and the redundancy between the attribute and the attribute can be reduced, that is, the evaluation value M in formula (4)_sThe higher the mean correlation of the feature attribute and class is

The larger the correlation mean between the attributes is

The smaller the size of the tube is,

the correlation between the attributes is evaluated by using an information gain algorithm in the association rule feature selection, and the information gain calculation method is a symmetry measurement method, so when two high-order associated features exist in the feature subset S, such as the feature W_i、W_jThe method of symmetry uncertainty of formula (5) can be used, the entropy of the features is H (W), the feature association is U, and thus formula (6) is an evaluation function of a subset of features based on the correlation between attributes, when evaluating the value H_sWhen raised, the features W in the feature subset S_jAnd W_iThe correlation decreases, and the correlation with the class attribute increases,

and embedding a secondary feature selection function in the PCA by adopting a CFS algorithm, then calculating an evaluation result of the CFS based on a heuristic sequence backward search strategy, and screening out an optimal feature subset after sorting.

The fault feature extraction method of the operation and maintenance management and control platform is a feature extraction method embedded with a secondary feature selection function, and high-dimensional space samples are converted into low-dimensional space samples based on PCA feature extraction, so that the redundancy of feature attributes is reduced while the feature dimensions are reduced, main classification information is reserved, the calculation complexity of a classifier is greatly reduced, and the training time is shortened; and because a secondary feature selection function is embedded in the feature extraction process, evaluation results are sequenced based on a CFS combined heuristic sequence backward search strategy, and further key features of the feature subset are determined, so that the feature attributes have maximum correlation-minimum redundancy, that is, the correlation degree of the attribute features and the class attributes can be improved to the maximum extent, the redundancy between the attributes is reduced, and the classification precision of the control faults is obviously improved. The method is scientific and reasonable, has strong applicability, and can be widely applied to various fault classification management and control platforms.

Drawings

Fig. 1 is a functional schematic diagram of a fault feature extraction method for an operation and maintenance management and control platform according to the present invention;

FIG. 2 is a flow diagram of a feature backward search strategy with embedded secondary feature selection functionality;

FIG. 3 is a comparison graph of the classification performance of faults before and after primary PCA feature extraction;

fig. 4 is a performance comparison diagram of the PCA feature extraction method with the embedded quadratic feature selection function and the conventional feature extraction method.

Detailed Description

The invention is further illustrated by the following figures and detailed description.

The invention relates to a fault feature extraction method for an operation and maintenance management and control platform, which comprises the following contents:

1) principal component analysis feature extraction

denotes Q as

2. quadratic feature selection

The mean of the correlation between attributes is used

The larger the correlation mean between the attributes is

The smaller the size of the tube is,

in the selection of the association rule characteristics, an information gain algorithm is adopted to evaluate the correlation among the attributes, and the information is increasedThe beneficial calculation method is a symmetry measurement method, so that when two higher-order associated features are present in the feature subset S, such as feature W_i、W_jThe method of symmetry uncertainty of equation (5) can be used, the correlation of features is U, the entropy of features is H (W), and thus equation (6) is an evaluation function of a subset of features based on the correlation between attributes, when evaluating the value H_sWhen raised, the features W in the feature subset S_jAnd W_iThe correlation decreases, and the correlation with the class attribute increases,

Referring to fig. 1, the functional framework of the operation and maintenance management and control platform fault feature extraction method of the invention

And data redundancy is more effectively removed after sample space transformation is carried out based on PCA feature extraction. The feature extraction process: 1) PCA-based on preprocessed data set S₀And (5) feature extraction. Obtaining a covariance matrix S of a high-dimensional sample space X according to a PCA principle; orthogonal matrix Q for deriving S and its eigenvalue lambda₁≥λ₂≥…≥λ_n(ii) a Extracting actual requirements according to the control fault characteristics and setting the threshold value of the accumulated contribution rate t, thereby obtaining the orthonormal vector u of the threshold value_iAnd the low-dimensional sample space after feature extraction is U ═ U₁,u₂,…u_dGet the original sample x_iThe principal component feature after spatial transformation is (u)₁,u₂,…u_d)^Tx_iForming a new candidate feature subset F ₁① feature subset F is locked as required after feature extraction of fault PCA₁The key feature of (1) is to go into quadraticAnd a feature selection function module. The secondary feature selection adopts an association rule feature selection CFS algorithm to calculate the feature correlation of the extracted feature set, so that the feature attributes have the maximum association-minimum redundancy, namely, the association degree of the attribute features and the class attributes can be improved to the maximum degree, the redundancy between the attributes can be reduced, and meanwhile, the key feature subset F after PCA feature extraction can be locked₂② when only fault classification is needed and key features are not needed to be analyzed, the function module can be skipped over to quickly classify the pipe control fault, 3) on the basis of the feature extraction of the embedded self-adaptive secondary feature selection function, the formed optimal feature data set of the pipe control fault is trained, and the fault classification result of the pipe control platform is obtained on the test set.

2. The invention relates to an algorithm framework of an operation and maintenance management and control platform fault feature extraction method

The algorithm carries out feature extraction on the original data set based on principal component analysis to form a feature set F₁And measure F₁Middle feature W_jAssociation with class Attribute S U (W)_jS), arranging the U in descending order, and calculating the characteristic entropy evaluation value H of the CFS_s1. The search strategy adopted in the calculation is heuristic sequence backward search, the backward search strategy flow is shown in fig. 2, each time the feature with the small correlation evaluation value with the class attribute is deleted, and the feature entropy evaluation value H after the feature deletion is calculated again_s2. Cycle evaluation H_sWhen it is not less than the threshold, if H_s2≥H_s1Feature subset F₁Will be updated if H_s2＜H_s1Feature subset F₁Not updated when H_sJump out of loop output optimal feature subset F when less than threshold₂. The secondary feature selection functional module can further lock the key features of the optimal feature subset through association rule feature selection on the basis of PCA feature extraction. The pseudo code of the quadratic feature selection algorithm is as follows:

inputting: feature set F after PCA feature extraction₁And outputting: optimal feature set F₂，

1. Selecting all the features after PCA feature extraction to form a feature subset F₁，

2. Calculating F₁In each characteristic attribute W_jAssociation with class Attribute S U (W)_j,S)，

3. Calculating a characteristic entropy evaluation value H_s，

4. For each feature and class attribute association U (W)_jS) values are arranged in descending order, H_s1←H_s，

5.For H_s1≥δdo，

6. Delete F₁Of the feature, forming a new subset of features F₂Calculating a characteristic entropy evaluation value H_s2，

7.If H_s2≥H_s1，then F₁＝F₂，

8.else，F₁The temperature of the molten steel is not changed,

9.End if，

10.H_s1＝H_s2，

11.End For。

the inventor adopts the fault feature extraction method of the operation and maintenance control platform to compare and analyze the fault recognition performance of the control platform after feature extraction. First, feature extraction is performed by PCA, and it is determined that the principal component cumulative contribution rate is 94%, because when the threshold t (threshold ═ 94%), the feature dimension is reduced to 18 dimensions, and the average fault identification accuracy rate reaches 98% or more, as shown in fig. 3. It should be noted that the threshold t determines the cumulative contribution rate of the PCA principal component, and although the cumulative contribution rate is the largest when the threshold is 100%, and has a high recognition accuracy, the feature dimension also increases sharply at the same time. Thus, the higher the threshold t, the better the performance of the classifier can only be optimized if the balance between dimensionality and classification accuracy is reached. After PCA feature extraction, secondary selection of 18-dimensional features is performed again, and the result shows that the redundancy among the features of the 1 st, 2 nd, 5 th, 6 th, 7 th and 12 th dimensions is the minimum and the relevance with the class attribute is the strongest. After screening, they are the key feature subset after feature extraction. Table 1 shows the cross-validation result after the secondary feature selection, and the comparison effect of the 6-dimensional key feature subset based on the secondary feature selection is shown in fig. 4, where the average secondary classification accuracy is 96.9%, and is less than 1.1% different from the classification accuracy obtained by only extracting the PCA features. Because the characteristic dimension is reduced to 6 dimensions, the characteristic dimension is reduced by 65 percent compared with the characteristic dimension obtained by only carrying out PCA dimension reduction; the classifier model performed an average reduction in time of 31.3%. In the fault classification process of the control platform, self-adaptive feature extraction and selection can be performed according to specific requirements. When only fault classification is needed, the requirement on classification precision is high, and key features do not need to be analyzed, the secondary feature selection module can be skipped, and the tube control fault can be classified. When the key features need to be locked and the requirement on feature dimensions is high, the secondary feature selection module can be adaptively entered, the key features are further locked, and meanwhile, the fault classification result of the management and control platform is obtained on the test set. The feasibility and the effectiveness of the operation and maintenance control platform fault feature extraction method provided by the invention are proved.

TABLE 1 PCA-based Secondary Fault feature selection (Ten-fold Cross validation)

Feature dimensionality after PCA extraction	Cross validation (%)	Feature dimensionality after PCA extraction	Cross validation (%)
				1	9(90％)	11	1(10％)
2	10(100％)	12	10(100％)
				3	5(50％)	13	0(0％)
4	4(40％)	14	0(0％)
				5	10(100％)	15	0(0％)
6	10(100％)	16	0(0％)
				7	9(90％)	17	0(0％)
8	7(70％)	18	0(0％)
				9	1(10％)
10	0(0％)

In conclusion, the method for extracting the fault features of the operation and maintenance management and control platform reduces the feature dimensions of each fault sample space, shortens the training time and improves the classification precision of the learning classifier. Because the PCA feature extraction is carried out firstly, the feature dimension of the management and control fault classification is greatly reduced, and the calculation complexity is reduced. Meanwhile, the method carries out self-adaptive secondary feature selection after feature extraction, overcomes the problem that a single feature extraction method cannot lock key features, reduces redundancy among the features, enhances the relevance between the features and class attributes, and greatly improves the precision of fault classification.

The software routines of the present invention are programmed according to automated and computer processing techniques, which are well known to those skilled in the art.

The embodiments of the present invention are not exhaustive, and those skilled in the art will still fall within the scope of the present invention as claimed without simple duplication and modification by the inventive efforts.

Claims

1. A fault feature extraction method for an operation and maintenance management and control platform is characterized by comprising the following contents:

1) principal component analysis feature extraction

Principal Component Analysis (PCA) is to perform sample space transformation, determine the projection direction with the largest variance of all original feature vectors by projection, perform feature extraction on the projection direction positioning discrimination vectors, and change the original samples into low-dimensional samples dispersed as much as possible after projection transformation while maintaining the original samplesBefore transformation, the difference of original high-dimensional sample space is set to contain N samples, X belongs to RⁿEach sample is an X_i＝[x_i1,...,x_in]^T∈RⁿIf the mean vector is M, the corresponding feature vector is X_i＝[x_1i,...,x_ni]∈RⁿAnd the corresponding covariance matrix is formula (1),

denotes Q as

2) quadratic feature selection

After PCA (principal component analysis) feature extraction, embedding a quadratic feature selection algorithm for further obtaining an optimal feature subset and key features of a PCA low-dimensional space, wherein the algorithm is based on Filter-based (Filter) association rule feature selection (CFS), adopts a heuristic sequence back search strategy when carrying out Correlation evaluation on sample features, determines the optimal feature subset through the Correlation sorting of the features,

The mean of the correlation between attributes is used

As shown in formula (4), the candidate feature subset determined by the association rule feature selection algorithm can make the feature attribute have the maximum association-minimum redundancy, that is, the association degree between the attribute feature and the class attribute can be improved to the maximum extent, and the redundancy between the attribute and the attribute can be reduced, that is, the evaluation value M in formula (4)_sHigher, characteristic property and classMean value of correlation

The larger the correlation mean between the attributes is

The smaller the size of the tube is,