CN111242204A - Operation and maintenance management and control platform fault feature extraction method - Google Patents

Operation and maintenance management and control platform fault feature extraction method Download PDF

Info

Publication number
CN111242204A
CN111242204A CN202010015277.7A CN202010015277A CN111242204A CN 111242204 A CN111242204 A CN 111242204A CN 202010015277 A CN202010015277 A CN 202010015277A CN 111242204 A CN111242204 A CN 111242204A
Authority
CN
China
Prior art keywords
feature
attributes
correlation
features
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010015277.7A
Other languages
Chinese (zh)
Inventor
姜涛
曹杰
王蕾
薄小永
曲朝阳
薛凯
于建友
吕洪波
胡可为
徐鹏程
于成立
周玉光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co
State Grid Jilin Electric Power Corp
Northeast Electric Power University
Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Original Assignee
Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co
Northeast Dianli University
State Grid Jilin Electric Power Corp
Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co, Northeast Dianli University, State Grid Jilin Electric Power Corp, Information and Telecommunication Branch of State Grid East Inner Mogolia Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd filed Critical Taipingwan Power Station State Grid Northeast Branch Department Lyuyuan Hydroelectric Co
Priority to CN202010015277.7A priority Critical patent/CN111242204A/en
Publication of CN111242204A publication Critical patent/CN111242204A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A fault feature extraction method for an operation and maintenance management and control platform is characterized by comprising the following steps: and performing principal component analysis feature extraction, secondary feature selection and the like. The high-dimensional space samples can be converted into low-dimensional space samples based on principal component analysis feature extraction, the redundancy of feature attributes is reduced while the feature dimensions are reduced, main classification information is reserved, the calculation complexity of a classifier is greatly reduced, and the training time is shortened; and because a secondary feature selection function is embedded in the feature extraction process, the evaluation results are sorted by combining a heuristic sequence backward search strategy based on association rule feature selection, and further the key features of the feature subset are determined, so that the feature attributes have maximum association-minimum redundancy, that is, the association degree of the attribute features and the class attributes can be improved to the maximum extent, the redundancy between the attributes is reduced, and the classification precision of the control faults is obviously improved. The method is scientific and reasonable, has strong applicability, and can be widely applied to various fault classification management and control platforms.

Description

Operation and maintenance management and control platform fault feature extraction method
Technical Field
The invention relates to the technical field of information system operation and maintenance management and control fault feature extraction, in particular to a fault feature extraction method for an operation and maintenance management and control platform.
Background
The information system management and control platform is used for remotely monitoring hardware equipment and software application in real time in order to acquire information such as system running conditions, running trends and the like. The control platform needs to monitor the device in a network environment, and in the network environment, data transmission usually brings corresponding features to the data stream, and the features are important bases for realizing data identification. When the management and control equipment is used for monitoring, a large amount of fault information can be collected, and the characteristic extraction and selection technology is the basis for carrying out classification and identification on the fault information. The feature extraction and selection technology can realize the selection of key monitoring features under the multi-attribute and high-redundancy information environment.
In the intelligent management and control platform of the information system, in order to strengthen the centralized management and unified monitoring of the system, the whole network monitoring of the network and the safety equipment is realized, accurate fault judgment and processing suggestions are provided, and the fault solving capability and efficiency of personnel are improved. To achieve this goal, feature extraction and selection techniques are used to determine key features for monitoring fault data, each fault type may contain a number of features, and the key features that are most representative of such fault type are selected. The feature extraction and selection technology has the advantages that in the process of identifying and classifying fault types, on the premise of reducing data redundancy, the accuracy of fault identification is greatly improved. Compared with other technologies, the key characteristics which can represent the fault types most accurately can be selected.
Through the feature extraction and selection technology, the fault types are effectively identified and classified, so that the faults are quickly and efficiently analyzed and processed, the management personnel are quickly alarmed in time, and 24-hour unmanned continuous monitoring is realized.
The operation and maintenance management and control platform fault data contains data with more characteristics, and the data is called high-dimensional data. The fault types are automatically classified based on partial features of high-dimensional data, but the features in some fault data do not contribute much to the classification result. In addition, because of the existence of a certain correlation and redundancy among the features, a large time and space overhead is generated in the classification process, and the fault classification effect is poor. The performance of the classifier is greatly influenced by the redundant features of the high-dimensional data, and particularly, a standard supervised learning classification algorithm which takes all data features as a decision function is adopted. Therefore, for the classifier based on supervised learning, the original data features of the classifier are extracted or selected before classification, so that the redundancy of data is reduced, and the generalization capability of the classifier can be effectively improved. At present, the fault statistical characteristics of the fault classification of the management and control platform can reach hundreds. In order to improve the efficiency and accuracy of the classification algorithm and effectively reduce the scale of the original data and the redundancy among the features, feature selection and extraction need to be performed on the features of the original high-dimensional data. The feature selection is to select an optimal feature subset from the features of the original data, wherein the feature subset can represent the distribution characteristics of the original data to the maximum extent; the feature extraction is to map high-dimensional data samples into low-dimensional samples through transformation by means of a mapping principle, and a new sample feature combination is formed after mapping, so that the combination not only reduces dimensionality, but also can fully represent original features due to mapping transformation.
Disclosure of Invention
The invention aims to solve the problem that redundant information is not sufficiently removed by simply adopting a feature selection method when the dependence of similarity among data is strong, and provides the fault feature extraction method for the operation and maintenance management and control platform, which is scientific, reasonable, high in applicability, capable of more effectively removing data redundancy and obtaining better classification accuracy under the condition of determining feature subsets.
The purpose of the invention is realized by the following technical scheme: a fault feature extraction method for an operation and maintenance management and control platform is characterized by comprising the following contents:
1) principal component analysis feature extraction
Principal Component Analysis (PCA) is to perform sample space transformation, determine the maximum variance of all original feature vectors by projection, perform feature extraction on the positioning discrimination vectors of the projection directions, change the original samples into low-dimensional samples dispersed as much as possible after projection transformation, and simultaneously keep the difference of original high-dimensional sample space before transformation, wherein N samples are set in the original high-dimensional space, and X belongs to RnEach sample is an Xi=[xi1,...,xin]T∈RnIf the mean vector is M, the corresponding feature vector is Xi=[x1i,...,xni]∈RnAnd the corresponding covariance matrix is formula (1),
Figure BDA0002358650860000021
the distribution variance of the sample on the feature vector, namely the feature value of the covariance matrix of formula (1), the orthogonal matrix obtained by diagonalizing the covariance matrix of formula (1) is formula (2),
Figure BDA0002358650860000022
denotes Q as
Figure BDA0002358650860000023
Where M is the dimension of the orthogonal matrix Q, and PCA derives the eigenvalues λ in the matrix based on Q1≥λ2≥…≥λnAnd calculating the orthonormal eigenvector v corresponding to the eigenvalue1≥v2≥…≥vnObtaining the orthonormal eigenvector u of the covariance matrix S through the eigenvalue of the orthonormal matrix Q and the corresponding orthonormal eigenvector1,u2,…udAs in equation (3) where the orthonormal eigenvector u1,u2,…udCorresponding to the first d largest non-zero eigenvalues of S,
Figure BDA0002358650860000024
setting t to 95%, uiT, the cumulative contribution of the principal components of the spatial samples on the first d axes is 95% of the original data, thus, for any sample xiMapping the space to a reduced-dimension low-dimension sample space U-U1,u2,…udX ofiIs characterized by y ═ u (u)1,u2,…ud)TxiThen y isiFor sample points in a low-dimensional space, through the spatial sample transformation of PCA, the transformed samples can represent 95% of the accumulated contribution rate of the principal components, and the original spatial dimension is reduced from n to d, wherein d is smaller than n, so that the spatial dimension is greatly reduced, and the function of feature extraction is played;
2) quadratic feature selection
After PCA (principal component analysis) feature extraction, embedding a quadratic feature Selection algorithm for further obtaining an optimal feature subset and key features of a PCA low-dimensional space, wherein the algorithm is based on Filter-based association rule feature Selection (CFS), adopts a heuristic sequence backward search strategy when carrying out Correlation evaluation on sample features, determines the optimal feature subset through the Correlation sequencing of the features,
CFS uses the correlation of the characteristics as an evaluation standard, is a Filter type characteristic selection algorithm, aims to reduce the redundancy between attributes and improve the correlation degree of the attribute characteristics and the class attributes under the corresponding search strategy, achieves the screening of the attributes with high redundancy and the attributes irrelevant to the classes, uses a formula (4) as the evaluation standard, and uses M for evaluating the k characteristics of the characteristic subset SsRepresentation in which the mean of the correlation of the feature attributes with the classes is
Figure BDA0002358650860000031
The mean of the correlation between attributes is used
Figure BDA0002358650860000032
As shown in formula (4), the candidate feature subset determined by the association rule feature selection algorithm can make the feature attribute have the maximum association-minimum redundancy, that is, the association degree between the attribute feature and the class attribute can be improved to the maximum extent, and the redundancy between the attribute and the attribute can be reduced, that is, the evaluation value M in formula (4)sThe higher the mean correlation of the feature attribute and class is
Figure BDA0002358650860000033
The larger the correlation mean between the attributes is
Figure BDA0002358650860000034
The smaller the size of the tube is,
Figure BDA0002358650860000035
the correlation between the attributes is evaluated by using an information gain algorithm in the association rule feature selection, and the information gain calculation method is a symmetry measurement method, so when two high-order associated features exist in the feature subset S, such as the feature Wi、WjThe method of symmetry uncertainty of formula (5) can be used, the entropy of the features is H (W), the feature association is U, and thus formula (6) is an evaluation function of a subset of features based on the correlation between attributes, when evaluating the value HsWhen raised, the features W in the feature subset SjAnd WiThe correlation decreases, and the correlation with the class attribute increases,
Figure BDA0002358650860000036
Figure BDA0002358650860000037
and embedding a secondary feature selection function in the PCA by adopting a CFS algorithm, then calculating an evaluation result of the CFS based on a heuristic sequence backward search strategy, and screening out an optimal feature subset after sorting.
The fault feature extraction method of the operation and maintenance management and control platform is a feature extraction method embedded with a secondary feature selection function, and high-dimensional space samples are converted into low-dimensional space samples based on PCA feature extraction, so that the redundancy of feature attributes is reduced while the feature dimensions are reduced, main classification information is reserved, the calculation complexity of a classifier is greatly reduced, and the training time is shortened; and because a secondary feature selection function is embedded in the feature extraction process, evaluation results are sequenced based on a CFS combined heuristic sequence backward search strategy, and further key features of the feature subset are determined, so that the feature attributes have maximum correlation-minimum redundancy, that is, the correlation degree of the attribute features and the class attributes can be improved to the maximum extent, the redundancy between the attributes is reduced, and the classification precision of the control faults is obviously improved. The method is scientific and reasonable, has strong applicability, and can be widely applied to various fault classification management and control platforms.
Drawings
Fig. 1 is a functional schematic diagram of a fault feature extraction method for an operation and maintenance management and control platform according to the present invention;
FIG. 2 is a flow diagram of a feature backward search strategy with embedded secondary feature selection functionality;
FIG. 3 is a comparison graph of the classification performance of faults before and after primary PCA feature extraction;
fig. 4 is a performance comparison diagram of the PCA feature extraction method with the embedded quadratic feature selection function and the conventional feature extraction method.
Detailed Description
The invention is further illustrated by the following figures and detailed description.
The invention relates to a fault feature extraction method for an operation and maintenance management and control platform, which comprises the following contents:
1) principal component analysis feature extraction
Principal Component Analysis (PCA) is to perform sample space transformation, determine the maximum variance of all original feature vectors by projection, perform feature extraction on the positioning discrimination vectors of the projection directions, change the original samples into low-dimensional samples dispersed as much as possible after projection transformation, and simultaneously keep the difference of original high-dimensional sample space before transformation, wherein N samples are set in the original high-dimensional space, and X belongs to RnEach sample is an Xi=[xi1,...,xin]T∈RnIf the mean vector is M, the corresponding feature vector is Xi=[x1i,...,xni]∈RnAnd the corresponding covariance matrix is formula (1),
Figure BDA0002358650860000041
the distribution variance of the sample on the feature vector, namely the feature value of the covariance matrix of formula (1), the orthogonal matrix obtained by diagonalizing the covariance matrix of formula (1) is formula (2),
Figure BDA0002358650860000042
denotes Q as
Figure BDA0002358650860000043
Where M is the dimension of the orthogonal matrix Q, and PCA derives the eigenvalues λ in the matrix based on Q1≥λ2≥…≥λnAnd calculating the orthonormal eigenvector v corresponding to the eigenvalue1≥v2≥…≥vnObtaining the orthonormal eigenvector u of the covariance matrix S through the eigenvalue of the orthonormal matrix Q and the corresponding orthonormal eigenvector1,u2,…udAs in equation (3) where the orthonormal eigenvector u1,u2,…udCorresponding to the first d largest non-zero eigenvalues of S,
Figure BDA0002358650860000051
setting t to 95%, uiT, the cumulative contribution of the principal components of the spatial samples on the first d axes is 95% of the original data, thus, for any sample xiMapping the space to a reduced-dimension low-dimension sample space U-U1,u2,…udX ofiIs characterized by y ═ u (u)1,u2,…ud)TxiThen y isiFor sample points in a low-dimensional space, through the spatial sample transformation of PCA, the transformed samples can represent 95% of the accumulated contribution rate of the principal components, and the original spatial dimension is reduced from n to d, wherein d is smaller than n, so that the spatial dimension is greatly reduced, and the function of feature extraction is played;
2. quadratic feature selection
After PCA (principal component analysis) feature extraction, embedding a quadratic feature Selection algorithm for further obtaining an optimal feature subset and key features of a PCA low-dimensional space, wherein the algorithm is based on Filter-based association rule feature Selection (CFS), adopts a heuristic sequence backward search strategy when carrying out Correlation evaluation on sample features, determines the optimal feature subset through the Correlation sequencing of the features,
CFS uses the correlation of the characteristics as an evaluation standard, is a Filter type characteristic selection algorithm, aims to reduce the redundancy between attributes and improve the correlation degree of the attribute characteristics and the class attributes under the corresponding search strategy, achieves the screening of the attributes with high redundancy and the attributes irrelevant to the classes, uses a formula (4) as the evaluation standard, and uses M for evaluating the k characteristics of the characteristic subset SsRepresentation in which the mean of the correlation of the feature attributes with the classes is
Figure BDA0002358650860000052
The mean of the correlation between attributes is used
Figure BDA0002358650860000053
As shown in formula (4), the candidate feature subset determined by the association rule feature selection algorithm can make the feature attribute have the maximum association-minimum redundancy, that is, the association degree between the attribute feature and the class attribute can be improved to the maximum extent, and the redundancy between the attribute and the attribute can be reduced, that is, the evaluation value M in formula (4)sThe higher the mean correlation of the feature attribute and class is
Figure BDA0002358650860000054
The larger the correlation mean between the attributes is
Figure BDA0002358650860000055
The smaller the size of the tube is,
Figure BDA0002358650860000056
in the selection of the association rule characteristics, an information gain algorithm is adopted to evaluate the correlation among the attributes, and the information is increasedThe beneficial calculation method is a symmetry measurement method, so that when two higher-order associated features are present in the feature subset S, such as feature Wi、WjThe method of symmetry uncertainty of equation (5) can be used, the correlation of features is U, the entropy of features is H (W), and thus equation (6) is an evaluation function of a subset of features based on the correlation between attributes, when evaluating the value HsWhen raised, the features W in the feature subset SjAnd WiThe correlation decreases, and the correlation with the class attribute increases,
Figure BDA0002358650860000061
Figure BDA0002358650860000062
and embedding a secondary feature selection function in the PCA by adopting a CFS algorithm, then calculating an evaluation result of the CFS based on a heuristic sequence backward search strategy, and screening out an optimal feature subset after sorting.
Referring to fig. 1, the functional framework of the operation and maintenance management and control platform fault feature extraction method of the invention
And data redundancy is more effectively removed after sample space transformation is carried out based on PCA feature extraction. The feature extraction process: 1) PCA-based on preprocessed data set S0And (5) feature extraction. Obtaining a covariance matrix S of a high-dimensional sample space X according to a PCA principle; orthogonal matrix Q for deriving S and its eigenvalue lambda1≥λ2≥…≥λn(ii) a Extracting actual requirements according to the control fault characteristics and setting the threshold value of the accumulated contribution rate t, thereby obtaining the orthonormal vector u of the threshold valueiAnd the low-dimensional sample space after feature extraction is U ═ U1,u2,…udGet the original sample xiThe principal component feature after spatial transformation is (u)1,u2,…ud)TxiForming a new candidate feature subset F 1① feature subset F is locked as required after feature extraction of fault PCA1The key feature of (1) is to go into quadraticAnd a feature selection function module. The secondary feature selection adopts an association rule feature selection CFS algorithm to calculate the feature correlation of the extracted feature set, so that the feature attributes have the maximum association-minimum redundancy, namely, the association degree of the attribute features and the class attributes can be improved to the maximum degree, the redundancy between the attributes can be reduced, and meanwhile, the key feature subset F after PCA feature extraction can be locked2② when only fault classification is needed and key features are not needed to be analyzed, the function module can be skipped over to quickly classify the pipe control fault, 3) on the basis of the feature extraction of the embedded self-adaptive secondary feature selection function, the formed optimal feature data set of the pipe control fault is trained, and the fault classification result of the pipe control platform is obtained on the test set.
2. The invention relates to an algorithm framework of an operation and maintenance management and control platform fault feature extraction method
The algorithm carries out feature extraction on the original data set based on principal component analysis to form a feature set F1And measure F1Middle feature WjAssociation with class Attribute S U (W)jS), arranging the U in descending order, and calculating the characteristic entropy evaluation value H of the CFSs1. The search strategy adopted in the calculation is heuristic sequence backward search, the backward search strategy flow is shown in fig. 2, each time the feature with the small correlation evaluation value with the class attribute is deleted, and the feature entropy evaluation value H after the feature deletion is calculated agains2. Cycle evaluation HsWhen it is not less than the threshold, if Hs2≥Hs1Feature subset F1Will be updated if Hs2<Hs1Feature subset F1Not updated when HsJump out of loop output optimal feature subset F when less than threshold2. The secondary feature selection functional module can further lock the key features of the optimal feature subset through association rule feature selection on the basis of PCA feature extraction. The pseudo code of the quadratic feature selection algorithm is as follows:
inputting: feature set F after PCA feature extraction1And outputting: optimal feature set F2
1. Selecting all the features after PCA feature extraction to form a feature subset F1
2. Calculating F1In each characteristic attribute WjAssociation with class Attribute S U (W)j,S),
3. Calculating a characteristic entropy evaluation value Hs
4. For each feature and class attribute association U (W)jS) values are arranged in descending order, Hs1←Hs
5.For Hs1≥δdo,
6. Delete F1Of the feature, forming a new subset of features F2Calculating a characteristic entropy evaluation value Hs2
7.If Hs2≥Hs1,then F1=F2
8.else,F1The temperature of the molten steel is not changed,
9.End if,
10.Hs1=Hs2
11.End For。
the inventor adopts the fault feature extraction method of the operation and maintenance control platform to compare and analyze the fault recognition performance of the control platform after feature extraction. First, feature extraction is performed by PCA, and it is determined that the principal component cumulative contribution rate is 94%, because when the threshold t (threshold ═ 94%), the feature dimension is reduced to 18 dimensions, and the average fault identification accuracy rate reaches 98% or more, as shown in fig. 3. It should be noted that the threshold t determines the cumulative contribution rate of the PCA principal component, and although the cumulative contribution rate is the largest when the threshold is 100%, and has a high recognition accuracy, the feature dimension also increases sharply at the same time. Thus, the higher the threshold t, the better the performance of the classifier can only be optimized if the balance between dimensionality and classification accuracy is reached. After PCA feature extraction, secondary selection of 18-dimensional features is performed again, and the result shows that the redundancy among the features of the 1 st, 2 nd, 5 th, 6 th, 7 th and 12 th dimensions is the minimum and the relevance with the class attribute is the strongest. After screening, they are the key feature subset after feature extraction. Table 1 shows the cross-validation result after the secondary feature selection, and the comparison effect of the 6-dimensional key feature subset based on the secondary feature selection is shown in fig. 4, where the average secondary classification accuracy is 96.9%, and is less than 1.1% different from the classification accuracy obtained by only extracting the PCA features. Because the characteristic dimension is reduced to 6 dimensions, the characteristic dimension is reduced by 65 percent compared with the characteristic dimension obtained by only carrying out PCA dimension reduction; the classifier model performed an average reduction in time of 31.3%. In the fault classification process of the control platform, self-adaptive feature extraction and selection can be performed according to specific requirements. When only fault classification is needed, the requirement on classification precision is high, and key features do not need to be analyzed, the secondary feature selection module can be skipped, and the tube control fault can be classified. When the key features need to be locked and the requirement on feature dimensions is high, the secondary feature selection module can be adaptively entered, the key features are further locked, and meanwhile, the fault classification result of the management and control platform is obtained on the test set. The feasibility and the effectiveness of the operation and maintenance control platform fault feature extraction method provided by the invention are proved.
TABLE 1 PCA-based Secondary Fault feature selection (Ten-fold Cross validation)
Feature dimensionality after PCA extraction Cross validation (%) Feature dimensionality after PCA extraction Cross validation (%)
1 9(90%) 11 1(10%)
2 10(100%) 12 10(100%)
3 5(50%) 13 0(0%)
4 4(40%) 14 0(0%)
5 10(100%) 15 0(0%)
6 10(100%) 16 0(0%)
7 9(90%) 17 0(0%)
8 7(70%) 18 0(0%)
9 1(10%)
10 0(0%)
In conclusion, the method for extracting the fault features of the operation and maintenance management and control platform reduces the feature dimensions of each fault sample space, shortens the training time and improves the classification precision of the learning classifier. Because the PCA feature extraction is carried out firstly, the feature dimension of the management and control fault classification is greatly reduced, and the calculation complexity is reduced. Meanwhile, the method carries out self-adaptive secondary feature selection after feature extraction, overcomes the problem that a single feature extraction method cannot lock key features, reduces redundancy among the features, enhances the relevance between the features and class attributes, and greatly improves the precision of fault classification.
The software routines of the present invention are programmed according to automated and computer processing techniques, which are well known to those skilled in the art.
The embodiments of the present invention are not exhaustive, and those skilled in the art will still fall within the scope of the present invention as claimed without simple duplication and modification by the inventive efforts.

Claims (1)

1. A fault feature extraction method for an operation and maintenance management and control platform is characterized by comprising the following contents:
1) principal component analysis feature extraction
Principal Component Analysis (PCA) is to perform sample space transformation, determine the projection direction with the largest variance of all original feature vectors by projection, perform feature extraction on the projection direction positioning discrimination vectors, and change the original samples into low-dimensional samples dispersed as much as possible after projection transformation while maintaining the original samplesBefore transformation, the difference of original high-dimensional sample space is set to contain N samples, X belongs to RnEach sample is an Xi=[xi1,...,xin]T∈RnIf the mean vector is M, the corresponding feature vector is Xi=[x1i,...,xni]∈RnAnd the corresponding covariance matrix is formula (1),
Figure FDA0002358650850000011
the distribution variance of the sample on the feature vector, namely the feature value of the covariance matrix of formula (1), the orthogonal matrix obtained by diagonalizing the covariance matrix of formula (1) is formula (2),
Figure FDA0002358650850000012
denotes Q as
Figure FDA0002358650850000013
Where M is the dimension of the orthogonal matrix Q, and PCA derives the eigenvalues λ in the matrix based on Q1≥λ2≥…≥λnAnd calculating the orthonormal eigenvector v corresponding to the eigenvalue1≥v2≥…≥vnObtaining the orthonormal eigenvector u of the covariance matrix S through the eigenvalue of the orthonormal matrix Q and the corresponding orthonormal eigenvector1,u2,…udAs in equation (3) where the orthonormal eigenvector u1,u2,…udCorresponding to the first d largest non-zero eigenvalues of S,
Figure FDA0002358650850000014
setting t to 95%, uiT, the cumulative contribution of the principal components of the spatial samples on the first d axes is 95% of the original data, thus, for any sample xiMapping the space to a reduced-dimension low-dimension sample space U-U1,u2,…udX ofiIs characterized by y ═ u (u)1,u2,…ud)TxiThen y isiFor sample points in a low-dimensional space, through the spatial sample transformation of PCA, the transformed samples can represent 95% of the accumulated contribution rate of the principal components, and the original spatial dimension is reduced from n to d, wherein d is smaller than n, so that the spatial dimension is greatly reduced, and the function of feature extraction is played;
2) quadratic feature selection
After PCA (principal component analysis) feature extraction, embedding a quadratic feature selection algorithm for further obtaining an optimal feature subset and key features of a PCA low-dimensional space, wherein the algorithm is based on Filter-based (Filter) association rule feature selection (CFS), adopts a heuristic sequence back search strategy when carrying out Correlation evaluation on sample features, determines the optimal feature subset through the Correlation sorting of the features,
CFS uses the correlation of the characteristics as an evaluation standard, is a Filter type characteristic selection algorithm, aims to reduce the redundancy between attributes and improve the correlation degree of the attribute characteristics and the class attributes under the corresponding search strategy, achieves the screening of the attributes with high redundancy and the attributes irrelevant to the classes, uses a formula (4) as the evaluation standard, and uses M for evaluating the k characteristics of the characteristic subset SsRepresentation in which the mean of the correlation of the feature attributes with the classes is
Figure FDA0002358650850000021
The mean of the correlation between attributes is used
Figure FDA0002358650850000022
As shown in formula (4), the candidate feature subset determined by the association rule feature selection algorithm can make the feature attribute have the maximum association-minimum redundancy, that is, the association degree between the attribute feature and the class attribute can be improved to the maximum extent, and the redundancy between the attribute and the attribute can be reduced, that is, the evaluation value M in formula (4)sHigher, characteristic property and classMean value of correlation
Figure FDA0002358650850000023
The larger the correlation mean between the attributes is
Figure FDA0002358650850000024
The smaller the size of the tube is,
Figure FDA0002358650850000025
the correlation between the attributes is evaluated by using an information gain algorithm in the association rule feature selection, and the information gain calculation method is a symmetry measurement method, so when two high-order associated features exist in the feature subset S, such as the feature Wi、WjThe method of symmetry uncertainty of formula (5) can be used, the entropy of the features is H (W), the feature association is U, and thus formula (6) is an evaluation function of a subset of features based on the correlation between attributes, when evaluating the value HsWhen raised, the features W in the feature subset SjAnd WiThe correlation decreases, and the correlation with the class attribute increases,
Figure FDA0002358650850000026
Figure FDA0002358650850000027
and embedding a secondary feature selection function in the PCA by adopting a CFS algorithm, then calculating an evaluation result of the CFS based on a heuristic sequence backward search strategy, and screening out an optimal feature subset after sorting.
CN202010015277.7A 2020-01-07 2020-01-07 Operation and maintenance management and control platform fault feature extraction method Pending CN111242204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010015277.7A CN111242204A (en) 2020-01-07 2020-01-07 Operation and maintenance management and control platform fault feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010015277.7A CN111242204A (en) 2020-01-07 2020-01-07 Operation and maintenance management and control platform fault feature extraction method

Publications (1)

Publication Number Publication Date
CN111242204A true CN111242204A (en) 2020-06-05

Family

ID=70864621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010015277.7A Pending CN111242204A (en) 2020-01-07 2020-01-07 Operation and maintenance management and control platform fault feature extraction method

Country Status (1)

Country Link
CN (1) CN111242204A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085619A (en) * 2020-08-10 2020-12-15 国网上海市电力公司 Feature selection method for power distribution network data optimization
CN112633383A (en) * 2020-12-25 2021-04-09 百度在线网络技术(北京)有限公司 Antique identification method and device, electronic equipment and readable medium
CN113128002A (en) * 2021-03-23 2021-07-16 常州匠心独具智能家居股份有限公司 High-dimensional time series modeling method and system for large-scale distributed system
CN118247782A (en) * 2024-01-16 2024-06-25 无锡商业职业技术学院 Intelligent refrigerator control method based on image recognition and intelligent refrigerator

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608004A (en) * 2015-12-17 2016-05-25 云南大学 CS-ANN-based software failure prediction method
CN105703954A (en) * 2016-03-17 2016-06-22 福州大学 Network data flow prediction method based on ARIMA model
CN108319987A (en) * 2018-02-20 2018-07-24 东北电力大学 A kind of filtering based on support vector machines-packaged type combined flow feature selection approach

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608004A (en) * 2015-12-17 2016-05-25 云南大学 CS-ANN-based software failure prediction method
CN105703954A (en) * 2016-03-17 2016-06-22 福州大学 Network data flow prediction method based on ARIMA model
CN108319987A (en) * 2018-02-20 2018-07-24 东北电力大学 A kind of filtering based on support vector machines-packaged type combined flow feature selection approach

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹杰: "基于SVM的网络流量特征降维与分类方法研究", 《中国博士学位论文全文数据库 信息科技辑》, no. 3, pages 139 - 1 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085619A (en) * 2020-08-10 2020-12-15 国网上海市电力公司 Feature selection method for power distribution network data optimization
CN112633383A (en) * 2020-12-25 2021-04-09 百度在线网络技术(北京)有限公司 Antique identification method and device, electronic equipment and readable medium
CN112633383B (en) * 2020-12-25 2023-08-18 百度在线网络技术(北京)有限公司 Ancient game authentication method and device, electronic equipment and readable medium
CN113128002A (en) * 2021-03-23 2021-07-16 常州匠心独具智能家居股份有限公司 High-dimensional time series modeling method and system for large-scale distributed system
CN118247782A (en) * 2024-01-16 2024-06-25 无锡商业职业技术学院 Intelligent refrigerator control method based on image recognition and intelligent refrigerator

Similar Documents

Publication Publication Date Title
CN111242204A (en) Operation and maintenance management and control platform fault feature extraction method
CN109657945B (en) Industrial production process fault diagnosis method based on data driving
US6466929B1 (en) System for discovering implicit relationships in data and a method of using the same
CN109740859A (en) Transformer condition evaluation and system based on Principal Component Analysis and support vector machines
CN109460574A (en) A kind of prediction technique of aero-engine remaining life
CN115270965A (en) Power distribution network line fault prediction method and device
CN113177594B (en) Air conditioner fault diagnosis method based on Bayesian optimization PCA-extreme random tree
CN108197647B (en) Rapid clustering method for automobile starter endurance test data
CN111708343A (en) Method for detecting abnormal behavior of field process behavior in manufacturing industry
CN109389325B (en) Method for evaluating state of electronic transformer of transformer substation based on wavelet neural network
CN108334894B (en) Unsupervised machine learning-based transformer oil temperature abnormity identification method
CN114037001A (en) Mechanical pump small sample fault diagnosis method based on WGAN-GP-C and metric learning
CN110175682A (en) A kind of optimization core pivot element analysis fault monitoring method based on Chaos-Particle Swarm Optimization
CN116433333B (en) Digital commodity transaction risk prevention and control method and device based on machine learning
CN114091549A (en) Equipment fault diagnosis method based on deep residual error network
CN116244657A (en) Train axle temperature abnormality identification method based on generation of countermeasure network and ensemble learning
CN110427019B (en) Industrial process fault classification method and control device based on multivariate discriminant analysis
CN110673577A (en) Distributed monitoring and fault diagnosis method for complex chemical production process
CN104537383A (en) Massive organizational structure data classification method and system based on particle swarm
CN114443338A (en) Sparse negative sample-oriented anomaly detection method, model construction method and device
CN117349786B (en) Evidence fusion transformer fault diagnosis method based on data equalization
CN113780432A (en) Intelligent detection method for operation and maintenance abnormity of network information system based on reinforcement learning
Kim et al. Anomaly pattern detection in streaming data based on the transformation to multiple binary-valued data streams
CN111639688A (en) Local interpretation method of Internet of things intelligent model based on linear kernel SVM
CN111275109A (en) Power equipment state data characteristic optimization method and system based on self-encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination