CN112085619A - Feature selection method for power distribution network data optimization - Google Patents

Feature selection method for power distribution network data optimization Download PDF

Info

Publication number
CN112085619A
CN112085619A CN202010797427.4A CN202010797427A CN112085619A CN 112085619 A CN112085619 A CN 112085619A CN 202010797427 A CN202010797427 A CN 202010797427A CN 112085619 A CN112085619 A CN 112085619A
Authority
CN
China
Prior art keywords
feature
features
fault
distribution network
power distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010797427.4A
Other languages
Chinese (zh)
Inventor
李帆
周蓝波
余捷
侯仲华
贝翔飚
顾珏
宗卫国
徐姗姗
夏子朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shanghai Electric Power Co Ltd
Original Assignee
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shanghai Electric Power Co Ltd filed Critical State Grid Shanghai Electric Power Co Ltd
Priority to CN202010797427.4A priority Critical patent/CN112085619A/en
Publication of CN112085619A publication Critical patent/CN112085619A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A characteristic selection method for power distribution network data optimization belongs to the field of electric power data analysis and processing. According to related data sources, quantifying certain influence factors of the power distribution network fault to enable the certain influence factors to become fault characteristic variables; the data matrix is preprocessed and discretized according to the average value of each fault expression; the output characteristic number n is provided by a user from the outside, and then a category data matrix is input; for the target value A, calculating according to the correlation of each fault through mutual information maximization; then, circularly and repeatedly updating the residual output characteristics; setting the target value B model function as the ratio of the characteristic correlation index and the redundancy index, and maximizing the characteristic correlation index and the redundancy index; and sequentially and circularly sorting the feature vectors with the highest scores until the range of the screened feature set extends to a predetermined limit value, outputting an optimal feature subset, and otherwise, repeating the steps. According to the technical scheme, the complexity of the feature selection method can be effectively reduced, and therefore the accuracy of the classification of the fault data of the power distribution network is improved.

Description

Feature selection method for power distribution network data optimization
Technical Field
The invention relates to a characteristic selection method for power distribution network fault data optimization, in particular to a method for selecting and processing big data by adopting multi-mark characteristics with maximum correlation and minimum redundancy, and belongs to the field of electric power data analysis and processing.
Background
In recent years, data mining technology has been developed and widely used in various industries. The data mining technology can intelligently process large-scale data, discover the implicit rules of historical data and predict unknown events. Therefore, the data mining technology can be used for mining the complex relation between the fault and the influence factors of the fault, and further building a model to predict the fault of the power distribution network.
At present, the data mining technology is also popularized in the field of power distribution network automation transformation, and in many practical applications, a data set stored in a power distribution network database often has thousands or even tens of thousands of features, but not all the features are helpful for finding important information hidden behind the data.
In the classification problem, information for determining the class of the sample is contained in the feature vector of the sample, and the completeness of the sample information, the degree of correlation redundancy between the features and the class directly determine the classification capability of the learning algorithm. The large number of extraneous and redundant features not only reduces the classification ability of the learning algorithm, but also increases the amount of unnecessary work.
The feature selection is to select the features with the strongest correlation for classification, and to remove redundant and invalid features. The feature selection is used as the first step of data processing, the data scale can be reduced for large data, the difficulty of target model learning is reduced, the dimensionality reduction can be performed on high-dimensional data to overcome the dimensionality puzzlement phenomenon, and overfitting of the model is prevented. Particularly, in the learning of high-dimensional data, the difficulty and cost of analyzing and learning the data is exponentially increased relative to the dimension of the data, a complex model must be learned to improve the expression capability of the model, and the exponentially increased data volume is also required to support the learning of the complex model. If the data amount is too small, the model is over-fitted, and the generalization performance of the model is poor.
Feature selection is a most effective means for reducing data dimension and improving learning algorithm popularization capability, and is also an indispensable part of data preprocessing in pattern recognition. By eliminating the features irrelevant to the categories, the problem that most learning algorithms are sensitive to irrelevant redundant features can be solved, so that the algorithms are focused on the useful features, and the capability of deep data mining on useful information is improved.
It is becoming more and more urgent how to reduce the dimensions from the data of a large-scale distribution network in order to obtain effective simplified data. The feature selection is used as a key data analysis method and a preprocessing means, and before knowledge mining is carried out on data, an optimal feature subset is selected from an original data feature set, so that the interference of data noise can be eliminated, redundant and irrelevant features can be eliminated, the complexity of subsequent data processing can be greatly reduced, the running time is reduced, and the accuracy and the effectiveness of data analysis are improved.
However, it is extremely difficult to find the optimal feature set in the huge subset space of the original feature set as the representation of the data. Feature extraction refers to the process of generating a small set of new features by merging or transforming the original types, while in feature selection, the spatial dimension is reduced by selecting the most significant features. The feature selection methods can be divided into four categories: filters, wrappers, embedded and hybrid approaches. The filter method performs a statistical analysis on the feature space to select a discriminative subset of features. The feature selection method should be able to identify and remove as many irrelevant and redundant features as possible. Most feature selection methods can effectively remove irrelevant features, but cannot handle redundant features.
In view of the fact that the average correct prediction rate of a prediction model is reduced due to excessive model input variables, and for possible redundant feature variables and non-strongly correlated variables, establishing a method for selecting and processing big data by using multi-labeled features with maximum correlation and minimum redundancy is a technical problem to be solved urgently in practical work.
Disclosure of Invention
The invention aims to provide a characteristic selection method for power distribution network data optimization. The method adopts an improved maximum correlation minimum redundancy feature selection algorithm, removes irrelevant features by performing correlation analysis on an original feature set, retains strong relevant features, and measures the classification error rate of the selected features through a classifier.
The technical scheme of the invention is as follows: the characteristic selection method for the data optimization of the power distribution network is characterized by comprising the following steps:
according to related data sources, quantifying certain influence factors of the power distribution network fault to enable the certain influence factors to become fault characteristic variables;
the data matrix is preprocessed and discretized according to the average value of each fault expression;
the output characteristic number n is provided by a user from the outside, and then a category data matrix is input;
for the target value A, calculating according to the correlation of each fault through mutual information maximization; then, circularly and repeatedly updating the residual output characteristics;
setting the target value B model function as the ratio of the characteristic correlation index and the redundancy index, and maximizing the characteristic correlation index and the redundancy index;
and sequentially and circularly sorting the feature vectors with the highest scores until the range of the screened feature set extends to a predetermined limit value, outputting an optimal feature subset, and otherwise, repeating the steps.
According to the feature selection method, relevance analysis is performed on an original feature set, irrelevant features are removed, strong relevant features are reserved, the classifier is used for measuring the classification error rate of the selected features, an optimal feature subset with low redundancy among the features and high relevance between the features and predictive variables can be selected, and the introduced weighted relevance coefficient calculation method can measure the relevance among all types of variables.
Further, the category data matrix is C ═ {1,2,3,4,5 ·, C }, the target value a is calculated by mutual information maximization according to the correlation of each fault, and the fault number with the highest correlation score is extracted from the target value a and added to the final solution set;
the correlation algorithm of the fault is as follows:
Figure BDA0002626171780000031
in the formula, D is a mutual information value between features and categories, c is a category of the data set, and | S | is the number of the feature set.
Further, after the correlation of each fault is calculated by the mutual information maximization, performing cycle iteration on the remaining output characteristics, wherein the redundancy value between the output characteristics and the remaining characteristics is calculated according to the average minimum redundancy value;
the feature selection method requires that the correlation between each feature attribute is minimum, namely, the minimum redundancy principle, which is expressed by minimizing the mutual information between features as follows:
Figure BDA0002626171780000032
wherein R is the size of mutual information value between the features;
if the output feature subset contains a plurality of features, the average value of the output feature subset is regarded as the redundancy score, and the algorithm is as follows:
Figure BDA0002626171780000033
where P is the set of output features, xlTo output the feature vector, xiIs the ith feature vector.
Furthermore, the target value B model function is set as the ratio of the characteristic correlation index and the redundancy index and is maximized;
after two target values of each feature are calculated, determining non-dominant features;
a reference feature is called a non-dominant trait if the following conditions are met;
(1) if the target value A of the reference feature is greater than or equal to all other future target values A, the target value B of the reference feature is greater than or equal to all other target values B of the other features;
(2) if the target value a of the reference feature is greater than the target value a of all the other features and the target value B of the reference feature is less than the target value B of all the other features, and vice versa.
Further, the feature selection method includes a feature having the largest target value B among the non-dominant features into the output feature set. And searching for the remaining output characteristics by adopting a step-by-step increasing method. And sequentially and circularly sorting the feature vectors with the highest scores until the range of the screened feature set extends to a predetermined limit value, outputting an optimal feature subset, and otherwise, repeating the steps.
The feature selection method comprises the steps of performing correlation analysis on an original feature set, removing irrelevant features, reserving strong relevant features, and performing classification error rate measurement on the selected features through a classifier, so that an optimal feature subset with low redundancy among the features and high correlation degree between the features and a predictive variable can be selected; the complexity of the feature selection method is effectively reduced, and therefore the accuracy of the power distribution network fault data classification is improved.
Compared with the prior art, the invention has the advantages that:
1. according to the technical scheme, relevance analysis is carried out on an original feature set, irrelevant features are removed, strong relevant features are reserved, a classifier is used for carrying out classification error rate measurement on the selected features, and an optimal feature subset with low redundancy among the features and high relevance between the features and a predictive variable can be selected through a feature subset model function;
2. according to the technical scheme, the most effective characteristics can be found out from a plurality of characteristics through characteristic selection and optimization, redundant characteristics and repeated characteristics are eliminated, the complexity of the characteristic selection method can be effectively reduced, and therefore the accuracy of power distribution network fault data classification is improved;
3. according to the technical scheme, an improved maximum correlation minimum redundancy feature selection algorithm is adopted, and a weighting correlation coefficient calculation method is introduced, so that the correlation degree among all types of variables can be measured; the complexity of the feature selection method can be effectively reduced, and therefore the accuracy of the classification of the fault data of the power distribution network is improved.
Drawings
FIG. 1 is a schematic block flow diagram of the process of the present invention;
FIG. 2 is a comparison of failure prediction accuracy for the present invention versus an unoptimized method feature set.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
In fig. 1, the technical scheme of the invention comprises the following steps:
according to related data sources, quantifying certain influence factors of the power distribution network fault to enable the certain influence factors to become fault characteristic variables;
the data matrix is preprocessed and discretized according to the average value of each fault expression;
the output characteristic number n is provided by a user from the outside, and then a category data matrix is input;
for the target value A, calculating according to the correlation of each fault through mutual information maximization; then, circularly and repeatedly updating the residual output characteristics;
setting the target value B model function as the ratio of the characteristic correlation index and the redundancy index, and maximizing the characteristic correlation index and the redundancy index;
and sequentially and circularly sorting the feature vectors with the highest scores until the range of the screened feature set extends to a predetermined limit value, outputting an optimal feature subset, and otherwise, repeating the steps.
Thus, feature selection and optimization finds the most efficient feature from many features, and rejects redundant features, duplicate features, and the like. The method can effectively reduce the complexity of the feature selection method, thereby improving the accuracy of the classification of the fault data of the power distribution network.
Specifically, the category data matrix in the technical solution of the present invention is C ═ {1,2,3,4,5 ·, C }, and for the target value a, the correlation of each fault is calculated by mutual information maximization, and the fault number with the highest correlation score is extracted from the target value a and added to the final solution set.
The correlation algorithm of the fault is as follows:
Figure BDA0002626171780000051
in the formula, D is a mutual information value between features and categories, c is a category of the data set, and | S | is the number of the feature set.
And after the correlation of each power distribution network fault is calculated by maximizing mutual information, performing cycle iteration on the residual output characteristics, and calculating the redundancy value between the output characteristics and the residual characteristics according to the average minimum redundancy value.
The minimum correlation between each feature attribute is required, i.e. the minimum redundancy principle, which is expressed by the minimization of mutual information between features as follows:
Figure BDA0002626171780000052
wherein, R is the mutual information value size between the characteristics.
If the output feature subset contains a plurality of features, the average value of the output feature subset is regarded as the redundancy score, and the algorithm is as follows:
Figure BDA0002626171780000053
where P is the set of output features, xlTo output the feature vector, xiIs the ith feature vector.
The objective value B model function is set to the ratio of the characteristic correlation index and the redundancy index and maximized. After the two target values for each feature are calculated, the non-dominant features are then determined. A reference feature is called a non-dominant trait if the following conditions are met:
(1) if the target value A of the reference feature is greater than or equal to all other future target values A, the target value B of the reference feature is greater than or equal to all other target values B of the other features.
(2) If the target value a of the reference feature is greater than the target value a of all the other features and the target value B of the reference feature is less than the target value B of all the other features, and vice versa.
From the non-dominant features, the feature having the largest target value B is included in the output feature set. And searching for the remaining output characteristics by adopting a step-by-step increasing method. And sequentially and circularly sorting the feature vectors with the highest scores until the range of the screened feature set extends to a predetermined limit value, outputting an optimal feature subset, and otherwise, repeating the steps.
Obviously, the technical solution of the present invention aims to find a feature set that maximizes mutual information between features and multiple labels, and minimizes mutual information between features.
By implementing the technical scheme of the invention, the most relevant feature set with the least redundancy can be found, and the number of output features is provided by a user; the final feature set is terminated after repeated cycles in turn with the size of the final feature set equal to the user-specified feature limit value, with the most relevant features added to the empty final feature set, and then the features added to each iteration in an incremental manner.
Compared with the prior art, the technical scheme of the invention overcomes the defects that most of feature selection methods can effectively remove irrelevant features but cannot process redundant features, and also keeps the advantages of effectively reducing the complexity of the feature selection method, not reducing the generalization performance of the model and the like, thereby improving the accuracy of the classification of the fault data of the power distribution network.
In the technical scheme of the invention, the pseudo code of the maximum correlation minimum redundancy feature selection algorithm for non-dominant feature selection is as follows.
Figure BDA0002626171780000061
Figure BDA0002626171780000071
By analyzing the fault influence factors of the power distribution network, the power distribution network data of 162 feeders is investigated, data required by feeder fault prediction is extracted, 17 of distribution transformation capacity, monthly average load of the feeders, monthly maximum load of the feeders, fault time, N-N month fault number, monthly average air temperature, monthly average high (low) temperature, monthly thunderstorm day grading, monthly windstorm day grading, fuse average operation time, segmented cable average operation time, load switch average operation time, segmented insulated wire length, branch line average operation time, transformer average operation time, cable length, feeder branch line number and the like are selected for fault-related characteristic classification and quality characteristic set sorting, and then fault prediction effect comparison is carried out with fault prediction accuracy under an unoptimized method, as shown in fig. 2.
As can be seen from fig. 2, the two feature orderings of the technical scheme of the invention and the unoptimized method gradually increase the number of feeder fault features, and the fault prediction accuracy of the power distribution network is correspondingly improved. According to the technical scheme, when the number of the fault characteristics is 12, the fault prediction accuracy reaches a peak value, and then slightly decreases with the increase of the number of the characteristic quantities, and finally is maintained near a constant prediction accuracy and is overlapped with a prediction curve of an unoptimized method. This indicates that there are 5 redundant feature quantities in the feature corpus, some of which have even adverse effects on fault diagnosis, but rather reduce the prediction accuracy, and the graph results illustrate the necessity of feature optimization. In addition, the selection input of the preferred features reduces the amount of data in the library, reduces the training time and the running time required by the prediction model, and improves the efficiency of fault prediction.
Most current feature selection methods can effectively remove irrelevant features, but do not handle redundant features well. For example, a popular Relieff feature selection strategy is a random selection example, and a weight is set according to feature correlation of nearest neighbors, Relieff is the most successful strategy in feature selection, but only 3 redundant features can be selected from 17 fault feature classification vectors, and the method is obviously lower than the technical scheme of the invention.
In summary, compared with the prior art, the method can effectively reduce the complexity of the feature selection method, for example, the feature number is directly selected to be 12 according to the peak value of fig. 2, so that the classification accuracy of the fault data of the power distribution network is improved.
The technical scheme of the invention aims to find the characteristic set, so that mutual information between the characteristics and multiple marks is maximized, and the mutual information between the characteristics is minimized.
Therefore, the technical scheme of the invention can find the most relevant feature set with the least redundancy, and the number of output features is provided by a user; the final feature set is terminated after repeated cycles in turn with the size of the final feature set equal to the user-specified feature limit value, with the most relevant features added to the empty final feature set, and then the features added to each iteration in an incremental manner.
Compared with the prior art, the technical scheme of the invention overcomes the defects that most of feature selection methods can effectively remove irrelevant features but cannot process redundant features, and also keeps the advantages of effectively reducing the complexity of the feature selection method, not reducing the generalization performance of the model and the like, thereby improving the accuracy of the classification of the fault data of the power distribution network.
The invention can be widely applied to the field of electric power data analysis and processing.

Claims (7)

1. A characteristic selection method for power distribution network data optimization is characterized by comprising the following steps:
according to related data sources, quantifying certain influence factors of the power distribution network fault to enable the certain influence factors to become fault characteristic variables;
the data matrix is preprocessed and discretized according to the average value of each fault expression;
the output characteristic number n is provided by a user from the outside, and then a category data matrix is input;
for the target value A, calculating according to the correlation of each fault through mutual information maximization; then, circularly and repeatedly updating the residual output characteristics;
setting the target value B model function as the ratio of the characteristic correlation index and the redundancy index, and maximizing the characteristic correlation index and the redundancy index;
and sequentially and circularly sorting the feature vectors with the highest scores until the range of the screened feature set extends to a predetermined limit value, outputting an optimal feature subset, and otherwise, repeating the steps.
2. The feature selection method for power distribution network data optimization according to claim 1, wherein the feature selection method is characterized in that relevance analysis is performed on an original feature set to remove irrelevant features, strong relevant features are reserved, a classifier is used for measuring the classification error rate of the selected features, an optimal feature subset with low redundancy among the features and high relevance between the features and predictive variables can be selected, and the introduced weighted relevance coefficient calculation method can measure the relevance among various types of variables.
3. The feature selection method for power distribution network data optimization according to claim 1, wherein the category data matrix is C {1,2,3,4,5 ·, C }, and for the target value a, the correlation of each fault is calculated by mutual information maximization, and a fault number with the highest correlation score is extracted from the correlation and added to the final solution set;
the correlation algorithm of the fault is as follows:
Figure FDA0002626171770000011
in the formula, D is a mutual information value between features and categories, c is a category of the data set, and | S | is the number of the feature set.
4. The method of selecting characteristics for power distribution network data optimization according to claim 1, wherein after the correlation of each fault is calculated by the mutual information maximization, the remaining output characteristics are subjected to a loop iteration, where redundancy values between the output characteristics and the remaining characteristics are calculated as an average minimum redundancy value;
the feature selection method requires that the correlation between each feature attribute is minimum, namely, the minimum redundancy principle, which is expressed by minimizing the mutual information between features as follows:
Figure FDA0002626171770000012
wherein R is the size of mutual information value between the features;
if the output feature subset contains a plurality of features, the average value of the output feature subset is regarded as the redundancy score, and the algorithm is as follows:
Figure FDA0002626171770000021
where P is the output feature set,xlTo output the feature vector, xiIs the ith feature vector.
5. The method of selecting characteristics for power distribution network data optimization according to claim 1, wherein the target value B model function is set to a ratio of a characteristic correlation index and a redundancy index and maximized;
after two target values of each feature are calculated, determining non-dominant features;
a reference feature is called a non-dominant trait if the following conditions are met;
(1) if the target value A of the reference feature is greater than or equal to all other future target values A, the target value B of the reference feature is greater than or equal to all other target values B of the other features;
(2) if the target value a of the reference feature is greater than the target value a of all the other features and the target value B of the reference feature is less than the target value B of all the other features, and vice versa.
6. The method of selecting characteristics for power distribution network data optimization according to claim 1, wherein the method of selecting characteristics includes characteristics having a maximum target value B from the non-dominant characteristics into the output characteristic set. And searching for the remaining output characteristics by adopting a step-by-step increasing method. And sequentially and circularly sorting the feature vectors with the highest scores until the range of the screened feature set extends to a predetermined limit value, outputting an optimal feature subset, and otherwise, repeating the steps.
7. The feature selection method for power distribution network data optimization according to claim 1, wherein the feature selection method is characterized in that relevance analysis is performed on an original feature set to remove irrelevant features, strong relevant features are reserved, and a classifier is used for performing classification error rate measurement on the selected features to select an optimal feature subset with low redundancy among the features and high relevance between the features and a predictive variable; the complexity of the feature selection method is effectively reduced, and therefore the accuracy of the power distribution network fault data classification is improved.
CN202010797427.4A 2020-08-10 2020-08-10 Feature selection method for power distribution network data optimization Pending CN112085619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010797427.4A CN112085619A (en) 2020-08-10 2020-08-10 Feature selection method for power distribution network data optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010797427.4A CN112085619A (en) 2020-08-10 2020-08-10 Feature selection method for power distribution network data optimization

Publications (1)

Publication Number Publication Date
CN112085619A true CN112085619A (en) 2020-12-15

Family

ID=73735427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010797427.4A Pending CN112085619A (en) 2020-08-10 2020-08-10 Feature selection method for power distribution network data optimization

Country Status (1)

Country Link
CN (1) CN112085619A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076986A (en) * 2021-03-29 2021-07-06 西安交通大学 Photovoltaic fault arc characteristic selection method combining filtering type and packaging type evaluation strategies
CN115144807A (en) * 2022-09-05 2022-10-04 武汉格蓝若智能技术有限公司 Differential noise filtering and current-carrying grading current transformer online evaluation method and device
CN116881687A (en) * 2023-06-25 2023-10-13 国网冀北电力有限公司信息通信分公司 Power grid sensitive data identification method and device based on feature extraction
CN118010103A (en) * 2024-04-10 2024-05-10 天津市博川岩土工程有限公司 Intelligent monitoring method and system for equal-thickness cement soil stirring wall in severe cold environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318515A (en) * 2014-10-20 2015-01-28 西安电子科技大学 Hyper-spectral image wave band dimension descending method based on NNIA evolutionary algorithm
CN106778832A (en) * 2016-11-28 2017-05-31 华南理工大学 The semi-supervised Ensemble classifier method of high dimensional data based on multiple-objection optimization
CN111242204A (en) * 2020-01-07 2020-06-05 东北电力大学 Operation and maintenance management and control platform fault feature extraction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318515A (en) * 2014-10-20 2015-01-28 西安电子科技大学 Hyper-spectral image wave band dimension descending method based on NNIA evolutionary algorithm
CN106778832A (en) * 2016-11-28 2017-05-31 华南理工大学 The semi-supervised Ensemble classifier method of high dimensional data based on multiple-objection optimization
CN111242204A (en) * 2020-01-07 2020-06-05 东北电力大学 Operation and maintenance management and control platform fault feature extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MONALISA MANDAL ETAL: "An Improved Minimum Redundancy Maximum Relevance Approach for Feature Selection in Gene Expression Data", PROCEDIA TECHNOLOGY, pages 20 - 27 *
辜超等: "基于最大相关最小冗余准则的变压器故障诊断特征选择", 电工电能新技术, pages 84 - 88 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076986A (en) * 2021-03-29 2021-07-06 西安交通大学 Photovoltaic fault arc characteristic selection method combining filtering type and packaging type evaluation strategies
CN115144807A (en) * 2022-09-05 2022-10-04 武汉格蓝若智能技术有限公司 Differential noise filtering and current-carrying grading current transformer online evaluation method and device
CN115144807B (en) * 2022-09-05 2022-12-02 武汉格蓝若智能技术有限公司 Differential noise filtering and current-carrying grading current transformer online evaluation method and device
CN116881687A (en) * 2023-06-25 2023-10-13 国网冀北电力有限公司信息通信分公司 Power grid sensitive data identification method and device based on feature extraction
CN116881687B (en) * 2023-06-25 2024-04-05 国网冀北电力有限公司信息通信分公司 Power grid sensitive data identification method and device based on feature extraction
CN118010103A (en) * 2024-04-10 2024-05-10 天津市博川岩土工程有限公司 Intelligent monitoring method and system for equal-thickness cement soil stirring wall in severe cold environment

Similar Documents

Publication Publication Date Title
CN112085619A (en) Feature selection method for power distribution network data optimization
CN111199016B (en) Daily load curve clustering method for improving K-means based on DTW
CN113256066A (en) PCA-XGboost-IRF-based job shop real-time scheduling method
CN111338950A (en) Software defect feature selection method based on spectral clustering
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN112434743A (en) Fault identification method based on GIL metal particle partial discharge time domain waveform image
CN116821832A (en) Abnormal data identification and correction method for high-voltage industrial and commercial user power load
CN115345297A (en) Platform area sample generation method and system based on generation countermeasure network
CN115062696A (en) Feature selection method based on standardized class specific mutual information
CN116701919B (en) Optimization monitoring method and system for gyro-type hydrogen fuel generator
CN113033898A (en) Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network
CN111858672A (en) Improved KNN case reasoning retrieval algorithm
CN112149052A (en) Daily load curve clustering method based on PLR-DTW
CN116561569A (en) Industrial power load identification method based on EO feature selection and AdaBoost algorithm
CN115409317A (en) Transformer area line loss detection method and device based on feature selection and machine learning
CN114386311A (en) Operation and maintenance abnormal data enhancement method and equipment based on key performance indexes
Abakar et al. Application of genetic algorithm for feature selection in optimisation of SVMR model for prediction of yarn tenacity
CN114330485A (en) Power grid investment capacity prediction method based on PLS-SVM-GA algorithm
CN115687899B (en) Hybrid feature selection method based on high-dimensional spinning data
CN117131425B (en) Numerical control machine tool processing state monitoring method and system based on feedback data
CN115017125B (en) Data processing method and device for improving KNN method
CN117829822B (en) Power transformer fault early warning method and system
CN117473344A (en) Multidimensional time sequence clustering method combining neural network and self-organizing mapping network
CN117725486A (en) Machine learning-based non-standard data processing and identifying method and related device for functional material
CN116910579A (en) Variable working condition machining chatter monitoring method based on transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination