CN118152773A - Wind power data dimension reduction method based on paired tag supplementary feature selection - Google Patents
Wind power data dimension reduction method based on paired tag supplementary feature selection Download PDFInfo
- Publication number
- CN118152773A CN118152773A CN202410258429.4A CN202410258429A CN118152773A CN 118152773 A CN118152773 A CN 118152773A CN 202410258429 A CN202410258429 A CN 202410258429A CN 118152773 A CN118152773 A CN 118152773A
- Authority
- CN
- China
- Prior art keywords
- feature
- candidate
- features
- tag
- candidate feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000011156 evaluation Methods 0.000 abstract description 7
- 238000007405 data analysis Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010248 power generation Methods 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 102100029469 WD repeat and HMG-box DNA-binding protein 1 Human genes 0.000 description 1
- 101710097421 WD repeat and HMG-box DNA-binding protein 1 Proteins 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- Evolutionary Biology (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Entrepreneurship & Innovation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Water Supply & Treatment (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a wind power data dimension reduction method based on paired tag supplementary feature selection, which comprises the steps of firstly obtaining wind power data features, and calculating the total amount of classification information provided by candidate features for a tag set; then, calculating the weight of the candidate feature providing classification information for the paired tags, the total amount of classification information provided by the candidate feature for all paired tags, and redundant information between the candidate feature and all selected features; finally, a feature importance evaluation criterion is provided, and an importance score of the candidate feature is calculated according to the criterion; adding the index of the candidate feature with the largest importance score into the selected feature set, and deleting the candidate feature from the wind power data feature set; repeating the steps until the number of the features in the selected feature set reaches a preset value, wherein all the features in the selected feature set are the selected features. The method utilizes the candidate features to provide the classification information weight for the paired tags to reevaluate the total classification information amount provided by each candidate feature for the paired tags, accurately quantifies the relationship between the candidate features and the tags, and improves the accuracy of feature selection.
Description
Technical Field
The invention belongs to the technical field of wind power data processing, and particularly relates to a wind power data dimension reduction method based on paired tag supplementary feature selection.
Background
Feature selection plays a critical role in processing wind power data. In the face of huge data sets, selecting appropriate features can improve the efficiency and accuracy of the prediction model. In wind power data analysis, multi-label feature selection may involve various factors such as wind speed, wind direction, temperature, humidity, mechanical vibration, fan state, power grid data and the like, and by analyzing historical data, features with significant influence on wind power generation can be determined, so that accuracy of a prediction model is improved, and operation and power generation efficiency of a wind power plant are optimized. Accurate feature selection is helpful for establishing a more reliable prediction model, and provides a more reliable decision basis for development and management of the wind power industry.
The common evaluation criteria for multi-label feature selection include various metrics such as distance measurement, fuzzy set theory, information theory, etc. The method has the advantages that firstly, the method has objectivity and quantifiability, and the contribution of the characteristics to the target variable is accurately estimated through a mathematical method, so that the characteristic selection process is more scientific and reliable; secondly, the relevance among the features is considered, and the selection of the features with redundant information is avoided, so that the generalization capability of the model is improved. Thirdly, feature selection based on the information theory can improve the prediction performance and the interpretability of the model while reducing the computational complexity, and provides powerful support for interpretation of model results. Therefore, feature selection based on information theory plays an important role in data analysis and machine learning, and provides an effective path for model establishment and optimization.
For the measurement of the feature correlation, the existing multi-label feature selection method either ignores the information amount provided by the candidate feature to the paired labels or considers the information amount provided by the candidate feature as the paired labels to be the same, which leads to inaccurate feature evaluation and failure to accurately select the feature.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a wind power data dimension reduction method based on paired tag supplement feature selection.
The invention solves the technical problems by adopting the following technical scheme:
the wind power data dimension reduction method based on paired tag supplementary feature selection is characterized by comprising the following steps of:
Step 1: acquiring a wind power data feature set, and initializing the selected feature set;
Step 2: calculating the sum of mutual information between the candidate features and all the tags according to the formula (1) to obtain the total amount of classification information provided by the candidate tags for the tag set;
Wherein CIS (f k, L) represents the total amount of classification information provided by candidate feature f k for tag set L, and I (f k;li) represents mutual information between candidate feature f k and tag L i;
step 3: calculating weights of candidate features providing classification information for the paired tags according to formula (3);
Wherein w represents the weight of the candidate feature f k for providing classification information for the paired tag l i、lj, I (l i;lj;fk) represents ternary mutual information between the candidate feature f k and the paired tag l i、lj, and H (l i) and H (l j) are information entropy of the tag l i、lj;
Calculating the total amount of classification information provided by the candidate features for all pairs of tags according to formula (4);
Wherein CID (f k, L) represents the total amount of classification information provided by candidate feature f k for all pairs of tags in tag set L, and I (L j,lj;fk) represents joint mutual information between candidate feature f k and pairs of tags L i、lj;
step 4: calculating redundant information between the candidate feature and all selected features according to equation (5);
Wherein RI (f k, S) represents redundant information between candidate feature f k and all selected features, S represents a set of selected features, and I (f k;fj) represents mutual information between candidate feature f k and selected feature f j;
Step 5: calculating an importance score for the candidate feature according to the feature importance assessment criteria of equation (6);
Wherein J (f k) represents the importance score of the candidate feature f k;
adding the index of the candidate feature with the largest importance score into the selected feature set, and deleting the candidate feature from the wind power data feature set; repeating the steps until the number of the features in the selected feature set reaches a preset value, wherein all the features in the selected feature set are the selected features.
Compared with the prior art, the invention has the beneficial effects that:
The existing multi-label feature selection method has some limitations when measuring the feature correlation, and neglects the information quantity provided by the candidate features to the paired labels, or considers the information quantity provided by the candidate features to the paired labels to be the same, so that the importance evaluation of the candidate features is inaccurate, and the feature selection effect is not ideal. The method utilizes the candidate features to provide the classification information weight for the paired tags to reevaluate the total classification information provided by each candidate feature for the paired tags, considers the correlation and the importance among the paired tags, and can more accurately quantify the relation between the candidate features and the tags, thereby improving the effect of feature selection and the model performance. The method provided by the invention is beneficial to more accurately identifying the characteristics with key effects on multi-label prediction, and provides a more reliable basis for multi-label data analysis and predictive modeling.
Drawings
Fig. 1 is an overall flow chart of the present invention.
Detailed Description
The following description of specific embodiments is given by way of illustration only and not by way of limitation of the scope of the application.
The invention provides a wind power data dimension reduction method (abbreviated as method, see figure 1) based on paired tag supplementary feature selection, which comprises the following steps:
Step 1: collecting a wind power data characteristic set comprising information such as wind speed, ambient temperature, turbine cabin yaw angle and the like from a wind power plant, wherein the wind power data characteristic set is marked as F= { F 1,f2,…,fn }; wherein f 1,f2,…,fn represents a feature, and n represents the number of features; features in the wind power data feature set F are used as candidate features; initializing a selected feature set S, namely emptying the selected feature set S, and setting the feature quantity of the selected feature set S as K (K < n);
Step 2: the sum of mutual information between the candidate feature and all the tags in the tag set L is calculated, and the specific formula is as follows:
Wherein, CIS (f k, L) represents the total amount of classification information provided by candidate feature f k for tag set L, I (f k;li) represents mutual information between candidate feature f k and tag L i, and the calculation formula of the mutual information is:
Wherein X, Y denotes different random variables, p (X s,yq) denotes a joint distribution function of the random variables X and Y, and p (X s)、p(yq) denotes independent probability distribution functions of the random variables X and Y, respectively;
Step 3: calculating the total amount of classification information provided by candidate features for all pairs of tags;
Step 3.1: the candidate feature f k is calculated to provide the weight of classification information for each group of labels in the label set L, and the specific steps are as follows:
Wherein w represents the weight of candidate feature f k to provide classification information for paired tag l i、lj, and-1 < w <1; i (l i;lj;fk) represents ternary mutual information between candidate feature f k and paired tag l i、lj; h (l i) and H (l j) are information entropy of a label l i、lj and are used for normalization processing, and the values of the H (l i) and H (l j) are non-negative;
From knowledge of information theory correlations :I(li;lj;fk)=I(li,lj;fk)-I(fk;li)-I(fk;lj),, where I (I i,lj;fk) is joint mutual information between candidate feature f k and paired tag l i、lj, for calculating the amount of information provided by the candidate feature for the paired tag; i (f k;li) is the mutual information between candidate feature f k and tag l i, I (f k;lj) is the mutual information between candidate feature f k and tag l j, for calculating the amount of information provided by the candidate feature for a single tag; if 0< w <1, it indicates that I (l i,lj;fk)>I(fk;li)+I(fk;lj), i.e., the candidate feature provides a greater amount of information for the paired tag than for the two tags, respectively, so that the weight of I (l i,lj;fk) in the feature importance assessment is to be increased; if-1 < w <0, let I (l i,lj;fk)<I(fk;li)+I(fk;lj), the candidate feature, is that the amount of information provided by the paired tag is less than the amount of information provided by the two tags separately, so the weight of I (l i,lj;fk) in the feature importance assessment is reduced.
Step 3.2: the total amount of classification information provided by candidate features for all pairs of tags is calculated as follows:
wherein COD (f k, L) represents the total amount of classification information provided by the candidate feature f k for all pairs of tags in the tag set L;
step 4: the redundant information between the candidate feature and all the selected features is calculated, and the specific calculation formula is as follows:
Wherein RI (f k, S) represents redundant information between candidate feature f k and all selected features in the set of selected features S, and I (f k;fj) represents mutual information between candidate feature f k and selected feature f j;
Step 5: based on CIS (f k,L)、CID(fk,L)、RI(fk; S) and a maximum relevant minimum redundancy strategy, a feature importance evaluation criterion is provided, and an importance score of the candidate feature is calculated according to the feature importance evaluation criterion, specifically as follows:
Wherein J (f k) represents the importance score of the candidate feature f k;
Evaluating the importance of each candidate feature according to a feature importance evaluation criterion, adding the index of the candidate feature with the largest importance score into the selected feature set S, and deleting the candidate feature from the wind power data feature set F;
And repeating the step until the number of the features in the selected feature set S reaches the preset number, wherein all the selected features in the selected feature set S are finally selected features, and completing wind power data dimension reduction.
The invention is applicable to the prior art where it is not described.
Claims (1)
1. The wind power data dimension reduction method based on paired tag supplementary feature selection is characterized by comprising the following steps of:
Step 1: acquiring a wind power data feature set, and initializing the selected feature set;
Step 2: calculating the sum of mutual information between the candidate features and all the tags according to the formula (1) to obtain the total amount of classification information provided by the candidate tags for the tag set;
Wherein CIS (f k, L) represents the total amount of classification information provided by candidate feature f k for tag set L, and I (f k;li) represents mutual information between candidate feature f k and tag L i;
step 3: calculating weights of candidate features providing classification information for the paired tags according to formula (3);
Wherein w represents the weight of the candidate feature f k for providing classification information for the paired tag l i、lj, I (l i;lj;fk) represents ternary mutual information between the candidate feature f k and the paired tag l i、lj, and H (l i) and H (l j) are information entropy of the tag l i、lj;
Calculating the total amount of classification information provided by the candidate features for all pairs of tags according to formula (4);
Wherein CID (f k, L) represents the total amount of classification information provided by candidate feature f k for all pairs of tags in tag set L, and I (L i,lj;fk) represents joint mutual information between candidate feature f k and pairs of tags L i、lj;
step 4: calculating redundant information between the candidate feature and all selected features according to equation (5);
Wherein RI (f k, S) represents redundant information between candidate feature f k and all selected features, S represents a set of selected features, and I (f k;fj) represents mutual information between candidate feature f k and selected feature f j;
Step 5: calculating an importance score for the candidate feature according to the feature importance assessment criteria of equation (6);
Wherein J (f k) represents the importance score of the candidate feature f k;
adding the index of the candidate feature with the largest importance score into the selected feature set, and deleting the candidate feature from the wind power data feature set; repeating the steps until the number of the features in the selected feature set reaches a preset value, wherein all the features in the selected feature set are the selected features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410258429.4A CN118152773A (en) | 2024-03-07 | 2024-03-07 | Wind power data dimension reduction method based on paired tag supplementary feature selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410258429.4A CN118152773A (en) | 2024-03-07 | 2024-03-07 | Wind power data dimension reduction method based on paired tag supplementary feature selection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118152773A true CN118152773A (en) | 2024-06-07 |
Family
ID=91294149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410258429.4A Pending CN118152773A (en) | 2024-03-07 | 2024-03-07 | Wind power data dimension reduction method based on paired tag supplementary feature selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118152773A (en) |
-
2024
- 2024-03-07 CN CN202410258429.4A patent/CN118152773A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110288136B (en) | Wind power multi-step prediction model establishment method | |
CN112116198B (en) | Data-driven process industrial state perception network key node screening method | |
Li et al. | Deep spatio-temporal wind power forecasting | |
CN115358347B (en) | Method for predicting remaining life of intelligent electric meter under different subsystems | |
Zou et al. | Deep non-crossing probabilistic wind speed forecasting with multi-scale features | |
CN114021483A (en) | Ultra-short-term wind power prediction method based on time domain characteristics and XGboost | |
CN112288157A (en) | Wind power plant power prediction method based on fuzzy clustering and deep reinforcement learning | |
CN116470491A (en) | Photovoltaic power probability prediction method and system based on copula function | |
CN113762591B (en) | Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning | |
CN112836876B (en) | Power distribution network line load prediction method based on deep learning | |
CN108830405B (en) | Real-time power load prediction system and method based on multi-index dynamic matching | |
CN117290673A (en) | Ship energy consumption high-precision prediction system based on multi-model fusion | |
CN113033898A (en) | Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network | |
CN117113086A (en) | Energy storage unit load prediction method, system, electronic equipment and medium | |
CN118152773A (en) | Wind power data dimension reduction method based on paired tag supplementary feature selection | |
CN111932008A (en) | Ship photovoltaic output power prediction method applicable to different weather conditions | |
CN110909943A (en) | Multi-scale multi-factor joint-driven monthly runoff probability forecasting method | |
CN117893030B (en) | Power system risk early warning method based on big data | |
CN117526316B (en) | Load prediction method based on GCN-CBAM-BiGRU combined model | |
CN118554433A (en) | Photovoltaic power generation prediction method and device based on hybrid deep learning and storage medium | |
CN117422180A (en) | Data mining technology thermal load prediction method based on combination of multiple machine learning algorithms | |
CN117252337A (en) | Method, device and computer system capable of evaluating single variable time sequence building energy consumption prediction precision in advance | |
Xiao | Construction of Financial Economic Cycle Early Warning System based on DE-Jaya Hybrid Optimization Algorithm | |
CN117650506A (en) | Data-driven day-ahead wind power prediction method | |
CN117927322A (en) | Method and system for optimizing efficiency based on gas-steam combined cycle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |