CN118152773A - Wind power data dimension reduction method based on paired tag supplementary feature selection - Google Patents

Wind power data dimension reduction method based on paired tag supplementary feature selection Download PDF

Info

Publication number
CN118152773A
CN118152773A CN202410258429.4A CN202410258429A CN118152773A CN 118152773 A CN118152773 A CN 118152773A CN 202410258429 A CN202410258429 A CN 202410258429A CN 118152773 A CN118152773 A CN 118152773A
Authority
CN
China
Prior art keywords
feature
candidate
features
tag
candidate feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410258429.4A
Other languages
Chinese (zh)
Inventor
张平
王云鹤
王光磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Technology
Original Assignee
Hebei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Technology filed Critical Hebei University of Technology
Priority to CN202410258429.4A priority Critical patent/CN118152773A/en
Publication of CN118152773A publication Critical patent/CN118152773A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Evolutionary Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a wind power data dimension reduction method based on paired tag supplementary feature selection, which comprises the steps of firstly obtaining wind power data features, and calculating the total amount of classification information provided by candidate features for a tag set; then, calculating the weight of the candidate feature providing classification information for the paired tags, the total amount of classification information provided by the candidate feature for all paired tags, and redundant information between the candidate feature and all selected features; finally, a feature importance evaluation criterion is provided, and an importance score of the candidate feature is calculated according to the criterion; adding the index of the candidate feature with the largest importance score into the selected feature set, and deleting the candidate feature from the wind power data feature set; repeating the steps until the number of the features in the selected feature set reaches a preset value, wherein all the features in the selected feature set are the selected features. The method utilizes the candidate features to provide the classification information weight for the paired tags to reevaluate the total classification information amount provided by each candidate feature for the paired tags, accurately quantifies the relationship between the candidate features and the tags, and improves the accuracy of feature selection.

Description

Wind power data dimension reduction method based on paired tag supplementary feature selection
Technical Field
The invention belongs to the technical field of wind power data processing, and particularly relates to a wind power data dimension reduction method based on paired tag supplementary feature selection.
Background
Feature selection plays a critical role in processing wind power data. In the face of huge data sets, selecting appropriate features can improve the efficiency and accuracy of the prediction model. In wind power data analysis, multi-label feature selection may involve various factors such as wind speed, wind direction, temperature, humidity, mechanical vibration, fan state, power grid data and the like, and by analyzing historical data, features with significant influence on wind power generation can be determined, so that accuracy of a prediction model is improved, and operation and power generation efficiency of a wind power plant are optimized. Accurate feature selection is helpful for establishing a more reliable prediction model, and provides a more reliable decision basis for development and management of the wind power industry.
The common evaluation criteria for multi-label feature selection include various metrics such as distance measurement, fuzzy set theory, information theory, etc. The method has the advantages that firstly, the method has objectivity and quantifiability, and the contribution of the characteristics to the target variable is accurately estimated through a mathematical method, so that the characteristic selection process is more scientific and reliable; secondly, the relevance among the features is considered, and the selection of the features with redundant information is avoided, so that the generalization capability of the model is improved. Thirdly, feature selection based on the information theory can improve the prediction performance and the interpretability of the model while reducing the computational complexity, and provides powerful support for interpretation of model results. Therefore, feature selection based on information theory plays an important role in data analysis and machine learning, and provides an effective path for model establishment and optimization.
For the measurement of the feature correlation, the existing multi-label feature selection method either ignores the information amount provided by the candidate feature to the paired labels or considers the information amount provided by the candidate feature as the paired labels to be the same, which leads to inaccurate feature evaluation and failure to accurately select the feature.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a wind power data dimension reduction method based on paired tag supplement feature selection.
The invention solves the technical problems by adopting the following technical scheme:
the wind power data dimension reduction method based on paired tag supplementary feature selection is characterized by comprising the following steps of:
Step 1: acquiring a wind power data feature set, and initializing the selected feature set;
Step 2: calculating the sum of mutual information between the candidate features and all the tags according to the formula (1) to obtain the total amount of classification information provided by the candidate tags for the tag set;
Wherein CIS (f k, L) represents the total amount of classification information provided by candidate feature f k for tag set L, and I (f k;li) represents mutual information between candidate feature f k and tag L i;
step 3: calculating weights of candidate features providing classification information for the paired tags according to formula (3);
Wherein w represents the weight of the candidate feature f k for providing classification information for the paired tag l i、lj, I (l i;lj;fk) represents ternary mutual information between the candidate feature f k and the paired tag l i、lj, and H (l i) and H (l j) are information entropy of the tag l i、lj;
Calculating the total amount of classification information provided by the candidate features for all pairs of tags according to formula (4);
Wherein CID (f k, L) represents the total amount of classification information provided by candidate feature f k for all pairs of tags in tag set L, and I (L j,lj;fk) represents joint mutual information between candidate feature f k and pairs of tags L i、lj;
step 4: calculating redundant information between the candidate feature and all selected features according to equation (5);
Wherein RI (f k, S) represents redundant information between candidate feature f k and all selected features, S represents a set of selected features, and I (f k;fj) represents mutual information between candidate feature f k and selected feature f j;
Step 5: calculating an importance score for the candidate feature according to the feature importance assessment criteria of equation (6);
Wherein J (f k) represents the importance score of the candidate feature f k;
adding the index of the candidate feature with the largest importance score into the selected feature set, and deleting the candidate feature from the wind power data feature set; repeating the steps until the number of the features in the selected feature set reaches a preset value, wherein all the features in the selected feature set are the selected features.
Compared with the prior art, the invention has the beneficial effects that:
The existing multi-label feature selection method has some limitations when measuring the feature correlation, and neglects the information quantity provided by the candidate features to the paired labels, or considers the information quantity provided by the candidate features to the paired labels to be the same, so that the importance evaluation of the candidate features is inaccurate, and the feature selection effect is not ideal. The method utilizes the candidate features to provide the classification information weight for the paired tags to reevaluate the total classification information provided by each candidate feature for the paired tags, considers the correlation and the importance among the paired tags, and can more accurately quantify the relation between the candidate features and the tags, thereby improving the effect of feature selection and the model performance. The method provided by the invention is beneficial to more accurately identifying the characteristics with key effects on multi-label prediction, and provides a more reliable basis for multi-label data analysis and predictive modeling.
Drawings
Fig. 1 is an overall flow chart of the present invention.
Detailed Description
The following description of specific embodiments is given by way of illustration only and not by way of limitation of the scope of the application.
The invention provides a wind power data dimension reduction method (abbreviated as method, see figure 1) based on paired tag supplementary feature selection, which comprises the following steps:
Step 1: collecting a wind power data characteristic set comprising information such as wind speed, ambient temperature, turbine cabin yaw angle and the like from a wind power plant, wherein the wind power data characteristic set is marked as F= { F 1,f2,…,fn }; wherein f 1,f2,…,fn represents a feature, and n represents the number of features; features in the wind power data feature set F are used as candidate features; initializing a selected feature set S, namely emptying the selected feature set S, and setting the feature quantity of the selected feature set S as K (K < n);
Step 2: the sum of mutual information between the candidate feature and all the tags in the tag set L is calculated, and the specific formula is as follows:
Wherein, CIS (f k, L) represents the total amount of classification information provided by candidate feature f k for tag set L, I (f k;li) represents mutual information between candidate feature f k and tag L i, and the calculation formula of the mutual information is:
Wherein X, Y denotes different random variables, p (X s,yq) denotes a joint distribution function of the random variables X and Y, and p (X s)、p(yq) denotes independent probability distribution functions of the random variables X and Y, respectively;
Step 3: calculating the total amount of classification information provided by candidate features for all pairs of tags;
Step 3.1: the candidate feature f k is calculated to provide the weight of classification information for each group of labels in the label set L, and the specific steps are as follows:
Wherein w represents the weight of candidate feature f k to provide classification information for paired tag l i、lj, and-1 < w <1; i (l i;lj;fk) represents ternary mutual information between candidate feature f k and paired tag l i、lj; h (l i) and H (l j) are information entropy of a label l i、lj and are used for normalization processing, and the values of the H (l i) and H (l j) are non-negative;
From knowledge of information theory correlations :I(li;lj;fk)=I(li,lj;fk)-I(fk;li)-I(fk;lj),, where I (I i,lj;fk) is joint mutual information between candidate feature f k and paired tag l i、lj, for calculating the amount of information provided by the candidate feature for the paired tag; i (f k;li) is the mutual information between candidate feature f k and tag l i, I (f k;lj) is the mutual information between candidate feature f k and tag l j, for calculating the amount of information provided by the candidate feature for a single tag; if 0< w <1, it indicates that I (l i,lj;fk)>I(fk;li)+I(fk;lj), i.e., the candidate feature provides a greater amount of information for the paired tag than for the two tags, respectively, so that the weight of I (l i,lj;fk) in the feature importance assessment is to be increased; if-1 < w <0, let I (l i,lj;fk)<I(fk;li)+I(fk;lj), the candidate feature, is that the amount of information provided by the paired tag is less than the amount of information provided by the two tags separately, so the weight of I (l i,lj;fk) in the feature importance assessment is reduced.
Step 3.2: the total amount of classification information provided by candidate features for all pairs of tags is calculated as follows:
wherein COD (f k, L) represents the total amount of classification information provided by the candidate feature f k for all pairs of tags in the tag set L;
step 4: the redundant information between the candidate feature and all the selected features is calculated, and the specific calculation formula is as follows:
Wherein RI (f k, S) represents redundant information between candidate feature f k and all selected features in the set of selected features S, and I (f k;fj) represents mutual information between candidate feature f k and selected feature f j;
Step 5: based on CIS (f k,L)、CID(fk,L)、RI(fk; S) and a maximum relevant minimum redundancy strategy, a feature importance evaluation criterion is provided, and an importance score of the candidate feature is calculated according to the feature importance evaluation criterion, specifically as follows:
Wherein J (f k) represents the importance score of the candidate feature f k;
Evaluating the importance of each candidate feature according to a feature importance evaluation criterion, adding the index of the candidate feature with the largest importance score into the selected feature set S, and deleting the candidate feature from the wind power data feature set F;
And repeating the step until the number of the features in the selected feature set S reaches the preset number, wherein all the selected features in the selected feature set S are finally selected features, and completing wind power data dimension reduction.
The invention is applicable to the prior art where it is not described.

Claims (1)

1. The wind power data dimension reduction method based on paired tag supplementary feature selection is characterized by comprising the following steps of:
Step 1: acquiring a wind power data feature set, and initializing the selected feature set;
Step 2: calculating the sum of mutual information between the candidate features and all the tags according to the formula (1) to obtain the total amount of classification information provided by the candidate tags for the tag set;
Wherein CIS (f k, L) represents the total amount of classification information provided by candidate feature f k for tag set L, and I (f k;li) represents mutual information between candidate feature f k and tag L i;
step 3: calculating weights of candidate features providing classification information for the paired tags according to formula (3);
Wherein w represents the weight of the candidate feature f k for providing classification information for the paired tag l i、lj, I (l i;lj;fk) represents ternary mutual information between the candidate feature f k and the paired tag l i、lj, and H (l i) and H (l j) are information entropy of the tag l i、lj;
Calculating the total amount of classification information provided by the candidate features for all pairs of tags according to formula (4);
Wherein CID (f k, L) represents the total amount of classification information provided by candidate feature f k for all pairs of tags in tag set L, and I (L i,lj;fk) represents joint mutual information between candidate feature f k and pairs of tags L i、lj;
step 4: calculating redundant information between the candidate feature and all selected features according to equation (5);
Wherein RI (f k, S) represents redundant information between candidate feature f k and all selected features, S represents a set of selected features, and I (f k;fj) represents mutual information between candidate feature f k and selected feature f j;
Step 5: calculating an importance score for the candidate feature according to the feature importance assessment criteria of equation (6);
Wherein J (f k) represents the importance score of the candidate feature f k;
adding the index of the candidate feature with the largest importance score into the selected feature set, and deleting the candidate feature from the wind power data feature set; repeating the steps until the number of the features in the selected feature set reaches a preset value, wherein all the features in the selected feature set are the selected features.
CN202410258429.4A 2024-03-07 2024-03-07 Wind power data dimension reduction method based on paired tag supplementary feature selection Pending CN118152773A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410258429.4A CN118152773A (en) 2024-03-07 2024-03-07 Wind power data dimension reduction method based on paired tag supplementary feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410258429.4A CN118152773A (en) 2024-03-07 2024-03-07 Wind power data dimension reduction method based on paired tag supplementary feature selection

Publications (1)

Publication Number Publication Date
CN118152773A true CN118152773A (en) 2024-06-07

Family

ID=91294149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410258429.4A Pending CN118152773A (en) 2024-03-07 2024-03-07 Wind power data dimension reduction method based on paired tag supplementary feature selection

Country Status (1)

Country Link
CN (1) CN118152773A (en)

Similar Documents

Publication Publication Date Title
CN110288136B (en) Wind power multi-step prediction model establishment method
CN112116198B (en) Data-driven process industrial state perception network key node screening method
Li et al. Deep spatio-temporal wind power forecasting
CN115358347B (en) Method for predicting remaining life of intelligent electric meter under different subsystems
Zou et al. Deep non-crossing probabilistic wind speed forecasting with multi-scale features
CN114021483A (en) Ultra-short-term wind power prediction method based on time domain characteristics and XGboost
CN112288157A (en) Wind power plant power prediction method based on fuzzy clustering and deep reinforcement learning
CN116470491A (en) Photovoltaic power probability prediction method and system based on copula function
CN113762591B (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning
CN112836876B (en) Power distribution network line load prediction method based on deep learning
CN108830405B (en) Real-time power load prediction system and method based on multi-index dynamic matching
CN117290673A (en) Ship energy consumption high-precision prediction system based on multi-model fusion
CN113033898A (en) Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network
CN117113086A (en) Energy storage unit load prediction method, system, electronic equipment and medium
CN118152773A (en) Wind power data dimension reduction method based on paired tag supplementary feature selection
CN111932008A (en) Ship photovoltaic output power prediction method applicable to different weather conditions
CN110909943A (en) Multi-scale multi-factor joint-driven monthly runoff probability forecasting method
CN117893030B (en) Power system risk early warning method based on big data
CN117526316B (en) Load prediction method based on GCN-CBAM-BiGRU combined model
CN118554433A (en) Photovoltaic power generation prediction method and device based on hybrid deep learning and storage medium
CN117422180A (en) Data mining technology thermal load prediction method based on combination of multiple machine learning algorithms
CN117252337A (en) Method, device and computer system capable of evaluating single variable time sequence building energy consumption prediction precision in advance
Xiao Construction of Financial Economic Cycle Early Warning System based on DE-Jaya Hybrid Optimization Algorithm
CN117650506A (en) Data-driven day-ahead wind power prediction method
CN117927322A (en) Method and system for optimizing efficiency based on gas-steam combined cycle

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination