CN116561554A

CN116561554A - Feature extraction method, system, equipment and medium of boiler soot blower

Info

Publication number: CN116561554A
Application number: CN202310418102.4A
Authority: CN
Inventors: 李德波; 陈拓; 陈智豪; 陈兆立; 金凤雏; 王广雷; 冯永新; 宋景慧
Original assignee: China Southern Power Grid Power Technology Co Ltd
Current assignee: China Southern Power Grid Power Technology Co Ltd
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-08-08

Abstract

The invention discloses a feature extraction method, a system, equipment and a medium of a boiler soot blower. And carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, generating a feature deletion set and an intermediate feature subset, and counting the iteration times in real time. And calculating the prediction precision corresponding to the intermediate feature subset by adopting a random forest algorithm, comparing the prediction precision and the iteration times with corresponding preset thresholds respectively, and determining the feature set corresponding to the boiler soot blower by combining the feature deletion set. The existing characteristic data set is dynamically adjusted, the characteristic extraction quality of the coal-fired power plant boiler soot blower is improved, the characteristic set required by the coal-fired power plant boiler soot blower can be effectively extracted, the training time of a boiler soot blower model is reduced, and the prediction accuracy of the boiler soot blower model is improved.

Description

Feature extraction method, system, equipment and medium of boiler soot blower

Technical Field

The invention relates to the technical field of feature extraction of boiler soot blowers, in particular to a method, a system, equipment and a medium for feature extraction of a boiler soot blower.

Background

The soot blower of the coal-fired power plant boiler is a complex and multi-parameter multivariable mutual coupling system, coal is a mixture and generally contains some mineral substances and inorganic substances, and the coal gradually deposits ash and slag on a heating surface of the boiler in the process of burning in the boiler, however, when the heating surface is stained with ash and slag, the safe and stable operation of the boiler soot blower is often not facilitated. Therefore, it is necessary to judge the ash deposition and slag formation in the boiler by the combustion conditions, thereby improving the combustion efficiency of the boiler and reducing the damage of slag formation to the boiler. When the boiler burns, the health state of the boiler heating surface can be modeled by monitoring the data obtained by the boiler burning, but in practical application, the number of variables for describing the accumulated ash of the boiler system is often hundreds to thousands, if the characterization parameters of the accumulated ash state are too large, the calculated amount of the model is too large, and the accuracy of the model is easily reduced due to the fitting phenomenon, so that the modeling of the accumulated ash state of the boiler is particularly important by adopting a proper feature extraction method to select key features.

Currently, feature extraction methods for coal-fired power plant boiler soot blowers in the industry include traditional feature extraction methods based on artificial experience and feature extraction methods based on artificial intelligence. The traditional characteristic extraction method based on artificial experience combines the principles of the heat flow rate in the hearth, the heat transfer mechanism theory and the like, and according to the change analysis of the soot blowing consumption of the related parameters of the boiler combustion process, the main characteristic parameters of the modeling of the heating surface of the boiler are manually selected, and the method is difficult to realize the rapid and accurate characteristic manual screening by the artificial experience.

Therefore, an artificial intelligence-based feature extraction method is generally adopted for feature selection of the soot blowers of the coal-fired power plant boiler. The coal-fired power plant soot blower characteristic extraction method based on the artificial intelligence technology does not need an accurate physical model between the optimization characteristics and the optimization targets, and can well solve the nonlinear complex characteristic extraction problem by mining a large number of collected unit historical data characteristics. However, the feature extraction method of the boiler soot blower does not combine with an optimization algorithm to obtain an optimal feature subset, and redundant information of high-dimensional features is not removed, so that the quality of the extracted features is poor.

Disclosure of Invention

The invention provides a feature extraction method, a system, equipment and a medium of a boiler soot blower, which solve the problems that the existing feature extraction method of the boiler soot blower does not combine with an optimization algorithm to obtain an optimal feature subset, and redundant information of high-dimensional features is not removed, so that the quality of the extracted features is poor.

The invention provides a feature extraction method of a boiler soot blower, which comprises the following steps:

acquiring an initial historical characteristic data set of a boiler soot blower;

performing feature dimension reduction on the initial historical feature data set by adopting a Pearson correlation coefficient to generate a target historical feature data set;

carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, generating a feature deletion set and an intermediate feature subset, and counting the iteration times in real time;

calculating the prediction precision corresponding to the intermediate feature subset by adopting a random forest algorithm;

and comparing the prediction precision and the iteration times with corresponding preset thresholds respectively, and determining a feature set corresponding to the boiler soot blower by combining the feature deletion set.

Optionally, the initial historical feature data set includes a plurality of initial historical feature data; the step of performing feature dimension reduction on the initial historical feature data set by adopting the pearson correlation coefficient to generate a target historical feature data set comprises the following steps:

Respectively calculating corresponding correlation coefficients between the initial historical characteristic data by adopting pearson correlation coefficients;

judging whether the absolute value of the correlation coefficient meets a preset correlation coefficient interval or not;

if yes, selecting one of a plurality of initial historical feature data corresponding to the correlation coefficient as target historical feature data according to a preset feature extraction standard;

if not, taking all initial historical characteristic data corresponding to the correlation coefficient as target historical characteristic data;

and constructing a target historical characteristic data set by adopting all the target historical characteristic data.

Optionally, the step of performing feature extraction on the target historical feature dataset by using a particle swarm algorithm to generate a feature deletion set, an intermediate feature subset and counting the iteration number in real time includes:

initializing the target historical characteristic data set in a population to generate an initial characteristic subset;

calculating the fitness corresponding to each feature in the initial feature subset by adopting a preset fitness formula;

the preset fitness formula is as follows:

wherein fit is the fitness, err is the error rate, dimension is the feature number corresponding to the initial feature subset, and D is the total feature number corresponding to the target historical feature data set;

Adopting all the characteristics of which the fitness does not meet the corresponding preset fitness threshold value to construct a characteristic deletion set;

and removing the features corresponding to the feature deletion set in the initial feature subset, generating an intermediate feature subset and counting the iteration times in real time.

Optionally, the step of calculating the prediction precision corresponding to the intermediate feature subset by adopting a random forest algorithm includes:

encoding the intermediate feature subset to generate a feature vector subset;

respectively selecting a plurality of characteristic vectors in the characteristic vector sub-sets to construct a plurality of training sets and test sets;

performing model construction by adopting all the training sets and the test sets to generate a random forest model;

and inputting the feature vector subset into the random forest model for data evaluation, and generating the prediction precision corresponding to the intermediate feature subset.

Optionally, the step of constructing a model by using all the training set and the test set to generate a random forest model includes:

respectively carrying out decision tree training on each training set by adopting a recursion splitting method to generate a decision tree corresponding to the training set;

testing the decision tree and the corresponding test set respectively to generate prediction data corresponding to the training set;

Judging whether all the predicted data meet a preset accurate threshold value or not;

if yes, constructing a random forest model by adopting all the decision trees;

if not, the step of respectively selecting a plurality of characteristic vectors in the characteristic vector sub-sets and constructing a plurality of training sets and test sets is carried out in a jumping manner until all the predicted data meet a preset accurate threshold value.

Optionally, the preset threshold includes a preset scoring standard, an iteration number threshold and an fitness threshold; the step of comparing the prediction precision and the iteration times with corresponding preset thresholds respectively and determining the feature set corresponding to the boiler soot blower by combining the feature deletion set comprises the following steps:

judging whether the prediction precision meets the preset scoring standard or not;

if yes, the intermediate feature subset is used as a target feature subset;

if not, constructing a target feature subset by adopting the intermediate feature subset and the feature deletion set;

judging whether the iteration times meet the iteration times threshold;

if yes, the target feature subset is used as a feature set corresponding to the boiler soot blower;

if not, determining a feature set corresponding to the boiler soot blower according to the subset fitness corresponding to the target feature subset and the fitness threshold.

Optionally, the step of determining the feature set corresponding to the boiler soot blower according to the subset fitness corresponding to the target feature subset and the fitness threshold value includes:

judging whether the subset fitness corresponding to the target feature subset meets the fitness threshold value or not;

if not, updating the speed and the position of each target feature in the target feature subset to generate a particle feature set;

and taking the particle characteristic set as the target historical characteristic data set, jumping to execute the step of carrying out characteristic extraction on the target historical characteristic data set by adopting a particle swarm algorithm, generating a characteristic deleting set, a middle characteristic subset and counting the iteration times in real time.

The invention also provides a feature extraction system of the boiler soot blower, which comprises:

the initial historical characteristic data set acquisition module is used for acquiring an initial historical characteristic data set of the boiler soot blower;

the target historical characteristic data set generation module is used for carrying out characteristic dimension reduction on the initial historical characteristic data set by adopting a Pearson correlation coefficient to generate a target historical characteristic data set;

The feature deleting set, the intermediate feature subset and the iteration number generating module are used for carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, generating a feature deleting set and an intermediate feature subset, and counting the iteration number in real time;

the prediction precision calculation module is used for calculating the prediction precision corresponding to the intermediate feature subset by adopting a random forest algorithm;

and the feature set determining module is used for comparing the prediction precision and the iteration times with corresponding preset thresholds respectively and determining a feature set corresponding to the boiler soot blower by combining the feature deletion set.

The invention also provides an electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the computer program when executed by the processor causes the processor to execute the steps of the feature extraction method for realizing the soot blower of any boiler.

The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed implements a method of feature extraction for a boiler soot blower as any one of the above.

From the above technical scheme, the invention has the following advantages:

According to the method, the initial historical characteristic data set of the boiler soot blower is obtained, and the initial historical characteristic data set is subjected to characteristic dimension reduction by adopting the Pelson correlation coefficient, so that the target historical characteristic data set is generated. And carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, generating a feature deletion set and an intermediate feature subset, and counting the iteration times in real time. And calculating the prediction precision corresponding to the intermediate feature subset by adopting a random forest algorithm, comparing the prediction precision and the iteration times with corresponding preset thresholds respectively, and determining the feature set corresponding to the boiler soot blower by combining the feature deletion set. The method solves the technical problems that the existing feature extraction method of the boiler soot blower does not combine with an optimization algorithm to obtain an optimal feature subset, and redundant information of high-dimensional features is not removed, so that the quality of the extracted features is poor. And the pearson correlation coefficient is used for carrying out correlation coefficient analysis, so that a large part of characteristic parameters of the correlation coefficient are removed. And then extracting the features by using a particle swarm algorithm, and scoring the importance of the features by using a random forest algorithm. The existing characteristic data set is dynamically adjusted, the characteristic extraction quality of the coal-fired power plant boiler soot blower is improved, the characteristic set required by the coal-fired power plant boiler soot blower can be effectively extracted, the training time of a boiler soot blower model is reduced, and the prediction accuracy of the boiler soot blower model is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flow chart of steps of a method for extracting features of a soot blower of a boiler according to a first embodiment of the present invention;

FIG. 2 is a flow chart of steps of a method for extracting features of a soot blower of a boiler according to a second embodiment of the present invention;

FIG. 3 is a block flow diagram of a method for extracting features of a soot blower of a boiler according to a second embodiment of the present invention;

FIG. 4 is a block diagram of a feature extraction system for a soot blower of a boiler in accordance with a third embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a feature extraction method, a system, equipment and a medium of a boiler soot blower, which are used for solving the technical problems that the existing feature extraction method of the boiler soot blower does not combine an optimization algorithm to obtain an optimal feature subset, and redundant information of high-dimensional features is not removed, so that the quality of the extracted features is poor.

In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for extracting features of a soot blower of a boiler according to an embodiment of the present invention.

The first embodiment of the invention provides a feature extraction method of a boiler soot blower, which comprises the following steps:

step 101, acquiring an initial historical characteristic data set of the boiler soot blower.

In an embodiment of the invention, an initial historical feature dataset of the boiler soot blower is obtained by collecting 15000 pieces of sample data from a Distributed Control System (DCS). The initial historical characteristic data set comprises 44 characteristic data of unit load, coal supply quantity, water supply flow, exhaust gas temperature, EF2 layer auxiliary wind 1 horn, F layer fuel wind 1 horn and the like, wherein the controllable variables are as follows: coal supply amount C, water supply flow F, primary air quantity (A1, A2, A3), secondary air quantity (S1-S22), primary air pressure (W1, W2); state variables: the unit load L, the oxygen content (O1-O7) at the outlet of the economizer, the oxygen content (S) of the flue gas and the flue gas temperature (T1-T6); output variable: furnace outlet NOx emissions (NX).

And 102, performing feature dimension reduction on the initial historical feature data set by adopting the Pearson correlation coefficient to generate a target historical feature data set.

In the embodiment of the invention, the pearson correlation coefficient is adopted to respectively calculate the corresponding correlation coefficient between the initial historical characteristic data. Judging whether the absolute value of the correlation coefficient meets a preset correlation coefficient interval or not, if so, selecting one from a plurality of initial historical characteristic data corresponding to the correlation coefficient as target historical characteristic data according to a preset characteristic extraction standard. If not, all initial historical characteristic data corresponding to the correlation numbers are used as target historical characteristic data. And finally, constructing a target historical characteristic data set by adopting all target historical characteristic data.

And 103, carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, generating a feature deletion set and an intermediate feature subset, and counting the iteration times in real time.

In the embodiment of the invention, the target historical characteristic data set is subjected to population initialization to generate an initial characteristic subset. And calculating the fitness corresponding to each feature in the initial feature subset by adopting a preset fitness formula, and constructing a feature deletion set by adopting the features of which all fitness does not meet the preset fitness threshold. And removing the features corresponding to the feature deletion set in the initial feature subset, generating an intermediate feature subset and counting the iteration times in real time.

And 104, calculating the prediction precision corresponding to the middle feature subset by adopting a random forest algorithm.

In the embodiment of the invention, the intermediate feature subset is encoded to generate a feature vector subset, and a plurality of feature vectors in the feature vector subset are respectively selected to construct a plurality of training sets and test sets. And (3) carrying out model construction by adopting all training sets and test sets, generating a random forest model, inputting the feature vector subsets into the random forest model for data evaluation, and generating the prediction precision corresponding to the intermediate feature subsets.

And 105, respectively comparing the prediction precision and the iteration times with corresponding preset thresholds, and determining a feature set corresponding to the boiler soot blower by combining the feature deletion set.

The preset threshold comprises a preset scoring standard, an iteration number threshold and an adaptability threshold. The preset scoring criteria refers to R2 scoring criteria. The iteration number threshold is a critical value of iteration update times by adopting a particle algorithm. The fitness threshold is a critical value that needs to be met by the fitness corresponding to each particle in the target feature subset.

In the embodiment of the invention, whether the prediction precision meets the preset scoring standard is judged, and if yes, the intermediate feature subset is used as the target feature subset. If not, constructing a target feature subset by adopting the intermediate feature subset and the feature deletion set. Judging whether the iteration times meet the iteration times threshold, if so, taking the target feature subset as a feature set corresponding to the boiler soot blower. If not, determining the feature set corresponding to the boiler soot blower based on the subset fitness and the fitness threshold corresponding to the target feature subset.

In the embodiment of the invention, the initial historical characteristic data set of the boiler soot blower is obtained, and the initial historical characteristic data set is subjected to characteristic dimension reduction by adopting the Pelson correlation coefficient, so that the target historical characteristic data set is generated. And carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, generating a feature deletion set and an intermediate feature subset, and counting the iteration times in real time. And calculating the prediction precision corresponding to the intermediate feature subset by adopting a random forest algorithm, comparing the prediction precision and the iteration times with corresponding preset thresholds respectively, and determining the feature set corresponding to the boiler soot blower by combining the feature deletion set. The method solves the technical problems that the existing feature extraction method of the boiler soot blower does not combine with an optimization algorithm to obtain an optimal feature subset, and redundant information of high-dimensional features is not removed, so that the quality of the extracted features is poor. And carrying out correlation analysis by using the Pearson correlation coefficient, and eliminating a large part of characteristic parameters of the correlation coefficient. Then adopting a PSO-RF based bidirectional feature selection method, wherein a particle swarm optimization algorithm (Particle Swarm Optimization, PSO) has good global searching capability and is used for searching an optimal feature subset in a feature space; random Forest (RF) algorithms are used to evaluate the merits of feature subsets. The existing characteristic data set is dynamically adjusted, the characteristic extraction quality of the coal-fired power plant boiler soot blower is improved, the characteristic set required by the coal-fired power plant boiler soot blower can be effectively extracted, the training time of a boiler soot blower model is reduced, and the prediction accuracy of the boiler soot blower model is improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for extracting features of a soot blower of a boiler according to a second embodiment of the present invention.

The feature extraction method of the soot blower of the boiler provided by the second embodiment of the invention comprises the following steps:

step 201, acquiring an initial historical characteristic data set of a boiler soot blower.

In the embodiment of the present invention, the implementation process of step 201 is similar to that of step 101, and will not be repeated here.

And 202, performing feature dimension reduction on the initial historical feature data set by adopting the Pearson correlation coefficient to generate a target historical feature data set.

Further, the initial set of historical feature data includes a plurality of initial historical feature data, and step 202 may include the sub-steps S11-S15 of:

s11, calculating corresponding correlation coefficients among the initial historical characteristic data by adopting the Pearson correlation coefficients.

And S12, judging whether the absolute value of the correlation coefficient meets a preset correlation coefficient interval.

And S13, if so, selecting one from a plurality of initial historical characteristic data corresponding to the correlation coefficient as target historical characteristic data according to a preset characteristic extraction standard.

And S14, if not, taking all initial historical characteristic data corresponding to the correlation numbers as target historical characteristic data.

S15, constructing a target historical characteristic data set by adopting all target historical characteristic data.

The preset correlation coefficient interval means that the correlation coefficient corresponding to the initial historical characteristic data is extremely strong. The correlation coefficient is very weak between 0.0 and 0.2, weak between 0.2 and 0.4, medium correlation between 0.4 and 0.6, strong correlation between 0.6 and 0.8, and very strong correlation between 0.8 and 1.0.

The preset feature extraction standard refers to one of two initial historical feature data corresponding to the correlation coefficient when the absolute value of the correlation coefficient is within a preset correlation coefficient interval according to actual needs. It may be arranged to retain the initial historical feature data with the smallest overall correlation coefficient of the two initial historical feature data.

In the embodiment of the invention, firstly, pearson correlation coefficient is adopted by a pearson correlation coefficient analysis method, namely, the correlation degree between initial historical characteristic data is calculated, and the initial historical characteristic data with higher correlation coefficient after calculation is selected, so that a target historical characteristic data set with lower correlation coefficient is obtained. By collecting 15000 sample data from a Distributed Control System (DCS), the initial historical characteristic data set includes 44 characteristic data of unit load, coal supply, water supply flow, exhaust temperature, EF2 layer auxiliary wind 1 horn, F layer fuel wind 1 horn, etc., wherein the controllable variables are: coal supply amount C, water supply flow F, primary air quantity (A1, A2, A3), secondary air quantity (S1-S22), primary air pressure (W1, W2); state variables: the unit load L, the oxygen content (O1-O7) at the outlet of the economizer, the oxygen content (S) of the flue gas and the flue gas temperature (T1-T6); output variable: furnace outlet NOx emissions (NX). And the thermodynamic diagram is adopted to show the correlation strength between the initial historical characteristic data, the correlation coefficient represents the correlation degree between the characteristics, the range is [ -1,1], and the positive and negative represent the correlation direction. The larger the absolute value, the stronger the correlation between the data sets. Feature dimension reduction is carried out on a given initial historical feature data set, so that the fact that the S1 and the S2, the S7 and the S6, the S13 and the S14, the S17 and the S19, the T4 and the T5 and the T6 have extremely strong correlations is obtained, one of the variables with extremely strong correlations is reserved between every two variables, and therefore the S1, the S7, the S13, the S17 and the T4 are reserved finally, and the S2, the S6, the S14, the S19, the T5 and the T6 are removed.

And 203, carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, generating a feature deletion set and an intermediate feature subset, and counting the iteration times in real time.

Further, step 203 may comprise the following sub-steps S21-S24:

s21, initializing the population of the target historical characteristic data set to generate an initial characteristic subset.

S22, calculating the fitness corresponding to each feature in the initial feature subset by adopting a preset fitness formula.

The preset fitness formula is as follows:

wherein fit is the fitness, err is the error rate, dimension is the feature number corresponding to the initial feature subset, and D is the total feature number corresponding to the target historical feature data set.

S23, constructing a feature deletion set by adopting features with all fitness not meeting the corresponding preset fitness threshold.

S24, removing the features corresponding to the feature deletion set in the initial feature subset, generating an intermediate feature subset and counting the iteration times in real time.

The preset fitness threshold value refers to a critical value for screening the optimal features.

In the embodiment of the invention, in the particle swarm algorithm, only 1 or 0,1 can be taken from each dimension of the characteristic value of the particles, wherein 1 indicates that the dimension characteristic is selected in the iteration of the round, 0 indicates that the dimension characteristic is discarded in the iteration of the round, and a characteristic selection result in the iteration of the round can be obtained for the population of the whole particles. At the beginning of the particle swarm algorithm, an initial value is assigned to all particles. The specific initial interval is [0,1], for the convenience of statistics and observation, when the initial value is greater than 0.5, the initial interval is assigned to 1, when the initial value is less than or equal to 0.5, the initial interval is assigned to 0, the record is carried out through x (i, j), i represents particles of the ith row, and j represents the jth dimension characteristic. It can be obtained that in the initialization process, the total number of the selected features accounts for 50% of the original features, and the general feature selection requirements are met. The specific formula is as follows:

From the practical situation, the feature selection needs to consider not only the dimension of the final feature selection, but also the accuracy of sample classification, and the two targets are used for jointly judging the selected goodness of the particles, so that the accuracy of the established fitness function accounts for 0.8, the feature dimension is 0.2, and the method is particularly shown in a preset fitness formula.

In order to avoid deviation of results caused by the fact that particles are trapped locally, random disturbance is carried out on the particles in the particle selection process, and the particle trapping local optimization is avoided, the specific process is that the value of x (i, j) is changed, the total number of the change accounts for 5% of the whole matrix, meanwhile, a random disturbance variable randper and the random number with the random number of disturbance times randcount are set in the iteration process, the random number value is 0 or 1, the initial value of the randcount is 0, and if the generated randper is 1, and the ratio of the random number to the total iteration times is not more than 30%, the random disturbance is carried out, otherwise, the random disturbance is not carried out. The specific modification formula is as follows:

therefore, the population is initialized through the target historical characteristic data set, an initial characteristic subset is generated, and the fitness corresponding to each characteristic in the initial characteristic subset is calculated by adopting a preset fitness formula. And constructing a feature deletion set by adopting the features of which all fitness does not meet the corresponding preset fitness threshold. Removing the features which are the same as the feature deletion set in the initial feature subset, generating an intermediate feature subset, and counting the iteration times in real time, namely searching from the feature full set by a particle swarm algorithm, deleting the features with the lowest importance from the initial feature subset each time, searching for a better feature subset, and obtaining the intermediate feature subset.

Step 204, the intermediate feature subset is encoded to generate a feature vector subset.

In an embodiment of the invention, each intermediate feature in the intermediate feature subset is encoded and converted into a feature vector that can be used in a machine learning algorithm. And constructing a feature vector subset by adopting all feature vectors.

Step 205, respectively selecting a plurality of feature vectors in the feature vector subsets, and constructing a plurality of training sets and testing sets.

In the embodiment of the invention, according to the preset extraction requirement, a part of feature vectors are extracted from the feature vector subsets respectively to construct a plurality of training sets, and according to the preset selection requirement, a part of features are randomly selected as candidate features to construct a plurality of test sets.

And 206, constructing a model by adopting all training sets and test sets to generate a random forest model.

Further, step 206 may include the following substeps S31-S35:

s31, respectively carrying out decision tree training on each training set by adopting a recursion splitting method, and generating decision trees corresponding to the training sets.

S32, testing the decision tree and the corresponding test set respectively to generate prediction data corresponding to the training set.

S33, judging whether all the predicted data meet a preset accurate threshold value.

And S34, if so, constructing a random forest model by adopting all decision trees.

And S35, if not, skipping and executing the steps of respectively selecting a plurality of characteristic vectors in the characteristic vector subsets and constructing a plurality of training sets and test sets until all the predicted data meet the preset accuracy threshold.

In the embodiment of the invention, a decision tree is constructed for each sampled training set and test set. The decision tree is typically constructed by recursively splitting, starting from the root node, recursively selecting the optimal features for splitting until the data samples on the leaf nodes reach a preset minimum number or depth reaches a preset maximum. And respectively testing the decision tree and the corresponding test set to generate prediction data corresponding to the training set, and when the prediction data corresponding to all the decision numbers meet a preset accurate threshold value, adopting all the decision trees to construct a random forest model. Otherwise, the step of respectively selecting a plurality of feature vectors in the feature vector sub-sets and constructing a plurality of training sets and test sets is carried out in a skip mode until all the predicted data meet a preset accurate threshold value.

And 207, inputting the feature vector subset into a random forest model for data evaluation, and generating the prediction precision corresponding to the intermediate feature subset.

In the embodiment of the invention, the feature vector subsets are input into a random forest model for data evaluation, the prediction results of each decision tree are integrated in a voting mode for each feature vector in the feature vector subsets, so that the prediction precision corresponding to the middle feature subsets is obtained, and the quality of the middle feature subsets is judged through the prediction accuracy.

And step 208, respectively comparing the prediction precision and the iteration times with corresponding preset thresholds, and determining a feature set corresponding to the boiler soot blower by combining the feature deletion set.

Further, the preset thresholds include a preset scoring criteria, a threshold number of iterations, and an fitness threshold, and step 208 may include the following substeps S41-S46:

s41, judging whether the prediction precision meets the preset scoring standard.

And S42, if yes, taking the intermediate feature subset as a target feature subset.

S43, if not, constructing a target feature subset by adopting the intermediate feature subset and the feature deletion set.

S44, judging whether the iteration times meet the iteration times threshold.

And S45, if yes, taking the target feature subset as a feature set corresponding to the boiler soot blower.

And S46, if not, determining a feature set corresponding to the boiler soot blower according to the subset fitness and the fitness threshold corresponding to the target feature subset.

In the embodiment of the invention, the R2 scoring standard is adopted to calculate the R2 scoring of the prediction precision and the true value corresponding to the intermediate feature subset, and the change curve of the R2 scoring obtained by the PSO-RF feature selection algorithm along with the increase of the iteration times can be drawn so as to observe the convergence and the stability of the algorithm. And when the prediction accuracy meets the preset scoring standard, taking the intermediate feature subset as a target feature subset. Otherwise, the newly deleted features are recovered, namely, the intermediate feature subset and the feature deletion set are adopted to construct a target feature subset. And judging that the iteration times are equal to the iteration times threshold, and if so, taking the target feature subset as a feature set corresponding to the boiler soot blower. Otherwise, determining the feature set corresponding to the boiler soot blower based on the subset fitness and the fitness threshold corresponding to the target feature subset. On the basis of guaranteeing the importance of the features, the feature fluctuation can be reduced by adding the dimension of the current feature subset as an evaluation index, and the selected optimal feature subset is guaranteed to have less redundancy and not to lose the prediction precision.

Further, step S46 may include the following sub-steps S461-S464:

s461, judging whether the subset fitness corresponding to the target feature subset meets a fitness threshold.

S462, if yes, taking the target feature subset as a feature set corresponding to the boiler soot blower.

And S463, if not, updating the speed and the position of each target feature in the target feature subset to generate a particle feature set.

S464, taking the particle characteristic set as a target historical characteristic data set, and jumping to execute the steps of carrying out characteristic extraction on the target historical characteristic data set by adopting a particle swarm algorithm, generating a characteristic deleting set and an intermediate characteristic subset, and counting the iteration times in real time.

Subset fitness is a generic term for fitness corresponding to each target feature in a subset of target features.

In the embodiment of the invention, when the subset fitness corresponding to the target feature subset meets the fitness threshold, the target feature subset is used as the feature set corresponding to the boiler soot blower. Otherwise, the following updating formula is adopted to update the speed and the position of each target feature in the target feature subset, and a particle feature set is generated. And taking the particle characteristic set as a target historical characteristic data set, jumping to execute the steps of carrying out characteristic extraction on the target historical characteristic data set by adopting a particle swarm algorithm, generating a characteristic deleting set and an intermediate characteristic subset, and counting the iteration times in real time.

The particle swarm optimization algorithm is a global random search algorithm based on swarm intelligence and is provided by simulating migration and clustering behaviors in the process of swarm foraging. On the basis of observing the activity behavior of the animal clusters, the particle swarm algorithm utilizes the sharing of the individual pair information in the clusters to enable the motion of the whole clusters to generate an unordered to ordered evolution process in a problem solving space, so that an optimal solution is obtained. In the particle swarm algorithm, birds are abstracted into individual particles, each particle has its own position and speed, two parameters of speed and position determine the specific situation of particle motion, the position determines the distance from the optimal value, and the magnitude of the change in speed iteration. For an N-dimensional optimization problem, if the size of the particle swarm is M, the position of the particle i (0<i.ltoreq.M) in the N-dimensional space is expressed as a vector, and the flying speed is expressed as a vector. Each particle has an adaptation value determined by the objective function and knows its best position (pbest) found so far and its present position. This can be seen as the particles' own flight experience. In addition to this, each particle knows the best position found by all particles in the whole population up to now, i.e. the population position vector (gbest), which is the best value in pbest, i.e. the individual position vector, which can be seen as experience of the particle companion. The particles determine the next movement by their own experience and the best experience among peers.

The particles update the speed and position according to the following formula, namely, the speed and position of each target feature in the target feature subset are updated by adopting the following formula, and a particle feature set is generated:

X _i (t+1)＝X _i (t)+V _i (t+1)

in the method, in the process of the invention,representing an inertia factor reflecting the movement habit of the particles, representing a tendency of the particles to maintain their previous velocity, when +.>When the algorithm is larger, the global searching capability of the algorithm is stronger, and the local searching capability is weaker. When->When the algorithm is smaller, the global searching capability of the algorithm is weaker, and the local searching capability of the algorithm is stronger; c ₁ And c ₂ For learning factors, also called acceleration constants, representing the tendency of a particle to learn from its own best experience and to learn from the best of all particles, by adjusting c ₁ And c ₂ The exploration and excavation energy of the algorithm can be balanced; rand () represents two random numbers between 0 and 1; v (V) _i (t+1) represents a corresponding target velocity vector, V, after the target feature is updated _i (t) represents an initial velocity vector corresponding to the ith target feature at the current time; x is X _i (t+1) represents the corresponding target individual position after the target feature is updated; x is X _i (t) represents an initial individual position corresponding to the i-th target feature at the current time; pbest (p best) _i (t) represents an individual position vector corresponding to the i-th target feature at the current time; gbest (g best) _i And the population position vector corresponding to the initial particle population at the current moment is represented.

In the embodiment of the invention, as shown in fig. 3, an initial historical characteristic data set of a boiler soot blower is acquired, and a pearson correlation coefficient is adopted to perform characteristic dimension reduction on the initial historical characteristic data set, so as to generate a target historical characteristic data set. And carrying out feature extraction on the target historical feature data set by adopting a particle swarm algorithm, initializing a population, generating an initial feature subset, calculating the fitness corresponding to each feature in the initial feature subset, generating a feature deletion set, an intermediate feature subset and counting the iteration times in real time. And coding the intermediate feature subsets to generate feature vector subsets, respectively selecting a plurality of feature vectors in the feature vector subsets for training, and constructing a plurality of training sets and test sets. And (5) constructing a model by adopting all training sets and test sets to generate a random forest model. And inputting the feature vector subset into a random forest model for data evaluation, and generating the prediction precision corresponding to the intermediate feature subset. Judging whether the iteration times meet the iteration times threshold, comparing the prediction precision and the iteration times with corresponding preset thresholds respectively, and determining a feature set corresponding to the boiler soot blower by combining the feature deletion set. And the pearson correlation coefficient is used for carrying out correlation coefficient analysis, so that a large part of characteristic parameters of the correlation coefficient are removed. And then extracting the features by using a particle swarm algorithm, and scoring the importance of the features by using a random forest algorithm. The particle swarm algorithm starts searching from the whole set of features, deletes the features with the lowest importance from the current feature subset each time, and searches for a better feature subset; and calculating according to the prediction accuracy of the current feature subset by using an RF algorithm, and if the accuracy is reduced, recovering the feature which is just deleted, and circulating until the set iteration times are reached or the fitness value of the particles reaches a preset threshold value. On the basis of guaranteeing the importance of the features, the feature fluctuation can be reduced by adding the dimension of the current feature subset as an evaluation index, and the selected optimal feature subset is guaranteed to have less redundancy and not to lose the prediction precision.

Referring to fig. 4, fig. 4 is a block diagram of a feature extraction system of a soot blower of a boiler according to a third embodiment of the present invention.

The embodiment of the invention provides a feature extraction system of a boiler soot blower, which comprises the following components:

an initial historical feature data set acquisition module 401 for acquiring an initial historical feature data set of the boiler soot blower.

The target historical feature data set generating module 402 is configured to perform feature dimension reduction on the initial historical feature data set by using pearson correlation coefficients, so as to generate a target historical feature data set.

The feature deletion set, the intermediate feature subset and the iteration number generation module 403 are configured to perform feature extraction on the target historical feature dataset by using a particle swarm algorithm, generate a feature deletion set, an intermediate feature subset and count the iteration number in real time.

And the prediction precision calculation module 404 is configured to calculate the prediction precision corresponding to the intermediate feature subset by using a random forest algorithm.

The feature set determining module 405 is configured to compare the prediction accuracy and the iteration number with corresponding preset thresholds, and determine a feature set corresponding to the boiler soot blower by combining the feature deletion set.

Optionally, the initial historical feature data set includes a plurality of initial historical feature data, and the target historical feature data set generation module 402 includes:

And the correlation coefficient calculation module is used for calculating the correlation coefficient corresponding to each initial historical characteristic data by adopting the Pearson correlation coefficient.

And the correlation coefficient judging module is used for judging whether the absolute value of the correlation coefficient meets the preset correlation coefficient interval.

And the first module is used for selecting one from a plurality of initial historical characteristic data corresponding to the correlation coefficient as target historical characteristic data according to a preset characteristic extraction standard if the target historical characteristic data is determined.

And the target historical characteristic data determining second module is used for taking all initial historical characteristic data corresponding to the correlation number as target historical characteristic data if not.

And the target historical characteristic data set generation sub-module is used for constructing a target historical characteristic data set by adopting all target historical characteristic data.

Optionally, the feature deletion set, the intermediate feature subset, and the iteration number generation module 403 includes:

and the initial feature subset generation module is used for carrying out population initialization on the target historical feature data set to generate an initial feature subset.

And the fitness calculation module is used for calculating fitness corresponding to each feature in the initial feature subset by adopting a preset fitness formula.

The preset fitness formula is as follows:

The feature deletion set construction module is used for constructing a feature deletion set by adopting features of which all fitness does not meet the preset fitness threshold.

And the intermediate feature subset and iteration number generation module is used for removing the features corresponding to the feature deletion set in the initial feature subset, generating an intermediate feature subset and counting the iteration number in real time.

Optionally, the prediction accuracy calculation module 404 includes:

and the feature vector subset generating module is used for encoding the intermediate feature subsets to generate feature vector subsets.

The training set and test set construction module is used for respectively selecting a plurality of characteristic vectors in the characteristic vector sub-sets to construct a plurality of training sets and test sets.

And the random forest model generation module is used for carrying out model construction by adopting all training sets and test sets to generate a random forest model.

And the prediction accuracy calculation sub-module is used for inputting the feature vector subset into the random forest model for data evaluation and generating the prediction accuracy corresponding to the intermediate feature subset.

Alternatively, the random forest model generation module may perform the steps of:

respectively carrying out decision tree training on each training set by adopting a recursion splitting method to generate decision trees corresponding to the training sets;

if yes, constructing a random forest model by adopting all decision trees;

if not, the step of jumping to execute the steps of respectively selecting a plurality of characteristic vectors in the characteristic vector sub-sets and constructing a plurality of training sets and test sets until all the predicted data meet the preset accurate threshold value.

Optionally, the preset threshold includes a preset scoring criterion, an iteration number threshold, and an fitness threshold, and the feature set determining module 405 includes:

and the prediction precision judging module is used for judging whether the prediction precision meets the preset scoring standard.

The target feature subset determines a first module for, if so, taking the intermediate feature subset as the target feature subset.

And the target feature subset determining second module is used for constructing the target feature subset by adopting the intermediate feature subset and the feature deletion set if not.

The iteration number judging module is used for judging whether the iteration number meets an iteration number threshold.

And determining a first submodule by the feature set, wherein if yes, the target feature subset is used as the feature set corresponding to the boiler soot blower.

And the feature set determining second sub-module is used for determining the feature set corresponding to the boiler soot blower according to the subset fitness and the fitness threshold corresponding to the target feature subset if not.

Alternatively, the feature set determination second sub-module may perform the steps of:

judging whether the subset fitness corresponding to the target feature subset meets a fitness threshold value or not;

taking the particle characteristic set as a target historical characteristic data set, and jumping to execute the steps of carrying out characteristic extraction on the target historical characteristic data set by adopting a particle swarm algorithm, generating a characteristic deleting set and an intermediate characteristic subset, and counting the iteration times in real time.

The embodiment of the invention also provides electronic equipment, which comprises: a memory and a processor, the memory storing a computer program; the computer program, when executed by a processor, causes the processor to perform the method of feature extraction of a boiler soot blower of any of the embodiments described above.

The memory may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory has memory space for program code to perform any of the method steps described above. For example, the memory space for the program code may include individual program code for implementing the various steps in the above method, respectively. The program code can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. The program code may be compressed, for example, in a suitable form. These codes, when executed by a computing processing device, cause the computing processing device to perform the steps in the method of feature extraction of a boiler soot blower described above.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the method for extracting the characteristics of the boiler soot blower according to any one of the above embodiments.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for extracting characteristics of a boiler soot blower, comprising:

2. The method of feature extraction of a boiler soot blower of claim 1, wherein said initial historical feature data set comprises a plurality of initial historical feature data; the step of performing feature dimension reduction on the initial historical feature data set by adopting the pearson correlation coefficient to generate a target historical feature data set comprises the following steps:

3. The method for extracting features of a boiler soot blower according to claim 1, wherein said step of performing feature extraction on said target historical feature dataset using a particle swarm algorithm to generate a feature deletion set, an intermediate feature subset, and counting the number of iterations in real time comprises:

the preset fitness formula is as follows:

4. The method for extracting features of a boiler soot blower according to claim 1, wherein said step of calculating the prediction accuracy corresponding to said intermediate feature subset using a random forest algorithm comprises:

encoding the intermediate feature subset to generate a feature vector subset;

5. The method of feature extraction for a boiler soot blower of claim 4, wherein said step of modeling using all of said training set and said test set to generate a random forest model comprises:

if yes, constructing a random forest model by adopting all the decision trees;

6. The method for extracting features of a boiler soot blower according to claim 1, wherein said preset threshold values include preset scoring criteria, iteration number threshold values and fitness threshold values; the step of comparing the prediction precision and the iteration times with corresponding preset thresholds respectively and determining the feature set corresponding to the boiler soot blower by combining the feature deletion set comprises the following steps:

if yes, the intermediate feature subset is used as a target feature subset;

judging whether the iteration times meet the iteration times threshold;

7. The method of feature extraction for a boiler soot blower of claim 6, wherein said step of determining a feature set for said boiler soot blower based on a subset fitness for said target feature subset and said fitness threshold comprises:

8. A feature extraction system for a boiler sootblower, comprising:

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of feature extraction of a boiler soot blower as claimed in any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the method of feature extraction of a boiler soot blower according to any one of claims 1-7.