CN116561569A

CN116561569A - Industrial power load identification method based on EO feature selection and AdaBoost algorithm

Info

Publication number: CN116561569A
Application number: CN202310319690.6A
Authority: CN
Inventors: 姚小康; 周孟然; 刘宇; 汪锟; 王昊男; 朱梓伟
Original assignee: Anhui University of Science and Technology
Current assignee: Anhui University of Science and Technology
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-08-08

Abstract

The invention discloses an industrial power load identification method based on EO feature selection and combining with AdaBoost algorithm, which comprises the steps of collecting power load parameters of various devices in industrial power scenes, constructing an original power data set, preprocessing the data, and dividing the data set into a training set and a testing set according to a certain proportion rule; taking the quality of the acquired samples and the complexity of the power load into consideration, extracting time-frequency domain characteristics of the original power data; using EO as a feature selection algorithm based on a wrapper, and screening out an optimal feature subset through DA; and inputting the screened optimal feature subset into an AdaBoost model for training to obtain a trained model. And carrying out a recognition experiment by using the test set, and outputting the category and the accuracy corresponding to the data. The method effectively improves the identification accuracy of the power load in the industrial power scene, and solves the problem that the traditional power load identification model has high complexity and is difficult to be practically applied to the industrial scene.

Description

Industrial power load identification method based on EO feature selection and AdaBoost algorithm

Technical Field

The invention relates to the field of industrial power load identification, in particular to an industrial power load identification method based on EO feature selection and AdaBoost algorithm.

Background

Energy is an important basis for human society development and is also a key factor for economic growth and quality of life. In the energy consumption, the electric energy occupies a great proportion, reflecting the industrialization level and technological innovation capability of a country or region. Industrial electricity is used as a main part of electric energy consumption, so that the industrial production efficiency and quality are directly affected, and the environment protection, energy conservation and emission reduction are further related. Therefore, the method improves the industrial electricity utilization efficiency, optimizes the industrial electricity utilization structure and is an important way for promoting the sustainable development of the economy and society. The research on power load identification of industrial power scenes is helpful for optimizing equipment configuration and scheduling strategies, reducing invalid power and loss, improving energy utilization rate and realizing intelligent energy management.

Industrial power load identification refers to a process of analyzing and identifying the characteristics and requirements of power equipment of an industrial consumer using measurement data of a power system. The purpose of industrial power load identification is to optimize operation and planning of a power system, improve electric energy utilization efficiency, reduce power grid loss and guarantee power supply reliability and safety. Feature selection is an important step in model construction, which can improve the efficiency and interpretability of the model, and reduce the risk of overfitting. Based on the ideas of information theory and genetic algorithms, a balance optimizer (EO) finds the optimal result by maximizing mutual information and minimizing redundancy. Meanwhile, as an algorithm for feature selection, EO can reduce the number of features while maintaining classification accuracy. To address the deficiencies of the traditional classification algorithms, an adaptive enhancement algorithm (AdaBoost) classifier was proposed. As an algorithm based on ensemble learning, the core idea is to construct and combine multiple weak classifiers (such as decision trees, neural networks, etc.) to form one strong classifier, thereby improving classification performance and generalization ability.

Disclosure of Invention

Aiming at the technical problems, the invention aims to provide an industrial power load identification method based on EO characteristic selection and AdaBoost algorithm. The invention has the advantage that the accuracy and the robustness of industrial power load identification can be effectively improved.

The technical scheme adopted for solving the technical problems is as follows:

an industrial power load identification method based on EO feature selection and combining with AdaBoost algorithm comprises the following steps:

step one: and collecting power load parameters of various devices in an industrial power scene, and constructing an original power data set. Preprocessing the data, and dividing the data set into a training set and a testing set according to a certain proportion rule;

step two: taking the quality of the acquired samples and the complexity of the power load into consideration, extracting time-frequency domain characteristics of the original power data;

step three: screening out an optimal feature subset through Discriminant Analysis (DA) by using a balance optimizer algorithm (EO) as a feature selection algorithm based on a wrapper;

step four: and inputting the screened optimal feature subset into an adaptive enhancement algorithm (AdaBoost) model for training to obtain a trained model. And carrying out a recognition experiment by using the test set, and outputting the category and the accuracy corresponding to the data.

Further, in the first step, the collection of the power load parameters of various devices in the industrial power scene refers to measuring and recording the indexes of voltage, current, power factor and the like of different types of industrial devices in the operation process through professional instruments and methods. These parameters reflect the energy consumption status and the operating efficiency of the plant. The sampling time interval was 1 time per minute.

In the first step, the method for preprocessing the electric power data refers to processing the original data collected by the electric power system to a certain extent, so as to improve the quality and usability of the data and provide effective support for subsequent analysis and application. The method comprises data cleaning and denoising, and particularly comprises the steps of detecting, deleting or correcting abnormal data such as missing values, noise values, error values and the like, so that the integrity and the accuracy of the data are ensured.

Further, in the first step, the preprocessed data set is processed according to 9:1 is divided into training and testing sets.

Furthermore, in the second step, the time-frequency domain feature extraction of the original power data is a commonly used signal analysis method, and useful information such as frequency, amplitude, phase and the like can be extracted from the power signal. The time-frequency domain feature extraction method comprises the following steps:

(1) And carrying out Fourier transform on the preprocessed power data, and converting the time domain signal into a frequency domain signal to obtain a time-frequency spectrogram. Wherein the frequency required for the calculation is obtained by a fast fourier transform, a fast algorithm that performs a discrete fourier transform on a computer. In signal processing, computation of discrete fourier transforms is important. Correlation, filtering, spectral estimation, etc. of the signals may be achieved by discrete fourier transforms. The continuous fourier transform of the discrete-time signal x (n) is defined as:

due to X (e) ^jω ) Is a continuous function and cannot be calculated on a computer. Therefore, we can perform spectrum analysis on the computer after dispersing the spectrum of the approximate x (n). The discrete fourier transform of a finite length discrete signal x (N), n=0, 1,2, …, N-1 is defined as:

wherein w is _N ＝e ^(-j2σ/N) N=0, 1,2, …, N-1. The inverse transformation is defined as:

the discrete fourier transform in matrix form is x=a·x. Wherein the transformation matrix A is:

(2) Filtering, dividing, binarizing and the like are carried out on the time-frequency spectrogram, noise and irrelevant information are removed, and the time-frequency characteristics of signals are highlighted;

(3) Extracting features of time-frequency parameters, shapes, textures and the like of signals by using a statistical method, a geometric method, a mode identification method and the like; the extracted features constitute a set of joint features.

Further, in step three, EO feature selection may select some useful feature subset from the high-dimensional data to reduce the complexity of the data and improve the performance of the model. The specific flow is as follows:

(1) Initializing a binary matrix as a population, each row representing an individual, each column representing a feature, 1 representing selecting the feature, 0 representing discarding the feature;

(2) Calculating the fitness value of each individual, and evaluating the contribution of the selected feature subset to the target task according to a certain evaluation index (such as classification accuracy, information gain and the like);

(3) According to the fitness value sequencing, determining the current balance pool state, namely selecting four individuals with the highest fitness and the average value of the four individuals as candidate solutions;

(4) Updating an index term coefficient F for controlling the balance between the global search and the local search;

(5) Updating a quality generation rate G for enhancing local optimizing capability;

(6) Updating the binary matrix of each individual, and performing variation operation according to the state of the balance pool, the index term coefficient and the quality generation rate;

(7) Judging whether a stopping condition is met (if the maximum iteration number is reached or the fitness value is not improved any more), if so, outputting an optimal solution (namely, the individual with the highest fitness), otherwise, returning to the second step to continue iteration.

Further, in step four, the optimal feature subset is input to an adaptive enhancement algorithm (AdaBoost) model, which iteratively trains a plurality of weak classifiers, assigns weights according to the error rate of each weak classifier, and finally combines all the weak classifiers into one strong classifier. The basic route of the method is as follows:

(1) Firstly, selecting an optimal feature subset from training data, and then using the subset as input to construct an initial weak classifier;

(2) Then, in each iteration, according to the classification result of the previous round, updating the weight distribution of the training data so that the samples which are incorrectly classified obtain higher weights and the samples which are correctly classified obtain lower weights;

(3) A new weak classifier is then trained. Repeating the steps until the preset iteration times are reached or the error rate reaches a small value;

(4) Finally, all the trained weak classifiers are weighted and averaged according to the accuracy shown by the weak classifiers in the training process to obtain a final strong classifier, and the strong classifier is used for classifying various power loads.

According to the industrial power load identification method based on EO feature selection and AdaBoost algorithm, the identification precision of the neural network on power recombination in industrial power scenes is effectively improved. The feature selection is completed through the balance optimizer, so that the dimension of the original power data is greatly reduced, the method is more suitable for being deployed in embedded equipment with limited internal limit and limited computing resources, and the method is more beneficial to online use in actual industry. The method provided by the application can be well applied to the accurate classification of industrial power equipment, and the existing hardware requirements are effectively reduced.

Drawings

FIG. 1 is a flow chart of industrial power load identification based on EO feature selection in combination with AdaBoost algorithm in accordance with an embodiment of the present invention;

FIG. 2 is a graph comparing load curves before and after pretreatment of a typical industrial power plant according to an embodiment of the present invention;

FIG. 3 is a flow chart of EO feature selection in accordance with one embodiment of the invention;

FIG. 4 is a schematic diagram of an AdaBoost model according to an embodiment of the present invention;

FIG. 5 is a confusion matrix for test set power load identification in an embodiment of the invention;

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses an industrial power load identification method based on EO feature selection and AdaBoost algorithm, as shown in figure 1, comprising the following steps:

specifically, the power load samples collected in this study were entirely from the actual chinese factory. And measuring and recording indexes such as current, voltage, active power, reactive power and the like of 9 industrial equipment in the running process by a professional sampling instrument. The sampling frequency of each instrument is 1/60Hz. The preprocessing of the power data refers to performing operations such as cleaning, checking, interpolation, smoothing and the like on the original power data so as to improve the quality and usability of the data. The active power and reactive power curves of a typical industrial power plant before and after pretreatment are changed as shown in fig. 2. The specific preprocessing method of the power load data comprises the following steps:

(1) Data cleaning: deleting or correcting invalid data which influences data analysis, such as abnormal values, noise values, missing values and the like;

(2) And (3) data verification: checking whether the data accords with a physical rule and a statistical rule, such as whether the voltage and the frequency are in a reasonable range, whether the active power and the reactive power meet a tide equation, and the like;

(3) And (3) data interpolation: reasonably filling the missing values, such as linear interpolation, spline interpolation, regression analysis and other methods;

(4) Smoothing data: short term fluctuations in the data are eliminated or reduced, such as using moving average methods, exponential smoothing methods, wavelet transforms, and the like.

in one embodiment, to extract useful information from the power signal, such as frequency, amplitude, phase, etc. The specific time-frequency domain feature extraction steps are as follows:

wherein w is _N ＝e ^(-j2σ/N ) N=0, 1,2, …, N-1. The inverse transformation is defined as:

(3) And extracting the characteristics of the processed time-frequency spectrogram by using a statistical method, a geometric method, a pattern recognition method and the like, extracting the characteristics of time-frequency parameters, shapes, textures and the like of signals, and summarizing all kinds of time-frequency domain characteristics to construct a combined characteristic set.

in one embodiment, EO feature selection may select some useful subset of features from the high-dimensional data to reduce the complexity of the data and improve the performance of the model. The EO algorithm flow is as in FIG. 3, which specifically includes the following steps:

In one embodiment, the optimal feature subset is input to an AdaBoost model that combines all weak classifiers into one strong classifier by iteratively training 10 weak classifiers and assigning weights according to the error rate of each weak classifier. The AdaBoost model building route is shown in FIG. 4, and basically comprises the following steps:

In one embodiment, the optimal feature subset is input to the AdaBoost model, and the classification result is shown in fig. 5. In order to more intuitively show the actual effect of the power load identification in the industrial scene. The authors use the confusion matrix to display the classification details of the power load, the result being shown in figure 5. The identification of each industrial device can be clearly understood from fig. 5. Of which devices of type ETE have 82% prediction correctness and 18% misprediction as CMW. Devices of type CME have 85% correctly predicted, 5% incorrectly predicted ETE,5% incorrectly predicted CMW,5% incorrectly predicted CCA. Wherein the device of type KHT has 95% prediction correct and 5% mispredicted CCA. All the rest of the devices predict correctly. Therefore, the method can accurately analyze the type of the industrial equipment.

The industrial power load identification method based on EO feature selection and the AdaBoost algorithm is described in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, which are intended to be merely illustrative of the core concepts of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. An industrial power load identification method based on EO feature selection and an AdaBoost algorithm, comprising the following steps:

s1, collecting power load parameters of various devices in an industrial power scene, and constructing an original power data set. Preprocessing the data, and dividing the data set into a training set and a testing set according to a certain proportion rule;

s2, taking the quality of the acquired samples and the complexity of the power load into consideration, and extracting time-frequency domain characteristics of the original power data;

s3, using a balance optimizer algorithm (EO) as a feature selection algorithm based on a wrapper, and screening out an optimal feature subset through Discriminant Analysis (DA);

and S4, inputting the screened optimal feature subset into a self-adaptive enhancement algorithm (AdaBoost) model for training, and obtaining a trained model. And carrying out a recognition experiment by using the test set, and outputting the category and the accuracy corresponding to the data.

2. An industrial power load identification method based on EO feature selection in combination with AdaBoost algorithm as claimed in claim 1, characterized in that: in the step S1, the power load parameters of various equipment in an industrial power scene are acquired, namely, indexes such as voltage, current, power factor and the like of different types of industrial equipment in the running process are measured and recorded through professional instruments and methods. These parameters reflect the energy consumption status and the operating efficiency of the plant. The sampling time interval was 1 time per minute.

3. An industrial power load identification method based on EO feature selection in combination with AdaBoost algorithm as claimed in claim 1, characterized in that: in the step S1, the power data preprocessing method refers to performing certain processing on the original data collected by the power system, so as to improve the quality and usability of the data, and provide effective support for subsequent analysis and application. The method comprises data cleaning and denoising, and particularly comprises the steps of detecting, deleting or correcting abnormal data such as missing values, noise values, error values and the like, so that the integrity and the accuracy of the data are ensured.

4. An industrial power load identification method based on EO feature selection in combination with AdaBoost algorithm as claimed in claim 1, characterized in that: in the step S2, the time-frequency domain feature extraction of the original power data is a commonly used signal analysis method, and useful information such as frequency, amplitude, phase and the like can be extracted from the power signal. The time-frequency domain feature extraction comprises the following steps:

s21, carrying out Fourier transform on the preprocessed power data, and converting the time domain signal into a frequency domain signal to obtain a time-frequency spectrogram. Wherein the frequency required for the calculation is obtained by a fast fourier transform, a fast algorithm that performs a discrete fourier transform on a computer. In signal processing, computation of discrete fourier transforms is important. Correlation, filtering, spectral estimation, etc. of the signals may be achieved by discrete fourier transforms. The continuous fourier transform of the discrete-time signal x (n) is defined as:

s22, preprocessing such as filtering, segmentation and binarization is carried out on the time-frequency spectrogram, noise and irrelevant information are removed, and the time-frequency characteristics of signals are highlighted;

s23, extracting features of the preprocessed time-frequency spectrogram, such as time-frequency parameters, shapes, textures and the like of the signals by using a statistical method, a geometric method, a pattern recognition method and the like.

5. An industrial power load identification method based on EO feature selection in combination with AdaBoost algorithm as claimed in claim 1, characterized in that: in S3, EO feature selection may select some useful feature subset from the high-dimensional data to reduce the complexity of the data and improve the performance of the model. The specific flow is as follows:

s31, initializing a binary matrix as a population, wherein each row represents an individual, each column represents a feature, 1 represents selecting the feature, and 0 represents discarding the feature;

s32, calculating the fitness value of each individual, and evaluating the contribution of the selected feature subset to the target task according to a certain evaluation index (such as classification accuracy, information gain and the like);

s33, determining the current balance pool state according to the ranking of the fitness values, namely selecting four individuals with the highest fitness and the average value of the four individuals as candidate solutions;

s34, updating an index term coefficient F, wherein the index term coefficient F is used for controlling the balance between global searching and local searching;

s35, updating a quality generation rate G, wherein the quality generation rate G is used for enhancing local optimizing capability;

s36, updating the binary matrix of each individual, and performing variation operation according to the state of the balance pool, the index term coefficient and the quality generation rate;

and S37, judging whether a stopping condition is met (if the maximum iteration times are reached or the fitness value is not improved any more), outputting an optimal solution (namely, the individual with the highest fitness) if the stopping condition is met, otherwise, returning to the second step and continuing iteration.

6. An industrial power load identification method based on EO feature selection in combination with AdaBoost algorithm as claimed in claim 1, characterized in that: in the step S4, the optimal feature subset is input to an adaptive enhancement algorithm (AdaBoost) model, which iteratively trains a plurality of weak classifiers, assigns weights according to the error rate of each weak classifier, and finally combines all the weak classifiers into one strong classifier. The basic route of the method is as follows:

s41, firstly selecting an optimal feature subset from training data, and then using the subset as input to construct an initial weak classifier;

s42, in each iteration, according to the classification result of the previous round, updating the weight distribution of the training data so that the samples which are incorrectly classified obtain higher weights and the samples which are correctly classified obtain lower weights;

s43, training a new weak classifier. Repeating the steps until the preset iteration times are reached or the error rate reaches a small value;

and S44, finally, weighting and averaging all the weak classifiers obtained through training according to the accuracy shown in the training process to obtain a final strong classifier, and classifying various electric loads by using the strong classifier.