CN113947159A

CN113947159A - Real-time online monitoring and identification method for electrical load

Info

Publication number: CN113947159A
Application number: CN202111247143.9A
Authority: CN
Inventors: 魏广芬; 赵航; 胡春华; 刘骞
Original assignee: Shandong Technology and Business University; Yantai Dongfang Wisdom Electric Co Ltd
Current assignee: Shandong Technology and Business University; Yantai Dongfang Wisdom Electric Co Ltd
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2022-01-18
Anticipated expiration: 2041-10-26
Also published as: CN113947159B

Abstract

The invention discloses a real-time online monitoring and identifying system for an electricity load, which identifies the type and the working state of the electricity load by utilizing a Shannon entropy weighted voting algorithm model, wherein the determination process of a base classifier combination in the Shannon entropy weighted voting algorithm model is as follows: selecting E base classifiers to carry out

Combining to obtain a plurality of different algorithm model combinations, calculating the diversity Ent of the different algorithm model combinations, and determining the optimal value range Ent of Ent^*(ii) a For the object belonging to Ent^*The combination of algorithm models in the range, and the Shannon entropy of the combination of algorithm models and the addition of the entropyThe accuracy of the weight vote, denoted ACC_se，Ent^*Inner distance Ent^*ACC with closest intermediate point and largest numerical value_seAnd the corresponding algorithm model combination is the optimal learner combination which is marked as Com, and the trained Shannon entropy weighted voting algorithm model is obtained. Therefore, the real-time online monitoring and identifying system for the power load has less hardware investment and high identifying efficiency and accuracy.

Description

Real-time online monitoring and identification method for electrical load

Technical Field

The invention relates to the field of electric power information acquisition, in particular to a real-time online monitoring and identifying method for an electric load.

Background

With the proposition of the strategy of 'carbon peak reaching and carbon neutralization', the environment-friendly, low-carbon, energy-saving and emission-reduction become normal and mainstream, wherein the optimized management of the electrical load of residents has larger energy-saving and emission-reduction potential.

The relevant detailed information of the household appliance load is obtained through the load monitoring technology, so that the method is not only beneficial to optimizing the operation and management of a power grid of a power company, but also beneficial to efficiently managing power utilization equipment by resident users, and therefore powerful support is provided for energy conservation and emission reduction. According to the Monitoring method, the current Load Monitoring technology mainly includes two types, Non-Intrusive Load Monitoring (NILM) and Intrusive Load Monitoring (ILM). The traditional load monitoring method generally records the use condition by installing a sensor on the electrical equipment of a user, belongs to an intrusive mode, needs to add a module on the electrical equipment, is high in economic cost and is not beneficial to popularization and application. In comparison, the non-intrusive load monitoring does not increase marginal cost, and generally, a monitoring device is installed at an intelligent electric meter, so that energy consumption data of the whole family is obtained, and the operation state of each household electric device is accurately identified by using methods such as event detection, mode identification and the like. The method has the advantages of good operability, low implementation cost, high user acceptance, easy popularization and wide prospect.

At present, in order to meet the requirement of real-time load online identification, an online identification system capable of processing data in real time is needed, the system needs to receive and store the data in real time, and meanwhile, the waveform, the real-time calculation characteristics and the loading model are displayed in a display area in real time to judge the category of an electric appliance in real time. Therefore, multiple threads are needed to realize the functions, so that the system is not jammed, the real-time requirement is met, and the real-time online monitoring and identification tasks are completed.

Compared with the traditional single learning algorithm, the integrated learning method has stronger advantages, and Boosting, Bagging, Stacking, Voting and other methods are formed at present. The Voting method is an integration method commonly used in classification problems, and comprises an absolute majority Voting method, a relative majority Voting method, a weighted Voting method and the like. The method effectively synthesizes the prediction results of various base classifiers, and the classification and identification effects are outstanding. However, in the experimental process of constructing a strong learner with high accuracy and strong generalization ability by using a plurality of weak heterogeneous base classifiers, the following problems are found to occur in the application of the Voting algorithm:

(1) the relative majority voting method only needs to acquire the option with the largest number of votes as the final result, but when the majority classifier predicts an error, the integrated model prediction result is also wrong, so that the recognition accuracy after integration is low.

(2) According to different scenes, the prediction accuracy of various base classifiers is different, namely the prediction results of the base classifiers have differences. Therefore, different base classifiers are selected to vote, the obtained voting results are inconsistent, and sometimes the combined voting results of the selected base classifiers are not as good as those of a single learner. How to select the base classifier in the integration method is an urgent problem to be solved.

(3) The weighted voting method is to configure different weights for each base classifier, and when an appropriate weight is given, the result obtained by the weighted voting method can be better than that obtained by the base classifier, and the result obtained by the weighted voting method is also better than that obtained by the relative majority voting method.

There is therefore a need for improvements in the prior art.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: an electric load real-time online monitoring and identifying method is provided.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a real-time online monitoring and identification method for electrical loads comprises the following steps:

s1, receiving an original signal sent by the intelligent electric meter by using the main thread, and analyzing according to a protocol to obtain original data;

s2, storing the original data by the sub thread 1;

s3, the sub-thread 2 displays the original data in real time through a display device;

s4, analyzing the voltage and current data of each period in the original data by the main thread, and extracting characteristic data;

s5, storing the characteristic data by the sub-thread 3, and storing the characteristic data to the local in real time;

s6, calling a display function by a sub-thread 4 to initialize a waveform diagram of the electric appliance to be displayed, wherein the waveform diagram displays the electricity utilization characteristic data of the electric appliance;

s7, the thread 5 reads the real-time feature data and transmits the feature data to the display function in the thread 4, and the feature data in the display is updated;

s8, detecting the event by the thread 4 by using an event detection algorithm, and detecting whether the event occurs according to the real-time characteristic data in the thread 5; the event means that the change value of the characteristic data is higher than a set change threshold value in a set detection period;

s9, when an event occurs, loading the trained Shannon entropy weighting voting algorithm model by using the sub-thread 6;

and S10, reading the real-time characteristic data by the sub-thread 6, and inputting the real-time characteristic data into the Shannon entropy weighting voting algorithm model to realize the online identification of the type and the working state of the electric appliance.

Compared with the prior art, the invention has the following technical effects:

and the system is ensured not to be jammed by running multiple threads, the real-time requirement is met, and the real-time online monitoring and identification tasks are completed. And realizing the online identification of the category and the working state of the electric appliance by utilizing a Shannon entropy weighting voting algorithm model.

On the basis of the technical scheme, the invention can be further improved as follows.

Preferably, the display device in step S2 presents the original data in a text format.

Preferably, the waveforms in step S6 include voltage waveform, current waveform, active power waveform, reactive power waveform, V-I trace and harmonic amplitude;

preferably, the characteristic data in step S4 includes active power (P), reactive power (Q), apparent power (S), power factor (λ), (2g-1) subharmonic, where g is a positive integer, such as 1 subharmonic (har1), 3 subharmonic (har3), …, 63 subharmonic (har 63);

preferably, in step S5, the shannon entropy weighted voting algorithm model is trained as follows,

A1、

the Feature data is stored locally to form a Feature database, the main thread pair includes a Linear Discriminant Analysis (LDA), a Naive Bayes classifier (NB), a K-Nearest Neighbor algorithm (KNN), a Decision Tree classifier (DT), a Support Vector Machine (SVM), a Logistic Regression algorithm (LR), a feedback Neural Network (BPNN), and a Recursive Feature Elimination (Recursive Feature aggregation, RFE) to select effective Feature data of each algorithm, and the effective Feature data is used for training the seven algorithms to obtain a post-basis algorithm model of each algorithm, subsequently train a basis classifier, and train numbers of 7 basis classifiers from 0 to 6.

A2、

To 7 kinds of base classifiers

Combining to obtain 120 different algorithm model combinations, calculating the diversity Ent of various combinations by using a formula (1),

ent is diversity in formula (1), and m is a certain algorithm modeThe number of base classifiers in the type combination corresponds to one Ent, x in each algorithm model combination_kIs the kth sample in the feature data set X, K being the total number of samples in the feature data set X, z (X)_k) Refers to a combination of m base classifiers for a sample x_kThe number of correctly classified base classifiers can be obtained.

A3、

Calculating the optimal range Ent of the diversity according to the formula (3) by using the maximum value and the minimum value of the diversity^*

In the formula (3), σ is Ent^*The half interval range of (1) is an empirical set value, and is generally between 0 and 1;

for the case of selecting E base classifiers, the predicted classification probabilities are weighted and fused according to formula (4) for the case of n classification classes

P in formula (4)_En(x_k) Represents the sample x for the feature data_kThe E-th classifier predicts the posterior probability of the nth class. w is a_iIs the ith base classifier to sample x_kThe fusion weight of (a);

in the formula (5), the reaction mixture is,

H_i(x_k) Representing feature-based data samples x_kThe Shannon entropy of the ith classifier on all classification classes is calculated;

j represents the classification category and i is the number of the base classifier.

A4、

Adding all the results calculated by the formula (6) to realize the posterior probability weighted fusion of the categories to obtain P' (x)_k) As shown in formula (7)

It is based on data x in an algorithm model combination_kProbability of being judged as each category.

A5、

P' (x) as shown in equation (8)_k) The column index of the element with the largest numerical value is for sample x_kPrediction Class (x) of_k)，

Class(x_k)＝argmax(P″(x_k)) (8)

arg represents the sequence corresponding to the element with the largest value in the solution data list. Thereby obtaining the prediction results of all training samples.

A6:

Calculating the prediction accuracy ACC of each algorithm model combination based on the prediction results and the real results of all training samples_se

TP_se、TN_se、FN_se、FP_seThe number of the predicted result and the real result of the se combination on the training sample is shown as follows: TP_seReally, the number of samples which show real results as positive classes is predicted as the positive classes; TN (twisted nematic)_seTrue negative, which represents the number of samples with true results as negative class predicted as negative class; FP_seIf the sample is false positive, the number of samples with true results of negative classes is predicted to be positive classes; FN (FN)_seIf the true result is false negative, the number of samples with true results of positive class is predicted as negative class; c1 is a pairSelected E base classifiers

The number of algorithm model combinations obtained by combination.

A7: for the object belonging to Ent^*Combining the algorithm models in the range, finding the value which is the largest and is at Ent^*ACC closest to the middle position inside_seAnd (4) corresponding algorithm model combination, namely the optimal learner combination which is marked as Com. The best learner combination Com is the trained Shannon entropy weighted voting algorithm model.

Further, the recursive feature elimination method uses a base model to perform multiple rounds of training, eliminates the features of a plurality of weight coefficients after each round of training, performs the next round of training based on a new feature set, and simultaneously uses 10-fold cross validation on the training set in each round of training to obtain the validation accuracy of the current feature combination. And after the training is finished, finding out the characteristic data combination which has the largest influence on the final prediction result. The most important characteristic data for the basic classification algorithm is screened out through the process, and the accuracy of the prediction result of the basic classification algorithm is improved.

Further, the method of calculating the active power (P), the reactive power (Q), the apparent power (S), the power factor (λ), the 1 st harmonic (har1), the 3 rd harmonic (har3), …, and the 63 rd harmonic (har63) is as follows:

firstly, the effective value of the voltage is calculated in the way shown in the formula (9).

Where N is the number of data points in the calculation cycle, U_dVoltage values in a single data point;

the effective value of the current is calculated in the formula (10).

In the formula I_dAs current values in a single data point

The active power P is equal to the average power of the N test points in the data segment, see equation (11).

Apparent power S equal to voltage effective value U_rmsAnd the effective value of the current I_rmsThe product of (c) is shown in equation (13).

S＝U_rms×I_rms (13)

The reactive power Q is calculated according to equation (12).

The power factor is usually expressed by λ, and is the ratio of the active power P of the ac circuit to the apparent power S, see equation (14).

λ＝P/S (14)

The collected current data are subjected to conversion calculation by combining an equation (15) and a fast Fourier transform algorithm to obtain 2n +1 harmonics, wherein n is an integer, 32 odd harmonics such as fundamental waves, 3 harmonics, 5 harmonics, … harmonics, 63 harmonics and the like are selected as characteristics and are respectively marked as Har1, Har3, Har5, … and Har63, and the corresponding harmonic frequencies are respectively 50Hz, 150Hz, 250Hz, … and 3.15 KHz.

In the formula (15), ω is 2 pi/T is the angular frequency of the periodic function, T is the period of the voltage, and corresponds to domestic alternating current in China for 0.02 second, wherein ω is 50Hz, and h is the harmonic frequency. a is₀Is a direct current coefficient in the signal, a_h，b_hFor the AC coefficients in the signal, calculation is performed by Fourier transformAnd (6) obtaining.

Further, the optimal model combination obtained for the self-test data set comprises four base classifiers of LDA, NB, DT and LR.

Further, the best model combinations obtained for white (a global family and industry transient data set with a sampling frequency of 44.1KHz and a data set containing current and voltage data of more than 130 electrical) include four base classifiers KNN, DT, LR and BPNN.

The beneficial effect of adopting the further scheme is that the optimal model is combined, and then the optimal model in the combination is stored to the local so as to be called at any time in the using process of the system. Inputting real-time characteristic data into stored model, and generating prediction result in real time

Drawings

FIG. 1 is a flow diagram of the present invention.

FIG. 2 is a graph of the number of different algorithmic features when using measured data sets versus the accuracy of the validation set.

FIG. 3 is a graph of the number of different algorithmic features and the accuracy of a validation set when using a WHITED data set.

FIG. 4 is a flowchart of finding the optimal learner combination Com.

FIG. 5 is a graph showing the relationship between the average accuracy and the diversity when using the measured data set.

FIG. 6 is an enlarged partial view of FIG. 5;

figure 7 is the average accuracy and diversity relationship using the white dataset.

FIG. 8 is an enlarged partial view of FIG. 7;

figure 9 is a NILM frame diagram.

FIG. 10 is a diagram of characteristic data before and after an event occurs.

FIG. 11 is the accuracy of the LDA, NB, DT, LR and the algorithm herein testing on the measured data set.

FIG. 12 is the test accuracy of KNN, DT, LR, BPNN and the algorithm herein on a WHITED data set.

FIG. 13 is a test set confusion matrix for a self-test data set.

Fig. 14 is a real-time detection result of the hair dryer.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Example 1:

a multithreading-based non-intrusive load data real-time processing and identification method is used for carrying out real-time online monitoring and identification on the type and the running state of an electric appliance by acquiring real-time electric appliance electricity utilization data and comprises the following steps.

And step S1, receiving the original signal sent by the intelligent electric meter by using the main thread serial port, and analyzing according to a protocol to obtain original data.

In step S2, raw data is stored using child thread 1.

Step S3, using sub-thread 2 to display the raw data in real time to a display area of a display device, the display area displaying the raw data in a text format.

And step S4, the main thread analyzes the voltage and current data of each period in the original data, extracts features and obtains feature data.

In step S5, the sub-thread 3 is used to store the feature data, and the feature data is stored locally in real time.

Step S6, using sub thread 4 to call a real-time display function to initialize the oscillogram of the electric appliance to be displayed, wherein the wave form comprises a voltage wave form, a current wave form, an active power wave form, a reactive power wave form, a V-I track and a harmonic amplitude;

in step S7, the feature data is transmitted to the real-time display function in step S6 using the signal function of thread 5, so that the displayed feature data is updated and presented in the form of a waveform.

Step S8, the sub-thread 4 uses an event detection algorithm to detect events, and detects whether an event occurs according to the real-time characteristic data presented in step S7, wherein the event means that the change rate of the characteristic data is higher than a set change threshold value in a set detection period; the characteristic data used herein may be current data, and may also be active power, reactive power, etc.

And step S9, loading the trained Shannon entropy weighting voting algorithm model by using the sub-thread 6, reading the real-time characteristics, inputting the real-time characteristics into the Shannon entropy weighting voting algorithm model, and realizing the online identification of the type and the working state of the electrical appliance.

In the feature extraction described in step S4, the present invention extracts active power (P), reactive power (Q), apparent power (S), power factor (λ), 1-th harmonic (har1), 3-th harmonic (har3), …, 63-th harmonic (har63), and 36 features of 5 types in total.

The load characteristics are further divided into steady-state characteristics and transient characteristics, wherein the steady-state characteristics are electrical quantities of the load in a steady operation state, the transient characteristics are electrical quantities in a transient process, and the transient characteristics can be used for non-intrusive load identification alone or as supplementary characteristics of the steady-state characteristics.

Table 2 shows the typical characteristic data, and the respective advantages and disadvantages.

TABLE 2 common feature types

Active power (P), reactive power (Q), apparent power (S), power factor (λ), 1 th harmonic (har1), 3 rd harmonic (har3), …, 63 th harmonic (har63) are calculated.

The effective value of the current is calculated in the formula (10).

The square sum of the reactive power Q and the active power P is the square of the apparent power S, and the calculation method is shown in formula (12).

S＝U_rms×I_rms (13)

λ＝P/S (14)

The collected current data are transformed by combining an equation (15) and a fast Fourier transform algorithm, 32 odd harmonics such as fundamental wave, 3-th harmonic, 5-th harmonic, …, 63-th harmonic and the like are selected as characteristics in the example, the characteristics are respectively marked as Har1, Har3, Har5, … and Har63, and the corresponding harmonic frequencies are respectively 50Hz, 150Hz, 250Hz, … and 3.15 KHz.

Where ω is 2 π/T is the angular frequency of the periodic function in equation (15), where ω is 50Hz, T is the period, where 0.02 sec, h is the harmonic order, a₀Coefficient of direct current, a_h、b_hAnd the alternating current coefficient is calculated through Fourier transform.

In order to obtain the trained shannon entropy weighted voting algorithm model in step S8, the method includes the following steps:

step A1: the feature data are stored in a local formed feature database, recursive feature elimination is used for seven algorithms including a linear discriminant analysis algorithm LDA, a naive Bayes classification algorithm NB, a K nearest neighbor algorithm KNN, a decision tree classification algorithm DT, a support vector machine SVM, a logistic regression algorithm LR and a feedback neural network BPNN, recursive feature elimination is used, effective features of the algorithms are selected, the seven algorithms are trained on the basis of the effective features, trained algorithm models are obtained and are recorded as base classifiers, and 7 base classifiers are numbered from 0 to 6.

The Recursive Feature Elimination method (RFE) uses a base model to perform multiple rounds of training, eliminates the features of a plurality of weight coefficients after each round of training, then performs the next round of training based on a new Feature set, and simultaneously performs 10-fold cross validation on the training set in each round of training to obtain the validation accuracy of the current Feature combination. And after the training is finished, finding out the characteristic data combination which has the largest influence on the final prediction result.

FIG. 2 illustrates the effect of the algorithm for different feature quantities of the measured data set on the accuracy of the validation set;

figure 3 illustrates the effect of the algorithm for different feature quantities of the white dataset on the accuracy of the validation set.

The most important characteristics can be screened out through characteristic optimization, and the accuracy of the prediction result is improved.

The seven base classifiers are carried out

And combining to obtain 120 different algorithm model combinations.

Step A2: calculating diversity using formula (1) for the 120 algorithm model combinations obtained above;

step A3: the average accuracy is calculated by equation (2).

To measure ACC in average accuracy_iEach basic moldAnd training the model by utilizing the respective optimal characteristic combination, and calculating the accuracy of the test set predicted by the base classifier through a formula (16).

Where True (TP) indicates the number of classes for which data carrying Positive class labels are predicted, True Negative (TN) indicates the number of classes for which data carrying Negative class labels are predicted, False Positive (FP) indicates the number of classes for which data carrying Negative class labels are predicted, and False Negative (FN) indicates the number of classes for which data carrying Positive class labels are predicted.

Step A4: calculating the optimal range Ent of the diversity according to the formula (3) by using the maximum value, the minimum value and the midpoint of the diversity^*。

Step A5: and (4) performing weighted fusion on the prediction classification probability according to the formula (4).

The Shannon entropy weighting voting algorithm is to adopt E classifiers for weighting and fusing the classification problems of n classes, wherein x in the formula (4)_kIs a sample, p, in the feature data set X_En(x_k) Represents the E-th classifier prediction x_kIs the posterior probability of the nth class. w is a_iIs the ith classifier on sample x_kFusion weight of w_iCalculated according to the formula (5), H in the formula (5)_i(x_k) Shannon entropy, H, representing the ith classifier for all classification classes_i(x_k) The calculation was performed according to equation (6).

Step A6: adding each column in the formula (4) to realize the posterior probability weighted fusion of the categories to obtain P' (x)_k) P' (x) as shown in equation (7)_k) The largest column index in the set is for sample x_kThe prediction category of (2) is shown in formula (8).

Class(x_k)＝argmax(P″(x_k)) (8)

For the object belonging to Ent^*The combination in the range calculates the accuracy of Shannon entropy weighted voting, and the accuracy is recorded as ACC_se. Finding the maximum ACC_seAnd Ent^*And the combination closer to the middle position, namely the combination at the moment is the optimal learner combination which is marked as Com, namely the Shannon entropy weighted voting algorithm model.

The finding process of the optimal learner combination Com is shown in fig. 4, and the combination in the optimal range is found by calculating diversity and determining the optimal range of diversity, which is assumed to include C1. And calculating the Shannon entropy weighted voting accuracy of the C1 learner combinations, and finding the combination Com which is close to the middle position and has the highest Shannon entropy weighted voting accuracy in the optimal range.

For the measured data set used in this example, see FIG. 5, its diversity Ent_maxAnd Ent_minRespectively 0.495 and 0.0078. Viewing scope Ent^*Is 0.2514, the range increment σ is set to 0.033. Optimal learner combinations of measured data sets [0,1,3,5]Wherein 0,1,3,5 respectively represent LDA, NB, DT, LR.

When the data set adopts a WHITED data set, see FIG. 6, its diversity Ent_maxAnd Ent_minAre respectively 0.0473And 0.0013, diversity viewing Range Ent^*Is 0.0243 and the range increment sigma is set to 0.003. The best combination of the WHITED datasets is [2, 3,5, 6 ]]And represent four base classifiers of KNN, DT, LR and BPNN.

The optimal model combination obtained for the self-testing data set comprises four base classifiers of LDA, NB, DT and LR, and the optimal model combination obtained for WHITED comprises four base classifiers of KNN, DT, LR and BPNN. The features extracted by the base classifier on the two data sets and the best features used for the number of features are shown in table 1.

TABLE 1 features and number of features extracted by the base classifier on two datasets

It can be seen that the obtained optimal combination Com of the classification algorithms is different when different data sets are used, but the corresponding optimal combination can be obtained by using the method.

Example 2:

this example uses the non-intrusive load monitoring (NILM) framework of fig. 7 to acquire raw data and uses the self-test data set and white common data set to test the robustness of the algorithm of the present invention.

In the NILM frame, the power frequency of the intelligent electric meter is 50Hz, the sampling frequency is 6.4KHz, so that one period of data can be obtained every 0.02 second, each period comprises 128 current data points and voltage data points, and the data can be divided into high-frequency data and low-frequency data according to the sampling frequency. The low frequency data is data with a frequency lower than 1Hz, and the high frequency data is data higher than 1Hz, so the data used in this example is high frequency data.

According to the multi-thread framework shown in fig. 1, the method employs multi-thread processing, including a main thread and six sub-threads. When multithreading is initiated, data is sent and processed using the signal function and the slot function. The main thread receives original signal data of the instrument through a serial port, analyzes the original signal data to obtain original data, and starts the sub-thread 1 to store the original data in the local in real time.

At the same time, sub-thread 2 is activated, displaying the raw data in the text box of the display device. The original data is transmitted to a protocol analysis module, and then the analyzed voltage and current are transmitted to a feature extraction module to obtain feature data.

Meanwhile, the sub-thread 3 is started to store the obtained real-time characteristic data locally so as to record the characteristics of the equipment in operation.

While, while the main thread is running, the sub-thread 4 is activated, for displaying an image,

the sub-thread 5 is used for capturing the latest feature data in real time and sending the latest feature data to the real-time visual data of the sub-thread 4 for displaying.

In this way, the feature data updated in real time is presented in real time via the display device. In the processing of data by thread 4, the first step is to detect an event and the second step is to start thread 6 and send the latest feature data via signal 6.

And the thread 6 loads a trained algorithm and combines the received latest characteristic data to realize the online identification function of the electric equipment.

The event detection algorithm, which is adopted by the invention, is a Two-sided Cumulative Sum algorithm (CUSUM), and the principle of the algorithm is shown in FIG. 8. W₁The window is used to calculate the mean value mu of the sample sequence₀；W₂The window is the basis for judging whether an event occurs. When W is₂Inner f_k ⁺And gradually accumulating, and judging that a load event exists when a certain threshold value h is exceeded. If no load event is detected, W₂Inner f_k ⁺The value is not changed much, at this time W₁And W₂And sliding the window to the right to a new sampling point, and continuing to detect.

As shown in the formula (16), the algorithm has strong noise interference resistance and high accuracy, and the steady-state event can be accurately identified through the algorithm.

Wherein x_kIndicates the currently adopted sequence, mu₀Is the average of the sequence, usually known or estimable; θ is random noise. Statistic f when sequence is in steady state_k ⁺And f_k ^-Fluctuating randomly around the 0 value.

In this example, the real data and the white data set of the real appliances and the combination types thereof in table 3 are collected, and the effect of the method of the present invention is verified under the data of the white data set.

30 electric appliances with steady-state current waveforms in the WHITED are classified and identified, and the robustness of the method provided by the invention is finally verified by comparing an AdaBoost algorithm based on a decision tree, a random forest algorithm and a Stacking algorithm based on a multilayer perceptron (MLP).

For the measured data set, 128 data points are obtained in a single period after the data are analyzed by the detection system, and 38 types of electric appliances are obtained after the data are stored, as shown in table 3.

TABLE 3 actual tested appliances and their mix situation categories

Each appliance was sampled for 10 seconds to obtain 500 cycles of voltage and current waveform data, each cycle containing 128 sample points. Each numbered sample in table 1 was taken 20 times and the voltage and current data during steady state was averaged 20 times to reduce noise. The measurement is repeated in this way, and a total of 38 × 500 × 128 current data and voltage data of the same magnitude of the above 38 appliance combinations are obtained for training, wherein the data of each period is taken as a training sample.

An additional 1 second of data was collected for testing to ensure that the test data was fresh for the training data, i.e., the test set contained 38 x 50 x 128 current data and the same magnitude of voltage data.

Feature extraction is performed next, and the present invention acquires 36 features in total, which will be described in detail below.

Calculating the active power according to equation (11)

As the 33 rd feature.

Calculating reactive power according to equation (12)

As the 34 th feature.

Calculating apparent power S ═ U according to equation (13)_rms×I_rmsAs the 35 th feature.

As the 36 th feature, the power factor λ is calculated according to equation (14).

The amplitudes of the respective odd harmonics are calculated according to equation (15), and there are 32 harmonics in total as the 1 st to 32 nd features.

The method extracts 32 harmonics of apparent power, active power, reactive power, power factors and signals from the acquired current and voltage waveforms of each period as features, converts a training set of original data into vectors of 38 multiplied by 500 multiplied by 36, and each test-numbered electrical appliance comprises 500 samples and each sample comprises 36 features. Similarly, the test set was converted to a 38 × 50 × 36 vector, with 50 samples for each numbered appliance.

All data analyses were calculated using a dell instron 7591 computer, i5-9300HCPU, dominant frequency 2.40 GHz. Python3.7 was used as the programming environment.

The public data set white, still using 50Hz power frequency, has a sampling frequency of 44.1KHz, and contains 882 data points per cycle. By adopting the principle, 30 pieces of electrical appliance data with steady-state current waveforms in the WHITED are obtained and subjected to characteristic conversion. The harmonic features select the first 32 odd harmonics, so as to obtain a training set vector of 30 × 500 × 36. Similarly, a 30 × 50 × 36 test set vector is obtained.

After the characteristics are stored in a locally formed characteristic database, the invention uses recursive characteristic elimination and recursive characteristic elimination method RFE for seven algorithms including a linear discriminant analysis algorithm LDA, a naive Bayes classification algorithm NB, a K nearest neighbor algorithm KNN, a decision tree classification algorithm DT, a support vector machine SVM, a logistic regression algorithm LR and a feedback neural network BPNN, thereby selecting the most effective characteristic data aiming at each algorithm, respectively training the seven algorithms based on the effective characteristics, obtaining the trained algorithm model and obtaining 7 base classifiers with the numbers of 0-6.

Namely a base classifier: LDA 0, NB 1, KNN 2, SVM 3, DT 4, LR 5 and BPNN 6, wherein each base classifier is realized by calling a Sciket-Learn library.

Wherein, the parameters of the LDA model are default.

With NB model, parameters default.

The KNN algorithm is a neighbor algorithm and one of the simplest methods in a classification algorithm, a linear scanning mode is adopted, the k value is set to be 5, and the threshold value of the number of leaf nodes of the tree is set to be 30.

The SVM algorithm adopts a linear kernel function, and the penalty value represents the penalty degree of the error sample and is set to be 1.0.

The DT algorithm is a tree structure that uses Gini as a feature selection criterion, the depth of the tree is 150, and Best as a partition criterion to find the optimal partition point.

With LR model, parameters default.

The BPNN model includes an input layer, two hidden layers and an output layer. For the measured data set, 36 nodes are set on an input layer, 80 nodes are set on two hidden layers, 38 nodes set on an output layer represent 38 electric appliance types which are put into use, Relu is used as an activation function, the learning rate is 0.03, iteration is carried out for 320 times, and Adam is used for optimizing loss. For the white data set, the input layer and the hidden layer are unchanged, and the output layer is set to 30, representing the 30 appliance types selected in the white data set.

The seven models are combined with each other to obtain 120 different model combinations from pairwise combination to seven combinations.

The diversity is calculated using the non-classical entropy formula (1),

m is the number of base classifiers in a certain algorithm model combination, and K is the total amount of samples in the characteristic data set X; z (x)_k) For a combination of representative algorithmic models, the characteristic data sample x_kThe number of the base learners of which the predicted result is consistent with the real result;

calculating the average accuracy by the average accuracy formula (2)

m is the number of base classifiers in a certain algorithm model combination;

to measure ACC in average accuracy_iAnd training the models by utilizing the respective optimal characteristic combinations of the base models, and calculating the accuracy of the test set through a formula (16).

Maximum Ent with diversity_maxMinimum value Ent_minAccording to formula (3)

Calculating the optimal range Ent of the diversity^*。

The Shannon entropy weighting voting algorithm is to adopt E classifiers to carry out weighting fusion on n categories of classification problems, as shown in a formula (4)

Wherein x_kIs a sample in the data set X, p_En(x_k) Substitute for Chinese traditional medicineThe posterior probability that the E-th classifier predicts the nth class is shown. Wherein w_iIs the ith classifier on sample x_kThe fusion weight of (2) is shown in formula (5)

Wherein for sample x_k，H_i(x_k) The calculation method of (2) is shown in formula (6), which represents the shannon entropy of the ith classifier for all classes, and j represents the jth classification class.

Adding the columns of the formula (4) to realize the posterior probability weighted fusion of the categories to obtain P' (x)_k) As shown in the formula (7),

P″(x_k) The largest column index in the set is for sample x_kThe prediction category of (2), as shown in formula (8)

Class(x_k)＝argmax(P″(x_k)) (8)。

The finding process of the optimal learner combination Com is shown in FIG. 4, which belongs to Ent^*The combination in the range calculates the accuracy of Shannon entropy weighted voting, and the accuracy is recorded as ACC_se. Finding the maximum ACC_seAnd Ent^*The combination closer to the middle position, the combination at this time is the best learner combination, denoted as Com, and the specific algorithm is shown in algorithm 1.

For the measured data set, see FIG. 5, its diversity Ent_maxAnd Ent_minRespectively 0.495 and 0.0078. Viewing scope Ent^*Is 0.2514, the range increment σ is set to 0.033. Optimal learner combinations of measured data sets [0,1,3,5]Wherein 0,1,3,5 respectively represent LDA, NB, DT, LR.

Corresponding to the WHITED dataset, see FIG. 6, its diversity Ent_maxAnd Ent_min0.0473 and 0.0013, respectively, diversity viewing Range Ent^*Is 0.0243 and the range increment sigma is set to 0.003. The best combination of the WHITED datasets is [2, 3,5, 6 ]]And represent four base classifiers of KNN, DT, LR and BPNN.

In the figure, the ordinate is the average accuracy Avg of the learner combination, the abscissa is its diversity Ent, and a dot is a base classifier combination. The shannon entropy weighted voting accuracy of the base classifier combination is calculated at the same time, and is indicated by a star in the figure. The learner combination to be selected should try to maintain Ent^*The high average accuracy of the test set in the range is ensured, the high diversity is ensured, and the combined Shannon entropy weighting voting algorithm has the highest accuracy. Therefore, the selection of the optimal combination is generally close to the middle part, and the triangle in the figure represents the combination with the highest accuracy of the shannon entropy weighting voting algorithm, and the diamond corresponding to the combination is the balance point at which the average accuracy and the diversity are balanced, and is also the sought optimal learner combination.

For the self-test dataset, the average accuracy of the four classifiers LDA, NB, DT, LR and the shannon entropy weighting algorithm on the test set was 98.6%, 97.8%, 53.8% and 99.5%, respectively, see fig. 9.

For the white dataset, the average accuracy of the KNN, DT, LR, BPNN four classifiers and the shannon entropy weighting algorithm on the test set was 98.7%, 99.1%, 95.8%, 98.5% and 100%, respectively, see fig. 10.

For both data sets, the classifiers select the best feature combination according to features, and table 1 shows the feature combinations of the four classifiers.

TABLE 2 features and number of features extracted by the base classifier on two datasets

The test confusion matrix of the algorithm of the present invention on the measured data set is shown in fig. 11, and it can be seen that 7 samples in the category 26 (desktop + laptop + egg boiler + electric hot wind 2) are predicted to be 19 (desktop + egg boiler + electric hot wind 2). 3 samples in class 27 (desktop + laptop + eggbeater + electric blanket 2) are predicted to be class 16 (desktop + laptop + eggbeater). Looking at the 10 samples tested in the four base classifiers, they were all found to be misclassified and thus still misclassified after integration.

Further comparing the shannon entropy weighted voting algorithm provided herein with other weighting algorithms (see table 4), the result is shown in table 5, which shows that the shannon entropy weighted voting algorithm has higher accuracy, and the robustness of the algorithm is higher than that of other weighting methods.

Table 4 common 4 weights

TABLE 5 comparison of Algorithm accuracy for different weights

Further, the shannon entropy weighting integration algorithm provided by the method is compared with other integration learning algorithms commonly used in the current literature, and mainly comprises an AdaBoost algorithm based on a decision tree, and a Stacking algorithm taking a random forest and a meta model as MLP (modeling and learning) algorithm. The test accuracy of the measured data set and the WHITED public data set is calculated. The parameters and accuracy results for each algorithm used are shown in table 6. It can be seen that the shannon entropy weighted voting-based algorithm provided by the invention has a significant improvement in the test accuracy of both data sets.

TABLE 6 Multi-model accuracy comparison of measured data

By the above principle, an optimal model combination is obtained for the self-test data set. Next to save, the present invention saves the model using "Joblib" to make the model persistent.

Model reading and prediction, the invention adopts 'Joblib' to load the model in 6 types of sub-threads, and inputs the real-time updated characteristic sequence into the model to predict the classification result.

According to table 7, the time required for five time consuming operations is shown. If a single thread is used, the total time consumed is their sum, i.e. 44 ms.

In the case of multithreading, the total time consumed depends mainly on the main thread, being the sum of the first two threads, i.e. 6 ms. According to table 8, the total time consumed by the four algorithms LDA, NB, DT, LR is small for the measured data. However, the accuracy of the shannon entropy based weighted voting algorithm is highest according to the recognition accuracy.

The industrial frequency in many countries is 50Hz, 0.02S is required for one cycle of data, the algorithm running time required is less than 0.02, and the single thread is definitely timed out since the algorithm time needs less than 20 ms.

TABLE 7 time spent in time consuming operation

TABLE 8 time comparison of five algorithms for single thread and multithreading

The real-time detection process is shown in fig. 12, after an event is detected, the electric hair drier is obtained through a steady-state data analysis result, and the visual data are obtained in a mode of displaying an active power oscillogram.

The multithreading-based non-intrusive load real-time online monitoring and identifying system provided by the invention based on the Shannon entropy weighted voting algorithm can meet the requirements of real-time tasks and has higher identifying precision.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. The real-time online monitoring and identification method for the power load is characterized in that a Shannon entropy weighted voting algorithm model is used for identifying the type and the working state of the power load, and the determination process of a base classifier combination in the Shannon entropy weighted voting algorithm model is as follows:

selecting E base classifiers to carry out

Combining to obtain a plurality of different algorithm model combinations, calculating the diversity Ent of the different algorithm model combinations, and determining the optimal value range Ent of Ent^*；

For the object belonging to Ent^*The combination of the algorithm models in the range, the accuracy of the Shannon entropy weighted voting of the combination of the algorithm models is calculated and is recorded as ACC_se；

Ent^*Inner distance Ent^*ACC with closest intermediate point and largest numerical value_seAnd the corresponding algorithm model combination is the optimal learner combination which is marked as Com, and the trained Shannon entropy weighted voting algorithm model is obtained.

2. The method for real-time online monitoring and identification of electrical loads according to claim 1, wherein the ACC is used for monitoring and identifying the electrical loads_seThe calculation process is as follows:

1) according to the selected E base classifiers, the prediction classification probability is subjected to weighted fusion according to a formula (4) aiming at the condition of n classification classes

P in formula (4)_En(x_k) Represents the sample x for the feature data_kThe E-th base classifier predicts the sample x_kIs the posterior probability of the nth class. w is a_iIs the ith base classifier to sample x_kFusion weight of w_iThe calculation formula is as follows:

in the formula (5), the reaction mixture is,

H_i(x_k) Representing feature-based data samples x_kThe Shannon entropy, H, of the ith base classifier calculated for all classification classes_i(x_k) The calculation formula is as follows:

i represents the ith base classifier; j represents the jth class;

2) accumulating the columns in the formula (4) respectively to realize the weighted fusion of the posterior probabilities of the categories to obtain P' (x)_k) As shown in formula (7)

In the formula (7), the reaction mixture is,

3)P″(x_k) The column index of the element with the largest numerical value is the sample x of the feature data_kPrediction Class (x) of_k)，

Class(x_k)＝arg max(P″(x_k)) (8)

arg represents the sequence corresponding to the element with the largest value in the data list; thus, a pair of samples x is obtained_kThe predicted result of (2); according to the steps, obtaining prediction results of all characteristic data samples;

4) calculating the prediction accuracy ACC of each algorithm model combination based on the prediction results and the real results of all training samples_se

TP_se、TN_se、FN_se、FP_seThe number of the predicted result and the real result of the se combination on the training sample is shown as follows: TP_seReally, the number of samples which show real results as positive classes is predicted as the positive classes; TN (twisted nematic)_seTrue negative, which represents the number of samples with true results as negative class predicted as negative class; FP_seIf the sample is false positive, the number of samples with true results of negative classes is predicted to be positive classes; FN (FN)_seIf the true result is false negative, the number of samples with true results of positive class is predicted as negative class; c1 is to select E base classifiers

The number of algorithm model combinations obtained by combination.

3. The real-time online electrical load monitoring and identification method according to claim 1, wherein the algorithm model is combined with a diversity Ent and an optimum thereofValue range Ent^*The calculation process of (2) is as follows:

1) calculating the diversity Ent of each algorithm model combination by using a formula (1);

2) calculating the optimal range Ent of the diversity according to the formula (3) by using the maximum value and the minimum value of the diversity Ent^*

In the formula (3), σ is Ent^*The half interval range of (a) is an empirical setting, taking a number between 0 and 1.

4. The real-time online monitoring and identification method for the electric load according to any one of claims 1 to 3, characterized by comprising the following steps:

s2, storing the original data by the sub thread 1;

s4, analyzing the original data by the main thread, and extracting characteristic data;

s5, storing the characteristic data to the local in real time by the sub-thread 3 to form a characteristic data set;

s6, the sub-thread 4 calls a display function to initialize the oscillogram of the electric appliance to be displayed;

s7, the sub-thread 5 reads the characteristic data in real time and transmits the characteristic data to the display function in the thread 4 to update the characteristic data in the display;

s8, the sub-thread 4 detects whether an event occurs according to the real-time characteristic data; the event means that the change value of the characteristic data is higher than a set change threshold value in a set detection period;

s9, the sub-thread 6 loads the trained Shannon entropy weighting voting algorithm model, reads real-time characteristic data and inputs the real-time characteristic data into the Shannon entropy weighting voting algorithm model to realize the online identification of the type and the working state of the electric appliance.

5. The real-time online monitoring and identification method for electrical loads according to claim 4, wherein after S2, before S4, there are further steps S3: the sub-thread 2 displays the original data in real time through a display device; the display device displays the raw data in a text format.

6. The real-time online monitoring and identification method for the electrical load according to claim 4, wherein the waveforms in step S6 include voltage waveform, current waveform, active power waveform, reactive power waveform, V-I trace and harmonic amplitude.

7. The real-time online monitoring and identification method for the electrical load according to claim 4, wherein the characteristic data in step S4 includes active power (P), reactive power (Q), apparent power (S), power factor (λ), and (2g-1) subharmonic, where g is a positive integer.

8. The method for real-time online monitoring and identification of electrical loads according to claim 4, wherein in step S5, the feature data are stored locally to form a feature database, the main thread uses recursive feature elimination for a plurality of basic classification algorithms to select the feature data effective for each basic classification algorithm, and the basic classification algorithms are trained based on the corresponding effective feature data to obtain a plurality of trained basic classification algorithms, which are called as basic classifiers.

9. The real-time online power load monitoring and identifying method according to claim 8, wherein the basic classification algorithm comprises at least 2 of a linear discriminant analysis algorithm, a naive bayes classification algorithm, a K-nearest neighbor algorithm, a decision tree classification algorithm, a support vector machine, a logistic regression algorithm, and a feedback neural network.

10. The method as claimed in claim 4, wherein the sub-thread 4 detects the occurrence of an event by using a bilateral accumulation sum algorithm.