CN109387712B

CN109387712B - Non-invasive load detection and decomposition method based on state matrix decision tree

Info

Publication number: CN109387712B
Application number: CN201811170715.6A
Authority: CN
Inventors: 苏鹭梅; 郑锐洁; 郑小龙; 朱文婷; 张宝琼; 邓冠森
Original assignee: Xiamen University of Technology
Current assignee: Xiamen University of Technology
Priority date: 2018-10-09
Filing date: 2018-10-09
Publication date: 2021-04-13
Anticipated expiration: 2038-10-09
Also published as: CN109387712A

Abstract

The invention relates to a non-invasive load detection and decomposition method based on a state matrix decision tree, which comprises the following steps: s1, preprocessing sample data, including data cleaning, data integration and data reduction, to obtain effective sample data; s2, determining a data sample period by using spectrum analysis; s3, selecting load characteristics based on a sequence forward characteristic selection algorithm and a K-means clustering algorithm, and extracting the load characteristics with high identification degree by utilizing a time sequence characteristic selection algorithm according to a sample period; s4, establishing an automatic identification single equipment working state model based on the improved sliding window bilateral CUSUM event detection algorithm and the load identification and decomposition of the decision tree, introducing the state matrix decision tree on the basis, and establishing a load time sequence characteristic probability model, thereby realizing the automatic identification of the working state of the superposed equipment. The method has high identification efficiency and good practicability.

Description

Non-invasive load detection and decomposition method based on state matrix decision tree

Technical Field

The invention relates to the field of electric power big data, in particular to a non-invasive load detection and decomposition method based on a state matrix decision tree.

Background

In recent years, the automatic power load monitoring and decomposing method based on the measurement sensing technology has obvious advantages compared with the manual investigation method, and therefore the method is widely concerned. The implementation modes of the method are mainly divided into two types:

one method is to equip each electric device with a sensor with digital communication function inside the total load and collect the electricity information of each electric device through the communication network, this way is called as intrusive residual load monitoring (ILM); another method is to install a sensor at the user entrance of the power grid, and monitor the power consumption and operating status of each or every type of electric equipment by collecting and analyzing the total power consumption or total current of the user, so as to know the power consumption and power consumption law of each or every type of electric equipment in the user's home, which is called non-intrusive load monitoring and decomposition (NILMD).

The electricity consumption analysis and measurement based on the NILMD technology takes the electricity consumption information of specific indoor electric equipment as a monitoring target, so that the obtained information has important significance for optimizing the planning, operation and management of a power grid of an electric power company, saving the electricity consumption and the electricity charge of a user and realizing the improvement of ecological civilization consciousness to specific activities in the whole society. Compared with the intrusive detection of a built-in sensor, the non-intrusive resident power load detection and decomposition technology is the most popular load power consumption detail detection technology with low cost at present.

The efficiency of the existing non-invasive resident power load detection and decomposition method in the application of load identification is not ideal, and the algorithm is relatively complex.

Disclosure of Invention

Therefore, the present invention provides a non-intrusive load detection and decomposition method based on a state matrix decision tree to solve the above technical problems. Therefore, the invention adopts the following specific scheme:

the non-invasive load detection and decomposition method based on the state matrix decision tree comprises the following steps:

s1, preprocessing sample data, including data cleaning, data integration and data reduction, to obtain effective sample data;

s2, determining a data sample period by using spectrum analysis;

s3, selecting load characteristics based on a sequence forward characteristic selection algorithm and a K-means clustering algorithm, and extracting the load characteristics with high identification degree by utilizing a time sequence characteristic selection algorithm according to a sample period;

s4, establishing an automatic identification single equipment working state model based on the improved sliding window bilateral CUSUM event detection algorithm and the load identification and decomposition of the decision tree, introducing the state matrix decision tree on the basis, and establishing a load time sequence characteristic probability model, thereby realizing the automatic identification of the working state of the superposed equipment.

Furthermore, the data cleaning method is a Grabbs method, and the data cleaning method is realized by judging the sample dataCalculating a deviation value to determine a "suspect value", and calculating G_iValue, by looking up the Grabbs table, G_iThe value is compared with a threshold value GP (n) given in the Grubbs table, if G_iIf the value is larger than the threshold value GP (n) in the table, the sample data is judged to be abnormal.

Further, the data integration method is a correlation coefficient method, a correlation coefficient is obtained by calculating the standard deviation and covariance of sample data after data cleaning, the strength of the relationship between the standard deviation and the covariance is judged according to the numerical value of the correlation coefficient, the value range of the correlation coefficient is between 1 and-1, wherein 1 represents that two variables are completely linearly related, -1 represents that two variables are completely negatively related, 0 represents that two variables are not related, and the data approaches to 0 and represents that the correlation relationship is weaker.

Furthermore, the data reduction method is a regression analysis method, the relation between variables is refined and solidified on the basis of the association degree among the parameters obtained by data integration, irrelevant variables are removed, the dimensionality of the analyzed data sample is reduced, and a reliable model is excavated.

Further, the specific process of step S2 is:

performing data screening and grouping on the screened characteristic values at intervals of a certain quantity according to a specific period, wherein the grouping method is to perform Fourier transform on the time sequence characteristic quantity to obtain an intensity frequency spectrum, find out the maximum frequency component and determine the reciprocal of the maximum frequency component as the period;

further, the specific process of step S3 is:

s31, determining the optimal feature subset according to the sequential forward feature selection algorithm, and setting that k features are selected to form a feature group X with the size of k_kThe unselected d-k features X_jJ-1, 2, 3.., d-k, arranged in J value size after combination with the features already selected, the sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m;

s32, evaluating the separation degree of the characteristics among the different types of samples by adopting a K-means clustering algorithm, wherein from the perspective of geometric intuition, the larger the separability among the types is, the larger the distance among the types is, the more the classification among the different types of samples is, and meanwhile, the smaller the intra-type distance is, the higher the intra-type clustering degree is; giving a sample set K, and dividing the sample set into K clusters by a K-means algorithm, wherein each cluster center is the mean value of samples in the clusters; then distributing the other objects to the nearest cluster according to the distance between the other objects and all samples in each cluster, then requiring the center of a new cluster, and continuously repeating the iterative positioning process to ensure that the sum of the distances between all samples and the center in each cluster is minimum until the target function is minimized, thereby selecting the optimal characteristic;

s33, calculating the operating characteristic value of the electric equipment, eliminating invalid periods in the sample data, selecting 15 period data with feasibility as the sample data, calculating the characteristic value of the 15 period data, and then classifying the characteristic values to extract the characteristics with the highest identification degree.

Further, the specific process of step S4 is:

s41, dividing the active power of all equipment in the sample data into three equipment attributes of high, medium and low according to the maximum value of the active power of all the equipment in the sample data;

s42, based on a C4.5 decision tree classification algorithm, considering that each load is classified into one type, namely leaf nodes in the decision tree, comparing attribute values at internal nodes of the decision tree in a top-down recursion mode, and classifying the loads in a mode of judging downward branches from the nodes according to different attribute values until each type only contains a unique result, namely leaf purity, performing load identification according to the obtained optimal load characteristic parameters, and judging which equipment the current power data conforms to;

s43, introducing an improved sliding window bilateral CUSUM event detection algorithm to identify steady-state characteristics and transient characteristics of active power, continuously tracking the change of each equipment state at each sampling point through an event detection program, and detecting whether a certain load has the change of the state in the whole time sequence to realize the identification of the load in the time sequence, thereby judging the operation time of the equipment at the current time; then, carrying out load decomposition to obtain that the current moment of the current data is in a certain state of certain equipment;

s44, establishing an equipment power state matrix according to the transient characteristic and the steady-state characteristic of active power, averaging the steady-state characteristic and the transient characteristic of the equipment state power through training samples, solving a standard deviation as a fluctuation level, introducing a state matrix decision tree, and establishing a load time sequence characteristic probability model, so that the optimal solution of the current superposition operation equipment is established, and finally automatic identification of the equipment is realized.

Further, the state change of the load in step S43 includes the input and cut-off of the load, the switching of the shift position, and the change of the operation state.

By adopting the technical scheme, the invention has the beneficial effects that: the method of the invention can improve the load identification efficiency and has better practicability.

Drawings

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures. Elements in the figures are not drawn to scale and like reference numerals are generally used to indicate like elements.

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a flow chart of feature selection in the method of the present invention;

FIG. 3 is a flow chart of single device identification in the method of the present invention;

FIG. 4 is a flow chart of overlay device identification in the method of the present invention;

FIG. 5 is a flowchart of a sliding window bilateral CUSUM event detection method in the method of the present invention;

FIG. 6 is a schematic diagram of four stages of event detection in the CUSUM event detection method of FIG. 5;

FIG. 7 is a schematic diagram of power spectrum analysis of a computer.

Detailed Description

The invention will now be further described with reference to the accompanying drawings and detailed description.

Referring to fig. 1, a general flow of the method of an embodiment of the present invention is described. The method mainly comprises the following steps: sample data preprocessing S1, sample data period determination S2, load feature selection and extraction S3 and load identification and decomposition S4. Each step is described in detail below.

Data pre-processing

The data preprocessing is mainly divided into 3 steps: (1) data cleaning: correcting recognizable errors in the data file, processing invalid values, missing values and abnormal values and checking data consistency; (2) data integration: analyzing the correlation among the data variables; (3) and (3) data reduction: and reducing the number of variables by using dimension reduction.

(1) Data cleaning: the "suspect value" is judged herein using the Grubbs method and is removed from the data sample without participating in the calculation of the mean.

The first step is as follows: determination of "suspect value": deviation value is maximum (minimum) to average

The second step is that: calculation of G_iThe value:

(where i is the rank number of the suspect value,

is the residual error, s is the standard deviation)

The third step: look-up Grabbs table G_iComparing with a critical value GP (n) given by the Grubbs table, if G_iIf the value is greater than the critical value GP (n) in the table, the measured data can be judged to be abnormal and can be eliminated.

(2) Data integration: since the subject matter provides a large amount of power utilization equipment data, considering that a high degree of correlation may exist between some parameters, the correlation coefficient method is adopted to reflect the degree of relationship among variables.

The following is a calculation formula of the correlation coefficient:

sxy sample covariance calculation formula:

sx sample standard deviation calculation formula:

sy sample standard deviation calculation formula:

wherein r is_xyRepresenting the sample correlation coefficient, S_xyRepresents the sample covariance, S_xSample standard deviation, S, for X_ySample standard deviations for y are indicated. Coefficient of correlation r_xyThe correlation degree table of (2) is shown in table 1:

TABLE 1 correlation coefficient r_xyReference table of degree of correlation

The value interval of the correlation coefficient is between 1 and-1. 1 indicates that the two variables are completely linearly related, -1 indicates that the two variables are completely negatively related, and 0 indicates that the two variables are not related. The closer the data is to 0, the weaker the correlation is.

(3) And (3) data reduction: because the number of data samples for analysis is huge, dimension reduction is needed, and the relation between variables is refined and solidified on the basis of the association degree between parameters obtained by data integration so as to excavate a reliable model. Therefore, the method of regression analysis is adopted to remove the independent variables, so that the dimensionality of the analyzed data sample is reduced. Taking the current, active power, reactive power, power factor, and second harmonic current of the laser printer and the notebook computer in the continuous variable state device as examples, the results obtained by regression analysis of MATLAB 2016a are shown in tables 2 and 3:

TABLE 2 correlation between laser printer parameters

TABLE 3 correlation between computer parameters

Determining a sample period

And screening and grouping the screened characteristic values at intervals of a certain quantity according to a specific period, wherein the grouping method comprises the steps of carrying out Fourier transform on the time sequence characteristic quantity to obtain an intensity frequency spectrum, finding out the maximum frequency component and determining the reciprocal of the maximum frequency component as the period.

The fourier transform of the periodic discrete-time signal x (nt) can be expressed as:

wherein, the finite-length discrete signal x (N), N ═ 0, 1, …, N-1.

Fig. 7 shows a computer spectral analysis. The period we estimated from the raw data is about 400s, and the second highest frequency obtained with our algorithm is about 0.0025Hz, which is consistent. The reason why the frequency of the highest amplitude is not used is that because our data is non-periodic, the highest amplitude occurs near zero and the corresponding frequency of the next highest amplitude is closer to the data period.

Feature selection

The load characteristics of the electric equipment are mainly classified into steady-state characteristics and transient-state characteristics. The steady state characteristic refers to a characteristic extracted when the load is at a stable power consumption level, and the transient state characteristic refers to an operation characteristic extracted when the load is in an instant state when the load is in an on, off or switching state. The feature selection process is shown in fig. 2:

(1) the optimal feature subset is determined herein according to a sequential forward feature selection algorithm. Let it be assumed that k selected features form a set of k sized features X_kThe unselected d-k features X_jJ-1, 2, 3,.., d-k, arranged in J value size in combination with the already entered features:

that is to say if

J(X_k+x₁)≥J(X_k+x₂)≥…≥J(X_k+x_d-k) (6)

The next step is to select the feature set as

X_k+1＝X_k+x₁ (7)

The sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m.

(2) The K-means clustering algorithm is adopted to evaluate the separation degree of the characteristics among different types of samples. From the perspective of geometric intuition, the larger the separability between classes is, the larger the distance between classes is, the farther the classification between different classes of samples is, and meanwhile, the smaller the intra-class distance is, the higher the intra-class aggregation degree is.

Giving a sample set K, and dividing the sample set into K clusters by a K-means algorithm, wherein each cluster center is the mean value of samples in the clusters; and then distributing the other objects to the nearest cluster according to the distances between the other objects and all samples in each cluster, and then requiring the center of a new cluster, wherein the iterative positioning process is repeated continuously, so that the sum of the distances between all samples and the center in each cluster is minimum until the objective function is minimized.

Feature extraction

Since the operation characteristics of some electric devices (e.g., microwave ovens) are more complex than those of other electric devices and cannot use power characteristics as identification characteristics, in order to overcome the limitation of load identification methods based on power variation, we search and extract load characteristics with higher identification degree from the operation sample period of these devices. Specifically, 15 period data with feasibility are selected as sample data, feature values of the 15 period data are calculated, and features with the highest recognition degree are selected through feature value classification.

Load identification and decomposition

Fig. 3 and 4 show the identification flows of the single device and the superimposed device, respectively. For the identification of single equipment, firstly, the active power PC of the electric equipment can be divided into three attributes of high, medium and low according to the maximum value of the active power PC, load identification is carried out by utilizing a C4.5 decision tree classification algorithm, and the equipment to which the current power data belongs is judged. And at the moment, introducing an improved sliding window bilateral CUSUM event detection algorithm to identify steady-state characteristics and transient-state characteristics of active power, judging whether equipment is put in or cut off at the moment, and obtaining the equipment state in the time period.

The identification of the superimposed devices is more complicated than the identification of a single device, and at this time, a power state matrix of the electric device needs to be introduced. Firstly, carrying out load identification by using a decision tree classification algorithm, and judging which equipment the current equipment data belongs to is superposed; secondly, identifying steady-state characteristics and transient characteristics of active power according to an improved sliding window bilateral CUSUM event detection algorithm, and judging the operation time of the equipment group at the current moment; then, carrying out load decomposition to obtain that the current moment of the current data is in a certain state of certain equipment; and finally, searching the optimal solution of the state matrix decision tree to obtain the real-time power consumption of each device.

According to whether the input and the removal of the load of the equipment need to be detected firstly when the equipment is identified, a method called based on event detection is introduced, namely, the change of the state of each equipment is continuously tracked at each sampling point through an event detection program. This method is to realize the identification of the load in the time series by detecting whether there is a change of state in a certain load throughout the time series.

According to the feature selection and feature extraction of the previous work, in the whole mathematical model, the event detection is realized by detecting the active power PC in the time sequence. The classic literature on the study of the NILMD system (Quinlan J R.C. 4.5: programs for machine learning [ J ], 1993.) uses a segmentation detection method to divide the time series into steady-state and transient characteristics by the variation value of active power acquired by experiments.

For the device identification, there are two categories, one is the identification of the device state, and the other is the identification and decomposition of the superposition state of a plurality of devices. And classifying the loads according to the working characteristics of the loads, selecting proper characteristics, and identifying and decomposing the loads by using a C4.5 decision tree algorithm.

Improvement of sliding window bilateral CUSUM event detection algorithm

Setting an active power time sequence

Defining two continuous sliding windows Ws (steady state mean window) and Wu (transient state mean window) in the time sequence, defining the lengths of the windows as s and u respectively, and calculating the mean value A of the two windows respectively_sAnd A_uThe calculation formula is as follows:

then define respectively

And

for detecting whether the time series is switched on (i.e. the power is increased) or switched off (i.e. the power is decreased) at the current moment, and defining a fluctuation level epsilon for representing the time series in a steady state, the calculation formula is as follows:

taking the time sequence whether to have an event starting or changing the state as an example, the flow of the sliding window bilateral CUSUM event detection method is as follows, taking the detection of the input event as an example, when the detection window A is used_uA value of greater than A_uWhen the sum is + epsilon,

the increment is started. At this time, a threshold value range K for determining the occurrence of the event needs to be set when

In order to avoid the multiple recognition of the load turn-on or turn-off event caused by the sequence oscillation, a time delay factor d (with an initial value of 0) is introduced, and each time the delay factor is added by l, the event can be generated at the moment

And

make a comparison if

Then it is considered that what caused the active power change at that time is a fluctuation, and order

d is 0, so that multiple recognition events caused by device data fluctuation are avoided. When in use

Then let d equal d + l, calculate

Up to

The detected time of occurrence of the event can be derived from t-d. The sliding window bilateral CUSUM event detection process taking the detection of the load input event as an example is shown in fig. 5, and the process of detecting the closing event can be obtained in the same way.

When the sliding window of the sliding window bilateral CUSUM event detection program slides over the occurrence time of an event, the sliding window bilateral CUSUM event detection program can be divided into 4 stages, as shown in fig. 6, where P is₀Is the active power before the occurrence of an event, and Δ P is the active power after the occurrence of an event and P₀The difference of (a).

a. The first phase is when the transient detection window has not yet slid to the event occurrence, and the values of both windows remain unchanged, i.e. A_u–A_s＝0；

b. The second phase is when the time of occurrence of the event is within the transient detection window, A_uIs constantly changing, and A_sDo not change, this time order P₁＝P₀+. DELTA P, and set t_d＝t-t₁And t is_dE (1, u), then at this stage every moment in time corresponds to a_sAnd A_uAre respectively A_s＝P₀，

c. The third phase is when the time of occurrence of the event is within the mean calculation window, A_uInvariable, A_sConstantly changing, and (t)_d-u) e (1, s-1), where A corresponds to each time instant_sAnd A_uAre respectively as

A_u＝P₁；

d. The fourth stage is when both windows have slid past the event detection window, A_sAnd A_uNo change occurs.

The above calculation and analysis of the threshold K are based on the devices that are turned on instantaneously, but many of the residential electric devices are not turned on instantaneously, such as microwave ovens, printers, and the like. In order to reduce the error rate of event identification, a compromise scheme is introduced, and the maximum and minimum values of threshold values are used as the threshold values for determining the occurrence of the event, namely, the command

From the above derivation, it is only necessary to determine As and A_uK, and the minimum power of the device identified at that time, may be determined. Then, the value range of the threshold K for determining the occurrence of time can be obtained as follows:

K＝(K_max+K_min)/2 (12)

based on the sliding window bilateral CUSUM event detection method, an active power sequence can be cut into corresponding parts according to event detection points, active power with steady-state characteristics and a fluctuation level epsilon in a steady state are introduced, at the moment, the average value m of the cut power is extracted according to statistical characteristics and characteristics, the event detection points at the current moment correspond to the detection points at the next moment, and then the overall operation time and the operation state of a single device are judged.

Mathematical model based on state matrix decision tree

The superposition state data needs to identify and decompose the load, and one equipment power is established according to the transient state characteristic and the steady state characteristic of the active powerA state matrix for averaging steady-state and transient-state characteristics of the state power of the device by training samples

And the standard deviation thereof is taken as the fluctuation level δ.

C4.5 decision tree classification algorithm

Decision Tree (Decision Tree) is a classification algorithm in the field of data mining, and is used for expressing the mapping relationship between object values and attributes. Each node in the tree represents an object and each divergent path represents a possible attribute value, and each leaf node corresponds to the value of the object represented by the path traversed from the root node to the corresponding leaf node. The decision tree classification method adopts a top-down recursion mode, compares attribute values at internal nodes of the decision tree, judges downward branches from the nodes according to different attribute values, and obtains conclusions at leaf nodes of the decision tree. When the decision tree is applied to load identification, each load can be considered as a class, which is equivalent to a leaf node in the decision tree, and classification is performed through the decision tree until each class only contains a unique result, namely the leaf node is pure. The attributes in the decision tree, i.e. the load characteristic parameters, are the basis for judging different downward branches.

Two features are introduced herein, one is the active power feature, which refers to the changed value of the device to transition to another operating state; the harmonic wave characteristic is that the harmonic wave data contains the unique characteristics of different kinds of electric appliances, especially the on/off state of the detection equipment is obvious, and the harmonic wave data can be directly used for detecting the equipment in the on/off state. There are some states of the continuous state-changing device that cannot be simply reflected by the difference of the active power, such as printing and copying of the printer, and restarting and opening of the notebook. And the load state can be more effectively identified by the active power and harmonic wave characteristics.

The C4.5 decision tree classification algorithm isSupervised classification learning algorithms. Let one sample set be PC. The proportion of the kth class sample in the sample set is P_k(k is 1, 2, … …, a), where a is the total number of classes in a sample, the sample set information entropy is defined as shown in the formula:

assuming that the sample set is divided according to the attribute B, if there are X possible values in the attribute B, X branch nodes are generated, where the xth (X ═ 1, 2, … …, X) branch node includes all values in the sample set that take the value B on the attribute B^xSample of (2), denoted as C^x(ii) a The "information gain" (information gain) obtained by dividing the sample set by the attribute B can be defined as follows:

further, the information gain ratio of the attribute B:

the gain ratios of different attributes can be calculated according to the formula, the attribute with the maximum gain ratio is selected as the splitting attribute of the splitting, the gain ratios of other attributes are calculated in the same mode, and the splitting is performed successively until all equipment states are distinguished or all samples are subjected to value phase on all attributes until the splitting cannot be performed.

Experimental testing and results analysis

Sequential feature selection algorithm based K-means clustering feature selection result analysis

Matlab 2016a is adopted to perform K-means cluster analysis on electric equipment (incandescent lamps, hot water kettles, fans, water dispensers, electric hair dryers, laser printers, microwave ovens and the like). From the analysis of the clustering result, the active power or the step of the active power and the reactive power is found to be used as the characteristic quantity, and the high-power load with the characteristics of obvious power consumption and the like, such as a hot water kettle, an electric hair drier, a fan, a water dispenser and the like, is easy to identify. However, it is obviously not feasible to select the power change as the characteristic of all the electric devices to identify, for example, when a microwave oven with multi-state electric devices is switched among gears with small fire, medium fire and high fire, the active power or the reactive power of the microwave oven does not belong to simple step change, and the power of the microwave oven is stabilized on the power of the corresponding gear after a period of time after the change of several values, so the K-means clustering effect of the microwave oven is not ideal.

Feature extraction result analysis based on time domain selection features

A feature extraction method of time domain statistical features is adopted, and the ratio R of the low value to the high value of the power of the microwave oven in one operation period is finally selected as the operation features of the microwave oven by comparing various time domain statistical features such as mean value, variance, skewness and the like. The state of the microwave oven at the moment is judged by observing the change of the R value in one period: if the value of R is increased, the microwave oven is in a lower gear at the moment; the value of R is decreased, which indicates that the microwave oven is in a higher gear at this time.

Mathematical model testing and results based on improved event detection

To detect the load recognition and decomposition effects based on event detection proposed herein, the following classifications will be made for the electrical consumers, as shown in tables 4 and 5 below:

table 4: load classification by state

Table 5: load classification by power size

Then use N_gIndicating what the event detection model detectedNumber of pieces, N_lIndicating a small number of recognized events, N_cThe number of the multiple recognized events is represented, eta represents the detection efficiency of the event detection program, and is defined as:

the results of the experiment are shown in table 6 below:

table 6: event detection results

The event detection is based on the steady-state and transient-state characteristics of active power, so that few identification events exist, namely, the characteristics of the state of the event detection are too similar to those of other states; secondly, the fluctuation level of the active power of the continuously variable state equipment is too large, which is not beneficial to the detection of the event.

Test and result of load identification and load decomposition of state matrix decision tree algorithm

1. Establishment of device power state matrix

An equipment power state matrix is established according to the transient characteristic and the steady-state characteristic of the active power, and the results are shown in the following table 7:

TABLE 7 device Power State matrix

2. Implementation of load recognition algorithms in a group of devices

The load identification program identifies the correct number of each type of load by N_T(T is 1, 2, 3, 4, 5) and the number of recognition errors is N_FFor indicating, identifying, or correcting rateη₁The results are shown in the following table:

table 8: load recognition algorithm recognition effect

From the load identification result, the overall accuracy of the identification program to the load reaches 86.36%, wherein the identification rate to the start/stop two-state load reaches 100%, the identification rate to the limited multi-state load reaches 90%, and the identification rate to the continuous variable-state load is lower, and the following results can be obtained through analysis: firstly, because part of state active power in the device-power matrix is relatively close in training data, the threshold setting in the constructed decision tree classification model is harsh; secondly, in the equipment combination switching experiment, the active power in the continuous variable state can change along with the time, so that the identification accuracy is reduced.

3. Implementation of load splitting algorithms in a group of devices

Dividing the equipment components into three types, namely superposing two pieces of equipment; secondly, three devices are overlapped; and three, five devices are superposed. The results of the experiments are shown in the following table:

table 9: load decomposition algorithm recognition effect

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The non-intrusive load detection and decomposition method based on the state matrix decision tree is characterized by comprising the following steps:

s2, determining a data sample period by using a spectrum analysis method;

2. The method of claim 1, wherein the method of data cleansing in S1 is a grubbs method.

3. The method of claim 1, wherein the method of data integration in S1 is a correlation coefficient method.

4. The method of claim 1, wherein the reduction of data in S1 is by regression analysis.

5. The method as claimed in claim 1, wherein the step S2 is specifically performed by: and screening and grouping the screened characteristic values at intervals of a certain quantity according to a specific period, wherein the grouping method comprises the steps of carrying out Fourier transform on the time sequence characteristic quantity to obtain an intensity frequency spectrum, finding out the maximum frequency component and determining the reciprocal of the maximum frequency component as the period.

6. The method as claimed in claim 1, wherein the step S3 is specifically performed by:

s31, determining the optimal feature subset according to the sequential forward feature selection algorithm, and setting that k features are selected to form a feature group X with the size of k_kSelecting the best choiceD-k features X_jJ-1, 2, 3.., d-k, arranged in J value size after combination with the features already selected, the sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m;

7. The method as claimed in claim 1, wherein the step S4 is specifically performed by:

8. The method according to claim 7, wherein the state change of the load in step S43 includes input and removal of the load, switching of the shift position, and change of the operation state.