CN109387712B - Non-invasive load detection and decomposition method based on state matrix decision tree - Google Patents

Non-invasive load detection and decomposition method based on state matrix decision tree Download PDF

Info

Publication number
CN109387712B
CN109387712B CN201811170715.6A CN201811170715A CN109387712B CN 109387712 B CN109387712 B CN 109387712B CN 201811170715 A CN201811170715 A CN 201811170715A CN 109387712 B CN109387712 B CN 109387712B
Authority
CN
China
Prior art keywords
load
data
state
equipment
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811170715.6A
Other languages
Chinese (zh)
Other versions
CN109387712A (en
Inventor
苏鹭梅
郑锐洁
郑小龙
朱文婷
张宝琼
邓冠森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University of Technology
Original Assignee
Xiamen University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University of Technology filed Critical Xiamen University of Technology
Priority to CN201811170715.6A priority Critical patent/CN109387712B/en
Publication of CN109387712A publication Critical patent/CN109387712A/en
Application granted granted Critical
Publication of CN109387712B publication Critical patent/CN109387712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a non-invasive load detection and decomposition method based on a state matrix decision tree, which comprises the following steps: s1, preprocessing sample data, including data cleaning, data integration and data reduction, to obtain effective sample data; s2, determining a data sample period by using spectrum analysis; s3, selecting load characteristics based on a sequence forward characteristic selection algorithm and a K-means clustering algorithm, and extracting the load characteristics with high identification degree by utilizing a time sequence characteristic selection algorithm according to a sample period; s4, establishing an automatic identification single equipment working state model based on the improved sliding window bilateral CUSUM event detection algorithm and the load identification and decomposition of the decision tree, introducing the state matrix decision tree on the basis, and establishing a load time sequence characteristic probability model, thereby realizing the automatic identification of the working state of the superposed equipment. The method has high identification efficiency and good practicability.

Description

Non-invasive load detection and decomposition method based on state matrix decision tree
Technical Field
The invention relates to the field of electric power big data, in particular to a non-invasive load detection and decomposition method based on a state matrix decision tree.
Background
In recent years, the automatic power load monitoring and decomposing method based on the measurement sensing technology has obvious advantages compared with the manual investigation method, and therefore the method is widely concerned. The implementation modes of the method are mainly divided into two types:
one method is to equip each electric device with a sensor with digital communication function inside the total load and collect the electricity information of each electric device through the communication network, this way is called as intrusive residual load monitoring (ILM); another method is to install a sensor at the user entrance of the power grid, and monitor the power consumption and operating status of each or every type of electric equipment by collecting and analyzing the total power consumption or total current of the user, so as to know the power consumption and power consumption law of each or every type of electric equipment in the user's home, which is called non-intrusive load monitoring and decomposition (NILMD).
The electricity consumption analysis and measurement based on the NILMD technology takes the electricity consumption information of specific indoor electric equipment as a monitoring target, so that the obtained information has important significance for optimizing the planning, operation and management of a power grid of an electric power company, saving the electricity consumption and the electricity charge of a user and realizing the improvement of ecological civilization consciousness to specific activities in the whole society. Compared with the intrusive detection of a built-in sensor, the non-intrusive resident power load detection and decomposition technology is the most popular load power consumption detail detection technology with low cost at present.
The efficiency of the existing non-invasive resident power load detection and decomposition method in the application of load identification is not ideal, and the algorithm is relatively complex.
Disclosure of Invention
Therefore, the present invention provides a non-intrusive load detection and decomposition method based on a state matrix decision tree to solve the above technical problems. Therefore, the invention adopts the following specific scheme:
the non-invasive load detection and decomposition method based on the state matrix decision tree comprises the following steps:
s1, preprocessing sample data, including data cleaning, data integration and data reduction, to obtain effective sample data;
s2, determining a data sample period by using spectrum analysis;
s3, selecting load characteristics based on a sequence forward characteristic selection algorithm and a K-means clustering algorithm, and extracting the load characteristics with high identification degree by utilizing a time sequence characteristic selection algorithm according to a sample period;
s4, establishing an automatic identification single equipment working state model based on the improved sliding window bilateral CUSUM event detection algorithm and the load identification and decomposition of the decision tree, introducing the state matrix decision tree on the basis, and establishing a load time sequence characteristic probability model, thereby realizing the automatic identification of the working state of the superposed equipment.
Furthermore, the data cleaning method is a Grabbs method, and the data cleaning method is realized by judging the sample dataCalculating a deviation value to determine a "suspect value", and calculating GiValue, by looking up the Grabbs table, GiThe value is compared with a threshold value GP (n) given in the Grubbs table, if GiIf the value is larger than the threshold value GP (n) in the table, the sample data is judged to be abnormal.
Further, the data integration method is a correlation coefficient method, a correlation coefficient is obtained by calculating the standard deviation and covariance of sample data after data cleaning, the strength of the relationship between the standard deviation and the covariance is judged according to the numerical value of the correlation coefficient, the value range of the correlation coefficient is between 1 and-1, wherein 1 represents that two variables are completely linearly related, -1 represents that two variables are completely negatively related, 0 represents that two variables are not related, and the data approaches to 0 and represents that the correlation relationship is weaker.
Furthermore, the data reduction method is a regression analysis method, the relation between variables is refined and solidified on the basis of the association degree among the parameters obtained by data integration, irrelevant variables are removed, the dimensionality of the analyzed data sample is reduced, and a reliable model is excavated.
Further, the specific process of step S2 is:
performing data screening and grouping on the screened characteristic values at intervals of a certain quantity according to a specific period, wherein the grouping method is to perform Fourier transform on the time sequence characteristic quantity to obtain an intensity frequency spectrum, find out the maximum frequency component and determine the reciprocal of the maximum frequency component as the period;
further, the specific process of step S3 is:
s31, determining the optimal feature subset according to the sequential forward feature selection algorithm, and setting that k features are selected to form a feature group X with the size of kkThe unselected d-k features XjJ-1, 2, 3.., d-k, arranged in J value size after combination with the features already selected, the sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m;
s32, evaluating the separation degree of the characteristics among the different types of samples by adopting a K-means clustering algorithm, wherein from the perspective of geometric intuition, the larger the separability among the types is, the larger the distance among the types is, the more the classification among the different types of samples is, and meanwhile, the smaller the intra-type distance is, the higher the intra-type clustering degree is; giving a sample set K, and dividing the sample set into K clusters by a K-means algorithm, wherein each cluster center is the mean value of samples in the clusters; then distributing the other objects to the nearest cluster according to the distance between the other objects and all samples in each cluster, then requiring the center of a new cluster, and continuously repeating the iterative positioning process to ensure that the sum of the distances between all samples and the center in each cluster is minimum until the target function is minimized, thereby selecting the optimal characteristic;
s33, calculating the operating characteristic value of the electric equipment, eliminating invalid periods in the sample data, selecting 15 period data with feasibility as the sample data, calculating the characteristic value of the 15 period data, and then classifying the characteristic values to extract the characteristics with the highest identification degree.
Further, the specific process of step S4 is:
s41, dividing the active power of all equipment in the sample data into three equipment attributes of high, medium and low according to the maximum value of the active power of all the equipment in the sample data;
s42, based on a C4.5 decision tree classification algorithm, considering that each load is classified into one type, namely leaf nodes in the decision tree, comparing attribute values at internal nodes of the decision tree in a top-down recursion mode, and classifying the loads in a mode of judging downward branches from the nodes according to different attribute values until each type only contains a unique result, namely leaf purity, performing load identification according to the obtained optimal load characteristic parameters, and judging which equipment the current power data conforms to;
s43, introducing an improved sliding window bilateral CUSUM event detection algorithm to identify steady-state characteristics and transient characteristics of active power, continuously tracking the change of each equipment state at each sampling point through an event detection program, and detecting whether a certain load has the change of the state in the whole time sequence to realize the identification of the load in the time sequence, thereby judging the operation time of the equipment at the current time; then, carrying out load decomposition to obtain that the current moment of the current data is in a certain state of certain equipment;
s44, establishing an equipment power state matrix according to the transient characteristic and the steady-state characteristic of active power, averaging the steady-state characteristic and the transient characteristic of the equipment state power through training samples, solving a standard deviation as a fluctuation level, introducing a state matrix decision tree, and establishing a load time sequence characteristic probability model, so that the optimal solution of the current superposition operation equipment is established, and finally automatic identification of the equipment is realized.
Further, the state change of the load in step S43 includes the input and cut-off of the load, the switching of the shift position, and the change of the operation state.
By adopting the technical scheme, the invention has the beneficial effects that: the method of the invention can improve the load identification efficiency and has better practicability.
Drawings
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures. Elements in the figures are not drawn to scale and like reference numerals are generally used to indicate like elements.
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a flow chart of feature selection in the method of the present invention;
FIG. 3 is a flow chart of single device identification in the method of the present invention;
FIG. 4 is a flow chart of overlay device identification in the method of the present invention;
FIG. 5 is a flowchart of a sliding window bilateral CUSUM event detection method in the method of the present invention;
FIG. 6 is a schematic diagram of four stages of event detection in the CUSUM event detection method of FIG. 5;
FIG. 7 is a schematic diagram of power spectrum analysis of a computer.
Detailed Description
The invention will now be further described with reference to the accompanying drawings and detailed description.
Referring to fig. 1, a general flow of the method of an embodiment of the present invention is described. The method mainly comprises the following steps: sample data preprocessing S1, sample data period determination S2, load feature selection and extraction S3 and load identification and decomposition S4. Each step is described in detail below.
Data pre-processing
The data preprocessing is mainly divided into 3 steps: (1) data cleaning: correcting recognizable errors in the data file, processing invalid values, missing values and abnormal values and checking data consistency; (2) data integration: analyzing the correlation among the data variables; (3) and (3) data reduction: and reducing the number of variables by using dimension reduction.
(1) Data cleaning: the "suspect value" is judged herein using the Grubbs method and is removed from the data sample without participating in the calculation of the mean.
The first step is as follows: determination of "suspect value": deviation value is maximum (minimum) to average
The second step is that: calculation of GiThe value:
Figure BDA0001822324870000067
(where i is the rank number of the suspect value,
Figure BDA0001822324870000061
is the residual error, s is the standard deviation)
The third step: look-up Grabbs table GiComparing with a critical value GP (n) given by the Grubbs table, if GiIf the value is greater than the critical value GP (n) in the table, the measured data can be judged to be abnormal and can be eliminated.
(2) Data integration: since the subject matter provides a large amount of power utilization equipment data, considering that a high degree of correlation may exist between some parameters, the correlation coefficient method is adopted to reflect the degree of relationship among variables.
The following is a calculation formula of the correlation coefficient:
Figure BDA0001822324870000062
sxy sample covariance calculation formula:
Figure BDA0001822324870000063
sx sample standard deviation calculation formula:
Figure BDA0001822324870000064
sy sample standard deviation calculation formula:
Figure BDA0001822324870000065
wherein r isxyRepresenting the sample correlation coefficient, SxyRepresents the sample covariance, SxSample standard deviation, S, for XySample standard deviations for y are indicated. Coefficient of correlation rxyThe correlation degree table of (2) is shown in table 1:
TABLE 1 correlation coefficient rxyReference table of degree of correlation
Figure BDA0001822324870000066
Figure BDA0001822324870000071
The value interval of the correlation coefficient is between 1 and-1. 1 indicates that the two variables are completely linearly related, -1 indicates that the two variables are completely negatively related, and 0 indicates that the two variables are not related. The closer the data is to 0, the weaker the correlation is.
(3) And (3) data reduction: because the number of data samples for analysis is huge, dimension reduction is needed, and the relation between variables is refined and solidified on the basis of the association degree between parameters obtained by data integration so as to excavate a reliable model. Therefore, the method of regression analysis is adopted to remove the independent variables, so that the dimensionality of the analyzed data sample is reduced. Taking the current, active power, reactive power, power factor, and second harmonic current of the laser printer and the notebook computer in the continuous variable state device as examples, the results obtained by regression analysis of MATLAB 2016a are shown in tables 2 and 3:
TABLE 2 correlation between laser printer parameters
Figure BDA0001822324870000072
TABLE 3 correlation between computer parameters
Figure BDA0001822324870000073
Figure BDA0001822324870000081
Determining a sample period
And screening and grouping the screened characteristic values at intervals of a certain quantity according to a specific period, wherein the grouping method comprises the steps of carrying out Fourier transform on the time sequence characteristic quantity to obtain an intensity frequency spectrum, finding out the maximum frequency component and determining the reciprocal of the maximum frequency component as the period.
The fourier transform of the periodic discrete-time signal x (nt) can be expressed as:
Figure BDA0001822324870000082
wherein, the finite-length discrete signal x (N), N ═ 0, 1, …, N-1.
Fig. 7 shows a computer spectral analysis. The period we estimated from the raw data is about 400s, and the second highest frequency obtained with our algorithm is about 0.0025Hz, which is consistent. The reason why the frequency of the highest amplitude is not used is that because our data is non-periodic, the highest amplitude occurs near zero and the corresponding frequency of the next highest amplitude is closer to the data period.
Feature selection
The load characteristics of the electric equipment are mainly classified into steady-state characteristics and transient-state characteristics. The steady state characteristic refers to a characteristic extracted when the load is at a stable power consumption level, and the transient state characteristic refers to an operation characteristic extracted when the load is in an instant state when the load is in an on, off or switching state. The feature selection process is shown in fig. 2:
(1) the optimal feature subset is determined herein according to a sequential forward feature selection algorithm. Let it be assumed that k selected features form a set of k sized features XkThe unselected d-k features XjJ-1, 2, 3,.., d-k, arranged in J value size in combination with the already entered features:
that is to say if
J(Xk+x1)≥J(Xk+x2)≥…≥J(Xk+xd-k) (6)
The next step is to select the feature set as
Xk+1=Xk+x1 (7)
The sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m.
(2) The K-means clustering algorithm is adopted to evaluate the separation degree of the characteristics among different types of samples. From the perspective of geometric intuition, the larger the separability between classes is, the larger the distance between classes is, the farther the classification between different classes of samples is, and meanwhile, the smaller the intra-class distance is, the higher the intra-class aggregation degree is.
Giving a sample set K, and dividing the sample set into K clusters by a K-means algorithm, wherein each cluster center is the mean value of samples in the clusters; and then distributing the other objects to the nearest cluster according to the distances between the other objects and all samples in each cluster, and then requiring the center of a new cluster, wherein the iterative positioning process is repeated continuously, so that the sum of the distances between all samples and the center in each cluster is minimum until the objective function is minimized.
Feature extraction
Since the operation characteristics of some electric devices (e.g., microwave ovens) are more complex than those of other electric devices and cannot use power characteristics as identification characteristics, in order to overcome the limitation of load identification methods based on power variation, we search and extract load characteristics with higher identification degree from the operation sample period of these devices. Specifically, 15 period data with feasibility are selected as sample data, feature values of the 15 period data are calculated, and features with the highest recognition degree are selected through feature value classification.
Load identification and decomposition
Fig. 3 and 4 show the identification flows of the single device and the superimposed device, respectively. For the identification of single equipment, firstly, the active power PC of the electric equipment can be divided into three attributes of high, medium and low according to the maximum value of the active power PC, load identification is carried out by utilizing a C4.5 decision tree classification algorithm, and the equipment to which the current power data belongs is judged. And at the moment, introducing an improved sliding window bilateral CUSUM event detection algorithm to identify steady-state characteristics and transient-state characteristics of active power, judging whether equipment is put in or cut off at the moment, and obtaining the equipment state in the time period.
The identification of the superimposed devices is more complicated than the identification of a single device, and at this time, a power state matrix of the electric device needs to be introduced. Firstly, carrying out load identification by using a decision tree classification algorithm, and judging which equipment the current equipment data belongs to is superposed; secondly, identifying steady-state characteristics and transient characteristics of active power according to an improved sliding window bilateral CUSUM event detection algorithm, and judging the operation time of the equipment group at the current moment; then, carrying out load decomposition to obtain that the current moment of the current data is in a certain state of certain equipment; and finally, searching the optimal solution of the state matrix decision tree to obtain the real-time power consumption of each device.
According to whether the input and the removal of the load of the equipment need to be detected firstly when the equipment is identified, a method called based on event detection is introduced, namely, the change of the state of each equipment is continuously tracked at each sampling point through an event detection program. This method is to realize the identification of the load in the time series by detecting whether there is a change of state in a certain load throughout the time series.
According to the feature selection and feature extraction of the previous work, in the whole mathematical model, the event detection is realized by detecting the active power PC in the time sequence. The classic literature on the study of the NILMD system (Quinlan J R.C. 4.5: programs for machine learning [ J ], 1993.) uses a segmentation detection method to divide the time series into steady-state and transient characteristics by the variation value of active power acquired by experiments.
For the device identification, there are two categories, one is the identification of the device state, and the other is the identification and decomposition of the superposition state of a plurality of devices. And classifying the loads according to the working characteristics of the loads, selecting proper characteristics, and identifying and decomposing the loads by using a C4.5 decision tree algorithm.
Improvement of sliding window bilateral CUSUM event detection algorithm
Setting an active power time sequence
Figure BDA0001822324870000101
Defining two continuous sliding windows Ws (steady state mean window) and Wu (transient state mean window) in the time sequence, defining the lengths of the windows as s and u respectively, and calculating the mean value A of the two windows respectivelysAnd AuThe calculation formula is as follows:
Figure BDA0001822324870000111
Figure BDA0001822324870000112
then define respectively
Figure BDA0001822324870000113
And
Figure BDA0001822324870000114
for detecting whether the time series is switched on (i.e. the power is increased) or switched off (i.e. the power is decreased) at the current moment, and defining a fluctuation level epsilon for representing the time series in a steady state, the calculation formula is as follows:
Figure BDA0001822324870000115
Figure BDA0001822324870000116
taking the time sequence whether to have an event starting or changing the state as an example, the flow of the sliding window bilateral CUSUM event detection method is as follows, taking the detection of the input event as an example, when the detection window A is useduA value of greater than AuWhen the sum is + epsilon,
Figure BDA0001822324870000117
the increment is started. At this time, a threshold value range K for determining the occurrence of the event needs to be set when
Figure BDA0001822324870000118
In order to avoid the multiple recognition of the load turn-on or turn-off event caused by the sequence oscillation, a time delay factor d (with an initial value of 0) is introduced, and each time the delay factor is added by l, the event can be generated at the moment
Figure BDA0001822324870000119
And
Figure BDA00018223248700001110
make a comparison if
Figure BDA00018223248700001111
Then it is considered that what caused the active power change at that time is a fluctuation, and order
Figure BDA00018223248700001112
d is 0, so that multiple recognition events caused by device data fluctuation are avoided. When in use
Figure BDA00018223248700001113
Then let d equal d + l, calculate
Figure BDA00018223248700001114
Up to
Figure BDA00018223248700001115
The detected time of occurrence of the event can be derived from t-d. The sliding window bilateral CUSUM event detection process taking the detection of the load input event as an example is shown in fig. 5, and the process of detecting the closing event can be obtained in the same way.
When the sliding window of the sliding window bilateral CUSUM event detection program slides over the occurrence time of an event, the sliding window bilateral CUSUM event detection program can be divided into 4 stages, as shown in fig. 6, where P is0Is the active power before the occurrence of an event, and Δ P is the active power after the occurrence of an event and P0The difference of (a).
a. The first phase is when the transient detection window has not yet slid to the event occurrence, and the values of both windows remain unchanged, i.e. Au–As=0;
b. The second phase is when the time of occurrence of the event is within the transient detection window, AuIs constantly changing, and AsDo not change, this time order P1=P0+. DELTA P, and set td=t-t1And t isdE (1, u), then at this stage every moment in time corresponds to asAnd AuAre respectively As=P0
Figure BDA0001822324870000121
c. The third phase is when the time of occurrence of the event is within the mean calculation window, AuInvariable, AsConstantly changing, and (t)d-u) e (1, s-1), where A corresponds to each time instantsAnd AuAre respectively as
Figure BDA0001822324870000122
Au=P1
d. The fourth stage is when both windows have slid past the event detection window, AsAnd AuNo change occurs.
The above calculation and analysis of the threshold K are based on the devices that are turned on instantaneously, but many of the residential electric devices are not turned on instantaneously, such as microwave ovens, printers, and the like. In order to reduce the error rate of event identification, a compromise scheme is introduced, and the maximum and minimum values of threshold values are used as the threshold values for determining the occurrence of the event, namely, the command
Figure BDA0001822324870000123
From the above derivation, it is only necessary to determine As and AuK, and the minimum power of the device identified at that time, may be determined. Then, the value range of the threshold K for determining the occurrence of time can be obtained as follows:
K=(Kmax+Kmin)/2 (12)
based on the sliding window bilateral CUSUM event detection method, an active power sequence can be cut into corresponding parts according to event detection points, active power with steady-state characteristics and a fluctuation level epsilon in a steady state are introduced, at the moment, the average value m of the cut power is extracted according to statistical characteristics and characteristics, the event detection points at the current moment correspond to the detection points at the next moment, and then the overall operation time and the operation state of a single device are judged.
Mathematical model based on state matrix decision tree
The superposition state data needs to identify and decompose the load, and one equipment power is established according to the transient state characteristic and the steady state characteristic of the active powerA state matrix for averaging steady-state and transient-state characteristics of the state power of the device by training samples
Figure BDA0001822324870000132
And the standard deviation thereof is taken as the fluctuation level δ.
Figure BDA0001822324870000131
C4.5 decision tree classification algorithm
Decision Tree (Decision Tree) is a classification algorithm in the field of data mining, and is used for expressing the mapping relationship between object values and attributes. Each node in the tree represents an object and each divergent path represents a possible attribute value, and each leaf node corresponds to the value of the object represented by the path traversed from the root node to the corresponding leaf node. The decision tree classification method adopts a top-down recursion mode, compares attribute values at internal nodes of the decision tree, judges downward branches from the nodes according to different attribute values, and obtains conclusions at leaf nodes of the decision tree. When the decision tree is applied to load identification, each load can be considered as a class, which is equivalent to a leaf node in the decision tree, and classification is performed through the decision tree until each class only contains a unique result, namely the leaf node is pure. The attributes in the decision tree, i.e. the load characteristic parameters, are the basis for judging different downward branches.
Two features are introduced herein, one is the active power feature, which refers to the changed value of the device to transition to another operating state; the harmonic wave characteristic is that the harmonic wave data contains the unique characteristics of different kinds of electric appliances, especially the on/off state of the detection equipment is obvious, and the harmonic wave data can be directly used for detecting the equipment in the on/off state. There are some states of the continuous state-changing device that cannot be simply reflected by the difference of the active power, such as printing and copying of the printer, and restarting and opening of the notebook. And the load state can be more effectively identified by the active power and harmonic wave characteristics.
The C4.5 decision tree classification algorithm isSupervised classification learning algorithms. Let one sample set be PC. The proportion of the kth class sample in the sample set is Pk(k is 1, 2, … …, a), where a is the total number of classes in a sample, the sample set information entropy is defined as shown in the formula:
Figure BDA0001822324870000141
assuming that the sample set is divided according to the attribute B, if there are X possible values in the attribute B, X branch nodes are generated, where the xth (X ═ 1, 2, … …, X) branch node includes all values in the sample set that take the value B on the attribute BxSample of (2), denoted as Cx(ii) a The "information gain" (information gain) obtained by dividing the sample set by the attribute B can be defined as follows:
Figure BDA0001822324870000142
further, the information gain ratio of the attribute B:
Figure BDA0001822324870000143
the gain ratios of different attributes can be calculated according to the formula, the attribute with the maximum gain ratio is selected as the splitting attribute of the splitting, the gain ratios of other attributes are calculated in the same mode, and the splitting is performed successively until all equipment states are distinguished or all samples are subjected to value phase on all attributes until the splitting cannot be performed.
Experimental testing and results analysis
Sequential feature selection algorithm based K-means clustering feature selection result analysis
Matlab 2016a is adopted to perform K-means cluster analysis on electric equipment (incandescent lamps, hot water kettles, fans, water dispensers, electric hair dryers, laser printers, microwave ovens and the like). From the analysis of the clustering result, the active power or the step of the active power and the reactive power is found to be used as the characteristic quantity, and the high-power load with the characteristics of obvious power consumption and the like, such as a hot water kettle, an electric hair drier, a fan, a water dispenser and the like, is easy to identify. However, it is obviously not feasible to select the power change as the characteristic of all the electric devices to identify, for example, when a microwave oven with multi-state electric devices is switched among gears with small fire, medium fire and high fire, the active power or the reactive power of the microwave oven does not belong to simple step change, and the power of the microwave oven is stabilized on the power of the corresponding gear after a period of time after the change of several values, so the K-means clustering effect of the microwave oven is not ideal.
Feature extraction result analysis based on time domain selection features
A feature extraction method of time domain statistical features is adopted, and the ratio R of the low value to the high value of the power of the microwave oven in one operation period is finally selected as the operation features of the microwave oven by comparing various time domain statistical features such as mean value, variance, skewness and the like. The state of the microwave oven at the moment is judged by observing the change of the R value in one period: if the value of R is increased, the microwave oven is in a lower gear at the moment; the value of R is decreased, which indicates that the microwave oven is in a higher gear at this time.
Mathematical model testing and results based on improved event detection
To detect the load recognition and decomposition effects based on event detection proposed herein, the following classifications will be made for the electrical consumers, as shown in tables 4 and 5 below:
table 4: load classification by state
Figure BDA0001822324870000151
Table 5: load classification by power size
Figure BDA0001822324870000152
Then use NgIndicating what the event detection model detectedNumber of pieces, NlIndicating a small number of recognized events, NcThe number of the multiple recognized events is represented, eta represents the detection efficiency of the event detection program, and is defined as:
Figure BDA0001822324870000161
the results of the experiment are shown in table 6 below:
table 6: event detection results
Figure BDA0001822324870000162
The event detection is based on the steady-state and transient-state characteristics of active power, so that few identification events exist, namely, the characteristics of the state of the event detection are too similar to those of other states; secondly, the fluctuation level of the active power of the continuously variable state equipment is too large, which is not beneficial to the detection of the event.
Test and result of load identification and load decomposition of state matrix decision tree algorithm
1. Establishment of device power state matrix
An equipment power state matrix is established according to the transient characteristic and the steady-state characteristic of the active power, and the results are shown in the following table 7:
TABLE 7 device Power State matrix
Figure BDA0001822324870000163
Figure BDA0001822324870000171
2. Implementation of load recognition algorithms in a group of devices
The load identification program identifies the correct number of each type of load by NT(T is 1, 2, 3, 4, 5) and the number of recognition errors is NFFor indicating, identifying, or correcting rateη1The results are shown in the following table:
table 8: load recognition algorithm recognition effect
Figure BDA0001822324870000172
From the load identification result, the overall accuracy of the identification program to the load reaches 86.36%, wherein the identification rate to the start/stop two-state load reaches 100%, the identification rate to the limited multi-state load reaches 90%, and the identification rate to the continuous variable-state load is lower, and the following results can be obtained through analysis: firstly, because part of state active power in the device-power matrix is relatively close in training data, the threshold setting in the constructed decision tree classification model is harsh; secondly, in the equipment combination switching experiment, the active power in the continuous variable state can change along with the time, so that the identification accuracy is reduced.
3. Implementation of load splitting algorithms in a group of devices
Dividing the equipment components into three types, namely superposing two pieces of equipment; secondly, three devices are overlapped; and three, five devices are superposed. The results of the experiments are shown in the following table:
table 9: load decomposition algorithm recognition effect
Figure BDA0001822324870000181
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. The non-intrusive load detection and decomposition method based on the state matrix decision tree is characterized by comprising the following steps:
s1, preprocessing sample data, including data cleaning, data integration and data reduction, to obtain effective sample data;
s2, determining a data sample period by using a spectrum analysis method;
s3, selecting load characteristics based on a sequence forward characteristic selection algorithm and a K-means clustering algorithm, and extracting the load characteristics with high identification degree by utilizing a time sequence characteristic selection algorithm according to a sample period;
s4, establishing an automatic identification single equipment working state model based on the improved sliding window bilateral CUSUM event detection algorithm and the load identification and decomposition of the decision tree, introducing the state matrix decision tree on the basis, and establishing a load time sequence characteristic probability model, thereby realizing the automatic identification of the working state of the superposed equipment.
2. The method of claim 1, wherein the method of data cleansing in S1 is a grubbs method.
3. The method of claim 1, wherein the method of data integration in S1 is a correlation coefficient method.
4. The method of claim 1, wherein the reduction of data in S1 is by regression analysis.
5. The method as claimed in claim 1, wherein the step S2 is specifically performed by: and screening and grouping the screened characteristic values at intervals of a certain quantity according to a specific period, wherein the grouping method comprises the steps of carrying out Fourier transform on the time sequence characteristic quantity to obtain an intensity frequency spectrum, finding out the maximum frequency component and determining the reciprocal of the maximum frequency component as the period.
6. The method as claimed in claim 1, wherein the step S3 is specifically performed by:
s31, determining the optimal feature subset according to the sequential forward feature selection algorithm, and setting that k features are selected to form a feature group X with the size of kkSelecting the best choiceD-k features XjJ-1, 2, 3.., d-k, arranged in J value size after combination with the features already selected, the sequential forward feature selection algorithm starts with an empty feature set, and in each subsequent cycle, the best feature in the original feature set is selected and added to the set until the number of features increases to m;
s32, evaluating the separation degree of the characteristics among the different types of samples by adopting a K-means clustering algorithm, wherein from the perspective of geometric intuition, the larger the separability among the types is, the larger the distance among the types is, the more the classification among the different types of samples is, and meanwhile, the smaller the intra-type distance is, the higher the intra-type clustering degree is; giving a sample set K, and dividing the sample set into K clusters by a K-means algorithm, wherein each cluster center is the mean value of samples in the clusters; then distributing the other objects to the nearest cluster according to the distance between the other objects and all samples in each cluster, then requiring the center of a new cluster, and continuously repeating the iterative positioning process to ensure that the sum of the distances between all samples and the center in each cluster is minimum until the target function is minimized, thereby selecting the optimal characteristic;
s33, calculating the operating characteristic value of the electric equipment, eliminating invalid periods in the sample data, selecting 15 period data with feasibility as the sample data, calculating the characteristic value of the 15 period data, and then classifying the characteristic values to extract the characteristics with the highest identification degree.
7. The method as claimed in claim 1, wherein the step S4 is specifically performed by:
s41, dividing the active power of all equipment in the sample data into three equipment attributes of high, medium and low according to the maximum value of the active power of all the equipment in the sample data;
s42, based on a C4.5 decision tree classification algorithm, considering that each load is classified into one type, namely leaf nodes in the decision tree, comparing attribute values at internal nodes of the decision tree in a top-down recursion mode, and classifying the loads in a mode of judging downward branches from the nodes according to different attribute values until each type only contains a unique result, namely leaf purity, performing load identification according to the obtained optimal load characteristic parameters, and judging which equipment the current power data conforms to;
s43, introducing an improved sliding window bilateral CUSUM event detection algorithm to identify steady-state characteristics and transient characteristics of active power, continuously tracking the change of each equipment state at each sampling point through an event detection program, and detecting whether a certain load has the change of the state in the whole time sequence to realize the identification of the load in the time sequence, thereby judging the operation time of the equipment at the current time; then, carrying out load decomposition to obtain that the current moment of the current data is in a certain state of certain equipment;
s44, establishing an equipment power state matrix according to the transient characteristic and the steady-state characteristic of active power, averaging the steady-state characteristic and the transient characteristic of the equipment state power through training samples, solving a standard deviation as a fluctuation level, introducing a state matrix decision tree, and establishing a load time sequence characteristic probability model, so that the optimal solution of the current superposition operation equipment is established, and finally automatic identification of the equipment is realized.
8. The method according to claim 7, wherein the state change of the load in step S43 includes input and removal of the load, switching of the shift position, and change of the operation state.
CN201811170715.6A 2018-10-09 2018-10-09 Non-invasive load detection and decomposition method based on state matrix decision tree Active CN109387712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811170715.6A CN109387712B (en) 2018-10-09 2018-10-09 Non-invasive load detection and decomposition method based on state matrix decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811170715.6A CN109387712B (en) 2018-10-09 2018-10-09 Non-invasive load detection and decomposition method based on state matrix decision tree

Publications (2)

Publication Number Publication Date
CN109387712A CN109387712A (en) 2019-02-26
CN109387712B true CN109387712B (en) 2021-04-13

Family

ID=65426678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811170715.6A Active CN109387712B (en) 2018-10-09 2018-10-09 Non-invasive load detection and decomposition method based on state matrix decision tree

Country Status (1)

Country Link
CN (1) CN109387712B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110166464B (en) * 2019-05-27 2021-10-15 北京信息科技大学 Method and system for detecting content-centric network interest flooding attack
CN110146758B (en) * 2019-05-28 2021-02-09 四川长虹电器股份有限公司 Non-invasive electrical appliance identification method based on cross entropy
CN112132495B (en) * 2019-06-25 2024-06-07 顺丰科技有限公司 State machine quantization method, device, equipment and medium based on logistics event judgment
CN110569876A (en) * 2019-08-07 2019-12-13 武汉中原电子信息有限公司 Non-invasive load identification method and device and computing equipment
CN110488128A (en) * 2019-09-12 2019-11-22 广东电网有限责任公司佛山供电局 Bilateral accumulation and event detecting method
CN110674451B (en) * 2019-09-12 2022-04-12 广东电网有限责任公司佛山供电局 Mixed integer programming calculation method for multi-device simultaneous switching decision in event process
CN110954744A (en) * 2019-11-18 2020-04-03 浙江工业大学 Non-invasive load monitoring method based on event detection
CN112903301B (en) * 2019-12-04 2023-09-15 西门子能源国际公司 Method and device for detecting the operating state of a gas turbine
CN111308185A (en) * 2020-02-28 2020-06-19 宁波三星医疗电气股份有限公司 Non-invasive load identification method
CN111242391B (en) * 2020-03-06 2023-10-31 云南电网有限责任公司电力科学研究院 Machine learning model training method and system for power load identification
CN111898694B (en) * 2020-08-07 2021-09-17 广东电网有限责任公司计量中心 Non-invasive load identification method and device based on random tree classification
CN112394220B (en) * 2020-11-13 2021-08-24 四川大学 Non-invasive electric vehicle charging load mode extraction method
CN112465268B (en) * 2020-12-16 2022-04-26 北京航空航天大学 Method for on-line household load electricity utilization combination identification and electricity consumption prediction
CN112732748B (en) * 2021-01-07 2024-03-15 西安理工大学 Non-invasive household appliance load identification method based on self-adaptive feature selection
CN113034305B (en) * 2021-02-10 2022-08-30 上海千居智科技有限公司 Non-invasive load monitoring event classification method and storage medium
CN113326296B (en) * 2021-02-25 2024-06-07 中国电力科学研究院有限公司 Load decomposition method and system suitable for industrial and commercial users
CN112949563B (en) * 2021-03-25 2022-09-16 复旦大学 Non-invasive load identification method based on variable point detection and improved KNN algorithm
CN113051316A (en) * 2021-04-06 2021-06-29 广东工业大学 Method and device for stripping superposed state of circuit equipment into single state
CN113848825B (en) * 2021-08-31 2023-04-11 国电南瑞南京控制系统有限公司 AGV state monitoring system and method for flexible production workshop
CN113899944B (en) * 2021-09-30 2023-11-10 广东电网有限责任公司 Detection method and device for power load switching point
CN114996182B (en) * 2022-05-23 2024-04-26 中国计量大学 Steady-state data stream output device for training load identification model
CN115775341B (en) * 2023-02-13 2023-05-09 广州海昇计算机科技有限公司 Method and system for detecting state of experimental equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201216106A (en) * 2010-10-13 2012-04-16 Univ Nat Taiwan Science Tech Intrusion detecting system and method to establish classifying rules thereof
CN105550798A (en) * 2015-12-07 2016-05-04 河南许继仪表有限公司 Non-intruding type load decomposition and monitoring system
CN107273920A (en) * 2017-05-27 2017-10-20 西安交通大学 A kind of non-intrusion type household electrical appliance recognition methods based on random forest
CN108224681A (en) * 2017-12-16 2018-06-29 广西电网有限责任公司电力科学研究院 Non-intrusion type starting of air conditioner detection method based on decision tree classifier
CN109145949A (en) * 2018-07-19 2019-01-04 山东师范大学 Non-intrusive electrical load monitoring and decomposition method and system based on integrated study
CN109165604A (en) * 2018-08-28 2019-01-08 四川大学 The recognition methods of non-intrusion type load and its test macro based on coorinated training

Also Published As

Publication number Publication date
CN109387712A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN109387712B (en) Non-invasive load detection and decomposition method based on state matrix decision tree
Gopinath et al. Energy management using non-intrusive load monitoring techniques–State-of-the-art and future research directions
Mengistu et al. A cloud-based on-line disaggregation algorithm for home appliance loads
CN108021736B (en) Load switching action monitoring method based on sliding window residual error model
Stankovic et al. A graph-based signal processing approach for low-rate energy disaggregation
Jazizadeh et al. An unsupervised hierarchical clustering based heuristic algorithm for facilitated training of electricity consumption disaggregation systems
De Baets et al. VI-based appliance classification using aggregated power consumption data
CN111830347B (en) Two-stage non-invasive load monitoring method based on event
EP3290869A2 (en) A utility consumption signal processing system and a method of processing a utility consumption signal
Ma et al. Toward energy-awareness smart building: Discover the fingerprint of your electrical appliances
Zhou et al. A study of polynomial fit-based methods for qualitative trend analysis
CN111639586B (en) Non-invasive load identification model construction method, load identification method and system
CN110569876A (en) Non-invasive load identification method and device and computing equipment
Shi et al. Discovering and labeling power system events in synchrophasor data with matrix profile
Markovič et al. Data-driven classification of residential energy consumption patterns by means of functional connectivity networks
Bilski et al. Generalized algorithm for the non-intrusive identification of electrical appliances in the household
CN109408498B (en) Time series feature identification and decomposition method based on feature matrix decision tree
Humala et al. Universalnilm: A semi-supervised energy disaggregation framework using general appliance models
CN111161097A (en) Method and device for detecting switch event based on event detection algorithm of hypothesis test
Mulinari et al. Feature extraction of v–i trajectory using 2-d fourier series for electrical load classification
CN113094448B (en) Analysis method and analysis device for residence empty state and electronic equipment
Dinesh et al. Non-intrusive load monitoring based on low frequency active power measurements
Valovage et al. Label correction and event detection for electricity disaggregation
KR20210106180A (en) Performance evaluation system for Non-intrusive load monitoring according to multi-state energy classification and method thereof
Geurts et al. Early prediction of electric power system blackouts by temporal machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant