CN110287544B

CN110287544B - Power distribution network power utilization time sequence deconstructing method based on Gaussian mixture algorithm

Info

Publication number: CN110287544B
Application number: CN201910472368.0A
Authority: CN
Inventors: 田英杰; 吴力波; 周阳; 马戎; 施政昱; 陈伟; 苏运; 郭乃网; 瞿海妮; 张琪祁; 时志雄; 宋岩; 庞天宇; 沈泉江
Original assignee: Fudan University; State Grid Shanghai Electric Power Co Ltd
Current assignee: Fudan University; State Grid Shanghai Electric Power Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2023-07-04
Anticipated expiration: 2039-05-31
Also published as: CN110287544A

Abstract

The invention relates to a power distribution network power utilization time sequence deconstructing method based on a Gaussian mixture algorithm, which comprises the following steps of: step 1: dividing the power data of each electric appliance in the power distribution network into a training set and a testing set; step 2: training by applying a hidden Markov model to the training set to obtain an optimal parameter solution; step 3: constructing a total model by combining the optimal parameter solution and the total power consumption data, solving a state corresponding to the total power consumption observation data at each moment on the test set by using the total model, and decomposing the total state to the state corresponding to each sub-item power consumption; step 4: predicting the priori expected value of each electric appliance in the test time according to the sub-term model of each electric appliance obtained by training in the step 2; step 5: and carrying out comparison revision on the total electric quantity observation data by using the priori expected value to obtain a final deconstructed result. Compared with the prior art, the invention has the advantages of high deconstructing speed, high accuracy and the like.

Description

Power distribution network power utilization time sequence deconstructing method based on Gaussian mixture algorithm

Technical Field

The invention relates to the technical field of power distribution network user power consumption time sequence deconstructing, in particular to a power distribution network power consumption time sequence deconstructing method based on a Gaussian mixture algorithm.

Background

There are a number of differences in user power usage behavior over time and space that have a significant impact on accurate power usage scheduling. Such as changes in user configuration, changes in user power consumption peak, etc., all contribute to an increase in scheduling costs. The method has the advantages that the total energy consumption is decomposed, the electricity consumption of the single electric appliance is fed back to the user, the spontaneous electricity saving behavior of the user can be brought, and meanwhile, a power supply company obtains microscopic electricity consumption data of the user, so that the method has a great deal of benefits for demand response, electricity saving research and policy preparation.

Spatially, the geographical location distribution of different types of users, the proportional distribution of various electricity utilization types of users in different areas, and the like may cause great difficulty in electricity utilization scheduling. Moreover, in the process of building and power facility construction, the expectation of equipment scale is often based on past experience and estimation of building demand, and factors such as economic development of influences among different users around are not considered, so that problems such as unsuitable power supply facilities can occur in a long term. The method has the advantages that the geographical distribution of the users and the incidence relation in space are deeply known, the dispatching efficiency can be improved, the cost is reduced, the basis can be provided in the process of power utilization facility construction and power grid design, the power grid architecture is more reasonable, and the power dispatching is more efficient. For example, when a large commercial building is newly built, a large amount of electricity is consumed, and the electricity consumption behavior of peripheral users is driven to change on the basis of changing the density of peripheral crowd, if the peripheral influence is not considered in the design of a power grid at the beginning of building construction, various problems such as future electricity shortage, unreasonable net rack and the like can be caused, and thus, the potential safety hazard, development and improvement difficulty are all great problems.

There have been many studies of electrical data decomposition, and conventional data deconstructing features and defects such as:

1) The analysis with good deconstructing effect uses ultra-high frequency power data (> 1 Hz) and an unsupervised algorithm, and meanwhile, various electrical appliance types and power curves are needed.

2) The ultrahigh frequency data analysis can only collect data in a short time in a laboratory at present, cannot be popularized to power operation and monitoring in actual life, and is less likely to be fed back to consumers to achieve the effects of energy conservation and emission reduction.

3) The research of deconstructing by using low-frequency electricity consumption data is relatively few, and supervised learning is carried out by using models such as sparse coding, hidden Markov and the like, but the deconstructing effect is not good enough, the electricity consumption of various electric appliances cannot be accurately decomposed, and the deconstructing precision is poor.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a power distribution network power utilization time sequence deconstructing method based on a Gaussian mixture algorithm.

The aim of the invention can be achieved by the following technical scheme:

a power distribution network power utilization time sequence deconstructing method based on a Gaussian mixture algorithm comprises the following steps:

step 1: dividing the power data of each electric appliance in the power distribution network into a training set and a testing set;

step 2: training by applying a hidden Markov model to the training set to obtain an optimal parameter solution;

step 3: constructing a total model by combining the optimal parameter solution and the total power consumption data, solving a state corresponding to the total power consumption observation data at each moment on the test set by using the total model, and decomposing the total state to the state corresponding to each sub-item power consumption;

step 4: predicting the priori expected value of each electric appliance in the test time according to the sub-term model of each electric appliance obtained by training in the step 2;

step 5: and carrying out comparison revision on the total electric quantity observation data by using the priori expected value to obtain a final deconstructed result.

Further, the step 2 comprises the following sub-steps:

step 21: adopting an EM iterative algorithm to optimally solve a mathematical model trained by applying a hidden Markov model to a training set;

step 22: and setting meeting conditions for residual errors, information quantity and deviation degree aiming at the optimal solving process.

Further, the calculation formula of the probability of each state of the prior expected value of each electrical appliance in the step 4 in the test time is as follows:

in the method, in the process of the invention,

representing a prioriProbability of each state of the expected value, +.>

Representing the state probability of the corresponding training set at the last moment, Γ ^h Representing the transfer matrix.

Further, the calculation formula of the expected value of each state of the prior expected value of each electrical appliance in the step 4 in the test time is as follows:

in the method, in the process of the invention,

the expected value of each state representing the a priori expected value, S _t+h Representing the correlation variable value, f () represents a linear function.

Further, the calculation formula of the priori expected value of each electric appliance in the step 4 in the test time is as follows:

in the method, in the process of the invention,

representing a priori expected value for each consumer over the test time.

Further, the final deconstructed result in step 5 is described as:

in the method, in the process of the invention,

representing the final deconstructed result of the total power consumption, +.>

Representing the predicted value of the total power consumption, Y _t+1day Representing the total power consumption observation value,/-, for example>

Correction value ratio representing the change of the electricity consumption of the electrical appliance, < >>

The ratio of correction values indicating fluctuation of the electric appliance is used, and i and N are natural numbers.

Further, the calculation formula of the correction value proportion of the electricity consumption change of the electric appliance is as follows:

in the method, in the process of the invention,

indicating the total power consumption of the ith power consumer.

Further, the calculation formula of the correction value proportion of the fluctuation of the electrical appliance is as follows:

wherein m is ⁽ⁱ⁾ Indicating the state number of the i-th electric appliance,

indicating the proportion of time the ith electrical consumer is in state j +.>

The state probability that the i-th electrical consumer is in state j is indicated.

Compared with the prior art, the invention has the following advantages:

(1) The invention combines the factor hidden Markov supervision deconstructing model of external climate and time factors, effectively realizes the deconstructing target from the total power consumption data of the low-frequency data to the sub-electric appliances, has the decomposition precision of part of the electric appliances up to 80 percent, and can be used for commercial popularization. The using process of each electric appliance is a hidden Markov process with a plurality of different distribution states and is expressed by a sub-model; and the total power, i.e. the sum of the power consumption of the individual appliances, is also a hidden markov process. And each state of the total electric quantity process is combined by the states of the sub-items, and is a total model, the total state number is the product of all the state numbers of the sub-items, and the accuracy of the integrally obtained deconstructed result is high.

(2) The method has the advantages that the deconstructing speed is high, the method adopts a factor hidden Markov model to describe and operate the power distribution network electric appliance, adopts an EM iterative algorithm to solve the optimal solution, and finally corrects the deconstructing result, so that the integral deconstructing operation speed is high.

Drawings

FIG. 1 is a schematic diagram of a factor hidden Markov model according to the present invention;

FIG. 2 is a schematic flow chart of the method of the present invention;

FIG. 3 is a schematic diagram of the probability change of each state in training of each sub-term model according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

As shown in fig. 1, the factor hidden markov model (z.ghahamani (1997)) is an extension of the markov model, the system is a superposition of multiple markov processes, and the observations satisfy the joint distribution of the multiple hidden markov models. The model can be used to describe the total power consumption data formed by overlapping a plurality of electric appliances each of which can have a plurality of states. Kolter (2012) uses a factor Markov model to identify the electric appliances, and uses a machine learning algorithm in deconstructing to successfully identify and decompose 9 electric appliances with different switching signals, wherein the identification accuracy of the electric appliance power utilization time and the recall rate of the total electric power consumption which are successfully interpreted reach 83% and 60% respectively. However, kolter (2012) uses data of 1kHz or more, and cannot be used directly to decompose user data in the open sea. The invention provides a semi-supervised electricity consumption data deconstructing model, which introduces various external objective factors such as meteorological conditions, and the like, and obtains a model capable of accurately deconstructing the total electric quantity data of a building in a period without sub-metering on the basis of model training of the building data with sub-metering.

Fig. 2 is a schematic flow chart of the method of the present invention, and the deconstructing method specifically includes the following steps:

1) Dividing the sub-term power data into two continuous parts, namely a training set and a testing set. The training set is used for training and optimizing a model, and the total electric quantity, the power consumption of the sub-electric appliances and all external data are used; while the test set is used to test the effect and accuracy of the model, only the total power and external data will be used.

2) And training the training set data of each electric appliance by applying a hidden Markov model to obtain an optimal parameter solution.

3) Parameters of each sub-model are obtained, and a total model is built by combining total electric quantity data. And calculating the state corresponding to the total electric quantity observation data at each moment on the test set by using the total model, and decomposing the total state to the state corresponding to each sub-electric appliance.

4) And predicting the priori expected value of each electric appliance in the test time according to the trained sub-term model, and revising the predicted expected value according to the observed value of the total electric consumption and the state distribution of each sub-term electric appliance to obtain a final deconstructing result.

5) And comparing the actual electric quantity with the deconstructed result, and calculating the accuracy of model training and deconstructing.

The steps are specifically described as follows:

1. factor hidden Markov model based on external environment variable

Assuming that the probability density distribution of the total power consumption in a certain state at the moment is normal distribution, the expected value is the sum of expected values of the corresponding states of the sub-power consumers, and the expected value of each sub-power consumer is a linear function of external influence factors, the probability density distribution of the ith power consumer in the corresponding state at the moment is also normal distribution, namely

Wherein Y is _t Is the total electricity consumption, N is the number of the electric appliances,

is in the state that the ith electric appliance is positioned at the time t,

is the electric appliance in the state +.>

Is a desired value of (2). The expected value of each consumer is related to a number of external factors, the measurement of which and the amount of electricity are independent of each other. For each sub-model, an initial state probability and a transition matrix

Wherein m is ⁽ⁱ⁾ Is the state number of the ith electric appliance. The total degree of freedom is

Each electric appliance corresponds to four items, namely initial state probability, a transition matrix, external factor parameters and normal distribution standard deviation.

2. Model estimation based on EM algorithm

The EM algorithm is an iterative algorithm, all observed values are known, the state number is fixed, and the difference between the parameter estimation and the true value is reduced by solving the expected value and the maximum likelihood method in each iteration until the estimated parameter difference in two adjacent iterations reaches the expected value. The number of states of each of the electric appliance models is determined by a similar method. By trying to go from 2 to an upper state number limit, here 25, the optimal state number is chosen so that it satisfies the three conditions of minimum BIC, stable residual and no deviation from excessive value for each state.

Residual error

Information volume

Degree of deviation

3. Total power consumption deconstructing based on factor hidden Markov model

All parameters of the total electric quantity factor hidden Markov model are obtained by combining the parameters of all electric appliances, and the total state number is

The state probability distribution, state transition matrix and the state in which they are the matrix tensor product of all the consumer parameters,

furthermore, the state probabilities of the respective electric appliances at the respective times may be obtained by performing the inverse operation by the equation 7-1. The probability density distribution of each state of the total model is also normal distribution, the probability density distribution is combined by the distribution of the states of the applied appliances, the expected value is the sum of expected value functions of the corresponding states of the applied appliances, and the variance is calculated by the expected value and the actual observed value. The conditional state probability of the total model in the test set area is as follows by fixed parameters

The state probability distribution of each electric appliance at each moment can be obtained by decomposing the conditional state probability of the total model by the formula (13).

4. Electrolysis structure for dividing total power consumption

For each electric appliance, the test set part has no observation data, and can only carry out multi-period prediction according to the training set data and the obtained model to obtain the expected value of the period

The conditional state probability, transition matrix, and actual observations of the influencing factors at future times are used in the prediction. Since the probability distribution of states is only related to the states of the previous moment, the state probabilities of all moments in the future are obtained by multiplying the state probability of the last moment of the training set by the power of the transition matrix. The expected values for the individual states at each instant are derived from the observed values of the relevant variables and the linear function. And obtaining the expected value of the electricity consumption of the electric appliance at the future moment according to the probability and the expected value of each state.

The power values predicted directly from the model depend only on the model and training set data, and the predicted values may deviate from the actual power usage if some changes occur in the future. In the process of real deconstructing, the total power consumption of each period can be observed, and the predicted value of the power consumption of each electric appliance can be corrected by using the total power consumption. The difference between the sum of the observed value of the total power consumption and the predicted value of all the power consumption is distributed by each power consumption according to a certain condition, and the proportion of each power consumption is determined by the change of the power consumption and the fluctuation degree of the power consumption in adjacent days. The deconstructed result is:

in the method, in the process of the invention,

representing the final deconstructed result of the total power consumption, +.>

In terms of power consumption change, the power consumption of the same power consumption at the same time of two adjacent days should be kept unchanged, so that the proportion of the largest power consumption which changes is larger, namely:

in the method, in the process of the invention,

indicating the total power consumption of the ith power consumer.

Meanwhile, according to the optimal allocation principle, the larger the original fluctuation, the larger the contribution of the data to the variation is, namely:

indicating the proportion of time the ith electrical consumer is in state j +.>

5. Model training and deconstructing accuracy measurement and calculation

Defining the relative error between the measured value and the estimated value to represent the accuracy of the model is that

Wherein is y _t The observed value is

Model estimation. The relative error can be used to estimate the accuracy of the model training and test set. The comparison between the relative errors of different electrical appliances can be used to estimate the difficulty of decomposition of different electrical appliances and the factors affecting the decomposition accuracy.

Specific calculation examples:

the power data used includes the hour power consumption data of four types of commercial buildings, which are integrated in one year of 2016 and four types of sub-consumer. These four commercial buildings include malls, offices, hotels, and complex buildings. The four sub-consumer types include lighting, air conditioning, power and others.

In addition to the power data, some hour data, such as weather and date type, that affects the power usage behavior are also introduced. Weather data includes temperature, rainfall, wind speed, barometric pressure, and humidity. The date type is represented by two dummy variables, where 00 represents a weekday (including weekend holidays), 10 represents a legal holiday, and 11 represents a normal weekend. A dummy variable is also introduced to distinguish the daytime and the night, 24 hours a day are divided into two types, 1 represents the daytime, the daytime and the time of different electric appliances are different, but the same every day, and the electricity consumption data are clustered by a K-average method.

There are many outliers and missing values in the raw data. Judging whether each point is an abnormal value or not by deviating the average value of the points, respectively carrying out the working days and the weekends, selecting data of the same time of two weeks before and after the point to be judged as a reference, and if the distance of the point deviating from the average value of the reference point is more than three times of the standard deviation of the reference point, identifying the point as the abnormal value, and eliminating the observed value as the missing treatment. And repairing the original missing and abnormal removed points by using a linear interpolation method.

Because the total electricity consumption in the electricity consumption data is the sum of the sub-term values, the sub-term electricity consumption is firstly cleaned in a time-sharing mode and a weekend working day respectively, and then the total electricity consumption data is cleaned according to the cleaning result. In the process of cleaning the total electric quantity, the data is divided into two parts, wherein the sum of the parts which are larger than or equal to each part is smaller than the sum of the parts, and the parts which are larger than the sum of the parts account for the majority, so that the electric quantity of some electric appliances in the data is not measured. The portion greater than the sum of the individual components calculates the average difference delta, and the portion less than the sum of the individual components forcibly sets the total amount of electricity as the sum of the individual components plus the average difference delta. In order to properly meter all the electricity consumption, a new variable unknown is defined to represent the difference between the total electricity consumption and the sum of the individual components.

The electricity usage of a building exhibits different properties in two dimensions, average daily, average per hour. The daily electricity consumption of various electric appliances is averaged, and the market is stable in annual electricity consumption except for the increase of the electricity consumption of the air conditioner in summer. Hotel, synthesis, lighting is mainly electricity consumption, electricity consumption is stable all year round, and there are weekdays and weekends; the electricity consumption of the air conditioner in summer is very high and exceeds illumination, the electricity consumption in other seasons is very low, and the weekends of the working days are not obvious; power and other annual stability are almost unchanged. The office work has obvious weekend distribution of the working days, the lighting power consumption is the most, and the summer is slightly reduced; the electricity consumption of the air conditioner is inferior to that of illumination, the air conditioner is used throughout the year, and very obvious peaks appear in summer and winter. The market and the comprehensive building are greatly affected in spring festival and national celebration, and the electricity consumption is obviously reduced. Office buildings have invalid data for a period of time in October and hotels in 2 months. The data state details are shown in table 1.

Table 1 building power status list

The data length is 365×24=8784, the first 7000 values are used as training set, and the last 1784 values are used as test set. Fitting the data of each item training set by using a hidden Markov model.

The probability of each occurrence of more than ten states of each sub-term is shown in figure 3, the probability of each occurrence of the states is more uniform, the cut-off state is difficult to occur, and the prediction effect can be very good; the air conditioner and the unknown item have a phenomenon that the probability of one state is far higher than that of the other states, and the states are easy to be turned off, namely, the other states are turned into the state and are not turned into the other states, so that the prediction capability can be not improved.

The model training results are shown in table 2. The optimal state number is more than ten, the relative error of the fitting result is mostly less than 10%, and the other terms have slightly larger relative error in electricity consumption and the residual standard deviation is the smallest.

TABLE 2 training set results including HMM State number, BIC, degree of freedom, residual Standard error, fitting relative error

The model obtained by fitting each sub training set is subjected to multi-stage prediction, the expected value of each stage is calculated, and compared with the actual observed value of the test set, and the relative error is shown in table 3. It can be seen that the relative error difference of different partial predictions is larger, wherein the minimum illumination and power are less than 14%, and the training errors of the two models are also minimum and only less than 3%; the prediction error of the air conditioner is maximum and reaches 55.4%, but the model training error of the air conditioner is not maximum, which indicates that the randomness of the power consumption of the air conditioner is very large, and the history data cannot completely express the possible power consumption behaviors in the future; for other and unknown items, the prediction errors are 38.4% and 34.1%, respectively, the model training errors are also maximum, and the description accuracy of the existing model on the data is limited due to the fact that the electric wave mobility is relatively large. The comparison result of the training error and the prediction error shows that the magnitude of the prediction error is determined by the quality of the training model and the randomness of the actual power consumption in the future, and the prediction error is 2-5 times larger than the training error.

TABLE 3 relative errors in training, predicting and deconstructing the data model for each sub-term

The relative errors of the four building model training and deconstructing results are shown in table 4, it can be seen that 1) the relative errors of deconstructing are much larger than the errors of model prediction, which means that the training set data cannot fully express all the attributes of the test set data, 2) the larger the training set errors are, the larger the errors in deconstructing are, which means that the accuracy of the deconstructing results depends on the training set data and the quality of model training to a great extent, 3) the relative errors of the model training and deconstructing results of an air conditioner and an unknown item are the largest, which means that the data fluctuation of the two items is larger, the model is difficult to fully describe all the characteristics, so the deconstructing errors are relatively larger, 4) the model training and deconstructing results of hotel power consumption are larger than those of other buildings, except for the invasiveness of hotel february data, and the model training and deconstructing results are slightly worse. In summary, the deconstructing model adopted in the text is feasible, the accuracy of the deconstructing results is high, and the relative error of the overall deconstructing of various buildings is about 22%.

Table 4 relative error in fitting and deconstructing results for four building models

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The power distribution network power utilization time sequence deconstructing method based on the Gaussian mixture algorithm is characterized by comprising the following steps of:

step 5: performing comparison revision on the total electric quantity observation data by using the priori expected value to obtain a final deconstructed result;

the calculation formula of the probability of each state of the prior expected value of each electrical appliance in the step 4 in the test time is as follows:

in the method, in the process of the invention,

probability of each state representing a priori desired value, +.>

Representing the state probability of the corresponding training set at the last moment, Γ ^h Representing a transfer matrix;

the calculation formula of the expected value of each state of the prior expected value of each electrical appliance in the step 4 in the test time is as follows:

in the method, in the process of the invention,

the expected value of each state representing the a priori expected value, S _t+h Representing the correlation variable value, f () represents a linear function;

the calculation formula of the priori expected value of each electrical appliance in the step 4 in the test time is as follows:

in the method, in the process of the invention,

representing a priori expected value of each electrical appliance in test time。

2. The power distribution network power consumption time sequence deconstructing method based on the Gaussian mixture algorithm according to claim 1, wherein the step 2 comprises the following sub-steps:

3. The power distribution network power consumption time sequence deconstructing method based on the Gaussian mixture algorithm according to claim 1, wherein the final deconstructing result in the step 5 is characterized in that the description formula is as follows:

in the method, in the process of the invention,

representing the final deconstructed result of the total power consumption, +.>

4. The power distribution network power consumption time sequence deconstructing method based on the Gaussian mixture algorithm according to claim 3, wherein the calculation formula of the correction value proportion of the power consumption change of the electric appliance is as follows:

in the method, in the process of the invention,

indicating the total power consumption of the ith power consumer.

5. The power distribution network power consumption time sequence deconstructing method based on the Gaussian mixture algorithm according to claim 3, wherein the calculation formula of the correction value proportion of the power consumption fluctuation is as follows:

indicating the proportion of time the ith consumer is in state j,