CN111177216A

CN111177216A - Association rule generation method and device for behavior characteristics of comprehensive energy consumer

Info

Publication number: CN111177216A
Application number: CN201911333048.3A
Authority: CN
Inventors: 董得龙; 孙虹; 卢静雅; 杨光; 孔祥玉; 祝雨晨; 李野; 李刚; 乔亚男; 刘浩宇; 翟术然; 张兆杰; 许迪; 赵紫敬; 吕伟嘉; 顾强; 何泽昊; 季浩; 白涛
Original assignee: Tianjin University; State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Current assignee: Tianjin University; State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-05-19
Anticipated expiration: 2039-12-23
Also published as: CN111177216B

Abstract

The invention relates to a method for generating association rules of behavior characteristics of comprehensive energy consumers, which comprises the following steps: step 1, carrying out normalization processing on time series data of the intelligent electric meter; step 2, converting the time sequence data of the intelligent electric meter into symbolic representation; step 3, extracting a characteristic pattern in the symbol and adding a characteristic motif of the characteristic pattern to a subject library; step 4, mining the characteristic motif of the newly added characteristic pattern in the theme library according to a time association rule, and analyzing the relation among the influence factors causing the energy consumption change in a certain specific time period; step 5, respectively adopting a K-means clustering method, a hierarchical method and a density clustering algorithm to carry out clustering data analysis on the characteristic motif in the topic library to generate a daily consumption general profile; and 6, measuring the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition. The invention can accurately describe the consumption condition of real energy.

Description

Association rule generation method and device for behavior characteristics of comprehensive energy consumer

Technical Field

The invention belongs to the field of data mining technologies of intelligent electric meters, relates to user power utilization information, and particularly relates to a method and a device for generating association rules of behavior characteristics of comprehensive energy consumers.

Background

Smart grids are one of the promising technologies to meet the increasing energy demand and reduce global environmental pollution. It improves the efficiency, reliability, sustainability and economy of electric energy. During the last decade, smart meters have been deployed in most parts of the world. Smart meters and database management systems constitute an Advanced Metering Infrastructure (AMI) that plays an important role in energy systems by facilitating two-way information flow and recording energy distribution. AMI has generated various novel smart home services, such as recommending energy savings and awareness to end users. The intelligent electric meter has great potential for analyzing fine-grained energy consumption data and can be used for energy planning and management. Deployment of intelligent electric energy meters is beneficial to both energy consumers and utility professionals.

Time series data generated by smart meters has a great potential consumption pattern to identify regular and anomalous energy. Time series data mining techniques are modeled and developed to identify energy consumption behaviors of energy consumers. Smart meter data requires advanced data analysis for accurate and automated decision making in a real-time environment. Through dynamic pricing, it can improve the energy awareness of consumers by better understanding the way and time of energy usage. Energy data analysis has become a major research area for power consumption analysis. The ability to analyze smart meter data to identify daily activities is very useful to the utility company implementing demand-side management techniques.

In smart grids, renewable energy sources are increasing in popularity. However, the intermittency of renewable energy power generation causes a contradiction between supply and demand. Thus, dynamic energy trading prices throughout the day make the time aspect more important. The energy consumption pattern of the smart meter fluctuates differently during a day depending on the time or month, weather, schedule and behavior of the resident. Similarly, grid load may vary in time with changes in demand, temperature, and renewable energy generation, which in turn may be affected by weather and seasonal time scales.

In recent years, various techniques have been developed to mine time-series data. However, the temporal nature of time series energy consumption data has only been studied to a limited extent, but since energy consumption is a highly dynamic concept, load demand and pricing differ over time. Therefore, in order to accurately describe the consumption situation of real energy, an association rule generating method and an association rule generating device which have short response time and can realize comprehensive energy consumer behavior characteristics which are frequently sampled in a period of time are needed.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method and a device for generating association rules of comprehensive energy consumer behavior characteristics, which are reasonable in design, short in response time and capable of realizing frequent sampling within a period of time.

The technical problem to be solved by the invention is realized by adopting the following technical scheme:

a method for generating association rules of comprehensive energy consumer behavior characteristics comprises the following steps:

step 1, carrying out normalization processing on time series data of the intelligent electric meter;

step 2, carrying out cloud segmentation aggregation approximation on the intelligent electric meter time sequence data after normalization processing by using symbolic approximation clustering, and then converting the data into symbolic representation;

step 3, extracting a characteristic pattern in the symbolic representation result and adding a characteristic motif of the characteristic pattern to a subject library, wherein the characteristic motif meets a frequency counting and usability threshold value defined by a user;

step 4, mining a time association rule of the feature motif of the feature pattern newly added in the theme library, analyzing the relation among the influence factors causing the energy consumption change in a specific time period, and executing the next step if the absolute value of the correlation coefficient among the influence factors causing the energy consumption change is larger than a set value; otherwise, returning to the step 3 to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;

step 5, after clustering data analysis is carried out on the feature motifs in the subject database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm, a clustering result based on feature patterns corresponding to the feature motifs is executed, and a daily consumption general profile is generated;

and 6, respectively carrying out statistical evaluation on the daily consumption profiles generated by the three clustering methods through the square error sum and the two measurement indexes of the contour coefficient so as to measure the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition.

Moreover, the specific method of step 1 is:

normalizing the data with z-normalization, normalizing the unit variance, wherein the normalization value

The mean μ and standard deviation σ are shown below:

in the above formula, x_iN is the number of data to be processed.

Further, the specific steps of step 2 include:

(1) converting the time-series data into cloud segment aggregated approximate representation symbols;

(2) and converting the cloud segmentation aggregation approximate representation symbols into character strings.

Moreover, the step 2, the step (1), comprises the following specific steps:

carrying out cloud model representation on each current segmented sequence data;

②, evaluating the data stability of the subsequences by using the entropy of each cloud model, and selecting the subsequence with the worst stability (the smallest entropy) (marked as Q (i))₀，j₀) Segment aggregation;

③ subsequences Q (i) after segmentation and aggregation₀，j₀) Find a data point as the key point q_k，i₀＜k＜j₀The key point q_kEnabling two subsequences (Q (i) separated by it₀K) and Q (k, j)₀) The sum of the cloud model entropies of (i) and the subsequence Q (i)₀，j₀) The difference between the cloud model entropies of (a) is the largest while deleting the subsequence Q (i)₀，j₀) Recording the subsequence Q (i)₀K) and Q (k, j)₀)；

and fourthly, repeating the first step and the third step until the stop condition is met.

Moreover, the specific method in the step (2) of the step 2 is as follows:

after the cloud segment aggregation approximation representation, the discretized representation is symbolized, and the time series is converted into a string of symbols.

Further, the specific steps of step 3 include:

(1) generating SAX pattern types after the symbolic approximation clustering transformation; respectively marking the mode types generated by the SAX as themes or unusual modes; wherein a topic is defined as a previously unknown, frequently occurring pattern;

(2) and extracting a required characteristic pattern from the data after the symbolic approximation clustering conversion, and adding a characteristic motif of the characteristic pattern into the subject library.

Moreover, the specific method of the step 4 is as follows:

wherein X is a first occurrence, Y is a subsequent occurrence, and N is a total number rule of records or transactions; the confidence of X → Y is the antecedent and number of subsequent records supporting the rule as a fraction of the number of records supporting only the rule premise.

Further, the specific steps of step 5 include:

(1) clustering the characteristic motifs in the topic library by using a K-means clustering method:

block per day (N) using Euclidean distance metric₁,N₂,...,N₃) Is divided into k sets, S ═ S₁,S₂...S_k) To obtain the minimum sum of squares:

wherein, mu_iIs S_iAverage of the values in (a);

(2) clustering feature motifs in a topic library using a hierarchical approach:

clustering by using a bottom-up hierarchical algorithm, and improving a clustering result obtained by using an iterative relocation method;

(3) clustering feature motifs in a topic library by using a density clustering algorithm:

setting values of a scanning radius eps and a minimum contained point number minPts;

selecting an unaccessed point, and finding out all nearby points which are within the scanning radius eps;

if the number of the nearby points is larger than or equal to the minimum contained point number minPts, the current point and the nearby points form a cluster, and the starting point is marked as visited;

if the cluster is sufficiently expanded, i.e. all points in the cluster are marked as visited, then the same algorithm is used to process points that are not visited.

Further, the specific steps of step 6 include:

(1) the sum of squared errors SSE is a measure of the compactness of a cluster, and the cluster quality is better when the SSE is smaller, and the calculation formula is as follows:

wherein (N)₁,N₂,...,N₃) For a daily dataset, these data are divided into k sets S ═ (S)₁,S₂...S_k)。

(2) The contour coefficient is a measure of inter-cluster cohesion and intra-cluster separation, with a coefficient of 1 being preferably, -1 being worst; the calculation formula of the contour coefficient is as follows:

where a is the average distance between the data instance and all other points in the same cluster, and b is the average distance between the data instance and all other points in the cluster closest to the data instance.

An association rule generating apparatus for integrating behavioral characteristics of energy consumers, comprising:

the normalization processing module is used for performing normalization processing on the time sequence data of the intelligent electric meter;

the symbol conversion module is used for carrying out cloud segmentation aggregation approximation on the intelligent electric meter time sequence data after the normalization processing by using symbol approximation clustering, and then converting the data into symbolic representation;

the characteristic pattern extraction module is used for extracting a characteristic pattern in the symbol and adding a characteristic motif of the characteristic pattern to the subject library to meet the frequency counting and usability threshold value defined by a user;

the time association rule mining module is used for mining the time association rule of the feature motif of the feature pattern newly added in the theme library, analyzing the relation among the influence factors causing the energy consumption change in a certain specific time period, and executing the next step if the relation among the influence factors causing the energy consumption change is strong; if the relation is weak, returning to the characteristic pattern extraction module to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;

the clustering module is used for performing clustering data analysis on the feature motifs in the topic database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm respectively, and then executing a clustering result based on feature patterns corresponding to the feature motifs so as to generate daily consumption general profiles;

and the clustering result evaluation module is used for respectively carrying out statistical evaluation on the daily consumption profiles generated by the three clustering methods through the square error sum and the two measurement indexes of the profile coefficient so as to measure the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition.

The invention has the advantages and positive effects that:

1. the invention provides an association rule generation method and an association rule generation device for behavior characteristics of comprehensive energy consumers. Firstly, symbolizing the data of the intelligent electric meter so as to facilitate the application of various data mining technologies; secondly, identifying an energy relation which can help to identify a consumer behavior pattern based on primitive theme identification, and extracting a characteristic pattern for meeting a frequency count and usability threshold defined by a user; thirdly, mining time association rules, and analyzing the relation among certain factors which can cause the increase/decrease of energy consumption in a certain period of time; finally, pattern-based clustering is performed to create a daily profile of energy consumption. The invention can accurately describe the consumption condition of real energy.

2. The invention converts the energy consumption value into symbolic representation, thereby greatly reducing the dimension of the time sequence. The present invention uses symbolic representations that can be used both for local operations on embedded sensing systems for home automation and for utility experts to effectively process smart meter data.

3. The primitive of the invention is defined as a previously unknown and frequently-occurring mode in a time sequence, the similarity representing the same affair in real life can be found, the family behavior at a specific time can be identified by adopting the patterns and the time labels thereof, therefore, the application of the primitive can play an important role in understanding the family behavior and determining the family mode, and further, the consumption condition of real energy can be accurately described by the time information extracted by the method.

Drawings

FIG. 1 is a flow chart of the steps of the present invention

FIG. 2 is a process flow diagram of the present invention;

FIG. 3 is a flow diagram of the cloud segment aggregated approximate representation of the present invention;

FIG. 4 is a process flow diagram of the density clustering algorithm of the present invention.

Detailed Description

The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:

the method for generating association rules of the behavior characteristics of the integrated energy consumer, as shown in fig. 1, fig. 2, fig. 3 and fig. 4, comprises the following steps:

the specific method of the step 1 comprises the following steps:

the data was normalized by z-normalization, and the unit variance was normalized, where the normalized value x, mean μ, and standard deviation σ are as follows:

step 2, converting the preprocessed time series data of the intelligent electric meter into symbolic representation by using symbolic approximate clustering;

in step 2, the preprocessed time sequence data is converted into Symbolic representation by using Symbolic approximate clustering (SAX), and dimension reduction can be effectively realized by using SAX to prepare data in a format suitable for applying different data mining technologies.

The specific steps of the step 2 comprise:

discretizing the z-normalized time series data using cloud segment aggregation approximation C (PAA). In the expression method, the data stability of the subsequences is evaluated by using the entropy of each cloud model, and the subsequences which do not meet the requirements are subjected to cloud segmentation aggregation approximation again, so that the subsequences are divided into w-dimensional spaces.

The step 2, the step (1), comprises the following specific steps:

(2) Converting the cloud segmentation aggregation approximate expression symbol into a character string;

the specific method of the step 2 and the step (2) is as follows:

In the present embodiment, the original sequence is represented by english letters, and generally, the symbol "a" represents low power consumption, "b" represents average value, "c" represents higher than average value, and "d" represents high power consumption. After SAX conversion, descriptive knowledge types (e.g., topic, association rule mining) in the time series data may be applied to knowledge discovery.

And 3, extracting the characteristic patterns and adding the characteristic motifs of the characteristic patterns to a subject library to meet the frequency counting and usability threshold value defined by a user.

The specific steps of the step 3 comprise:

(2) extracting a required characteristic pattern from the data after the symbol approximate clustering conversion, and adding a characteristic motif of the characteristic pattern into a subject library;

in this embodiment, after sign-approximation clustering SAX conversion, we focus on the pattern type of SAX generation.

The type of pattern generated by SAX is marked as a topic, defined as a previously unknown, frequently occurring pattern, or as an uncommon pattern.

The alphabet size a should be fixed as a reasonable compromise, since having too many symbols will yield too many patterns that may not be repeated, and on the other hand, having few symbols will not capture more of the consumed resolution. The number of windows W per day must also be carefully selected.

A large number of patterns can be extracted from the data. It is not necessary that all the patterns found are used for the analysis. The frequency and availability of the pattern play an important role in detecting the regular behavior of the smart meter. For example, most commonly, the second most common topic is important to the analysis. We can set a threshold for selecting the feature pattern. The characteristic pattern is a pattern that satisfies different criteria, such as the number of occurrences for a particular time period, exceeding a threshold. The threshold may be set as a fraction of the frequency of each motif and the total number of all motifs.

Furthermore, in general, a power specialist with some expertise and experience will test all topics to determine interesting or uninteresting topics. There is no activity for a long time, except for one change in meter reading, which results in a series of similar patterns. For example, when a letter size of 5 is used, a long period of no activity or event will result in a pattern such as cccca, ccccac, cccacc, etc., except for a small drop in energy consumption. As only one of these is interesting. Further analysis, excluding other topics, starts with two or more c. A pattern is considered to be meaningless if it represents only an increase in energy consumption. The pattern found should reflect the complete behavior and the pattern showing the increase in energy expenditure represents only the beginning of an event or activity. One useful theme would include initiating activities (turning on the device) and completing activities (turning off the device). Finally, the pattern time and its availability within one day, one week and one month are important for different energy consumption behaviors.

In this way, feature motifs are extracted and added to the topic library to further discover knowledge. These patterns have the potential to determine the energy consumption behavior of each consumer.

Step 4, mining the characteristic motif of the newly added characteristic pattern in the theme library according to a time association rule, analyzing the relation among the influence factors causing the energy consumption change in a certain specific time period, and executing the next step if the absolute value of the correlation coefficient among the influence factors causing the energy consumption change is greater than a set value of 0.8; otherwise, returning to the step 3 to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;

the specific method of the step 4 comprises the following steps:

wherein X is a first occurrence, Y is a subsequent occurrence, and N is a total number rule for records or transactions; the confidence of X → Y is the antecedent and number of subsequent records supporting the rule as a fraction of the number of records supporting only the rule premise.

In this embodiment, association rule mining finds cross-section associations without considering temporal information. Since power has complex dynamics, time-dependent rule mining is a hot spot of our research. The time association rule mining searches the relation between variables in a specific time period.

The proposed association rule mining method is based on a motif library containing time-period-specific feature motif information. The extraction of frequent topics must be done from a repository whose number of supports is greater than or equal to a user-provided minimum support threshold. For example, if X is a precedent and Y is a subsequent result, the association rule X → Y indicates that if X occurs, Y will also occur. The support of the rule is the ratio of the number of look-ahead and follow-up occurrences to the total number of records. Support of association rules indicates statistical importance. Association rules with lower support indicate those relationships that are not common, and rules with higher support describe those relationships that are common in the record.

Where X is the first occurrence, Y is the next occurrence, and N is the total number of records or transactions rule X → the confidence of Y is the antecedent and subsequent number of records that support the rule and the score of the number of records that support only the rule antecedent. Confidence indicates statistical strength. A higher confidence indicates a stronger correlation between the precedent word and the follow-up word, while a lower confidence indicates a weaker correlation.

In power dynamics, association rules that occur within a certain time period are of particular interest to us. The format of such a rule is

It indicates that Y will occur in the Ti slot after X occurs. The relationship between the appliances may be interpreted from the shape of the used amount of electricity.

Step 5, clustering feature motifs in the subject database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm respectively, and performing clustering based on feature patterns to generate daily consumption profiles after clustering data analysis;

in this example, after motif discovery, characteristic motifs were clustered to generate a daily consumption profile. The daily profile represents the household power pattern. This is an important step if there are 15 characteristic patterns and the power specialist needs to further aggregate these patterns into 5 or 6 typical energy consumption cases. It will provide the user with additional controls to further describe the capabilities that can be used to select either parameter a or W during the SAX conversion process. There are various existing clustering approaches for different application targets. Time series clustering is a model clustering method based on characteristics

The specific steps of the step 5 comprise:

(1) due to the simplicity of K-means clustering, we use the euclidean distance metric. Daily block (N)₁,N₂,...,N₃) Is divided into k sets, S ═ S₁,S₂...S_k) In order to minimize the sum of squares.

Wherein, mu_iIs S_iAverage value of the values in (1)

(2) Balanced iterative reduction and clustering using hierarchical methods

The Balanced Iterative Reduction and Clustering (BIRCH) Using hierarchical clustering algorithm Using hierarchical method is an effective and traditional hierarchical clustering algorithm. In a given motif library, the BIRCH algorithm can efficiently perform clustering with one pass of scanning, and can efficiently handle outliers.

The algorithm first uses a bottom-up hierarchical algorithm and then iterative relocation to improve the result. Hierarchical clustering uses a bottom-up strategy, where each object is first treated as an atomic cluster, and then the clusters are merged to form larger clusters, reducing the number of clusters until all objects are in a cluster, or some termination condition is met.

(3) Clustering by density clustering algorithm

A density clustering algorithm, which defines clusters as the largest set of density-connected points, can divide areas with sufficiently high density into clusters and find clusters of arbitrary shape in a spatial database of noise. Two parameters are required for density clustering algorithms, the scan radius (eps) and the minimum contained points (minPts). Optionally, starting with a point that is not visited (unvisited), find all nearby points that are within eps distance (including eps) from it.

If the number of nearby points is ≧ minPts, the current point forms a cluster with its nearby points, and the departure point is marked as visited (visited). Then recursively, all points in the cluster that are not marked as accessed (visited) are processed in the same way, thereby expanding the cluster.

If the number of nearby points < minPts, the point is temporarily marked as a noise point.

If the cluster is sufficiently expanded, i.e., all points within the cluster are marked as visited, then the same algorithm is used to process the points that are not visited.

And 6, respectively carrying out statistical evaluation on the three clustering methods through the square error sum and two measurement indexes of the contour coefficient to measure the degree that the k-means clustering can create the daily consumption general profile group.

The specific steps of the step 6 comprise:

(1) the sum of squared errors SSE is a measure of cluster compactness, and a small error indicates good cluster quality, and is calculated by the following formula:

wherein (N)₁,N₂,…,N₃) For a daily dataset, these data are divided into k sets S ═ (S)₁,S₂...S_k)。

The invention may be stored in a computer readable storage medium, storing the following templates: as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A method for generating association rules of behavior characteristics of an integrated energy consumer is characterized by comprising the following steps: the method comprises the following steps:

2. The method according to claim 1, wherein the method comprises the following steps: the specific method of the step 1 comprises the following steps:

The mean μ and standard deviation σ are shown below:

in the above formula, x_iN is the number of data to be processed.

3. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 2 comprise:

4. The method according to claim 3, wherein the method comprises the following steps: the step 2, the step (1), comprises the following specific steps:

5. The method according to claim 3, wherein the method comprises the following steps: the specific method of the step 2 and the step (2) is as follows:

6. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 3 comprise:

7. The method according to claim 1, wherein the method comprises the following steps: the specific method of the step 4 comprises the following steps:

8. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 5 comprise:

wherein, mu_iIs S_iAverage of the values in (a);

(2) clustering feature motifs in a topic library using a hierarchical approach:

9. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 6 comprise:

wherein (N)₁,N₂,...,N₃) For a daily dataset, these data are divided into k sets S ═ (S)₁,S₂…S_k)。

10. An association rule generating apparatus for integrating behavioral characteristics of energy consumers, comprising: