CN111177216A - Association rule generation method and device for behavior characteristics of comprehensive energy consumer - Google Patents

Association rule generation method and device for behavior characteristics of comprehensive energy consumer Download PDF

Info

Publication number
CN111177216A
CN111177216A CN201911333048.3A CN201911333048A CN111177216A CN 111177216 A CN111177216 A CN 111177216A CN 201911333048 A CN201911333048 A CN 201911333048A CN 111177216 A CN111177216 A CN 111177216A
Authority
CN
China
Prior art keywords
clustering
data
characteristic
steps
characteristic pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911333048.3A
Other languages
Chinese (zh)
Other versions
CN111177216B (en
Inventor
董得龙
孙虹
卢静雅
杨光
孔祥玉
祝雨晨
李野
李刚
乔亚男
刘浩宇
翟术然
张兆杰
许迪
赵紫敬
吕伟嘉
顾强
何泽昊
季浩
白涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Original Assignee
Tianjin University
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University, State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd, Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd filed Critical Tianjin University
Priority to CN201911333048.3A priority Critical patent/CN111177216B/en
Publication of CN111177216A publication Critical patent/CN111177216A/en
Application granted granted Critical
Publication of CN111177216B publication Critical patent/CN111177216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a method for generating association rules of behavior characteristics of comprehensive energy consumers, which comprises the following steps: step 1, carrying out normalization processing on time series data of the intelligent electric meter; step 2, converting the time sequence data of the intelligent electric meter into symbolic representation; step 3, extracting a characteristic pattern in the symbol and adding a characteristic motif of the characteristic pattern to a subject library; step 4, mining the characteristic motif of the newly added characteristic pattern in the theme library according to a time association rule, and analyzing the relation among the influence factors causing the energy consumption change in a certain specific time period; step 5, respectively adopting a K-means clustering method, a hierarchical method and a density clustering algorithm to carry out clustering data analysis on the characteristic motif in the topic library to generate a daily consumption general profile; and 6, measuring the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition. The invention can accurately describe the consumption condition of real energy.

Description

Association rule generation method and device for behavior characteristics of comprehensive energy consumer
Technical Field
The invention belongs to the field of data mining technologies of intelligent electric meters, relates to user power utilization information, and particularly relates to a method and a device for generating association rules of behavior characteristics of comprehensive energy consumers.
Background
Smart grids are one of the promising technologies to meet the increasing energy demand and reduce global environmental pollution. It improves the efficiency, reliability, sustainability and economy of electric energy. During the last decade, smart meters have been deployed in most parts of the world. Smart meters and database management systems constitute an Advanced Metering Infrastructure (AMI) that plays an important role in energy systems by facilitating two-way information flow and recording energy distribution. AMI has generated various novel smart home services, such as recommending energy savings and awareness to end users. The intelligent electric meter has great potential for analyzing fine-grained energy consumption data and can be used for energy planning and management. Deployment of intelligent electric energy meters is beneficial to both energy consumers and utility professionals.
Time series data generated by smart meters has a great potential consumption pattern to identify regular and anomalous energy. Time series data mining techniques are modeled and developed to identify energy consumption behaviors of energy consumers. Smart meter data requires advanced data analysis for accurate and automated decision making in a real-time environment. Through dynamic pricing, it can improve the energy awareness of consumers by better understanding the way and time of energy usage. Energy data analysis has become a major research area for power consumption analysis. The ability to analyze smart meter data to identify daily activities is very useful to the utility company implementing demand-side management techniques.
In smart grids, renewable energy sources are increasing in popularity. However, the intermittency of renewable energy power generation causes a contradiction between supply and demand. Thus, dynamic energy trading prices throughout the day make the time aspect more important. The energy consumption pattern of the smart meter fluctuates differently during a day depending on the time or month, weather, schedule and behavior of the resident. Similarly, grid load may vary in time with changes in demand, temperature, and renewable energy generation, which in turn may be affected by weather and seasonal time scales.
In recent years, various techniques have been developed to mine time-series data. However, the temporal nature of time series energy consumption data has only been studied to a limited extent, but since energy consumption is a highly dynamic concept, load demand and pricing differ over time. Therefore, in order to accurately describe the consumption situation of real energy, an association rule generating method and an association rule generating device which have short response time and can realize comprehensive energy consumer behavior characteristics which are frequently sampled in a period of time are needed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method and a device for generating association rules of comprehensive energy consumer behavior characteristics, which are reasonable in design, short in response time and capable of realizing frequent sampling within a period of time.
The technical problem to be solved by the invention is realized by adopting the following technical scheme:
a method for generating association rules of comprehensive energy consumer behavior characteristics comprises the following steps:
step 1, carrying out normalization processing on time series data of the intelligent electric meter;
step 2, carrying out cloud segmentation aggregation approximation on the intelligent electric meter time sequence data after normalization processing by using symbolic approximation clustering, and then converting the data into symbolic representation;
step 3, extracting a characteristic pattern in the symbolic representation result and adding a characteristic motif of the characteristic pattern to a subject library, wherein the characteristic motif meets a frequency counting and usability threshold value defined by a user;
step 4, mining a time association rule of the feature motif of the feature pattern newly added in the theme library, analyzing the relation among the influence factors causing the energy consumption change in a specific time period, and executing the next step if the absolute value of the correlation coefficient among the influence factors causing the energy consumption change is larger than a set value; otherwise, returning to the step 3 to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;
step 5, after clustering data analysis is carried out on the feature motifs in the subject database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm, a clustering result based on feature patterns corresponding to the feature motifs is executed, and a daily consumption general profile is generated;
and 6, respectively carrying out statistical evaluation on the daily consumption profiles generated by the three clustering methods through the square error sum and the two measurement indexes of the contour coefficient so as to measure the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition.
Moreover, the specific method of step 1 is:
normalizing the data with z-normalization, normalizing the unit variance, wherein the normalization value
Figure BDA0002330190500000032
The mean μ and standard deviation σ are shown below:
Figure BDA0002330190500000031
in the above formula, xiN is the number of data to be processed.
Further, the specific steps of step 2 include:
(1) converting the time-series data into cloud segment aggregated approximate representation symbols;
(2) and converting the cloud segmentation aggregation approximate representation symbols into character strings.
Moreover, the step 2, the step (1), comprises the following specific steps:
carrying out cloud model representation on each current segmented sequence data;
②, evaluating the data stability of the subsequences by using the entropy of each cloud model, and selecting the subsequence with the worst stability (the smallest entropy) (marked as Q (i))0,j0) Segment aggregation;
③ subsequences Q (i) after segmentation and aggregation0,j0) Find a data point as the key point qk,i0<k<j0The key point qkEnabling two subsequences (Q (i) separated by it0K) and Q (k, j)0) The sum of the cloud model entropies of (i) and the subsequence Q (i)0,j0) The difference between the cloud model entropies of (a) is the largest while deleting the subsequence Q (i)0,j0) Recording the subsequence Q (i)0K) and Q (k, j)0);
and fourthly, repeating the first step and the third step until the stop condition is met.
Moreover, the specific method in the step (2) of the step 2 is as follows:
after the cloud segment aggregation approximation representation, the discretized representation is symbolized, and the time series is converted into a string of symbols.
Further, the specific steps of step 3 include:
(1) generating SAX pattern types after the symbolic approximation clustering transformation; respectively marking the mode types generated by the SAX as themes or unusual modes; wherein a topic is defined as a previously unknown, frequently occurring pattern;
(2) and extracting a required characteristic pattern from the data after the symbolic approximation clustering conversion, and adding a characteristic motif of the characteristic pattern into the subject library.
Moreover, the specific method of the step 4 is as follows:
Figure BDA0002330190500000041
Figure BDA0002330190500000042
wherein X is a first occurrence, Y is a subsequent occurrence, and N is a total number rule of records or transactions; the confidence of X → Y is the antecedent and number of subsequent records supporting the rule as a fraction of the number of records supporting only the rule premise.
Further, the specific steps of step 5 include:
(1) clustering the characteristic motifs in the topic library by using a K-means clustering method:
block per day (N) using Euclidean distance metric1,N2,...,N3) Is divided into k sets, S ═ S1,S2...Sk) To obtain the minimum sum of squares:
Figure BDA0002330190500000043
wherein, muiIs SiAverage of the values in (a);
(2) clustering feature motifs in a topic library using a hierarchical approach:
clustering by using a bottom-up hierarchical algorithm, and improving a clustering result obtained by using an iterative relocation method;
(3) clustering feature motifs in a topic library by using a density clustering algorithm:
setting values of a scanning radius eps and a minimum contained point number minPts;
selecting an unaccessed point, and finding out all nearby points which are within the scanning radius eps;
if the number of the nearby points is larger than or equal to the minimum contained point number minPts, the current point and the nearby points form a cluster, and the starting point is marked as visited;
if the cluster is sufficiently expanded, i.e. all points in the cluster are marked as visited, then the same algorithm is used to process points that are not visited.
Further, the specific steps of step 6 include:
(1) the sum of squared errors SSE is a measure of the compactness of a cluster, and the cluster quality is better when the SSE is smaller, and the calculation formula is as follows:
Figure BDA0002330190500000051
wherein (N)1,N2,...,N3) For a daily dataset, these data are divided into k sets S ═ (S)1,S2...Sk)。
(2) The contour coefficient is a measure of inter-cluster cohesion and intra-cluster separation, with a coefficient of 1 being preferably, -1 being worst; the calculation formula of the contour coefficient is as follows:
Figure BDA0002330190500000061
where a is the average distance between the data instance and all other points in the same cluster, and b is the average distance between the data instance and all other points in the cluster closest to the data instance.
An association rule generating apparatus for integrating behavioral characteristics of energy consumers, comprising:
the normalization processing module is used for performing normalization processing on the time sequence data of the intelligent electric meter;
the symbol conversion module is used for carrying out cloud segmentation aggregation approximation on the intelligent electric meter time sequence data after the normalization processing by using symbol approximation clustering, and then converting the data into symbolic representation;
the characteristic pattern extraction module is used for extracting a characteristic pattern in the symbol and adding a characteristic motif of the characteristic pattern to the subject library to meet the frequency counting and usability threshold value defined by a user;
the time association rule mining module is used for mining the time association rule of the feature motif of the feature pattern newly added in the theme library, analyzing the relation among the influence factors causing the energy consumption change in a certain specific time period, and executing the next step if the relation among the influence factors causing the energy consumption change is strong; if the relation is weak, returning to the characteristic pattern extraction module to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;
the clustering module is used for performing clustering data analysis on the feature motifs in the topic database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm respectively, and then executing a clustering result based on feature patterns corresponding to the feature motifs so as to generate daily consumption general profiles;
and the clustering result evaluation module is used for respectively carrying out statistical evaluation on the daily consumption profiles generated by the three clustering methods through the square error sum and the two measurement indexes of the profile coefficient so as to measure the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition.
The invention has the advantages and positive effects that:
1. the invention provides an association rule generation method and an association rule generation device for behavior characteristics of comprehensive energy consumers. Firstly, symbolizing the data of the intelligent electric meter so as to facilitate the application of various data mining technologies; secondly, identifying an energy relation which can help to identify a consumer behavior pattern based on primitive theme identification, and extracting a characteristic pattern for meeting a frequency count and usability threshold defined by a user; thirdly, mining time association rules, and analyzing the relation among certain factors which can cause the increase/decrease of energy consumption in a certain period of time; finally, pattern-based clustering is performed to create a daily profile of energy consumption. The invention can accurately describe the consumption condition of real energy.
2. The invention converts the energy consumption value into symbolic representation, thereby greatly reducing the dimension of the time sequence. The present invention uses symbolic representations that can be used both for local operations on embedded sensing systems for home automation and for utility experts to effectively process smart meter data.
3. The primitive of the invention is defined as a previously unknown and frequently-occurring mode in a time sequence, the similarity representing the same affair in real life can be found, the family behavior at a specific time can be identified by adopting the patterns and the time labels thereof, therefore, the application of the primitive can play an important role in understanding the family behavior and determining the family mode, and further, the consumption condition of real energy can be accurately described by the time information extracted by the method.
Drawings
FIG. 1 is a flow chart of the steps of the present invention
FIG. 2 is a process flow diagram of the present invention;
FIG. 3 is a flow diagram of the cloud segment aggregated approximate representation of the present invention;
FIG. 4 is a process flow diagram of the density clustering algorithm of the present invention.
Detailed Description
The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:
the method for generating association rules of the behavior characteristics of the integrated energy consumer, as shown in fig. 1, fig. 2, fig. 3 and fig. 4, comprises the following steps:
step 1, carrying out normalization processing on time series data of the intelligent electric meter;
the specific method of the step 1 comprises the following steps:
the data was normalized by z-normalization, and the unit variance was normalized, where the normalized value x, mean μ, and standard deviation σ are as follows:
Figure BDA0002330190500000081
step 2, converting the preprocessed time series data of the intelligent electric meter into symbolic representation by using symbolic approximate clustering;
in step 2, the preprocessed time sequence data is converted into Symbolic representation by using Symbolic approximate clustering (SAX), and dimension reduction can be effectively realized by using SAX to prepare data in a format suitable for applying different data mining technologies.
The specific steps of the step 2 comprise:
(1) converting the time-series data into cloud segment aggregated approximate representation symbols;
discretizing the z-normalized time series data using cloud segment aggregation approximation C (PAA). In the expression method, the data stability of the subsequences is evaluated by using the entropy of each cloud model, and the subsequences which do not meet the requirements are subjected to cloud segmentation aggregation approximation again, so that the subsequences are divided into w-dimensional spaces.
The step 2, the step (1), comprises the following specific steps:
carrying out cloud model representation on each current segmented sequence data;
②, evaluating the data stability of the subsequences by using the entropy of each cloud model, and selecting the subsequence with the worst stability (the smallest entropy) (marked as Q (i))0,j0) Segment aggregation;
③ subsequences Q (i) after segmentation and aggregation0,j0) Find a data point as the key point qk,i0<k<j0The key point qkEnabling two subsequences (Q (i) separated by it0K) and Q (k, j)0) The sum of the cloud model entropies of (i) and the subsequence Q (i)0,j0) The difference between the cloud model entropies of (a) is the largest while deleting the subsequence Q (i)0,j0) Recording the subsequence Q (i)0K) and Q (k, j)0);
and fourthly, repeating the first step and the third step until the stop condition is met.
(2) Converting the cloud segmentation aggregation approximate expression symbol into a character string;
the specific method of the step 2 and the step (2) is as follows:
after the cloud segment aggregation approximation representation, the discretized representation is symbolized, and the time series is converted into a string of symbols.
In the present embodiment, the original sequence is represented by english letters, and generally, the symbol "a" represents low power consumption, "b" represents average value, "c" represents higher than average value, and "d" represents high power consumption. After SAX conversion, descriptive knowledge types (e.g., topic, association rule mining) in the time series data may be applied to knowledge discovery.
And 3, extracting the characteristic patterns and adding the characteristic motifs of the characteristic patterns to a subject library to meet the frequency counting and usability threshold value defined by a user.
The specific steps of the step 3 comprise:
(1) generating SAX pattern types after the symbolic approximation clustering transformation; respectively marking the mode types generated by the SAX as themes or unusual modes; wherein a topic is defined as a previously unknown, frequently occurring pattern;
(2) extracting a required characteristic pattern from the data after the symbol approximate clustering conversion, and adding a characteristic motif of the characteristic pattern into a subject library;
in this embodiment, after sign-approximation clustering SAX conversion, we focus on the pattern type of SAX generation.
The type of pattern generated by SAX is marked as a topic, defined as a previously unknown, frequently occurring pattern, or as an uncommon pattern.
The alphabet size a should be fixed as a reasonable compromise, since having too many symbols will yield too many patterns that may not be repeated, and on the other hand, having few symbols will not capture more of the consumed resolution. The number of windows W per day must also be carefully selected.
A large number of patterns can be extracted from the data. It is not necessary that all the patterns found are used for the analysis. The frequency and availability of the pattern play an important role in detecting the regular behavior of the smart meter. For example, most commonly, the second most common topic is important to the analysis. We can set a threshold for selecting the feature pattern. The characteristic pattern is a pattern that satisfies different criteria, such as the number of occurrences for a particular time period, exceeding a threshold. The threshold may be set as a fraction of the frequency of each motif and the total number of all motifs.
Furthermore, in general, a power specialist with some expertise and experience will test all topics to determine interesting or uninteresting topics. There is no activity for a long time, except for one change in meter reading, which results in a series of similar patterns. For example, when a letter size of 5 is used, a long period of no activity or event will result in a pattern such as cccca, ccccac, cccacc, etc., except for a small drop in energy consumption. As only one of these is interesting. Further analysis, excluding other topics, starts with two or more c. A pattern is considered to be meaningless if it represents only an increase in energy consumption. The pattern found should reflect the complete behavior and the pattern showing the increase in energy expenditure represents only the beginning of an event or activity. One useful theme would include initiating activities (turning on the device) and completing activities (turning off the device). Finally, the pattern time and its availability within one day, one week and one month are important for different energy consumption behaviors.
In this way, feature motifs are extracted and added to the topic library to further discover knowledge. These patterns have the potential to determine the energy consumption behavior of each consumer.
Step 4, mining the characteristic motif of the newly added characteristic pattern in the theme library according to a time association rule, analyzing the relation among the influence factors causing the energy consumption change in a certain specific time period, and executing the next step if the absolute value of the correlation coefficient among the influence factors causing the energy consumption change is greater than a set value of 0.8; otherwise, returning to the step 3 to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;
the specific method of the step 4 comprises the following steps:
Figure BDA0002330190500000111
Figure BDA0002330190500000112
wherein X is a first occurrence, Y is a subsequent occurrence, and N is a total number rule for records or transactions; the confidence of X → Y is the antecedent and number of subsequent records supporting the rule as a fraction of the number of records supporting only the rule premise.
In this embodiment, association rule mining finds cross-section associations without considering temporal information. Since power has complex dynamics, time-dependent rule mining is a hot spot of our research. The time association rule mining searches the relation between variables in a specific time period.
The proposed association rule mining method is based on a motif library containing time-period-specific feature motif information. The extraction of frequent topics must be done from a repository whose number of supports is greater than or equal to a user-provided minimum support threshold. For example, if X is a precedent and Y is a subsequent result, the association rule X → Y indicates that if X occurs, Y will also occur. The support of the rule is the ratio of the number of look-ahead and follow-up occurrences to the total number of records. Support of association rules indicates statistical importance. Association rules with lower support indicate those relationships that are not common, and rules with higher support describe those relationships that are common in the record.
Figure BDA0002330190500000113
Figure BDA0002330190500000114
Where X is the first occurrence, Y is the next occurrence, and N is the total number of records or transactions rule X → the confidence of Y is the antecedent and subsequent number of records that support the rule and the score of the number of records that support only the rule antecedent. Confidence indicates statistical strength. A higher confidence indicates a stronger correlation between the precedent word and the follow-up word, while a lower confidence indicates a weaker correlation.
In power dynamics, association rules that occur within a certain time period are of particular interest to us. The format of such a rule is
Figure BDA0002330190500000121
It indicates that Y will occur in the Ti slot after X occurs. The relationship between the appliances may be interpreted from the shape of the used amount of electricity.
Step 5, clustering feature motifs in the subject database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm respectively, and performing clustering based on feature patterns to generate daily consumption profiles after clustering data analysis;
in this example, after motif discovery, characteristic motifs were clustered to generate a daily consumption profile. The daily profile represents the household power pattern. This is an important step if there are 15 characteristic patterns and the power specialist needs to further aggregate these patterns into 5 or 6 typical energy consumption cases. It will provide the user with additional controls to further describe the capabilities that can be used to select either parameter a or W during the SAX conversion process. There are various existing clustering approaches for different application targets. Time series clustering is a model clustering method based on characteristics
The specific steps of the step 5 comprise:
(1) due to the simplicity of K-means clustering, we use the euclidean distance metric. Daily block (N)1,N2,...,N3) Is divided into k sets, S ═ S1,S2...Sk) In order to minimize the sum of squares.
Figure BDA0002330190500000122
Wherein, muiIs SiAverage value of the values in (1)
(2) Balanced iterative reduction and clustering using hierarchical methods
The Balanced Iterative Reduction and Clustering (BIRCH) Using hierarchical clustering algorithm Using hierarchical method is an effective and traditional hierarchical clustering algorithm. In a given motif library, the BIRCH algorithm can efficiently perform clustering with one pass of scanning, and can efficiently handle outliers.
The algorithm first uses a bottom-up hierarchical algorithm and then iterative relocation to improve the result. Hierarchical clustering uses a bottom-up strategy, where each object is first treated as an atomic cluster, and then the clusters are merged to form larger clusters, reducing the number of clusters until all objects are in a cluster, or some termination condition is met.
(3) Clustering by density clustering algorithm
A density clustering algorithm, which defines clusters as the largest set of density-connected points, can divide areas with sufficiently high density into clusters and find clusters of arbitrary shape in a spatial database of noise. Two parameters are required for density clustering algorithms, the scan radius (eps) and the minimum contained points (minPts). Optionally, starting with a point that is not visited (unvisited), find all nearby points that are within eps distance (including eps) from it.
If the number of nearby points is ≧ minPts, the current point forms a cluster with its nearby points, and the departure point is marked as visited (visited). Then recursively, all points in the cluster that are not marked as accessed (visited) are processed in the same way, thereby expanding the cluster.
If the number of nearby points < minPts, the point is temporarily marked as a noise point.
If the cluster is sufficiently expanded, i.e., all points within the cluster are marked as visited, then the same algorithm is used to process the points that are not visited.
And 6, respectively carrying out statistical evaluation on the three clustering methods through the square error sum and two measurement indexes of the contour coefficient to measure the degree that the k-means clustering can create the daily consumption general profile group.
The specific steps of the step 6 comprise:
(1) the sum of squared errors SSE is a measure of cluster compactness, and a small error indicates good cluster quality, and is calculated by the following formula:
Figure BDA0002330190500000141
wherein (N)1,N2,…,N3) For a daily dataset, these data are divided into k sets S ═ (S)1,S2...Sk)。
(2) The contour coefficient is a measure of inter-cluster cohesion and intra-cluster separation, with a coefficient of 1 being preferably, -1 being worst; the calculation formula of the contour coefficient is as follows:
Figure BDA0002330190500000142
where a is the average distance between the data instance and all other points in the same cluster, and b is the average distance between the data instance and all other points in the cluster closest to the data instance.
The invention may be stored in a computer readable storage medium, storing the following templates: as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A method for generating association rules of behavior characteristics of an integrated energy consumer is characterized by comprising the following steps: the method comprises the following steps:
step 1, carrying out normalization processing on time series data of the intelligent electric meter;
step 2, carrying out cloud segmentation aggregation approximation on the intelligent electric meter time sequence data after normalization processing by using symbolic approximation clustering, and then converting the data into symbolic representation;
step 3, extracting a characteristic pattern in the symbolic representation result and adding a characteristic motif of the characteristic pattern to a subject library, wherein the characteristic motif meets a frequency counting and usability threshold value defined by a user;
step 4, mining a time association rule of the feature motif of the feature pattern newly added in the theme library, analyzing the relation among the influence factors causing the energy consumption change in a specific time period, and executing the next step if the absolute value of the correlation coefficient among the influence factors causing the energy consumption change is larger than a set value; otherwise, returning to the step 3 to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;
step 5, after clustering data analysis is carried out on the feature motifs in the subject database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm, a clustering result based on feature patterns corresponding to the feature motifs is executed, and a daily consumption general profile is generated;
and 6, respectively carrying out statistical evaluation on the daily consumption profiles generated by the three clustering methods through the square error sum and the two measurement indexes of the contour coefficient so as to measure the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition.
2. The method according to claim 1, wherein the method comprises the following steps: the specific method of the step 1 comprises the following steps:
normalizing the data with z-normalization, normalizing the unit variance, wherein the normalization value
Figure FDA0002330190490000021
The mean μ and standard deviation σ are shown below:
Figure FDA0002330190490000022
in the above formula, xiN is the number of data to be processed.
3. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 2 comprise:
(1) converting the time-series data into cloud segment aggregated approximate representation symbols;
(2) and converting the cloud segmentation aggregation approximate representation symbols into character strings.
4. The method according to claim 3, wherein the method comprises the following steps: the step 2, the step (1), comprises the following specific steps:
carrying out cloud model representation on each current segmented sequence data;
②, evaluating the data stability of the subsequences by using the entropy of each cloud model, and selecting the subsequence with the worst stability (the smallest entropy) (marked as Q (i))0,j0) Segment aggregation;
③ subsequences Q (i) after segmentation and aggregation0,j0) Find a data point as the key point qk,i0<k<j0The key point qkEnabling two subsequences (Q (i) separated by it0K) and Q (k, j)0) The sum of the cloud model entropies of (i) and the subsequence Q (i)0,j0) The difference between the cloud model entropies of (a) is the largest while deleting the subsequence Q (i)0,j0) Recording the subsequence Q (i)0K) and Q (k, j)0);
and fourthly, repeating the first step and the third step until the stop condition is met.
5. The method according to claim 3, wherein the method comprises the following steps: the specific method of the step 2 and the step (2) is as follows:
after the cloud segment aggregation approximation representation, the discretized representation is symbolized, and the time series is converted into a string of symbols.
6. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 3 comprise:
(1) generating SAX pattern types after the symbolic approximation clustering transformation; respectively marking the mode types generated by the SAX as themes or unusual modes; wherein a topic is defined as a previously unknown, frequently occurring pattern;
(2) and extracting a required characteristic pattern from the data after the symbolic approximation clustering conversion, and adding a characteristic motif of the characteristic pattern into the subject library.
7. The method according to claim 1, wherein the method comprises the following steps: the specific method of the step 4 comprises the following steps:
Figure FDA0002330190490000031
Figure FDA0002330190490000032
wherein X is a first occurrence, Y is a subsequent occurrence, and N is a total number rule for records or transactions; the confidence of X → Y is the antecedent and number of subsequent records supporting the rule as a fraction of the number of records supporting only the rule premise.
8. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 5 comprise:
(1) clustering the characteristic motifs in the topic library by using a K-means clustering method:
block per day (N) using Euclidean distance metric1,N2,...,N3) Is divided into k sets, S ═ S1,S2...Sk) To obtain the minimum sum of squares:
Figure FDA0002330190490000033
wherein, muiIs SiAverage of the values in (a);
(2) clustering feature motifs in a topic library using a hierarchical approach:
clustering by using a bottom-up hierarchical algorithm, and improving a clustering result obtained by using an iterative relocation method;
(3) clustering feature motifs in a topic library by using a density clustering algorithm:
setting values of a scanning radius eps and a minimum contained point number minPts;
selecting an unaccessed point, and finding out all nearby points which are within the scanning radius eps;
if the number of the nearby points is larger than or equal to the minimum contained point number minPts, the current point and the nearby points form a cluster, and the starting point is marked as visited;
if the cluster is sufficiently expanded, i.e. all points in the cluster are marked as visited, then the same algorithm is used to process points that are not visited.
9. The method according to claim 1, wherein the method comprises the following steps: the specific steps of the step 6 comprise:
(1) the sum of squared errors SSE is a measure of the compactness of a cluster, and the cluster quality is better when the SSE is smaller, and the calculation formula is as follows:
Figure FDA0002330190490000041
wherein (N)1,N2,...,N3) For a daily dataset, these data are divided into k sets S ═ (S)1,S2…Sk)。
(2) The contour coefficient is a measure of inter-cluster cohesion and intra-cluster separation, with a coefficient of 1 being preferably, -1 being worst; the calculation formula of the contour coefficient is as follows:
Figure FDA0002330190490000051
where a is the average distance between the data instance and all other points in the same cluster, and b is the average distance between the data instance and all other points in the cluster closest to the data instance.
10. An association rule generating apparatus for integrating behavioral characteristics of energy consumers, comprising:
the normalization processing module is used for performing normalization processing on the time sequence data of the intelligent electric meter;
the symbol conversion module is used for carrying out cloud segmentation aggregation approximation on the intelligent electric meter time sequence data after the normalization processing by using symbol approximation clustering, and then converting the data into symbolic representation;
the characteristic pattern extraction module is used for extracting a characteristic pattern in the symbol and adding a characteristic motif of the characteristic pattern to the subject library to meet the frequency counting and usability threshold value defined by a user;
the time association rule mining module is used for mining the time association rule of the feature motif of the feature pattern newly added in the theme library, analyzing the relation among the influence factors causing the energy consumption change in a certain specific time period, and executing the next step if the relation among the influence factors causing the energy consumption change is strong; if the relation is weak, returning to the characteristic pattern extraction module to re-extract the characteristic pattern and adding the characteristic motif of the characteristic pattern to the subject library;
the clustering module is used for performing clustering data analysis on the feature motifs in the topic database by adopting three clustering methods of K-means clustering, a hierarchical method and a density clustering algorithm respectively, and then executing a clustering result based on feature patterns corresponding to the feature motifs so as to generate daily consumption general profiles;
and the clustering result evaluation module is used for respectively carrying out statistical evaluation on the daily consumption profiles generated by the three clustering methods through the square error sum and the two measurement indexes of the profile coefficient so as to measure the fitting degree of the daily consumption profile group created by the three clustering methods and the actual daily consumption condition.
CN201911333048.3A 2019-12-23 2019-12-23 Association rule generation method and device for comprehensive energy consumer behavior characteristics Active CN111177216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911333048.3A CN111177216B (en) 2019-12-23 2019-12-23 Association rule generation method and device for comprehensive energy consumer behavior characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911333048.3A CN111177216B (en) 2019-12-23 2019-12-23 Association rule generation method and device for comprehensive energy consumer behavior characteristics

Publications (2)

Publication Number Publication Date
CN111177216A true CN111177216A (en) 2020-05-19
CN111177216B CN111177216B (en) 2024-01-05

Family

ID=70657443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911333048.3A Active CN111177216B (en) 2019-12-23 2019-12-23 Association rule generation method and device for comprehensive energy consumer behavior characteristics

Country Status (1)

Country Link
CN (1) CN111177216B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288143A (en) * 2020-10-14 2021-01-29 昆明电力交易中心有限责任公司 Regional energy consumption research method based on correlation analysis
CN112381137A (en) * 2020-11-10 2021-02-19 重庆大学 New energy power system reliability assessment method, device, equipment and storage medium
CN112766590A (en) * 2021-01-27 2021-05-07 华中科技大学 Method and system for extracting typical residential power consumption pattern
CN113157768A (en) * 2021-04-09 2021-07-23 天津大学 Heating ventilation air conditioner operation data association attribute mining method and system
CN117272398A (en) * 2023-11-23 2023-12-22 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090032594A (en) * 2007-09-28 2009-04-01 한국전력공사 System for spatio-temporal load analysis of electric power system using meter reading data and method of load calculating thereof
US8560134B1 (en) * 2010-09-10 2013-10-15 Kwangduk Douglas Lee System and method for electric load recognition from centrally monitored power signal and its application to home energy management
CN103455563A (en) * 2013-08-15 2013-12-18 国家电网公司 Data mining method applicable to integrated monitoring system of intelligent substation
CN104215856A (en) * 2014-09-10 2014-12-17 国家电网公司 Method for dynamically checking electric energy value of large power grid
US20150161233A1 (en) * 2013-12-11 2015-06-11 The Board Of Trustees Of The Leland Stanford Junior University Customer energy consumption segmentation using time-series data
CN106228244A (en) * 2016-07-12 2016-12-14 深圳大学 A kind of energy based on self adaptation association rule mining depolymerizes method
CN106384128A (en) * 2016-09-09 2017-02-08 西安交通大学 Method for mining time series data state correlation
CN106597862A (en) * 2016-12-13 2017-04-26 山东建筑大学 Building energy consumption control device and building energy consumption control method based on association rule mining
CN107734073A (en) * 2017-11-27 2018-02-23 罗娅 A kind of building power consumption intelligent acquisition system
US20180150547A1 (en) * 2016-11-30 2018-05-31 Business Objects Software Ltd. Time series analysis using a clustering based symbolic representation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090032594A (en) * 2007-09-28 2009-04-01 한국전력공사 System for spatio-temporal load analysis of electric power system using meter reading data and method of load calculating thereof
US8560134B1 (en) * 2010-09-10 2013-10-15 Kwangduk Douglas Lee System and method for electric load recognition from centrally monitored power signal and its application to home energy management
CN103455563A (en) * 2013-08-15 2013-12-18 国家电网公司 Data mining method applicable to integrated monitoring system of intelligent substation
US20150161233A1 (en) * 2013-12-11 2015-06-11 The Board Of Trustees Of The Leland Stanford Junior University Customer energy consumption segmentation using time-series data
CN104215856A (en) * 2014-09-10 2014-12-17 国家电网公司 Method for dynamically checking electric energy value of large power grid
CN106228244A (en) * 2016-07-12 2016-12-14 深圳大学 A kind of energy based on self adaptation association rule mining depolymerizes method
CN106384128A (en) * 2016-09-09 2017-02-08 西安交通大学 Method for mining time series data state correlation
US20180150547A1 (en) * 2016-11-30 2018-05-31 Business Objects Software Ltd. Time series analysis using a clustering based symbolic representation
CN106597862A (en) * 2016-12-13 2017-04-26 山东建筑大学 Building energy consumption control device and building energy consumption control method based on association rule mining
CN107734073A (en) * 2017-11-27 2018-02-23 罗娅 A kind of building power consumption intelligent acquisition system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郝平等: ""一种企业能耗预警相关性分析的时空挖掘算法"" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288143A (en) * 2020-10-14 2021-01-29 昆明电力交易中心有限责任公司 Regional energy consumption research method based on correlation analysis
CN112288143B (en) * 2020-10-14 2024-02-20 昆明电力交易中心有限责任公司 Regional energy consumption research method based on association analysis
CN112381137A (en) * 2020-11-10 2021-02-19 重庆大学 New energy power system reliability assessment method, device, equipment and storage medium
CN112766590A (en) * 2021-01-27 2021-05-07 华中科技大学 Method and system for extracting typical residential power consumption pattern
CN113157768A (en) * 2021-04-09 2021-07-23 天津大学 Heating ventilation air conditioner operation data association attribute mining method and system
CN117272398A (en) * 2023-11-23 2023-12-22 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence
CN117272398B (en) * 2023-11-23 2024-01-26 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence

Also Published As

Publication number Publication date
CN111177216B (en) 2024-01-05

Similar Documents

Publication Publication Date Title
CN111177216B (en) Association rule generation method and device for comprehensive energy consumer behavior characteristics
Rajabi et al. A comparative study of clustering techniques for electrical load pattern segmentation
US11043808B2 (en) Method for identifying pattern of load cycle
Basu et al. Time series distance-based methods for non-intrusive load monitoring in residential buildings
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN110990461A (en) Big data analysis model algorithm model selection method and device, electronic equipment and medium
Lu et al. A weekly load data mining approach based on hidden Markov model
CN107833153A (en) A kind of network load missing data complementing method based on k means clusters
Wang et al. Short-term industrial load forecasting based on ensemble hidden Markov model
CN112819299A (en) Differential K-means load clustering method based on center optimization
Dash et al. An appliance load disaggregation scheme using automatic state detection enabled enhanced integer programming
Li et al. Profiling household appliance electricity usage with n-gram language modeling
CN115545280A (en) Low-voltage distribution network topology generation method and device
CN113094448B (en) Analysis method and analysis device for residence empty state and electronic equipment
CN113487448A (en) Power credit labeling method and system based on power big data
Jazizadeh et al. Unsupervised clustering of residential electricity consumption measurements for facilitated user-centric non-intrusive load monitoring
Jin et al. Power load curve clustering algorithm using fast dynamic time warping and affinity propagation
CN111090679A (en) Time sequence data representation learning method based on time sequence influence and graph embedding
CN115146744A (en) Electric energy meter load real-time identification method and system integrating time characteristics
CN114004408A (en) User power load prediction method based on data analysis
Davarzani et al. Study of missing meter data impact on domestic load profiles clustering and characterization
Butunoi et al. Shapelet based classification of customer consumption patterns
Gong et al. Visual Clustering Analysis of Electricity Data Based on t-SNE
RongQi et al. Research of Power User Load Classification Method Based on K-means and FSVM
Jiahong et al. Load Curve Clustering Based on Feature Engineering and Uniform Manifold Approximation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant