CN102982489A - Power customer online grouping method based on mass measurement data - Google Patents
Power customer online grouping method based on mass measurement data Download PDFInfo
- Publication number
- CN102982489A CN102982489A CN2012104847126A CN201210484712A CN102982489A CN 102982489 A CN102982489 A CN 102982489A CN 2012104847126 A CN2012104847126 A CN 2012104847126A CN 201210484712 A CN201210484712 A CN 201210484712A CN 102982489 A CN102982489 A CN 102982489A
- Authority
- CN
- China
- Prior art keywords
- data
- cluster
- online
- power
- customer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a power customer online grouping method based on mass measurement data. The power customer online grouping method includes step1 extracting historical sample data of power customers; step2 preprocessing extracted sample data; step3 conducting initial customer grouping on the historical sample data of the power customer; step4 extracting user information of online power customers in real time from a metering automatic system and reflecting real-time power utilization data reflecting power utility characteristics of the online power customers and conducting preprocessing; step5 acquiring preprocessed online power customer data and utilizing generated cluster center points to conduct online real-time grouping on newly increased online customer user data on the base of the generated customer groups. The method is capable of conducting dynamic grouping calculation of all the power customers.
Description
Technical field
The present invention relates to a kind of power industry power customer grouping method, specifically refer to the online grouping method of a kind of power customer based on the magnanimity continuous data.
Background technology
Aspect power marketing; is often can run into such problem: how to do well avoided the peak hour, the electricity consumption of fault outage, scheduled outage is instructed and the emergency service when having a power failure? how the user that avoids the peak hour is carried out science, flexible avoiding the peak hour? how to guarantee responsive client, high value customer; can the avoiding the peak hour of high credit worthiness client, scheduled outage information in time be sent to? all these problems need that all a kind of effective method is arranged, and the client hives off to electricity consumption.
Traditional power customer hives off and mainly takes static method, the static attribute data of normal operation power customer.Or electric quantity data per month carries out, and data volume seldom in case calculate completely, seldom changes.And in fact change during the user power utilization behavior, static grouping method can not satisfy the demand that the client is provided the personal marketing service, and each power supply unit is in the urgent need to a kind of new method, comes the calculating of hiving off dynamically of all power customers.
The general data that the present invention adopts derives from the metering automation system, and this system utilizes collecting terminal equipment and communication network, can the real-time Power system load data of Real-time Obtaining client.The continuous data amount is large, utilizes traditional data mining algorithm to be difficult to process, and goes to infer that large data often make people take the wrong turning if excavate small data with algorithm, and implementing very is that effort is time-consuming.For electric system dirigibility, portability being provided and reducing cost and need select cloud computing environment.
Summary of the invention
The purpose of this invention is to provide the online grouping method of a kind of power customer based on the magnanimity continuous data, the method can be to the calculating of hiving off dynamically of all power customers.
Above-mentioned purpose of the present invention realizes by following technical solution: the online grouping method of a kind of power customer based on the magnanimity continuous data may further comprise the steps:
Step 1: power consumer historical sample data are extracted;
Step 2: the sample data that extracts is carried out pre-service;
Step 3: power consumer historical sample data are carried out initial customer grouping;
Step 4: from the metering automation system, extract in real time online power consumer information and reflect the online power consumer real-time electricity consumption data of electrical feature, and carry out pre-service;
Step 5: obtain pretreated online power consumer data, on the basis of the customer grouping that step 3 has generated, utilize the cluster centre point that has generated, newly-increased online power consumer data are hived off online in real time.
As improvement of the present invention, the present invention also comprises step 6: to the performance evaluation of hiving off in real time online of step 5 acquisition.
Among the present invention, the power consumer historical sample data in the described step 1 advance to comprise customer profile information, client's information about power and client's information on load;
Data pre-service in the described step 2 comprises missing values processing, outlier processing, the data of hiving off computing and data normalization processing;
S2.1: missing values is processed
At original continuous data, find to exist the phenomenon of disappearance, for guaranteeing the validity of modeling data, need to carry out polishing to these missing datas and process;
S2.2: outlier processing
To exceeding the data of index threshold values scope, carry out correcting process by of the same type day data in conjunction with interpolation algorithm;
S2.3: the computational data index of hiving off
Consider that load fluctuation can characterize client's the electrical feature of using substantially, so calculate the index that reflects the load change situation in the certain hour section based on electric weight and load index:
Rate of load condensate=average load/peak load
Peak total ratio=peak electric weight/total electric weight
Flat always than the electric weight of=ordinary telegram amount/always
Paddy is always than=paddy electric weight/total electric weight
Peak load=M α x (L
i), i=1,2 ..., 96, L
iExpression was every 15 minutes power load sampled value; Peak, paddy, ordinary telegram amount are respectively the power consumption of city peak of power consumption time period, flat peak time section and paddy peak time section;
Rush hour, section referred to the peak of power consumption, and power consumption is relatively concentrated, and the low ebb time period is then opposite; Rush hour, section was 8 hours: 9:00~12:00,17:00~22:00; Flat 7 hours time periods of section: 8:00~9:00,12:00~17:00,22:00~23:00; 9 hours low ebb time periods: 23:00~next day 8:00;
S2.4: data normalization
In order to eliminate the otherness of hiving off between the index dimension, data are carried out normalized, main method can adopt the minimax value method, zero-mean method and decimal scaling method;
Described step 3 comprises following substep:
S3.1: sample data standardization
Data normalization refers to changing into vector data through a pretreated sample data, vector data comprise the big customer rate of load condensate, peak total ratio, flat always than and paddy always than;
Wherein d1 is rate of load condensate, and d2 is peak total ratio, d3 for flat always than, d4 be paddy always than;
Vector data is stored in the distributed file system, in standardized process, can pass through the MapReduce scheduler, according to the sample data file size split into some data block vector data angang than and paddy always than; Quantity according to data block starts Map tasks in parallel operative norm conversion work;
S3.2: distributed storage
Distributed file system adopts the master/slave framework; HDFS cluster is comprised of the Datanodes of a Namenode and some; Namenode is a central server, is in charge of the name space (namespace) of file system and client to the access of file; Datanode in the cluster is one of a node, is in charge of the storage on its place node; HDFS has opened the name space of file system, and the user can store data in the above with the form of file; See that internally a file is divided into one or more data blocks in fact, these pieces are stored on one group of Datanode; The operation of the name space of Namenode execute file system, such as open, close, Rename file or catalogue; It also is responsible for the specified data piece to the mapping of concrete Datanode node; Datanode is responsible for processing the read-write requests of file system client; Under the United Dispatching of Namenode, carry out data block establishment, delete and copy;
S3.3: cluster centre point initialization
Customer grouping mainly take can and the clustering algorithm of Distributed Calculation hive off, the below describes as an example of the K-means algorithm example;
Clustering algorithm at first generates empty cluster and numbering, concentrates from all sample datas and selects at random K object as the central point of K-means cluster, with the representative of cluster centre point as each cluster;
S3.4: iterative computation Optimal cluster centers point
By alternative manner, constantly calculate new cluster centre point, until all sample datas all and the distance between the central point minimum;
S3.5: export the data of hiving off
In previous step, drawn the cluster centre point by iterative computation constantly, also drawn the cluster centre under each sample data simultaneously, can directly export and get final product;
In the step 4, after the historical sample data initialization hives off, according to the practical application needs, regularly extract the real-time electricity consumption data that customer information and reflection client use electrical feature from the metering automation system, the described method of electricity consumption the data step 2 is carried out pre-service in real time;
Described step 5 comprises following substep:
On the basis of the customer grouping that step 3 has generated, utilize K the cluster centre point that has generated, adopt the Canopy algorithm that newly-increased data are hived off online in real time, concrete steps are as follows:
S5.1: according to the existing cluster centre point that hives off, generate K Canopy cluster, the center initial value of each cluster is the existing cluster centre point that hives off;
S5.2: specify suitable T1 and T2 parameter, all new datas are placed in the cluster of Canopy and carry out cluster calculation;
The Canopy algorithm at first can require to input two threshold values T1 and T2, T1〉T2; Algorithm has a cluster the S set et of Canopy, and it is empty when initial; Then first that reads can be put as a Canopy in the set, then read next point, the distance of each Canopy in calculating this point and gathering, if this distance is less than T1, then this point can be distributed to this Canopy, and when this distance during less than T2 this point can not be put in the set as a new Canopy;
S5.3: calculate new sample data to the distance B of each central point according to the Canopy algorithm, when D<T1, just this sample data is put in the corresponding cluster, when D<T2, then this sample data is deleted from new sample set, if D1-DK is〉T1, then this point can originally be generated as a new central point, thereby forms New Consumers clustering class; Cycle calculations is until all new samples data sets are sky.
The real-time cluster of electricity consumption client continuous data is an important content in the customer behavior analysis, can set up a lot of correlation models (such as classification recurrence, time series forecasting, association analysis and specificity discovery etc.) based on this.The online Clustering Model of metering automation system magnanimity is according to the input of real time measure data and the importance of parameter, according to cloud computing environment, be a plurality of classifications with the metering user data subdividing, provide all kinds of with results such as electrical feature, accounting and distributions, according to these Output rusults, can carry out differentiation to each class client and process.
The present invention builds on cloud computing technology, adopt the technology such as distributed storage, distributed index, distributed parallel calculating, can effectively carry out tissue, storage, index and the management of mass data, and the function such as inquiry, analysis of mass data is provided with standardized application or service interface.
Description of drawings
Fig. 1 is the system flowchart of the online grouping method of the present invention;
Fig. 2 is based on the k-means cluster process flow diagram of Distributed Calculation in the online grouping method of the present invention
Fig. 3 is the MapReduce parallelization implementation of k-means cluster in the online grouping method of the present invention
Fig. 4 is the Canopy figure that hives off in real time online in the online grouping method of the present invention
Embodiment
The online grouping method of a kind of power customer based on the magnanimity continuous data as shown in Figures 1 to 4 comprises the steps:
Step 1: power consumer historical sample data are extracted;
The present invention will realize that one is hived off with the cluster of electrical feature based on the big customer, need to from metering automation system and marketing management system, extract some and can reflect that the client uses the data of electrical feature, so except customer profile class data, also need extract client's electric weight class data, the client class data of loading, specifically comprise:
Customer profile information: stoichiometric point numbering, electricity consumption classification, category of employment, electric pressure etc.
Client's information about power: total electricity consumption, peak power consumption, flat power consumption, paddy power consumption etc.
Client's information on load: electric current, voltage, power factor, active power, every information on loads of 15 minutes etc.
Step 2: the sample data that extracts is carried out pre-service;
Described step 2 comprises following substep:
The data pre-service mainly comprises missing values processing, outlier processing, the data of hiving off calculating etc.
S2.1: missing values is processed
At original continuous data, particularly in the Real-time Load data pick-up process, find to exist the phenomenon of disappearance, for guaranteeing the validity of modeling data, need to carry out polishing to these missing datas and process.Rule is mainly by of the same type day data and processes in conjunction with interpolation algorithm.
S2.2: outlier processing
To exceeding the data of index threshold values scope, carry out correcting process by of the same type day data in conjunction with interpolation algorithm.
S2.3: the computational data index of hiving off
Consider that load fluctuation can characterize client's the electrical feature of use substantially, so calculate the index that (for example: upper one month every day is average) in the certain hour section reflects the load change situation based on electric weight and load index:
Rate of load condensate=average load/peak load
Peak total ratio=peak electric weight/total electric weight
Flat always than the electric weight of=ordinary telegram amount/always
Paddy is always than=paddy electric weight/total electric weight
Wherein:
Peak load=M α x (L
i), i=1,2 ..., 96, L
iExpression was every 15 minutes power load sampled value.
Peak, paddy, ordinary telegram amount are respectively the power consumption of city peak of power consumption time period, flat peak time section and paddy peak time section.
Rush hour, section referred to the peak of power consumption, and power consumption is relatively concentrated, and the low ebb time period is then opposite.Rush hour, section was 8 hours: 9:00~12:00,17:00~22:00; Flat 7 hours time periods of section: 8:00~9:00,12:00~17:00,22:00~23:00; 9 hours low ebb time periods: 23:00~next day 8:00.
S2.4: data normalization
In order to eliminate the otherness of hiving off between the index dimension, data are carried out normalized, main method can adopt the minimax value method, zero-mean method and decimal scaling method, each index standard is arrived unified scope, the below is take the minimax value method as example, and each data all can be normalized within [0,1] scope.
Step 3: power consumer historical sample data are carried out initial customer grouping;
Described step 3 comprises following substep:
S3.1: sample data standardization
Data normalization refers to changing into vector data through a pretreated sample data, vector data comprise the big customer rate of load condensate, peak total ratio, flat always than and paddy always than.
Wherein d1 is rate of load condensate, and d2 is peak total ratio, d3 for flat always than, d4 be paddy always than.
Vector data is stored in the distributed file system, in standardized process, can pass through the MapReduce scheduler, according to the sample data file size split into some data block vector data angang than and paddy always than etc.Quantity according to data block starts Map tasks in parallel operative norm conversion work, sees Fig. 2.
S3.2: distributed storage
Distributed file system adopts the master/slave framework.HDFS cluster is comprised of the Datanodes of a Namenode and some.Namenode is a central server, is in charge of the name space (namespace) of file system and client to the access of file.Datanode in the cluster is one of a node, is in charge of the storage on its place node.HDFS has opened the name space of file system, and the user can store data in the above with the form of file.See that internally a file is divided into one or more data blocks in fact, these pieces are stored on one group of Datanode.The operation of the name space of Namenode execute file system, such as open, close, Rename file or catalogue.It also is responsible for the specified data piece to the mapping of concrete Datanode node.Datanode is responsible for processing the read-write requests of file system client.Under the United Dispatching of Namenode, carry out data block establishment, delete and copy.
S3.3: cluster centre point initialization
Customer grouping mainly take can and the clustering algorithm of Distributed Calculation hive off, the below describes as an example of the K-means algorithm example.
Clustering algorithm at first generates empty cluster and numbering, concentrates from all sample datas and selects at random K object as the central point of K-means cluster, with the representative of cluster centre point as each cluster.
S3.4: iterative computation Optimal cluster centers point
By alternative manner, constantly calculate new cluster centre point, until all sample datas all and the distance between the central point minimum.
Here be divided into again for two steps:
The first step: calculate each sample data and belong to the cluster centre point.Namely calculate first each sample data to the distance of central point, then sample data is belonged to nearest cluster centre point.The main Map parallel method that adopts.The Map parallel method is the direct cutting of sample data, separately parallel computation of each cutting.Because do not need other sample datas when each sample data is calculated, therefore can walk abreast and carry out.
The cluster centre that is input as all sample datas to be clustered and last round of iteration (or initial clustering) of Map parallel method, input data recording<key, value〉right form be<line number, record is capable 〉; Each Map function reads in the cluster centre description document, and the Map function calculates apart from its nearest class center each sample data of input, and does the mark of new classification; Output intermediate result<key, value〉right form be<the cluster category IDs, record attribute is vectorial 〉.
Second step: the cluster centre point that recomputates each cluster.Namely for above-mentioned each cluster, calculate its center position, as new cluster centre point.The main Reduce parallel method that adopts.The Reduce method is returned transmission with the above-mentioned sample data that is distributed on each computing machine according to cluster position under it, and the sample data that is about in the identical cluster is sent on the same computer, to calculate the new central point of this cluster.
The computing formula of central point:
In the formula,
Refer to each dimension values of new central point, be respectively all sample datas in this cluster
In the arithmetic mean of this dimension values.
The task of Reduce function is that the intermediate result that obtains according to the Map function is calculated the cluster centre that makes new advances, for next round Map-Reduce Job. input data<key, value〉right form is<the cluster category IDs { record attribute vector set } 〉; The record that all key are identical (record that identical category ID is namely arranged) give a Reduce task--the identical some number of cumulative key and each record component and, ask the average of each component, obtain new cluster centre description document; Output rusults<key, value〉right form is<the cluster category IDs mean vector 〉.
The 3rd step: new and old cluster centre point position, determine whether convergence, such as convergence, then continue next step, otherwise, repeat S3.3.This process also adopts the Reduce method.Being about to new cluster centre point position data and old center position sends to same Reduce task and calculates.
Judge whether this cluster restrains: calculate cluster centre point distance before the last round of cluster centre that calculates and the beginning, if apart from less than given threshold value, think that then algorithm restrained end.Otherwise, then replace last round of cluster centre with the cluster centre of epicycle, and start the calculation task of a new round.
Fig. 3 is the process synoptic diagram of k-means clustering algorithm MapReduce implementation method deal with data.Before the Reduce task begins, can divide into groups take the key value as index and sort the intermediate result of Map tasks carrying node this locality, to improve the execution efficient of Reduce task.
S3.5: export the data of hiving off
In previous step, drawn the cluster centre point by iterative computation constantly, also drawn the cluster centre under each sample data simultaneously, can directly export and get final product.
Step 4: from the metering automation system, extract in real time online power consumer information and reflect the online power consumer real-time electricity consumption data of electrical feature, and carry out pre-service;
After the historical sample data initialization hives off, according to the practical application needs, can regularly from the metering automation system, extract customer information and reflection client with the real-time electricity consumption data of electrical feature, as deleting the client who has left, add new client, upgrade client's new data etc.Data should be carried out pre-service according to the described method of step 2.
Step 5: obtain pretreated online power consumer data, on the basis of the customer grouping that step 3 has generated, utilize the cluster centre point that has generated, newly-increased online power consumer data are hived off online in real time.
Described step 5 comprises following substep:
On the basis of the customer grouping that step 3 has generated, utilize K the cluster centre point that has generated, adopt the Canopy algorithm that newly-increased data are hived off online in real time.As shown in Figure 4.
S5.1: according to the existing cluster centre point that hives off, generate K Canopy cluster, the center initial value of each cluster is the existing cluster centre point that hives off;
S5.2: specify suitable T1 and T2 parameter, all new datas are placed in the cluster of Canopy and carry out cluster calculation;
The Canopy algorithm at first can require to input two threshold values T1 and T2, T1〉T2; Algorithm has a cluster the set (Set) of Canopy, and it is empty when just beginning; Then first that reads can be put as a Canopy in the set, then read next point, the distance of each Canopy in calculating this point and gathering, if this distance is less than T1, then this point can be distributed to point of this Canopy(and can distribute to a plurality of Canopy), and when this distance during less than T2 this point can not be put in the set as a new Canopy.
S5.3: calculate new sample data to the distance B of each central point according to the Canopy algorithm, when D<T1, just this sample data is put in the corresponding cluster, when D<T2, then this sample data is deleted from new sample set, if D1-DK is〉T1, then this point can originally be generated as a new central point, thereby forms new cluster (customer group); Cycle calculations is until all new samples data sets are sky;
S5.4: new central point is added in the former K-means central point, as the central point of online cluster next time;
S5.5: through long all after dates, the central point that calculates like this can be inaccurate, comprehensively recomputates so need to re-use step 5 pair all data, can improve existing central point as the cluster initial center point speed of convergence.
Step 6: to the performance evaluation of hiving off in real time online of step 5 acquisition.
Adopt 2 host nodes in the research, 5 are calculated and memory node 1 data acquisition node.Data volume is 74,920,323 records, takies disk space 2.5G, adopts the K-means cluster to carry out iterative computation 10 times, about 150 minutes consuming time; The online cluster time is about 9.39 minutes; As seen the online cluster grouping method that proposes of this patent has guaranteed that the Canopy cluster of increment can carry out very fast, the new reflection client who collects uses the electric quantity data of electrical feature, can both be very fast be assigned to cluster under its, this also is the key point of the online cluster that proposes of the present invention.
The hive off difficult point implemented of power customer of the present invention is the online processing of mass data, because very huge for the cluster data that carries out customer grouping, although can adopt a large amount of computer resources that all data are carried out constantly cluster, thereby improve the accuracy of cluster and guarantee certain real-time, but obviously this is a very waste.And this patent to adopt scheme that online cluster is combined with K-means be exactly a very cheap and solution fast.Algorithm has avoided the mass data amount excessive effectively, and software and hardware requires high, the problem that the system resource occupancy is high.
Claims (3)
1. online grouping method of the power customer based on the magnanimity continuous data may further comprise the steps:
Step 1: power consumer historical sample data are extracted;
Step 2: the sample data that extracts is carried out pre-service;
Step 3: power consumer historical sample data are carried out initial customer grouping;
Step 4: from the metering automation system, extract in real time online power consumer information and reflect the online power consumer real-time electricity consumption data of electrical feature, and carry out pre-service;
Step 5: obtain pretreated online power consumer data, on the basis of the customer grouping that step 3 has generated, utilize the cluster centre point that has generated, newly-increased online power consumer data are hived off online in real time.
2. the online grouping method of the power customer based on the magnanimity continuous data according to claim 1, it is characterized in that: the method comprises that also described step 1 comprises step 6: the performance evaluation of hiving off in real time online that step 5 is obtained.
3. the online grouping method of the power customer based on the magnanimity continuous data according to claim 1 and 2, it is characterized in that: the power consumer historical sample data in the described step 1 advance to comprise customer profile information, client's information about power and client's information on load;
Data pre-service in the described step 2 comprises missing values processing, outlier processing, the data of hiving off computing and data normalization processing;
S2.1: missing values is processed
At original continuous data, find to exist the phenomenon of disappearance, for guaranteeing the validity of modeling data, need to carry out polishing to these missing datas and process;
S2.2: outlier processing
To exceeding the data of index threshold values scope, carry out correcting process by of the same type day data in conjunction with interpolation algorithm;
S2.3: the computational data index of hiving off
Consider that load fluctuation can characterize client's the electrical feature of using substantially, so calculate the index that reflects the load change situation in the certain hour section based on electric weight and load index:
Rate of load condensate=average load/peak load
Peak total ratio=peak electric weight/total electric weight
Flat always than the electric weight of=ordinary telegram amount/always
Paddy is always than=paddy electric weight/total electric weight
Peak load=M α x (L
i), i=1,2 ..., 96, L
iExpression was every 15 minutes power load sampled value; Peak, paddy, ordinary telegram amount are respectively the power consumption of city peak of power consumption time period, flat peak time section and paddy peak time section;
Rush hour, section referred to the peak of power consumption, and power consumption is relatively concentrated, and the low ebb time period is then opposite; Rush hour, section was 8 hours: 9:00~12:00,17:00~22:00; Flat 7 hours time periods of section: 8:00~9:00,12:00~17:00,22:00~23:00; 9 hours low ebb time periods: 23:00~next day 8:00;
S2.4: data normalization
In order to eliminate the otherness of hiving off between the index dimension, data are carried out normalized, main method can adopt the minimax value method, zero-mean method and decimal scaling method;
Described step 3 comprises following substep:
S3.1: sample data standardization
Data normalization refers to changing into vector data through a pretreated sample data, vector data comprise the big customer rate of load condensate, peak total ratio, flat always than and paddy always than;
Wherein d1 is rate of load condensate, and d2 is peak total ratio, d3 for flat always than, d4 be paddy always than;
Vector data is stored in the distributed file system, in standardized process, can pass through the MapReduce scheduler, according to the sample data file size split into some data block vector data angang than and paddy always than; Quantity according to data block starts Map tasks in parallel operative norm conversion work;
S3.2: distributed storage
Distributed file system adopts the master/slave framework; HDFS cluster is comprised of the Datanodes of a Namenode and some; Namenode is a central server, is in charge of the name space (namespace) of file system and client to the access of file; Datanode in the cluster is one of a node, is in charge of the storage on its place node; HDFS has opened the name space of file system, and the user can store data in the above with the form of file; See that internally a file is divided into one or more data blocks in fact, these pieces are stored on one group of Datanode; The operation of the name space of Namenode execute file system, such as open, close, Rename file or catalogue; It also is responsible for the specified data piece to the mapping of concrete Datanode node; Datanode is responsible for processing the read-write requests of file system client; Under the United Dispatching of Namenode, carry out data block establishment, delete and copy;
S3.3: cluster centre point initialization
Customer grouping mainly take can and the clustering algorithm of Distributed Calculation hive off, the below describes as an example of the K-means algorithm example;
Clustering algorithm at first generates empty cluster and numbering, concentrates from all sample datas and selects at random K object as the central point of K-means cluster, with the representative of cluster centre point as each cluster;
S3.4: iterative computation Optimal cluster centers point
By alternative manner, constantly calculate new cluster centre point, until all sample datas all and the distance between the central point minimum;
S3.5: export the data of hiving off
In previous step, drawn the cluster centre point by iterative computation constantly, also drawn the cluster centre under each sample data simultaneously, can directly export and get final product;
In the step 4, after the historical sample data initialization hives off, according to the practical application needs, regularly extract the real-time electricity consumption data that customer information and reflection client use electrical feature from the metering automation system, the described method of electricity consumption the data step 2 is carried out pre-service in real time;
Described step 5 comprises following substep:
On the basis of the customer grouping that step 3 has generated, utilize K the cluster centre point that has generated, adopt the Canopy algorithm that newly-increased data are hived off online in real time, concrete steps are as follows:
S5.1: according to the existing cluster centre point that hives off, generate K Canopy cluster, the center initial value of each cluster is the existing cluster centre point that hives off;
S5.2: specify suitable T1 and T2 parameter, all new datas are placed in the cluster of Canopy and carry out cluster calculation;
The Canopy algorithm at first can require to input two threshold values T1 and T2, T1〉T2; Algorithm has a cluster the S set et of Canopy, and it is empty when initial; Then first that reads can be put as a Canopy in the set, then read next point, the distance of each Canopy in calculating this point and gathering, if this distance is less than T1, then this point can be distributed to this Canopy, and when this distance during less than T2 this point can not be put in the set as a new Canopy;
S5.3: calculate new sample data to the distance B of each central point according to the Canopy algorithm, when D<T1, just this sample data is put in the corresponding cluster, when D<T2, then this sample data is deleted from new sample set, if D1-DK is〉T1, then this point can originally be generated as a new central point, thereby forms New Consumers clustering class; Cycle calculations is until all new samples data sets are sky.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104847126A CN102982489A (en) | 2012-11-23 | 2012-11-23 | Power customer online grouping method based on mass measurement data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104847126A CN102982489A (en) | 2012-11-23 | 2012-11-23 | Power customer online grouping method based on mass measurement data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102982489A true CN102982489A (en) | 2013-03-20 |
Family
ID=47856443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012104847126A Pending CN102982489A (en) | 2012-11-23 | 2012-11-23 | Power customer online grouping method based on mass measurement data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102982489A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103366312A (en) * | 2013-07-15 | 2013-10-23 | 国家电网公司 | Intelligent transformer substation cloud system |
CN103810644A (en) * | 2013-12-13 | 2014-05-21 | 广东电网公司电力科学研究院 | Directional power supply method and device |
CN104750861A (en) * | 2015-04-16 | 2015-07-01 | 中国电力科学研究院 | Method and system for cleaning mass data of energy storage power station |
CN105005570A (en) * | 2014-04-23 | 2015-10-28 | 国家电网公司 | Method and apparatus for mining massive intelligent power consumption data based on cloud computing |
CN105681089A (en) * | 2016-01-26 | 2016-06-15 | 上海晶赞科技发展有限公司 | Network user behavior clustering method, device and terminal |
CN105844294A (en) * | 2016-03-21 | 2016-08-10 | 全球能源互联网研究院 | Electricity usage behavior analysis method based on FCM cluster algorithm |
CN106022592A (en) * | 2016-05-16 | 2016-10-12 | 中国电子科技集团公司电子科学研究院 | Power consumption behavior anomaly detection and public security risk early warning method and device |
CN106405224A (en) * | 2016-08-24 | 2017-02-15 | 广东电网有限责任公司电力科学研究院 | Method and system for energy-saving diagnosis based on bulk electric energy data |
CN107248086A (en) * | 2017-02-21 | 2017-10-13 | 国网江苏省电力公司南通供电公司 | Advertisement putting aided analysis method based on user power utilization behavioural analysis |
CN107274025A (en) * | 2017-06-21 | 2017-10-20 | 国网山东省电力公司诸城市供电公司 | A kind of system and method realized with power mode Intelligent Recognition and management |
CN107274066A (en) * | 2017-05-19 | 2017-10-20 | 浙江大学 | A kind of shared traffic Customer Value Analysis method based on LRFMD models |
CN107391728A (en) * | 2017-08-02 | 2017-11-24 | 北京京东尚科信息技术有限公司 | Data digging method and data mining device |
CN108009224A (en) * | 2017-11-24 | 2018-05-08 | 国网北京市电力公司 | The sorting technique and device of power customer |
CN108268876A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of detection method and device of the approximately duplicate record based on cluster |
CN109241190A (en) * | 2018-09-12 | 2019-01-18 | 国网江苏省电力有限公司苏州供电分公司 | Electric power big data mixes computing architecture |
CN109636101A (en) * | 2018-11-02 | 2019-04-16 | 国网辽宁省电力有限公司朝阳供电公司 | Large user's electricity consumption behavior analysis method under opening sale of electricity environment based on big data |
CN112035715A (en) * | 2020-07-10 | 2020-12-04 | 广西电网有限责任公司 | User label design method and device |
CN112712442A (en) * | 2020-12-30 | 2021-04-27 | 国网浙江省电力有限公司营销服务中心 | Power consumer ultra-tolerant diagnosis method based on multidimensional clustering |
CN116431931A (en) * | 2023-06-14 | 2023-07-14 | 陕西思极科技有限公司 | Real-time incremental data statistical analysis method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101728868A (en) * | 2008-10-31 | 2010-06-09 | 韩国电力公社 | Method for classification and forecast of remote measuring power load patterns |
US20100332210A1 (en) * | 2009-06-25 | 2010-12-30 | University Of Tennessee Research Foundation | Method and apparatus for predicting object properties and events using similarity-based information retrieval and modeling |
-
2012
- 2012-11-23 CN CN2012104847126A patent/CN102982489A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101728868A (en) * | 2008-10-31 | 2010-06-09 | 韩国电力公社 | Method for classification and forecast of remote measuring power load patterns |
US20100332210A1 (en) * | 2009-06-25 | 2010-12-30 | University Of Tennessee Research Foundation | Method and apparatus for predicting object properties and events using similarity-based information retrieval and modeling |
Non-Patent Citations (2)
Title |
---|
刘友波 等: "基于多目标聚类的用电集群特征属性计算", 《电力系统自动化》 * |
李应安: "基于MapReduce的聚类算法的并行化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103366312A (en) * | 2013-07-15 | 2013-10-23 | 国家电网公司 | Intelligent transformer substation cloud system |
CN103366312B (en) * | 2013-07-15 | 2016-08-10 | 国家电网公司 | A kind of intelligent transformer substation cloud system |
CN103810644A (en) * | 2013-12-13 | 2014-05-21 | 广东电网公司电力科学研究院 | Directional power supply method and device |
CN105005570A (en) * | 2014-04-23 | 2015-10-28 | 国家电网公司 | Method and apparatus for mining massive intelligent power consumption data based on cloud computing |
CN105005570B (en) * | 2014-04-23 | 2018-02-16 | 国家电网公司 | Magnanimity intelligent power data digging method and device based on cloud computing |
CN104750861A (en) * | 2015-04-16 | 2015-07-01 | 中国电力科学研究院 | Method and system for cleaning mass data of energy storage power station |
CN104750861B (en) * | 2015-04-16 | 2019-05-21 | 中国电力科学研究院 | A kind of energy-accumulating power station mass data cleaning method and system |
CN105681089A (en) * | 2016-01-26 | 2016-06-15 | 上海晶赞科技发展有限公司 | Network user behavior clustering method, device and terminal |
CN105681089B (en) * | 2016-01-26 | 2019-10-18 | 上海晶赞科技发展有限公司 | Networks congestion control clustering method, device and terminal |
CN105844294A (en) * | 2016-03-21 | 2016-08-10 | 全球能源互联网研究院 | Electricity usage behavior analysis method based on FCM cluster algorithm |
CN106022592B (en) * | 2016-05-16 | 2021-12-28 | 中国电子科技集团公司电子科学研究院 | Electricity consumption behavior abnormity detection and public security risk early warning method and device |
CN106022592A (en) * | 2016-05-16 | 2016-10-12 | 中国电子科技集团公司电子科学研究院 | Power consumption behavior anomaly detection and public security risk early warning method and device |
CN106405224A (en) * | 2016-08-24 | 2017-02-15 | 广东电网有限责任公司电力科学研究院 | Method and system for energy-saving diagnosis based on bulk electric energy data |
CN108268876A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of detection method and device of the approximately duplicate record based on cluster |
CN107248086A (en) * | 2017-02-21 | 2017-10-13 | 国网江苏省电力公司南通供电公司 | Advertisement putting aided analysis method based on user power utilization behavioural analysis |
CN107274066A (en) * | 2017-05-19 | 2017-10-20 | 浙江大学 | A kind of shared traffic Customer Value Analysis method based on LRFMD models |
CN107274025B (en) * | 2017-06-21 | 2020-09-11 | 国网山东省电力公司诸城市供电公司 | System and method for realizing intelligent identification and management of power consumption mode |
CN107274025A (en) * | 2017-06-21 | 2017-10-20 | 国网山东省电力公司诸城市供电公司 | A kind of system and method realized with power mode Intelligent Recognition and management |
CN107391728B (en) * | 2017-08-02 | 2020-07-31 | 北京京东尚科信息技术有限公司 | Data mining method and data mining device |
CN107391728A (en) * | 2017-08-02 | 2017-11-24 | 北京京东尚科信息技术有限公司 | Data digging method and data mining device |
CN108009224A (en) * | 2017-11-24 | 2018-05-08 | 国网北京市电力公司 | The sorting technique and device of power customer |
CN109241190A (en) * | 2018-09-12 | 2019-01-18 | 国网江苏省电力有限公司苏州供电分公司 | Electric power big data mixes computing architecture |
CN109636101A (en) * | 2018-11-02 | 2019-04-16 | 国网辽宁省电力有限公司朝阳供电公司 | Large user's electricity consumption behavior analysis method under opening sale of electricity environment based on big data |
CN112035715A (en) * | 2020-07-10 | 2020-12-04 | 广西电网有限责任公司 | User label design method and device |
CN112035715B (en) * | 2020-07-10 | 2023-04-14 | 广西电网有限责任公司 | User label design method and device |
CN112712442A (en) * | 2020-12-30 | 2021-04-27 | 国网浙江省电力有限公司营销服务中心 | Power consumer ultra-tolerant diagnosis method based on multidimensional clustering |
CN112712442B (en) * | 2020-12-30 | 2023-11-07 | 国网浙江省电力有限公司营销服务中心 | Multi-dimensional clustering-based power consumer super-capacity diagnosis method |
CN116431931A (en) * | 2023-06-14 | 2023-07-14 | 陕西思极科技有限公司 | Real-time incremental data statistical analysis method |
CN116431931B (en) * | 2023-06-14 | 2023-08-25 | 陕西思极科技有限公司 | Real-time incremental data statistical analysis method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102982489A (en) | Power customer online grouping method based on mass measurement data | |
CN116646933B (en) | Big data-based power load scheduling method and system | |
CN110231528B (en) | Transformer household variation common knowledge identification method and device based on load characteristic model library | |
CN105005570B (en) | Magnanimity intelligent power data digging method and device based on cloud computing | |
US20120130659A1 (en) | Analysis of Large Data Sets Using Distributed Polynomial Interpolation | |
CN104317800A (en) | Hybrid storage system and method for mass intelligent power utilization data | |
CN105678398A (en) | Power load forecasting method based on big data technology, and research and application system based on method | |
CN102999791A (en) | Power load forecasting method based on customer segmentation in power industry | |
CN103955509A (en) | Quick search method for massive electric power metering data | |
CN112614011B (en) | Power distribution network material demand prediction method and device, storage medium and electronic equipment | |
CN111680841B (en) | Short-term load prediction method, system and terminal equipment based on principal component analysis | |
CN114416855A (en) | Visualization platform and method based on electric power big data | |
JP2015002588A (en) | Power consumption management system and method | |
CN108898248B (en) | Power load influence factor quantitative analysis method, device, equipment and medium | |
CN106250206A (en) | A kind of resource pool automatic measurement & calculation method based on virtual machine | |
Dong et al. | Forecasting smart meter energy usage using distributed systems and machine learning | |
CN112101689A (en) | Day-ahead intra-day scheduling method considering multi-type demand response uncertainty | |
CN113919655A (en) | Law enforcement personnel scheduling method, system, computer device and storage medium | |
CN115205068A (en) | Energy storage optimal peak-valley time interval dividing method considering net load demand distribution | |
Oprea et al. | Big data processing for commercial buildings and assessing flexibility in the context of citizen energy communities | |
CN204066111U (en) | A kind of quick retrieval system of magnanimity electric-power metering data | |
CN107590747A (en) | Power grid asset turnover rate computational methods based on the analysis of comprehensive energy big data | |
CN115146744B (en) | Electric energy meter load real-time identification method and system integrating time characteristics | |
CN104158175A (en) | Calculation method for real-time electricity classified load of power system distribution transformer terminal | |
CN104978604B (en) | A kind of analog simulation method and device based on professional ability model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130320 |