CN102982489A - Power customer online grouping method based on mass measurement data - Google Patents

Power customer online grouping method based on mass measurement data Download PDF

Info

Publication number
CN102982489A
CN102982489A CN2012104847126A CN201210484712A CN102982489A CN 102982489 A CN102982489 A CN 102982489A CN 2012104847126 A CN2012104847126 A CN 2012104847126A CN 201210484712 A CN201210484712 A CN 201210484712A CN 102982489 A CN102982489 A CN 102982489A
Authority
CN
China
Prior art keywords
data
cluster
online
power
customer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104847126A
Other languages
Chinese (zh)
Inventor
刘涛
杨劲锋
阙华坤
肖勇
孙卫明
陈启冠
王和栋
张良均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Guangdong Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Guangdong Power Grid Co Ltd filed Critical Electric Power Research Institute of Guangdong Power Grid Co Ltd
Priority to CN2012104847126A priority Critical patent/CN102982489A/en
Publication of CN102982489A publication Critical patent/CN102982489A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a power customer online grouping method based on mass measurement data. The power customer online grouping method includes step1 extracting historical sample data of power customers; step2 preprocessing extracted sample data; step3 conducting initial customer grouping on the historical sample data of the power customer; step4 extracting user information of online power customers in real time from a metering automatic system and reflecting real-time power utilization data reflecting power utility characteristics of the online power customers and conducting preprocessing; step5 acquiring preprocessed online power customer data and utilizing generated cluster center points to conduct online real-time grouping on newly increased online customer user data on the base of the generated customer groups. The method is capable of conducting dynamic grouping calculation of all the power customers.

Description

The online grouping method of a kind of power customer based on the magnanimity continuous data
Technical field
The present invention relates to a kind of power industry power customer grouping method, specifically refer to the online grouping method of a kind of power customer based on the magnanimity continuous data.
Background technology
Aspect power marketing; is often can run into such problem: how to do well avoided the peak hour, the electricity consumption of fault outage, scheduled outage is instructed and the emergency service when having a power failure? how the user that avoids the peak hour is carried out science, flexible avoiding the peak hour? how to guarantee responsive client, high value customer; can the avoiding the peak hour of high credit worthiness client, scheduled outage information in time be sent to? all these problems need that all a kind of effective method is arranged, and the client hives off to electricity consumption.
Traditional power customer hives off and mainly takes static method, the static attribute data of normal operation power customer.Or electric quantity data per month carries out, and data volume seldom in case calculate completely, seldom changes.And in fact change during the user power utilization behavior, static grouping method can not satisfy the demand that the client is provided the personal marketing service, and each power supply unit is in the urgent need to a kind of new method, comes the calculating of hiving off dynamically of all power customers.
The general data that the present invention adopts derives from the metering automation system, and this system utilizes collecting terminal equipment and communication network, can the real-time Power system load data of Real-time Obtaining client.The continuous data amount is large, utilizes traditional data mining algorithm to be difficult to process, and goes to infer that large data often make people take the wrong turning if excavate small data with algorithm, and implementing very is that effort is time-consuming.For electric system dirigibility, portability being provided and reducing cost and need select cloud computing environment.
Summary of the invention
The purpose of this invention is to provide the online grouping method of a kind of power customer based on the magnanimity continuous data, the method can be to the calculating of hiving off dynamically of all power customers.
Above-mentioned purpose of the present invention realizes by following technical solution: the online grouping method of a kind of power customer based on the magnanimity continuous data may further comprise the steps:
Step 1: power consumer historical sample data are extracted;
Step 2: the sample data that extracts is carried out pre-service;
Step 3: power consumer historical sample data are carried out initial customer grouping;
Step 4: from the metering automation system, extract in real time online power consumer information and reflect the online power consumer real-time electricity consumption data of electrical feature, and carry out pre-service;
Step 5: obtain pretreated online power consumer data, on the basis of the customer grouping that step 3 has generated, utilize the cluster centre point that has generated, newly-increased online power consumer data are hived off online in real time.
As improvement of the present invention, the present invention also comprises step 6: to the performance evaluation of hiving off in real time online of step 5 acquisition.
Among the present invention, the power consumer historical sample data in the described step 1 advance to comprise customer profile information, client's information about power and client's information on load;
Data pre-service in the described step 2 comprises missing values processing, outlier processing, the data of hiving off computing and data normalization processing;
S2.1: missing values is processed
At original continuous data, find to exist the phenomenon of disappearance, for guaranteeing the validity of modeling data, need to carry out polishing to these missing datas and process;
S2.2: outlier processing
To exceeding the data of index threshold values scope, carry out correcting process by of the same type day data in conjunction with interpolation algorithm;
S2.3: the computational data index of hiving off
Consider that load fluctuation can characterize client's the electrical feature of using substantially, so calculate the index that reflects the load change situation in the certain hour section based on electric weight and load index:
Rate of load condensate=average load/peak load
Peak total ratio=peak electric weight/total electric weight
Flat always than the electric weight of=ordinary telegram amount/always
Paddy is always than=paddy electric weight/total electric weight
Wherein:
Figure GDA00002454555300021
Peak load=M α x (L i), i=1,2 ..., 96, L iExpression was every 15 minutes power load sampled value; Peak, paddy, ordinary telegram amount are respectively the power consumption of city peak of power consumption time period, flat peak time section and paddy peak time section;
Rush hour, section referred to the peak of power consumption, and power consumption is relatively concentrated, and the low ebb time period is then opposite; Rush hour, section was 8 hours: 9:00~12:00,17:00~22:00; Flat 7 hours time periods of section: 8:00~9:00,12:00~17:00,22:00~23:00; 9 hours low ebb time periods: 23:00~next day 8:00;
S2.4: data normalization
In order to eliminate the otherness of hiving off between the index dimension, data are carried out normalized, main method can adopt the minimax value method, zero-mean method and decimal scaling method;
Described step 3 comprises following substep:
S3.1: sample data standardization
Data normalization refers to changing into vector data through a pretreated sample data, vector data comprise the big customer rate of load condensate, peak total ratio, flat always than and paddy always than;
Wherein d1 is rate of load condensate, and d2 is peak total ratio, d3 for flat always than, d4 be paddy always than;
Vector data is stored in the distributed file system, in standardized process, can pass through the MapReduce scheduler, according to the sample data file size split into some data block vector data angang than and paddy always than; Quantity according to data block starts Map tasks in parallel operative norm conversion work;
S3.2: distributed storage
Distributed file system adopts the master/slave framework; HDFS cluster is comprised of the Datanodes of a Namenode and some; Namenode is a central server, is in charge of the name space (namespace) of file system and client to the access of file; Datanode in the cluster is one of a node, is in charge of the storage on its place node; HDFS has opened the name space of file system, and the user can store data in the above with the form of file; See that internally a file is divided into one or more data blocks in fact, these pieces are stored on one group of Datanode; The operation of the name space of Namenode execute file system, such as open, close, Rename file or catalogue; It also is responsible for the specified data piece to the mapping of concrete Datanode node; Datanode is responsible for processing the read-write requests of file system client; Under the United Dispatching of Namenode, carry out data block establishment, delete and copy;
S3.3: cluster centre point initialization
Customer grouping mainly take can and the clustering algorithm of Distributed Calculation hive off, the below describes as an example of the K-means algorithm example;
Clustering algorithm at first generates empty cluster and numbering, concentrates from all sample datas and selects at random K object as the central point of K-means cluster, with the representative of cluster centre point as each cluster;
S3.4: iterative computation Optimal cluster centers point
By alternative manner, constantly calculate new cluster centre point, until all sample datas all and the distance between the central point minimum;
S3.5: export the data of hiving off
In previous step, drawn the cluster centre point by iterative computation constantly, also drawn the cluster centre under each sample data simultaneously, can directly export and get final product;
In the step 4, after the historical sample data initialization hives off, according to the practical application needs, regularly extract the real-time electricity consumption data that customer information and reflection client use electrical feature from the metering automation system, the described method of electricity consumption the data step 2 is carried out pre-service in real time;
Described step 5 comprises following substep:
On the basis of the customer grouping that step 3 has generated, utilize K the cluster centre point that has generated, adopt the Canopy algorithm that newly-increased data are hived off online in real time, concrete steps are as follows:
S5.1: according to the existing cluster centre point that hives off, generate K Canopy cluster, the center initial value of each cluster is the existing cluster centre point that hives off;
S5.2: specify suitable T1 and T2 parameter, all new datas are placed in the cluster of Canopy and carry out cluster calculation;
The Canopy algorithm at first can require to input two threshold values T1 and T2, T1〉T2; Algorithm has a cluster the S set et of Canopy, and it is empty when initial; Then first that reads can be put as a Canopy in the set, then read next point, the distance of each Canopy in calculating this point and gathering, if this distance is less than T1, then this point can be distributed to this Canopy, and when this distance during less than T2 this point can not be put in the set as a new Canopy;
S5.3: calculate new sample data to the distance B of each central point according to the Canopy algorithm, when D<T1, just this sample data is put in the corresponding cluster, when D<T2, then this sample data is deleted from new sample set, if D1-DK is〉T1, then this point can originally be generated as a new central point, thereby forms New Consumers clustering class; Cycle calculations is until all new samples data sets are sky.
The real-time cluster of electricity consumption client continuous data is an important content in the customer behavior analysis, can set up a lot of correlation models (such as classification recurrence, time series forecasting, association analysis and specificity discovery etc.) based on this.The online Clustering Model of metering automation system magnanimity is according to the input of real time measure data and the importance of parameter, according to cloud computing environment, be a plurality of classifications with the metering user data subdividing, provide all kinds of with results such as electrical feature, accounting and distributions, according to these Output rusults, can carry out differentiation to each class client and process.
The present invention builds on cloud computing technology, adopt the technology such as distributed storage, distributed index, distributed parallel calculating, can effectively carry out tissue, storage, index and the management of mass data, and the function such as inquiry, analysis of mass data is provided with standardized application or service interface.
Description of drawings
Fig. 1 is the system flowchart of the online grouping method of the present invention;
Fig. 2 is based on the k-means cluster process flow diagram of Distributed Calculation in the online grouping method of the present invention
Fig. 3 is the MapReduce parallelization implementation of k-means cluster in the online grouping method of the present invention
Fig. 4 is the Canopy figure that hives off in real time online in the online grouping method of the present invention
Embodiment
The online grouping method of a kind of power customer based on the magnanimity continuous data as shown in Figures 1 to 4 comprises the steps:
Step 1: power consumer historical sample data are extracted;
The present invention will realize that one is hived off with the cluster of electrical feature based on the big customer, need to from metering automation system and marketing management system, extract some and can reflect that the client uses the data of electrical feature, so except customer profile class data, also need extract client's electric weight class data, the client class data of loading, specifically comprise:
Customer profile information: stoichiometric point numbering, electricity consumption classification, category of employment, electric pressure etc.
Client's information about power: total electricity consumption, peak power consumption, flat power consumption, paddy power consumption etc.
Client's information on load: electric current, voltage, power factor, active power, every information on loads of 15 minutes etc.
Step 2: the sample data that extracts is carried out pre-service;
Described step 2 comprises following substep:
The data pre-service mainly comprises missing values processing, outlier processing, the data of hiving off calculating etc.
S2.1: missing values is processed
At original continuous data, particularly in the Real-time Load data pick-up process, find to exist the phenomenon of disappearance, for guaranteeing the validity of modeling data, need to carry out polishing to these missing datas and process.Rule is mainly by of the same type day data and processes in conjunction with interpolation algorithm.
S2.2: outlier processing
To exceeding the data of index threshold values scope, carry out correcting process by of the same type day data in conjunction with interpolation algorithm.
S2.3: the computational data index of hiving off
Consider that load fluctuation can characterize client's the electrical feature of use substantially, so calculate the index that (for example: upper one month every day is average) in the certain hour section reflects the load change situation based on electric weight and load index:
Rate of load condensate=average load/peak load
Peak total ratio=peak electric weight/total electric weight
Flat always than the electric weight of=ordinary telegram amount/always
Paddy is always than=paddy electric weight/total electric weight
Wherein:
Figure GDA00002454555300061
Peak load=M α x (L i), i=1,2 ..., 96, L iExpression was every 15 minutes power load sampled value.
Peak, paddy, ordinary telegram amount are respectively the power consumption of city peak of power consumption time period, flat peak time section and paddy peak time section.
Rush hour, section referred to the peak of power consumption, and power consumption is relatively concentrated, and the low ebb time period is then opposite.Rush hour, section was 8 hours: 9:00~12:00,17:00~22:00; Flat 7 hours time periods of section: 8:00~9:00,12:00~17:00,22:00~23:00; 9 hours low ebb time periods: 23:00~next day 8:00.
S2.4: data normalization
In order to eliminate the otherness of hiving off between the index dimension, data are carried out normalized, main method can adopt the minimax value method, zero-mean method and decimal scaling method, each index standard is arrived unified scope, the below is take the minimax value method as example, and each data all can be normalized within [0,1] scope.
Step 3: power consumer historical sample data are carried out initial customer grouping;
Described step 3 comprises following substep:
S3.1: sample data standardization
Data normalization refers to changing into vector data through a pretreated sample data, vector data comprise the big customer rate of load condensate, peak total ratio, flat always than and paddy always than.
Figure GDA00002454555300062
Wherein d1 is rate of load condensate, and d2 is peak total ratio, d3 for flat always than, d4 be paddy always than.
Vector data is stored in the distributed file system, in standardized process, can pass through the MapReduce scheduler, according to the sample data file size split into some data block vector data angang than and paddy always than etc.Quantity according to data block starts Map tasks in parallel operative norm conversion work, sees Fig. 2.
S3.2: distributed storage
Distributed file system adopts the master/slave framework.HDFS cluster is comprised of the Datanodes of a Namenode and some.Namenode is a central server, is in charge of the name space (namespace) of file system and client to the access of file.Datanode in the cluster is one of a node, is in charge of the storage on its place node.HDFS has opened the name space of file system, and the user can store data in the above with the form of file.See that internally a file is divided into one or more data blocks in fact, these pieces are stored on one group of Datanode.The operation of the name space of Namenode execute file system, such as open, close, Rename file or catalogue.It also is responsible for the specified data piece to the mapping of concrete Datanode node.Datanode is responsible for processing the read-write requests of file system client.Under the United Dispatching of Namenode, carry out data block establishment, delete and copy.
S3.3: cluster centre point initialization
Customer grouping mainly take can and the clustering algorithm of Distributed Calculation hive off, the below describes as an example of the K-means algorithm example.
Clustering algorithm at first generates empty cluster and numbering, concentrates from all sample datas and selects at random K object as the central point of K-means cluster, with the representative of cluster centre point as each cluster.
S3.4: iterative computation Optimal cluster centers point
By alternative manner, constantly calculate new cluster centre point, until all sample datas all and the distance between the central point minimum.
Here be divided into again for two steps:
The first step: calculate each sample data and belong to the cluster centre point.Namely calculate first each sample data to the distance of central point, then sample data is belonged to nearest cluster centre point.The main Map parallel method that adopts.The Map parallel method is the direct cutting of sample data, separately parallel computation of each cutting.Because do not need other sample datas when each sample data is calculated, therefore can walk abreast and carry out.
The cluster centre that is input as all sample datas to be clustered and last round of iteration (or initial clustering) of Map parallel method, input data recording<key, value〉right form be<line number, record is capable 〉; Each Map function reads in the cluster centre description document, and the Map function calculates apart from its nearest class center each sample data of input, and does the mark of new classification; Output intermediate result<key, value〉right form be<the cluster category IDs, record attribute is vectorial 〉.
Second step: the cluster centre point that recomputates each cluster.Namely for above-mentioned each cluster, calculate its center position, as new cluster centre point.The main Reduce parallel method that adopts.The Reduce method is returned transmission with the above-mentioned sample data that is distributed on each computing machine according to cluster position under it, and the sample data that is about in the identical cluster is sent on the same computer, to calculate the new central point of this cluster.
The computing formula of central point:
C → new = 1 n Σ p →
In the formula,
Figure GDA00002454555300081
Refer to each dimension values of new central point, be respectively all sample datas in this cluster
Figure GDA00002454555300082
In the arithmetic mean of this dimension values.
The task of Reduce function is that the intermediate result that obtains according to the Map function is calculated the cluster centre that makes new advances, for next round Map-Reduce Job. input data<key, value〉right form is<the cluster category IDs { record attribute vector set } 〉; The record that all key are identical (record that identical category ID is namely arranged) give a Reduce task--the identical some number of cumulative key and each record component and, ask the average of each component, obtain new cluster centre description document; Output rusults<key, value〉right form is<the cluster category IDs mean vector 〉.
The 3rd step: new and old cluster centre point position, determine whether convergence, such as convergence, then continue next step, otherwise, repeat S3.3.This process also adopts the Reduce method.Being about to new cluster centre point position data and old center position sends to same Reduce task and calculates.
Judge whether this cluster restrains: calculate cluster centre point distance before the last round of cluster centre that calculates and the beginning, if apart from less than given threshold value, think that then algorithm restrained end.Otherwise, then replace last round of cluster centre with the cluster centre of epicycle, and start the calculation task of a new round.
Fig. 3 is the process synoptic diagram of k-means clustering algorithm MapReduce implementation method deal with data.Before the Reduce task begins, can divide into groups take the key value as index and sort the intermediate result of Map tasks carrying node this locality, to improve the execution efficient of Reduce task.
S3.5: export the data of hiving off
In previous step, drawn the cluster centre point by iterative computation constantly, also drawn the cluster centre under each sample data simultaneously, can directly export and get final product.
Step 4: from the metering automation system, extract in real time online power consumer information and reflect the online power consumer real-time electricity consumption data of electrical feature, and carry out pre-service;
After the historical sample data initialization hives off, according to the practical application needs, can regularly from the metering automation system, extract customer information and reflection client with the real-time electricity consumption data of electrical feature, as deleting the client who has left, add new client, upgrade client's new data etc.Data should be carried out pre-service according to the described method of step 2.
Step 5: obtain pretreated online power consumer data, on the basis of the customer grouping that step 3 has generated, utilize the cluster centre point that has generated, newly-increased online power consumer data are hived off online in real time.
Described step 5 comprises following substep:
On the basis of the customer grouping that step 3 has generated, utilize K the cluster centre point that has generated, adopt the Canopy algorithm that newly-increased data are hived off online in real time.As shown in Figure 4.
S5.1: according to the existing cluster centre point that hives off, generate K Canopy cluster, the center initial value of each cluster is the existing cluster centre point that hives off;
S5.2: specify suitable T1 and T2 parameter, all new datas are placed in the cluster of Canopy and carry out cluster calculation;
The Canopy algorithm at first can require to input two threshold values T1 and T2, T1〉T2; Algorithm has a cluster the set (Set) of Canopy, and it is empty when just beginning; Then first that reads can be put as a Canopy in the set, then read next point, the distance of each Canopy in calculating this point and gathering, if this distance is less than T1, then this point can be distributed to point of this Canopy(and can distribute to a plurality of Canopy), and when this distance during less than T2 this point can not be put in the set as a new Canopy.
S5.3: calculate new sample data to the distance B of each central point according to the Canopy algorithm, when D<T1, just this sample data is put in the corresponding cluster, when D<T2, then this sample data is deleted from new sample set, if D1-DK is〉T1, then this point can originally be generated as a new central point, thereby forms new cluster (customer group); Cycle calculations is until all new samples data sets are sky;
S5.4: new central point is added in the former K-means central point, as the central point of online cluster next time;
S5.5: through long all after dates, the central point that calculates like this can be inaccurate, comprehensively recomputates so need to re-use step 5 pair all data, can improve existing central point as the cluster initial center point speed of convergence.
Step 6: to the performance evaluation of hiving off in real time online of step 5 acquisition.
Adopt 2 host nodes in the research, 5 are calculated and memory node 1 data acquisition node.Data volume is 74,920,323 records, takies disk space 2.5G, adopts the K-means cluster to carry out iterative computation 10 times, about 150 minutes consuming time; The online cluster time is about 9.39 minutes; As seen the online cluster grouping method that proposes of this patent has guaranteed that the Canopy cluster of increment can carry out very fast, the new reflection client who collects uses the electric quantity data of electrical feature, can both be very fast be assigned to cluster under its, this also is the key point of the online cluster that proposes of the present invention.
The hive off difficult point implemented of power customer of the present invention is the online processing of mass data, because very huge for the cluster data that carries out customer grouping, although can adopt a large amount of computer resources that all data are carried out constantly cluster, thereby improve the accuracy of cluster and guarantee certain real-time, but obviously this is a very waste.And this patent to adopt scheme that online cluster is combined with K-means be exactly a very cheap and solution fast.Algorithm has avoided the mass data amount excessive effectively, and software and hardware requires high, the problem that the system resource occupancy is high.

Claims (3)

1. online grouping method of the power customer based on the magnanimity continuous data may further comprise the steps:
Step 1: power consumer historical sample data are extracted;
Step 2: the sample data that extracts is carried out pre-service;
Step 3: power consumer historical sample data are carried out initial customer grouping;
Step 4: from the metering automation system, extract in real time online power consumer information and reflect the online power consumer real-time electricity consumption data of electrical feature, and carry out pre-service;
Step 5: obtain pretreated online power consumer data, on the basis of the customer grouping that step 3 has generated, utilize the cluster centre point that has generated, newly-increased online power consumer data are hived off online in real time.
2. the online grouping method of the power customer based on the magnanimity continuous data according to claim 1, it is characterized in that: the method comprises that also described step 1 comprises step 6: the performance evaluation of hiving off in real time online that step 5 is obtained.
3. the online grouping method of the power customer based on the magnanimity continuous data according to claim 1 and 2, it is characterized in that: the power consumer historical sample data in the described step 1 advance to comprise customer profile information, client's information about power and client's information on load;
Data pre-service in the described step 2 comprises missing values processing, outlier processing, the data of hiving off computing and data normalization processing;
S2.1: missing values is processed
At original continuous data, find to exist the phenomenon of disappearance, for guaranteeing the validity of modeling data, need to carry out polishing to these missing datas and process;
S2.2: outlier processing
To exceeding the data of index threshold values scope, carry out correcting process by of the same type day data in conjunction with interpolation algorithm;
S2.3: the computational data index of hiving off
Consider that load fluctuation can characterize client's the electrical feature of using substantially, so calculate the index that reflects the load change situation in the certain hour section based on electric weight and load index:
Rate of load condensate=average load/peak load
Peak total ratio=peak electric weight/total electric weight
Flat always than the electric weight of=ordinary telegram amount/always
Paddy is always than=paddy electric weight/total electric weight
Wherein:
Figure FDA00002454555200021
Peak load=M α x (L i), i=1,2 ..., 96, L iExpression was every 15 minutes power load sampled value; Peak, paddy, ordinary telegram amount are respectively the power consumption of city peak of power consumption time period, flat peak time section and paddy peak time section;
Rush hour, section referred to the peak of power consumption, and power consumption is relatively concentrated, and the low ebb time period is then opposite; Rush hour, section was 8 hours: 9:00~12:00,17:00~22:00; Flat 7 hours time periods of section: 8:00~9:00,12:00~17:00,22:00~23:00; 9 hours low ebb time periods: 23:00~next day 8:00;
S2.4: data normalization
In order to eliminate the otherness of hiving off between the index dimension, data are carried out normalized, main method can adopt the minimax value method, zero-mean method and decimal scaling method;
Described step 3 comprises following substep:
S3.1: sample data standardization
Data normalization refers to changing into vector data through a pretreated sample data, vector data comprise the big customer rate of load condensate, peak total ratio, flat always than and paddy always than;
Figure FDA00002454555200022
Wherein d1 is rate of load condensate, and d2 is peak total ratio, d3 for flat always than, d4 be paddy always than;
Vector data is stored in the distributed file system, in standardized process, can pass through the MapReduce scheduler, according to the sample data file size split into some data block vector data angang than and paddy always than; Quantity according to data block starts Map tasks in parallel operative norm conversion work;
S3.2: distributed storage
Distributed file system adopts the master/slave framework; HDFS cluster is comprised of the Datanodes of a Namenode and some; Namenode is a central server, is in charge of the name space (namespace) of file system and client to the access of file; Datanode in the cluster is one of a node, is in charge of the storage on its place node; HDFS has opened the name space of file system, and the user can store data in the above with the form of file; See that internally a file is divided into one or more data blocks in fact, these pieces are stored on one group of Datanode; The operation of the name space of Namenode execute file system, such as open, close, Rename file or catalogue; It also is responsible for the specified data piece to the mapping of concrete Datanode node; Datanode is responsible for processing the read-write requests of file system client; Under the United Dispatching of Namenode, carry out data block establishment, delete and copy;
S3.3: cluster centre point initialization
Customer grouping mainly take can and the clustering algorithm of Distributed Calculation hive off, the below describes as an example of the K-means algorithm example;
Clustering algorithm at first generates empty cluster and numbering, concentrates from all sample datas and selects at random K object as the central point of K-means cluster, with the representative of cluster centre point as each cluster;
S3.4: iterative computation Optimal cluster centers point
By alternative manner, constantly calculate new cluster centre point, until all sample datas all and the distance between the central point minimum;
S3.5: export the data of hiving off
In previous step, drawn the cluster centre point by iterative computation constantly, also drawn the cluster centre under each sample data simultaneously, can directly export and get final product;
In the step 4, after the historical sample data initialization hives off, according to the practical application needs, regularly extract the real-time electricity consumption data that customer information and reflection client use electrical feature from the metering automation system, the described method of electricity consumption the data step 2 is carried out pre-service in real time;
Described step 5 comprises following substep:
On the basis of the customer grouping that step 3 has generated, utilize K the cluster centre point that has generated, adopt the Canopy algorithm that newly-increased data are hived off online in real time, concrete steps are as follows:
S5.1: according to the existing cluster centre point that hives off, generate K Canopy cluster, the center initial value of each cluster is the existing cluster centre point that hives off;
S5.2: specify suitable T1 and T2 parameter, all new datas are placed in the cluster of Canopy and carry out cluster calculation;
The Canopy algorithm at first can require to input two threshold values T1 and T2, T1〉T2; Algorithm has a cluster the S set et of Canopy, and it is empty when initial; Then first that reads can be put as a Canopy in the set, then read next point, the distance of each Canopy in calculating this point and gathering, if this distance is less than T1, then this point can be distributed to this Canopy, and when this distance during less than T2 this point can not be put in the set as a new Canopy;
S5.3: calculate new sample data to the distance B of each central point according to the Canopy algorithm, when D<T1, just this sample data is put in the corresponding cluster, when D<T2, then this sample data is deleted from new sample set, if D1-DK is〉T1, then this point can originally be generated as a new central point, thereby forms New Consumers clustering class; Cycle calculations is until all new samples data sets are sky.
CN2012104847126A 2012-11-23 2012-11-23 Power customer online grouping method based on mass measurement data Pending CN102982489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104847126A CN102982489A (en) 2012-11-23 2012-11-23 Power customer online grouping method based on mass measurement data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104847126A CN102982489A (en) 2012-11-23 2012-11-23 Power customer online grouping method based on mass measurement data

Publications (1)

Publication Number Publication Date
CN102982489A true CN102982489A (en) 2013-03-20

Family

ID=47856443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104847126A Pending CN102982489A (en) 2012-11-23 2012-11-23 Power customer online grouping method based on mass measurement data

Country Status (1)

Country Link
CN (1) CN102982489A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366312A (en) * 2013-07-15 2013-10-23 国家电网公司 Intelligent transformer substation cloud system
CN103810644A (en) * 2013-12-13 2014-05-21 广东电网公司电力科学研究院 Directional power supply method and device
CN104750861A (en) * 2015-04-16 2015-07-01 中国电力科学研究院 Method and system for cleaning mass data of energy storage power station
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN105681089A (en) * 2016-01-26 2016-06-15 上海晶赞科技发展有限公司 Network user behavior clustering method, device and terminal
CN105844294A (en) * 2016-03-21 2016-08-10 全球能源互联网研究院 Electricity usage behavior analysis method based on FCM cluster algorithm
CN106022592A (en) * 2016-05-16 2016-10-12 中国电子科技集团公司电子科学研究院 Power consumption behavior anomaly detection and public security risk early warning method and device
CN106405224A (en) * 2016-08-24 2017-02-15 广东电网有限责任公司电力科学研究院 Method and system for energy-saving diagnosis based on bulk electric energy data
CN107248086A (en) * 2017-02-21 2017-10-13 国网江苏省电力公司南通供电公司 Advertisement putting aided analysis method based on user power utilization behavioural analysis
CN107274025A (en) * 2017-06-21 2017-10-20 国网山东省电力公司诸城市供电公司 A kind of system and method realized with power mode Intelligent Recognition and management
CN107274066A (en) * 2017-05-19 2017-10-20 浙江大学 A kind of shared traffic Customer Value Analysis method based on LRFMD models
CN107391728A (en) * 2017-08-02 2017-11-24 北京京东尚科信息技术有限公司 Data digging method and data mining device
CN108009224A (en) * 2017-11-24 2018-05-08 国网北京市电力公司 The sorting technique and device of power customer
CN108268876A (en) * 2016-12-30 2018-07-10 广东精点数据科技股份有限公司 A kind of detection method and device of the approximately duplicate record based on cluster
CN109241190A (en) * 2018-09-12 2019-01-18 国网江苏省电力有限公司苏州供电分公司 Electric power big data mixes computing architecture
CN109636101A (en) * 2018-11-02 2019-04-16 国网辽宁省电力有限公司朝阳供电公司 Large user's electricity consumption behavior analysis method under opening sale of electricity environment based on big data
CN112035715A (en) * 2020-07-10 2020-12-04 广西电网有限责任公司 User label design method and device
CN112712442A (en) * 2020-12-30 2021-04-27 国网浙江省电力有限公司营销服务中心 Power consumer ultra-tolerant diagnosis method based on multidimensional clustering
CN116431931A (en) * 2023-06-14 2023-07-14 陕西思极科技有限公司 Real-time incremental data statistical analysis method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101728868A (en) * 2008-10-31 2010-06-09 韩国电力公社 Method for classification and forecast of remote measuring power load patterns
US20100332210A1 (en) * 2009-06-25 2010-12-30 University Of Tennessee Research Foundation Method and apparatus for predicting object properties and events using similarity-based information retrieval and modeling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101728868A (en) * 2008-10-31 2010-06-09 韩国电力公社 Method for classification and forecast of remote measuring power load patterns
US20100332210A1 (en) * 2009-06-25 2010-12-30 University Of Tennessee Research Foundation Method and apparatus for predicting object properties and events using similarity-based information retrieval and modeling

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘友波 等: "基于多目标聚类的用电集群特征属性计算", 《电力系统自动化》 *
李应安: "基于MapReduce的聚类算法的并行化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366312A (en) * 2013-07-15 2013-10-23 国家电网公司 Intelligent transformer substation cloud system
CN103366312B (en) * 2013-07-15 2016-08-10 国家电网公司 A kind of intelligent transformer substation cloud system
CN103810644A (en) * 2013-12-13 2014-05-21 广东电网公司电力科学研究院 Directional power supply method and device
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN105005570B (en) * 2014-04-23 2018-02-16 国家电网公司 Magnanimity intelligent power data digging method and device based on cloud computing
CN104750861A (en) * 2015-04-16 2015-07-01 中国电力科学研究院 Method and system for cleaning mass data of energy storage power station
CN104750861B (en) * 2015-04-16 2019-05-21 中国电力科学研究院 A kind of energy-accumulating power station mass data cleaning method and system
CN105681089A (en) * 2016-01-26 2016-06-15 上海晶赞科技发展有限公司 Network user behavior clustering method, device and terminal
CN105681089B (en) * 2016-01-26 2019-10-18 上海晶赞科技发展有限公司 Networks congestion control clustering method, device and terminal
CN105844294A (en) * 2016-03-21 2016-08-10 全球能源互联网研究院 Electricity usage behavior analysis method based on FCM cluster algorithm
CN106022592B (en) * 2016-05-16 2021-12-28 中国电子科技集团公司电子科学研究院 Electricity consumption behavior abnormity detection and public security risk early warning method and device
CN106022592A (en) * 2016-05-16 2016-10-12 中国电子科技集团公司电子科学研究院 Power consumption behavior anomaly detection and public security risk early warning method and device
CN106405224A (en) * 2016-08-24 2017-02-15 广东电网有限责任公司电力科学研究院 Method and system for energy-saving diagnosis based on bulk electric energy data
CN108268876A (en) * 2016-12-30 2018-07-10 广东精点数据科技股份有限公司 A kind of detection method and device of the approximately duplicate record based on cluster
CN107248086A (en) * 2017-02-21 2017-10-13 国网江苏省电力公司南通供电公司 Advertisement putting aided analysis method based on user power utilization behavioural analysis
CN107274066A (en) * 2017-05-19 2017-10-20 浙江大学 A kind of shared traffic Customer Value Analysis method based on LRFMD models
CN107274025B (en) * 2017-06-21 2020-09-11 国网山东省电力公司诸城市供电公司 System and method for realizing intelligent identification and management of power consumption mode
CN107274025A (en) * 2017-06-21 2017-10-20 国网山东省电力公司诸城市供电公司 A kind of system and method realized with power mode Intelligent Recognition and management
CN107391728B (en) * 2017-08-02 2020-07-31 北京京东尚科信息技术有限公司 Data mining method and data mining device
CN107391728A (en) * 2017-08-02 2017-11-24 北京京东尚科信息技术有限公司 Data digging method and data mining device
CN108009224A (en) * 2017-11-24 2018-05-08 国网北京市电力公司 The sorting technique and device of power customer
CN109241190A (en) * 2018-09-12 2019-01-18 国网江苏省电力有限公司苏州供电分公司 Electric power big data mixes computing architecture
CN109636101A (en) * 2018-11-02 2019-04-16 国网辽宁省电力有限公司朝阳供电公司 Large user's electricity consumption behavior analysis method under opening sale of electricity environment based on big data
CN112035715A (en) * 2020-07-10 2020-12-04 广西电网有限责任公司 User label design method and device
CN112035715B (en) * 2020-07-10 2023-04-14 广西电网有限责任公司 User label design method and device
CN112712442A (en) * 2020-12-30 2021-04-27 国网浙江省电力有限公司营销服务中心 Power consumer ultra-tolerant diagnosis method based on multidimensional clustering
CN112712442B (en) * 2020-12-30 2023-11-07 国网浙江省电力有限公司营销服务中心 Multi-dimensional clustering-based power consumer super-capacity diagnosis method
CN116431931A (en) * 2023-06-14 2023-07-14 陕西思极科技有限公司 Real-time incremental data statistical analysis method
CN116431931B (en) * 2023-06-14 2023-08-25 陕西思极科技有限公司 Real-time incremental data statistical analysis method

Similar Documents

Publication Publication Date Title
CN102982489A (en) Power customer online grouping method based on mass measurement data
CN116646933B (en) Big data-based power load scheduling method and system
CN110231528B (en) Transformer household variation common knowledge identification method and device based on load characteristic model library
CN105005570B (en) Magnanimity intelligent power data digging method and device based on cloud computing
US20120130659A1 (en) Analysis of Large Data Sets Using Distributed Polynomial Interpolation
CN104317800A (en) Hybrid storage system and method for mass intelligent power utilization data
CN105678398A (en) Power load forecasting method based on big data technology, and research and application system based on method
CN102999791A (en) Power load forecasting method based on customer segmentation in power industry
CN103955509A (en) Quick search method for massive electric power metering data
CN112614011B (en) Power distribution network material demand prediction method and device, storage medium and electronic equipment
CN111680841B (en) Short-term load prediction method, system and terminal equipment based on principal component analysis
CN114416855A (en) Visualization platform and method based on electric power big data
JP2015002588A (en) Power consumption management system and method
CN108898248B (en) Power load influence factor quantitative analysis method, device, equipment and medium
CN106250206A (en) A kind of resource pool automatic measurement & calculation method based on virtual machine
Dong et al. Forecasting smart meter energy usage using distributed systems and machine learning
CN112101689A (en) Day-ahead intra-day scheduling method considering multi-type demand response uncertainty
CN113919655A (en) Law enforcement personnel scheduling method, system, computer device and storage medium
CN115205068A (en) Energy storage optimal peak-valley time interval dividing method considering net load demand distribution
Oprea et al. Big data processing for commercial buildings and assessing flexibility in the context of citizen energy communities
CN204066111U (en) A kind of quick retrieval system of magnanimity electric-power metering data
CN107590747A (en) Power grid asset turnover rate computational methods based on the analysis of comprehensive energy big data
CN115146744B (en) Electric energy meter load real-time identification method and system integrating time characteristics
CN104158175A (en) Calculation method for real-time electricity classified load of power system distribution transformer terminal
CN104978604B (en) A kind of analog simulation method and device based on professional ability model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130320