CN106547882A - A kind of real-time processing method and system of big data of marketing in intelligent grid - Google Patents

A kind of real-time processing method and system of big data of marketing in intelligent grid Download PDF

Info

Publication number
CN106547882A
CN106547882A CN201610953688.4A CN201610953688A CN106547882A CN 106547882 A CN106547882 A CN 106547882A CN 201610953688 A CN201610953688 A CN 201610953688A CN 106547882 A CN106547882 A CN 106547882A
Authority
CN
China
Prior art keywords
data
marketing
real
intelligent grid
time processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610953688.4A
Other languages
Chinese (zh)
Inventor
杨云
吕跃春
朱珠
罗春雷
聂静
吴彬
张晓勇
雷娟
张伟
晏尧
徐鑫
徐光侠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd
State Grid Corp of China SGCC
State Grid Chongqing Electric Power Co Ltd
Original Assignee
Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd
State Grid Corp of China SGCC
State Grid Chongqing Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd, State Grid Corp of China SGCC, State Grid Chongqing Electric Power Co Ltd filed Critical Electric Power Research Institute of State Grid Chongqing Electric Power Co Ltd
Priority to CN201610953688.4A priority Critical patent/CN106547882A/en
Publication of CN106547882A publication Critical patent/CN106547882A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

A kind of real-time processing method and system of big data of marketing in intelligent grid, the data acquisition module of proposition, data processing module, data memory module, business logic modules, this five parts of display module.Streaming computation model and batch processing computation model are organically linked together, operation can be easily interacted, so as to preferably provide the process service to marketing data.The real time mass data that the method for the present invention is produced in will be helpful to process intelligent grid in time, and the similarity of all kinds of marketing datas is deeply excavated, while providing more individual character and quality services for power grid user, it is also possible to provide for network system and reliably ensure.

Description

A kind of real-time processing method and system of big data of marketing in intelligent grid
Technical field
The present invention relates to the processing method and system of electric network data processing technology field, particularly electrical network marketing data.
Background technology
With data sustainable growth and the rapid expansion of Electric Power Marketing System, during big also into the electric power data of China Generation.Deepen continuously and advance with big data marketing system construction in intelligent grid, what operation of power networks and equipment detecting/monitoring were produced Data volume and the swift and violent growth of electric network terminal userspersonal information amount, become electricity based on the marketing big data real-time processing of intelligent grid Power department question of common concern.Under intelligent grid environment, reality is required for from generating, power transmission and transformation to user power utilization information gathering etc. When process.Generally big data process in real time is divided into two classes:Stream data analysis in real time and real-time batch system.
The Typical Representative of stream data real-time processing platform is Storm.Storm is one distributed, fault-tolerant real-time Computing system, it easily can write the real-time calculation procedure of complexity in a computer cluster, and ensure each message Will be processed, and in a little cluster, it is per second to process millions of message.Its main feature is as follows: 1) programming model is simple, and Storm has been provided the user flow data and processed simultaneously using the development mode similar to MapReduce Row framework, reduces the complexity of real-time streaming data process.2) various programming languages are supported, can be used on Storm many Programming language is planted, including Clojure, Java, Ruby and Python etc..Increase the support to other language, it is only necessary to realize Storm communication protocols.3) failure of high fault tolerance, the Storm progresses of work good at managing and node, and fast quick-recovery.4) level can Autgmentability, calculating task are distributed in multiple threads, are carried out between process and server parallel.5) Message Processing reliability, Storm Can guarantee that each message at least obtains once complete process.When mission failure, message is retried from message source.6) rapidity, The design of Storm systems ensure that message quickly can be processed, and its bottom uses message queue mechanism.
The Typical Representative of batch processing computation model is Hadoop, a distributed system architecture.With Hadoop as generation Traditional big data batch processing mode of table, needs frequently disk I-O operations, and computational efficiency is low, it is difficult to meet power system Middle on-line condition monitoring, the demand of assessment.For problems, it is applied widely based on calculating platform Spark of internal memory. Spark is a cluster computing system of increasing income calculated based on internal memory, it is therefore an objective to make data analysiss quicker.Spark is adopted Similar to the PC cluster framework of Hadoop, but Spark is applied to the PC cluster of particular job loadtype, and this calculating exists Need shared work data set between multiple parallel iteration operations (such as machine learning algorithm).In order to optimize such meter Calculate, Spark introduces PC cluster based on internal memory, will data set be buffered in internal memory, reduce disk access delays. Used in Spark is calculated, elasticity distribution formula data set RDD (resilient distributed datasets) improves efficiency. RDD is distributed across the read-only object set between a group node.These set can be heavy in the case where partial data collection is lost Build so that Spark has fault tolerant mechanism, the process for rebuilding partial data collection needs to safeguard blood lineage (lineage), i.e., by record The generating process of data, rebuilds the partial data collection lost.In Spark, RDD can be:From HDFS (hadoop Distributed file system) the Scala objects that create in file system;Distribution simultaneously line number between the individual nodes According to section;From the RDD that other RDD are converted;Change the persistency of existing RDD, such as existing RDD is buffered in internal memory. Spark process some particular tasks when, 1~2 order of magnitude higher than Hadoop operational efficiency.The speed of Spark is Hadoop 100 times of MapReduce.
Spark streaming are to build the framework that stream data is processed on Spark, and its ultimate principle is by flow data Be divided into little time segment (several seconds), this fraction data is processed in the way of similar batch batch processings.Spark Streaming is built on Spark, is on the one hand because that the low latency enforcement engine (100ms+) of Spark can be used in real time Calculate, on the other hand compare other process framework (such as Storm) based on Record, RDD data sets are easier to do efficient appearance Fault is managed.In addition the mode that small lot is processed allows it while the logical sum algorithm of compatible batch and real time data processing. Facilitate some certain applications for needing historical data and real time data conjoint analysis.
Existing intelligent grid marketing data processing system is mostly using the processing method that unit is centralized, main to process knot The data of structure, non real-time type, and collection and real-time processing have been carried out to partial service data only, its data storage capacities, Data-handling capacity, data exchange capability, data exhibiting ability and data interactive capability improving limited space, mass data assets Not by rational and efficient use;Lack the means of Stream Processing simultaneously, it is impossible to support application of each electrical network field to real time data to need Ask.
The content of the invention
One object of the present invention is just to provide a kind of real-time processing method of marketing big data in intelligent grid, and it is to electricity Net big data is analyzed process, provides technical support for electrical network marketing.
The purpose of the present invention is realized by such technical scheme, is comprised the following steps that:
1) multiple server groups are received into the electrical network marketing data collected by Flume into Kafka clusters, electrical network is marketed Data separate partition functions select subregion, are stored on corresponding server after subregion;
2) to the electrical network marketing data after subregion, quick diagnosis and assessment are done in the way of stream calculation using Storm, is distinguished Go out requirement of the electrical network marketing data to real-time;
3) real-time processing is carried out to the electrical network marketing data after diagnosis and assessment, low data are selected is required to the time limit MapReduce process;The data high to requirement of real-time, are processed using the K-Means clustering algorithms based on Spark, are analyzed hidden The information ensconced in data;
4) by step 3) output result write in HBase, corresponding service logic is set.
Further, step 1) described in electrical network marketing data using partition functions select subregion concrete grammar it is as follows:
Using Flume blocker interceptors, the Key values in electrical network marketing data event header information are read, i.e., Key assignments, then selects subregion according to key assignments.
Further, step 2) described in Storm comprised the following steps with the concrete grammar of stream calculation mode:Build electrical network battalion The topological structure of pin data handling procedure, carries out denoising successively, calculates characteristic quantity and interpretation of result process to electrical network marketing data.
Further, the step 2) process the distributed file storage system HDFS that the data for obtaining are stored in Hadoop In.
Further, step 3) described in comprised the following steps that based on the K-Means clustering algorithms of Spark:
3-1) in internal memory, each block is converted into an elastic data collection RDD to the blocks of files in HDFS, comprising prison in RDD Survey the feature duration set of data;
Map operation is carried out to RDD 3-2), vectorial Vector is characterized by Monitoring Data abstract, it is every in wherein Vector It is one-dimensional all to correspond to the every one-dimensional of Point, calculate the corresponding clusters of each Vector (Point) and number (Class), wherein Class For the numbering of each cluster centre point, and key-value pair (K, V) is exported for (wherein K represents key for Class, (Point, 1)), is several According to per one-dimensional numbering;V representative values, are the actual value of data;New RDD is generated with this;
3-3) RDD new to each mixes, and the data of identical cluster are stored together, and every in RDD internal calculations Individual cluster centre point;
3-4) judge the distance between central point and previous central point, if meet required, terminate, otherwise from second Step starts, until meeting termination condition;
3-5) output result is write in HDFS.
Further, step 3-3) described in the concrete grammar in RDD internal calculations each cluster centre points it is as follows:
K cluster centre point is randomly selected, μ is designated as1, μ2..., μk∈Rn, wherein RnRepresent n dimension real number vector spaces.
Repeat procedure below, until convergence:To each sample i, its class that should belong to is calculated, Wherein c(i)Represent i-th sample generic;x(i)Represent the corresponding characteristic vector of i-th sample;μjRepresent in j-th cluster The corresponding characteristic vector of heart point.For each class j, such central point is recalculated, Wherein m represents the sum of characteristic vector in j-th classification.
Another object of the present invention is just to provide a kind of real time processing system of marketing big data in intelligent grid, and it can Intellectual analysis are carried out to the marketing data in electrical network, provide data support for electrical network marketing.
By such technical scheme, the purpose of the present invention realizes that it includes:
Data acquisition module, for receiving the electrical network marketing data collected by Flume by Kafka clusters, to data source Integrated;
Data processing module, for using Storm clusters in the way of stream calculation processing data, then using being based on The K-Means clustering algorithms of Spark are calculated in real time;
Data memory module, the result for data processing module is exported are write in HBase;
Business logic modules, for realizing user management and user right system logic;
Display module, for providing Web interactive operations interface.
Further, the Storm clusters are made up of a host node and multiple working nodes, and host node is Nimbus, are born Responsibility business distribution, code distribution, cluster monitoring work;Working node is Supervisor, one physical machine of correspondence, for starting Process.
Further, the working node includes multiple processes, and each process includes multiple threads.
As a result of above-mentioned technical proposal, the present invention has the advantage that:
The characteristics of present invention combines marketing data, devises mixing Hadoop, Storm from system-level aspect, Spark's Real-time processing framework, completes the real-time processing to big data of marketing in intelligent grid with efficiency higher.The method of the present invention The real time mass data produced in will be helpful to process intelligent grid in time, and deeply excavate the similar of all kinds of marketing datas Property, while providing more individual character and quality services for power grid user, it is also possible to provide for network system and reliably ensure
Other advantages of the present invention, target and feature will be illustrated to a certain extent in the following description, and And to a certain extent, based on being will be apparent to investigating hereafter to those skilled in the art, Huo Zheke To be instructed from the practice of the present invention.The target and other advantages of the present invention can pass through description below and right will Seek book to realize and obtain.
Description of the drawings
The description of the drawings of the present invention is as follows.
Fig. 1 is the real time processing system module diagram of big data of marketing in a kind of intelligent grid in the present invention;
Fig. 2 is the framework model figure of Storm clusters in the present invention;
Fig. 3 is the real-time processing architectural framework figure in the present invention for big data of marketing in intelligent grid;
Fig. 4 is the topology diagram of the marketing data stream process designed in the present invention;
Fig. 5 is to realize process schematic based on the K-Means of Spark in the present invention.
Specific embodiment
The invention will be further described with reference to the accompanying drawings and examples.
A kind of real-time processing method of big data of marketing in intelligent grid, specifically includes following steps:
Step one, multi-class parallel message transmission;By in intelligent grid each marketing data collection terminal is abstract is One producer, is then that every class marketing data creates a topic, and producer is by news release to the topic for specifying In, subregion is selected using specific partition functions.Finally, message is provided from kafka clusters to consumer.In electrical network marketing number Polytype data, such as resident living power utility, commercial power, big commercial power etc. are included according in.Flume is to various numbers According to pretreatment is carried out, connection message middleware Kafka is received.Data inside same Topic according to certain Key- Value forms are partitioned storage on a different server.
Step 2, processed in the way of stream calculation using Storm:During using Storm processing datas, first at design data The priority logical relation of the topological structure of reason process, i.e. data processing.The processing sequence of marketing data is followed successively by:Obtain number According to, denoising, calculate characteristic quantity, interpretation of result.
Step 3, real-time processing is carried out to electric network data:Low task choosing MapReduce process is required to the time limit;It is right The high task of requirement of real-time, using K-Means clustering algorithms are combined the characteristics of Spark memory parallel technologies, data is divided For different classifications, find to be hidden in valuable information in marketing data.The judgement of requirement of real-time height is according to statistical number According to drawing.Based on the K-Means clustering algorithms of Spark as shown in figure 5, comprising the following steps that for the algorithm:
(1) blocks of files being stored on HDFS is read in internal memory, each block is converted into a RDD, the inside includes monitoring The feature duration set of data.
(2) map operation is carried out and then to RDD, vectorial Vector is characterized by Monitoring Data abstract, in wherein Vector It is every one-dimensional all correspond to the every one-dimensional of Point, calculate the corresponding clusters of each Vector (Point) and number (Class), wherein Class is the numbering of each cluster centre point, and export key-value pair (K, V) for (wherein K represents key for Class, (Point, 1)), It is the every one-dimensional numbering of data;V representative values, are the actual value of data;New RDD is generated with this.
(3) then in reduction operation, to each, new RDD mixes, and the data of identical cluster are stored together, and In each cluster centre point of RDD internal calculations.
(4) finally judge the distance between central point and previous central point, if meet required, terminate, otherwise from Second step starts, until meeting termination condition.
(5) finally output result is write in HDFS.
Cluster centre point is calculated as:K cluster centre point is randomly selected, μ is designated as1, μ2..., μk∈Rn, wherein RnRepresent N ties up real number vector space.
Repeat procedure below, until convergence:To each sample i, the class that sample i should belong to is calculated,Wherein c(i)Represent i-th sample generic;x(i)Represent i-th sample corresponding Characteristic vector;μjRepresent the corresponding characteristic vector of j-th cluster centre point.For each class j, such center is recalculated Point,Wherein m represents the sum of characteristic vector in j-th classification.
As shown in figure 1, the invention discloses a kind of calculate the big number of intelligent grid marketing calculated with batch processing based on streaming According to real time processing system, which includes:
Electrical network marketing data acquisition module, realizes message queue using Kafka in the data acquisition module, to data source Integrated;
Data processing module, data processing module processing data in the way of stream calculation first with Storm, then Calculated using the Spark frameworks based on internal memory in real time;
The output result of data processing module is stored to output result by data memory module, the data memory module In HBase;
Business logic modules, the business logic modules realize the user management of system and user right system logic;
Display module, the display module provide Web interactive operations interface.
Wherein, data acquisition module is responsible for integrating Flume+Kafka, Producer Producers of the Flume as message, raw The message data of product is saved in Kafka, using Storm topological structure Topology as message consumer Consumer。
Preferably, in real time processing system proposed by the invention, streaming computing module adopts Storm clusters, batch processing meter Calculate module and mainly adopt Spark clusters, can also be other clusters in other embodiments certainly, its function description is as follows:
(1) framework of Storm clusters is client/server, is made up of a host node and multiple working nodes.Host node Nimbus, is responsible for the work such as task distribution, code distribution, cluster monitoring.Working node is Supervisor, one physics of correspondence Machine, for starting worker.Each working node runs multiple worker, and what worker was represented is process, each worker bag Containing multiple Task, Task represents thread.The framework of Storm clusters is as shown in Figure 2.Wherein Nimbus and Supervisor are fast Speed failure, it is stateless, so can restart immediately after some node collapses, do not interfere with the operation of system, host node and work The coordination made between node is completed by Zookeeper.
(2) Spark is the cluster computing system calculated based on internal memory, it is therefore an objective to make data analysiss quicker.Spark is adopted With the PC cluster framework similar to Hadoop, but Spark is applied to the PC cluster of particular job loadtype, this calculating Shared work data set, such as machine learning algorithm are needed between the operation of multiple parallel iterations.In order to optimize such meter Calculate, Spark introduces PC cluster based on internal memory, will data set be buffered in internal memory, reduce disk access delays. Used in Spark is calculated, elasticity distribution formula data set RDD improves efficiency.RDD is distributed across the read-only object between a group node Set.These set can be rebuild in the case where partial data collection is lost so that Spark has fault tolerant mechanism, rebuilds part The process of data set needs to safeguard blood lineage, i.e., by the generating process of record data, rebuild the partial data collection lost. In Spark, RDD can be:The Scala objects created from HDFS file system;Distribution parallel data between the individual nodes Section;From the RDD that other RDD are converted;Change the persistency of existing RDD, such as existing RDD is buffered in internal memory. Spark process some particular tasks when, 1~2 order of magnitude higher than Hadoop operational efficiency.The speed of Spark is Hadoop 100 times of MapReduce.Because during operation Spark systems, server can be intermediate data storage in RAM, and without the need for Jing Often load from disk.
It is as shown in Figure 3 for a kind of real-time processing architectural framework of big data of marketing in intelligent grid in the present invention.By institute The data of acquisition import system in the form of streaming, is processed in the way of stream calculation using Storm, data is made with quick diagnosis and is commented Estimate;After the completion of process, data are stored in the distributed file storage system HDFS of Hadoop;Low data are required to the time limit Analysis task, is completed using MapReduce technologies, directly processes data in magnetic disk;The task high to requirement of real-time, reads from HDFS Elastic data collection RDD is fetched data and be converted to, is calculated using the Spark frameworks based on internal memory.
The present invention is as shown in Figure 4 based on the topological structure that streaming is calculated.Spout represents the origin of marketing data, supports many Data Source is planted, and is respectively processed.Blot represents a process of data processing, comprising denoising, calculates characteristic quantity, result Analysis etc., different characteristic quantity calculates modes and different analysis modes are expressed as different Blot.The output of one Blot can Using the input as another Blot.
So, one intactly data processing module process just complete.And then output result is write in HBase, Corresponding service logic is set, that is, realizes the user management and user right system logic of system.The interactive behaviour of Web is finally provided Make interface to inquire about for staff.
The characteristics of present invention combines marketing data, devises mixing Hadoop, Storm from system-level aspect, Spark's Real-time processing framework, completes the real-time processing to big data of marketing in intelligent grid with efficiency higher.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of spirit or essential attributes without departing substantially from the present invention, the present invention can be realized in other specific forms.Therefore, no matter From from the point of view of which, example all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention will by appended right Ask rather than described above is limited, it is intended that all changes that will fall in the implication and scope of the equivalency of claim Include in the present invention.Any reference in claim should not be considered as and limit involved claim.
Finally illustrate, above example is only unrestricted to illustrate technical scheme, although with reference to compared with Good embodiment has been described in detail to the present invention, it will be understood by those within the art that, can be to the skill of the present invention Art scheme is modified or equivalent, and without deviating from the objective and scope of the technical program, which all should be covered in the present invention Right in the middle of.

Claims (9)

1. the real-time processing method of big data of marketing in a kind of intelligent grid, it is characterised in that comprise the following steps that:
1) multiple server groups are received into the electrical network marketing data collected by Flume, to electrical network marketing data into Kafka clusters Subregion is selected using partition functions, is stored in after subregion on corresponding server;
2) to the electrical network marketing data after subregion, quick diagnosis and assessment is done in the way of stream calculation using Storm, electricity is distinguished Requirement of the net marketing data to real-time;
3) real-time processing is carried out to the electrical network marketing data after diagnosis and assessment, low data are selected is required to the time limit MapReduce process;The data high to requirement of real-time, are processed using the K-Means clustering algorithms based on Spark, are analyzed hidden The information ensconced in data;
4) by step 3) output result write in HBase, corresponding service logic is set.
2. the real-time processing method of big data of marketing in intelligent grid as claimed in claim 1, it is characterised in that step 1) in It is described to select the concrete grammar of subregion as follows using partition functions electrical network marketing data:
Using Flume blocker interceptors, the Key values in electrical network marketing data event header information, i.e. key assignments are read, Then subregion is selected according to key assignments.
3. the real-time processing method of big data of marketing in intelligent grid as claimed in claim 1, it is characterised in that step 2) in The Storm is comprised the following steps with the concrete grammar of stream calculation mode:Build the topology knot of electrical network marketing data processing procedure Structure, carries out denoising successively, calculates characteristic quantity and interpretation of result process to electrical network marketing data.
4. the real-time processing method of big data of marketing in intelligent grid as claimed in claim 1, it is characterised in that the step 2) process the data for obtaining to be stored in the distributed file storage system HDFS of Hadoop.
5. the real-time processing method of big data of marketing in intelligent grid as claimed in claim 4, it is characterised in that step 3) in The K-Means clustering algorithms based on Spark are comprised the following steps that:
3-1) in internal memory, each block is converted into an elastic data collection RDD to the blocks of files in HDFS, comprising monitoring number in RDD According to feature duration set;
Map operation is carried out to RDD 3-2), vectorial Vector is characterized by Monitoring Data abstract, it is every one-dimensional in wherein Vector The every one-dimensional of Point is all corresponded to, the corresponding clusters of each Vector (Point) is calculated and is numbered (Class), wherein Class is every The numbering of individual cluster centre point, and key-value pair (K, V) is exported for (wherein K represents key for Class, (Point, 1)), is that data are every One-dimensional numbering;V representative values, are the actual value of data;New RDD is generated with this;
3-3) RDD new to each mixes, and the data of identical cluster are stored together, and each gathers in RDD internal calculations Class central point;
3-4) judge the distance between central point and previous central point, if meet required, terminate, otherwise open from second step Begin, until meeting termination condition;
3-5) output result is write in HDFS.
6. the real-time processing method of big data of marketing in intelligent grid as claimed in claim 5, it is characterised in that step 3-3) Described in the concrete grammar in RDD internal calculations each cluster centre points it is as follows:
K cluster centre point is randomly selected, μ is designated as1, μ2..., μk∈Rn, wherein RnRepresent n dimension real number vector spaces.
Repeat procedure below, until convergence:To each sample i, its class that should belong to is calculated, Wherein c(i)Represent i-th sample generic;x(i)Represent the corresponding characteristic vector of i-th sample;μjRepresent in j-th cluster The corresponding characteristic vector of heart point.For each class j, such central point is recalculated,Its Middle m represents the sum of characteristic vector in j-th classification.
7. using the real time processing system of big data of marketing in the intelligent grid of claim 1-6 any one methods described, its It is characterised by, the system is included:
Data acquisition module, for receiving the electrical network marketing data collected by Flume by Kafka clusters, is carried out to data source Integrate;
Data processing module, for using Storm clusters in the way of stream calculation processing data, then utilize based on Spark's K-Means clustering algorithms are calculated in real time;
Data memory module, the result for data processing module is exported are write in HBase;
Business logic modules, for realizing user management and user right system logic;
Display module, for providing Web interactive operations interface.
8. the real time processing system of big data of marketing in intelligent grid as claimed in claim 7, it is characterised in that:It is described Storm clusters are made up of a host node and multiple working nodes, and host node is Nimbus, be responsible for task distribution, code distribution, Cluster monitoring works;Working node is Supervisor, one physical machine of correspondence, for launching process.
9. the real time processing system of big data of marketing in intelligent grid as claimed in claim 8, it is characterised in that:The work Node includes multiple processes, and each process includes multiple threads.
CN201610953688.4A 2016-11-03 2016-11-03 A kind of real-time processing method and system of big data of marketing in intelligent grid Pending CN106547882A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610953688.4A CN106547882A (en) 2016-11-03 2016-11-03 A kind of real-time processing method and system of big data of marketing in intelligent grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610953688.4A CN106547882A (en) 2016-11-03 2016-11-03 A kind of real-time processing method and system of big data of marketing in intelligent grid

Publications (1)

Publication Number Publication Date
CN106547882A true CN106547882A (en) 2017-03-29

Family

ID=58393658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610953688.4A Pending CN106547882A (en) 2016-11-03 2016-11-03 A kind of real-time processing method and system of big data of marketing in intelligent grid

Country Status (1)

Country Link
CN (1) CN106547882A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220360A (en) * 2017-06-07 2017-09-29 云南电网有限责任公司信息中心 A kind of Unified Modeling storage cut-in method based on magnanimity electric power monitoring data
CN107239878A (en) * 2017-04-27 2017-10-10 国网上海市电力公司 A kind of intelligent grid meta-synthetic engineering applied to large-scale travel resort
CN107395669A (en) * 2017-06-01 2017-11-24 华南理工大学 A kind of collecting method and system based on the real-time distributed big data of streaming
CN107704545A (en) * 2017-11-08 2018-02-16 华东交通大学 Railway distribution net magnanimity information method for stream processing based on Storm Yu Kafka message communicatings
CN107918830A (en) * 2017-11-20 2018-04-17 国网重庆市电力公司南岸供电分公司 A kind of distribution Running State assessment system and method based on big data technology
CN108492150A (en) * 2018-04-11 2018-09-04 口碑(上海)信息技术有限公司 The determination method and system of entity temperature
CN108804601A (en) * 2018-05-29 2018-11-13 国网浙江省电力有限公司 Power grid operation monitors the active analysis method of big data and device
CN109450978A (en) * 2018-10-10 2019-03-08 四川长虹电器股份有限公司 A kind of data classification and load balance process method based on storm
CN109617734A (en) * 2018-12-25 2019-04-12 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN109933620A (en) * 2019-03-18 2019-06-25 上海大学 Thermoelectricity big data method for digging based on Spark
CN110019106A (en) * 2019-03-21 2019-07-16 国网江西省电力有限公司萍乡供电分公司 A kind of power marketing method and system for processing mass data of smart grid
CN110309115A (en) * 2018-03-14 2019-10-08 华东交通大学 Fusion calculates the railway power distribution network magnanimity information processing method with off-line calculation in real time
CN110490229A (en) * 2019-07-16 2019-11-22 昆明理工大学 A kind of electric energy meter calibration error diagnostics method based on spark and clustering algorithm
CN110543464A (en) * 2018-12-12 2019-12-06 广东鼎义互联科技股份有限公司 Big data platform applied to smart park and operation method
CN111177276A (en) * 2020-01-06 2020-05-19 浙江中烟工业有限责任公司 Spark calculation framework-based kinetic energy data processing system and method
CN111460333A (en) * 2020-03-30 2020-07-28 北京工业大学 Real-time search data analysis system
CN112782469A (en) * 2021-01-13 2021-05-11 公诚管理咨询有限公司 Smart power grid metering processing method based on distributed computation
CN114911862A (en) * 2022-07-18 2022-08-16 国网江苏省电力有限公司营销服务中心 System and method for transmitting big data of network operation link of network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281130A (en) * 2014-09-22 2015-01-14 国家电网公司 Hydroelectric equipment monitoring and fault diagnosis system based on big data technology
CN105681397A (en) * 2015-12-30 2016-06-15 曙光信息产业(北京)有限公司 Network traffic data storage method and system, query method and device
CN105701596A (en) * 2015-12-24 2016-06-22 国家电网公司 Method for lean distribution network emergency maintenance and management system based on big data technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281130A (en) * 2014-09-22 2015-01-14 国家电网公司 Hydroelectric equipment monitoring and fault diagnosis system based on big data technology
CN105701596A (en) * 2015-12-24 2016-06-22 国家电网公司 Method for lean distribution network emergency maintenance and management system based on big data technology
CN105681397A (en) * 2015-12-30 2016-06-15 曙光信息产业(北京)有限公司 Network traffic data storage method and system, query method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DIAOXIAOMIN402: ""基于FlumeKafkaSpark的分布式日志流处理系统的设计与实现"", 《百度文库—HTTPS://WENKU.BAIDU.COM/VIEW/5D366EDC5727A5E9846A6158.HTML》 *
周国亮 等: ""实时大数据处理技术在状态监测领域中的应用"", 《电工技术学报》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239878A (en) * 2017-04-27 2017-10-10 国网上海市电力公司 A kind of intelligent grid meta-synthetic engineering applied to large-scale travel resort
CN107395669A (en) * 2017-06-01 2017-11-24 华南理工大学 A kind of collecting method and system based on the real-time distributed big data of streaming
CN107395669B (en) * 2017-06-01 2020-04-07 华南理工大学 Data acquisition method and system based on streaming real-time distributed big data
CN107220360A (en) * 2017-06-07 2017-09-29 云南电网有限责任公司信息中心 A kind of Unified Modeling storage cut-in method based on magnanimity electric power monitoring data
CN107704545A (en) * 2017-11-08 2018-02-16 华东交通大学 Railway distribution net magnanimity information method for stream processing based on Storm Yu Kafka message communicatings
CN107918830A (en) * 2017-11-20 2018-04-17 国网重庆市电力公司南岸供电分公司 A kind of distribution Running State assessment system and method based on big data technology
CN107918830B (en) * 2017-11-20 2021-11-23 国网重庆市电力公司南岸供电分公司 Power distribution network running state evaluation method based on big data technology
CN110309115A (en) * 2018-03-14 2019-10-08 华东交通大学 Fusion calculates the railway power distribution network magnanimity information processing method with off-line calculation in real time
CN108492150A (en) * 2018-04-11 2018-09-04 口碑(上海)信息技术有限公司 The determination method and system of entity temperature
CN108492150B (en) * 2018-04-11 2020-06-09 口碑(上海)信息技术有限公司 Method and system for determining entity heat degree
CN108804601A (en) * 2018-05-29 2018-11-13 国网浙江省电力有限公司 Power grid operation monitors the active analysis method of big data and device
CN109450978A (en) * 2018-10-10 2019-03-08 四川长虹电器股份有限公司 A kind of data classification and load balance process method based on storm
CN110543464A (en) * 2018-12-12 2019-12-06 广东鼎义互联科技股份有限公司 Big data platform applied to smart park and operation method
CN109617734A (en) * 2018-12-25 2019-04-12 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN109617734B (en) * 2018-12-25 2021-12-07 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN109933620A (en) * 2019-03-18 2019-06-25 上海大学 Thermoelectricity big data method for digging based on Spark
CN110019106A (en) * 2019-03-21 2019-07-16 国网江西省电力有限公司萍乡供电分公司 A kind of power marketing method and system for processing mass data of smart grid
CN110490229A (en) * 2019-07-16 2019-11-22 昆明理工大学 A kind of electric energy meter calibration error diagnostics method based on spark and clustering algorithm
CN111177276A (en) * 2020-01-06 2020-05-19 浙江中烟工业有限责任公司 Spark calculation framework-based kinetic energy data processing system and method
CN111177276B (en) * 2020-01-06 2023-10-20 浙江中烟工业有限责任公司 Spark computing framework-based kinetic energy data processing system and method
CN111460333A (en) * 2020-03-30 2020-07-28 北京工业大学 Real-time search data analysis system
CN111460333B (en) * 2020-03-30 2024-02-23 北京工业大学 Real-time search data analysis system
CN112782469A (en) * 2021-01-13 2021-05-11 公诚管理咨询有限公司 Smart power grid metering processing method based on distributed computation
CN114911862A (en) * 2022-07-18 2022-08-16 国网江苏省电力有限公司营销服务中心 System and method for transmitting big data of network operation link of network
CN114911862B (en) * 2022-07-18 2022-12-06 国网江苏省电力有限公司营销服务中心 System and method for transmitting big data of network national network operation link

Similar Documents

Publication Publication Date Title
CN106547882A (en) A kind of real-time processing method and system of big data of marketing in intelligent grid
Zheng et al. Real-time big data processing framework: challenges and solutions
CN103246749B (en) The matrix database system and its querying method that Based on Distributed calculates
DE102012216029B4 (en) A SCALABLE ADAPTABLE MAP REDUCE FRAMEWORK WITH DISTRIBUTED DATA
Neelakandan et al. Large scale optimization to minimize network traffic using MapReduce in big data applications
CN105893628A (en) Real-time data collection system and method
CN104820708B (en) A kind of big data clustering method and device based on cloud computing platform
CN107563153A (en) A kind of PacBio microarray dataset IT architectures based on Hadoop structures
CN106897322A (en) The access method and device of a kind of database and file system
CN105469204A (en) Reassembling manufacturing enterprise integrated evaluation system based on deeply integrated big data analysis technology
CN104750780B (en) A kind of Hadoop configuration parameter optimization methods based on statistical analysis
CN107609141A (en) It is a kind of that quick modelling method of probabilistic is carried out to extensive renewable energy source data
CN114416855A (en) Visualization platform and method based on electric power big data
Bellini et al. Data flow management and visual analytic for big data smart city/IOT
CN103116525A (en) Map reduce computing method under internet environment
CN107046557A (en) The intelligent medical calling inquiry system that dynamic Skyline is inquired about under mobile cloud computing environment
CN111159180A (en) Data processing method and system based on data resource directory construction
Elagib et al. Big data analysis solutions using MapReduce framework
CN109063752B (en) Multi-source high-dimensional multi-scale real-time data stream sorting method based on neural network
Singh et al. Spatial data analysis with ArcGIS and MapReduce
Reddy et al. A comprehensive literature review on data analytics in IIoT (Industrial Internet of Things)
CN103501253A (en) Monitoring organization method for high-performance computing application characteristics
CN109657197A (en) A kind of pre-stack depth migration calculation method and system
CN107679127A (en) Point cloud information parallel extraction method and its system based on geographical position
Niu Optimization of teaching management system based on association rules algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170329