CN110022226B

CN110022226B - Object-oriented data acquisition system and acquisition method

Info

Publication number: CN110022226B
Application number: CN201910165447.7A
Authority: CN
Inventors: 郑安刚; 巫钟兴; 王伟峰; 汪岳荣; 顾春云; 江婷; 骆云江; 郁春雷; 麻吕斌
Original assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; China Electric Power Research Institute Co Ltd CEPRI; Zhejiang Huayun Information Technology Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; China Electric Power Research Institute Co Ltd CEPRI; Zhejiang Huayun Information Technology Co Ltd
Priority date: 2019-01-04
Filing date: 2019-03-05
Publication date: 2023-04-04
Anticipated expiration: 2039-03-05
Also published as: CN110022226A

Abstract

The invention discloses an object-oriented data acquisition system and an object-oriented data acquisition method, and relates to the field of data acquisition of power systems. At present, the expansibility, the reusability and the flexibility in the electric energy collection process are insufficient. The system comprises a gateway cluster, a communication front-end cluster, a service processor cluster, a data bus, a warehousing service module, a mass data analysis module and a data storage module; the technical scheme adopts a distributed elastic architecture design, applies technologies such as stream processing, message middleware, distributed storage and parallel computation and the like to reconstruct the electric power data acquisition system, greatly improves the storage capacity, the computation performance, the data processing speed, intelligent analysis and the like, and provides powerful guarantee for supporting intelligent analysis of electricity marketing, service business innovation, expanding professional application, improving the power supply service level and the like.

Description

Object-oriented data acquisition system and acquisition method

Technical Field

The invention relates to the field of data acquisition of power systems, in particular to an object-oriented data acquisition system and an object-oriented data acquisition method.

Background

The communication protocol of the current acquisition system is not uniform due to various expansions, so that a large amount of unnecessary protocol conversion work is added in the electric energy acquisition communication process, the realization of interoperation is difficult, and the high-intelligent application of electric quantity information under an intelligent power grid in the future is severely restricted. Meanwhile, the traditional communication protocol is mainly a data type protocol oriented to business, and gradually shows some defects in the aspects of expansibility, reusability and flexibility of the increasingly diversified acquisition task requirement in electric energy acquisition.

When data are collected, data collection is only allowed according to the existing rules under the traditional 1376.1 protocol, for example, the daily freezing of the positive active total electric energy needs to be carried out through the specified 0DF005 coding, and the expansibility, the reusability and the flexibility in the electric energy collection process are insufficient.

Disclosure of Invention

The technical problem to be solved and the technical task to be solved by the invention are to perfect and improve the prior technical scheme, and provide an object-oriented data acquisition system and an object-oriented data acquisition method so as to achieve the purpose of improving expansibility, reusability and flexibility. Therefore, the invention adopts the following technical scheme.

A data acquisition system based on an object-oriented system comprises a gateway cluster, a communication preposed cluster, a service processor cluster, a data bus, a storage service module, a mass data analysis module and a data storage module;

the gateway cluster is used for accessing acquisition equipment into the electric power data acquisition system and maintaining a terminal communication link and the receiving and sending of an original message, wherein the acquisition equipment comprises a special transformer load control terminal, a distribution transformer monitoring terminal and a low-voltage concentrator;

the communication preposed cluster is connected with the gateway cluster and is used for distributing and scheduling the original data message and pushing the original data message to a distributed message queue; the distribution scheduling of the communication preposed cluster to the message is based on the equipment address domain algorithm to realize the strategy distribution, and the dynamic adjustment of the distribution strategy is realized by monitoring the operation condition of each node of the service processor cluster; the operating conditions of each node of the preposed service processor are monitored through a heartbeat handshake mechanism, dynamic adjustment is carried out on three scenes, namely node newly-added, node failure and failed node recovery according to a 'newly-added node distribution strategy', 'distribution strategy when node fails' and 'distribution strategy when failed node recovers', and a terminal address on the node is distributed to a designated service processor node according to an address domain algorithm, so that system files are reduced and loaded in a balanced mode, and the requirement of a program on server memory configuration is lowered;

the service processor cluster is connected with the communication preposed module, is used for analyzing a communication protocol and interacts with the distributed message queue; namely, a downlink request is obtained from a message queue and a downlink frame is formed, the original data message of the communication preposition is carried out protocol analysis and the analysis result is pushed to a distributed message queue,

the data bus module is used for supporting the sequencing and persistence of the uplink and downlink communication interaction information; a high-throughput distributed Kafka message queue is adopted, the theme and the theme partition of Kafka service are fully utilized, the publisher theme is associated with a master station application cluster and a service processor cluster, and the receiving/sending of downlink request data and terminal uplink data generated by master station application are managed in a unified manner;

the warehouse-in service module is used for acquiring data from the message queue and storing the data in a relational database in batches; a distributed big data frame Hadoop and a traditional relational database Oracle are combined to adapt to analysis and storage of mass data;

the mass data analysis module is used for realizing real-time calculation and off-line analysis of service data through a big data frame based on a distributed file system and providing technical support for further deep mining;

the data storage module is used for storing all the service data, the archive data and the original data and providing basic data support and calculation service for the system; the system is divided into a main production library, a disaster recovery library, a history library and a data release library, and a library division strategy is made according to business and storage time limit so as to reduce the access pressure of a single-point database.

The method comprises the steps of storing mass data, dividing a relational database into a main production library, a disaster recovery library, a historical library and a data release library according to different data use attributes, ensuring the safety and stability of collected data, reducing the data access pressure of the production library, improving the data release efficiency, and reducing the access pressure of a single-point database by using a database partitioning strategy according to business and storage time limit.

As a preferable technical means: dividing all field terminal equipment address fields into a plurality of intervals according to a certain rule by a communication front-end processor, and obtaining corresponding group address field intervals by the equipment addresses according to the number of descending topoics; a mapping relation exists between the downlink Topic and the address domain interval, and the prepositive service processor node manages the address domain interval, namely the downlink Topic; the initialized address domain distribution strategy is modulo according to the number of service nodes of the service preprocessing, dynamic adjustment is realized when the nodes are newly added, the nodes are in fault and the fault nodes are recovered according to a 'newly added node distribution strategy', 'distribution strategy when the nodes are in fault' and 'distribution strategy when the fault nodes are recovered', and distribution information is timely updated to the Zookeeper distributed service system, so that program memory loading is reduced, and the expansion capability of a program cluster is improved.

As a preferable technical means: a) The distribution strategy during node addition/capacity expansion is as follows:

a01 Sorting the Topic distributed by each service processor node according to the Topic code, and calculating the total number of the Topic currently processed by the node;

a02 Sorting the service processor nodes according to the total number of Topic;

a03 Calculating an average value of Topic that each service processor node can process, dividing the total number of Topic by the total number of service processor nodes;

a04 Take out the redundant Topic of all service processor nodes with the number of Topic larger than avgttopic in the node, take out the rule: a05 ) preferentially selecting the Topic with larger Topic code in the service processor node with larger sequence in the step A02);

a06 Distributing the Topic taken out in the step A04) to the newly added service processor node preferentially to ensure that the number of the Topic of the newly added node is about the average value; if the unallocated Topic still exists, performing modular allocation on all nodes;

a07 The allocated Topic information is deleted from the other nodes.

As a preferable technical means: b) The distribution strategy when the node fails is as follows:

b01 Sorting the preposition service processor nodes according to the Topic number;

b02 Dividing the total number of the Topic by the total number of the nodes of the preposed service processor which operates currently to obtain the average value of the Topic processed by each node of the service processor which operates currently;

b03 The newly-added number of Topic of each service processor node currently running is calculated according to the average value: mean calculated from b-existing Topic number;

b04 To-be-distributed Topic caused by node failure is sequentially distributed to the service processor nodes with small sequence according to the calculated value of the distribution strategy when the node is newly added/expanded.

As a preferable technical means: c) The distribution strategy when the fault node is recovered is as follows:

c01 ) the recovered service processor node loads the corresponding Topic according to the allocation strategy during initialization;

c02 Time deletes these Topic information returned to the recovery node from other traffic processor nodes.

As a preferable technical means: the warehouse entry service module carries out immediate acquisition, correction and real-time restoration on the acquired data, realizes real-time inspection and verification on the acquired load and electric energy indicating value data by using a stream processing technology, marks problem data and restores abnormal load data; problem data is repaired through a power estimation value, an ARIMA algorithm and marketing distribution electric quantity, the reasonability, consistency and logicality of the data are guaranteed, and the quality of system data is improved through timely finding and marking invalid and distorted data; the real-time monitoring and analysis of the electric energy data and the alarm event are realized by using a flow processing technology; the stream processing technology adopts a real-time computing framework, and adopts Hbase + Storm, wherein the Storm real-time computing framework is responsible for acquiring original data and message data from a message queue and inputting the original data and the message data into an HBase distributed database.

As a preferable technical means: the mass data analysis module realizes statistics on acquisition success rate indexes, various user electric quantities and loads, line loss calculation, distribution transformer operation monitoring, mobile operator channel quality monitoring and terminal online rate in hours through a big data distributed memory parallel calculation framework so as to meet the management and control requirements of unit services at all levels; the quasi-real-time analysis framework adopts Hive + Spark and Spark off-line calculation framework to lead the original data into a Hive data warehouse to execute statistical analysis service and data mining of mass data.

Another object of the present invention is to provide an object-oriented data acquisition method, which is characterized in that:

1) When the acquisition master station needs to set and call the terminal and the measuring point, the method comprises the following steps:

101 The Oracle main production library synchronizes basic data from the marketing system, mainly stores all business data, archive data and original data, and provides data query for the acquisition master station;

102 The acquisition master station initiates a downlink request, can set different keys according to different operation types to issue to downlink Topic of Kafka service, and stores an operation command id into a Redis cache;

103 Messages in the downlink Topic are stored in a partitioned manner according to keys and algorithms, and different partitions can define different priorities, such as a partition processing control type downlink request with the highest configuration priority, a partition processing setting type downlink request with the second priority, and a partition processing summoning/relaying type downlink request with other priorities;

104 Service processor node loads and synchronizes the archive information of the designated terminal from the Redis cache server, subscribes the information of the downlink queue from the Kafka service, executes according to different Partition priorities, forms a downlink request message frame, distributes the downlink request message frame to the communication front-end cluster, and pushes the downlink message to the message Topic in Kafka;

105 The communication front-end cluster sends the communication gateway cluster according to the scheduling distribution strategy;

106 The communication gateway sends the downlink request to the terminal device;

107 The terminal returns the operation result, the message is analyzed by the service processor through the communication gateway and the communication front-end cluster, and the operation result is returned to the operation command id corresponding to the terminal in the Redis;

108 The acquisition master station acquires an operation result from the Redis according to an operation command id corresponding to the terminal;

2) When the electric energy data acquisition is required, the method comprises the following steps:

201 The task class data and the abnormal event data are sent to the gateway cluster in a message mode;

202, distributing the gateway cluster to a communication front cluster according to a load balancing strategy;

203 Distributing original message data to the service processor cluster according to a scheduling distribution strategy;

204 The device node loads and synchronizes the file information of the appointed terminal from the Redis cache server, analyzes the uplink original message data, and pushes the analysis result, the original message data and other information to the corresponding Kafka message queue; namely, the analysis result is pushed to the reported data Topic, and the original message data is pushed to the message Topic;

205 The store service subscribes messages from the Kafka service, and the Storm real-time computing framework acquires original message data, electric energy data and the like from a Kafka message queue and stores the original message data, the electric energy data and the like into the HBase distributed database; the Spark offline calculation framework leads the original data into a Hive data warehouse to perform complex statistical analysis and data mining; the data warehousing service stores the original message data and the electric energy data into a relational database in batches;

206 Fast inquiring the electric energy data acquisition details, the acquisition success rate and the like from the cloud platform;

207 Le main production library from marketing system synchronous basic data, mainly storing all business data, archive data, original data, for collecting main station to provide data query;

3) When the electric energy data is to be supplemented, the method comprises the following steps

301 Transmitting the electric energy data to the communication gateway cluster in a message form through a plurality of communication modes;

302 The communication gateway cluster sends the communication gateway cluster to the communication front-end cluster according to the load balancing distribution strategy;

303 The front cluster is distributed to the service processor cluster through a scheduling distribution strategy;

304 Service processor node loads and synchronizes the file information of the designated terminal from Redis cache server, analyzes the uplink original message data, and pushes the analysis result, the original message data and other information to the corresponding Kafka message queue; namely, the analysis result is pushed to the reported data Topic, and the original message data is pushed to the message Topic;

305 The Storm service subscribes messages from the Kafka service, acquires electric energy data in real time, and stores a task data dotting table in HBase distributed data in real time;

306 When the real-time missing point subsidy is carried out, spark RDD executes a missing point audit task regularly, namely, missing point audit is carried out on a dotting table in HBase according to a missing point subsidy strategy, a corresponding missing point request is formed according to a terminal communication state and pushed to a downlink Topic of Kafka service for a service processor to obtain and issue, and therefore real-time missing point subsidy is realized;

307 When the manual missing point recruitment is realized, the Spark RDD timing task reads the recruitment strategy from the Oracle database and then executes the missing point audit task, forms a corresponding missing point request according to the communication condition of the terminal, and pushes the corresponding missing point request to the Kafka message service for the service processor to acquire and issue so as to realize the missing point recruitment.

As a preferable technical means: when a terminal event is collected, defining collection templates of different levels according to the severity and urgency of the event, and assigning different reporting frequencies; the method and the device can more reasonably distribute channel resources, reduce unnecessary expense of the terminal processor, assist managers to analyze and process abnormal events and improve management efficiency.

As a preferable technical means: the data acquisition comprises the acquisition of special variable data, the data acquisition of a low-voltage I type concentrator and the data acquisition of a low-voltage II type concentrator;

one) special data collection:

when the special transformer collects daily frozen active electric energy, the terminal collects the electric meter with the execution frequency of 1 day, the terminal reports data with the frequency of 12 hours, the data are classified into daily frozen data, and the data items comprise a current one-quadrant reactive electric energy indicating data block, a current four-quadrant reactive electric energy indicating data block, a current forward active electric energy indicating data block and a current reverse active electric energy indicating data block;

when a 96-point load curve is acquired by a special transformer, the execution frequency of acquiring an ammeter and reporting data by a terminal is 15 minutes, the data are classified into real-time data, and data items comprise a voltage data block, a current data block, active power, a current one-quadrant reactive power indicating data block, a current four-quadrant reactive power indicating data block, forward active total electric energy and a power factor;

II) data acquisition of a low-voltage I-type concentrator:

when the low-voltage I-type concentrator collects daily frozen active electric energy, the execution frequency of a terminal collection ammeter is 1 day, the frequency of terminal reported data is 12 hours, the data are classified into daily frozen data, and the collected and reported data items comprise a current forward active electric energy indicating data block and a current reverse active electric energy indicating data block;

when the low-voltage I-type concentrator collects a 96-point load curve, the execution frequency of the terminal for collecting the electric meter and reporting data is 6 hours, the data are classified into minute frozen data, if a three-phase electric energy meter is installed, data items are a voltage data block, a current data block, a power factor, active power and forward active total electric energy, and if a single-phase electric energy meter is installed, data items are an A-phase voltage, an A-phase current, a power factor, forward active total electric energy, active power and an N-line current;

thirdly), collecting data of a low-voltage II-type concentrator:

when the low-voltage II type concentrator collects daily freezing active electric energy, the same collection scheme template and reporting scheme template are issued as the I type concentrator;

when the low-voltage II-type concentrator collects a 96-point load curve, the execution frequency of the terminal for collecting the electric meter and reporting data is 15 minutes, the data are classified into real-time data, if a three-phase electric energy meter is installed, the data items are a voltage data block, a current data block, a power factor, active power and forward active total electric energy, and if a single-phase electric energy meter is installed, the data items are an A-phase voltage, an A-phase current, a power factor, forward active total electric energy, active power and an N-line current.

Has the advantages that:

1. the technical scheme adopts a distributed elastic architecture design, applies the technologies of stream processing, message middleware, distributed storage, parallel computation and the like, reconstructs the power data acquisition system, greatly improves the storage capacity, the computation performance, the data processing speed, the intelligent analysis and the like, and provides powerful guarantee for the aspects of supporting power consumption marketing intelligent analysis, service business innovation, expanding professional application, improving the power supply service level and the like.

2. This technical scheme can establish many sets of data acquisition methods based on object-oriented communication protocol characteristic, for traditional acquisition mode, in the aspect of efficiency, flexibility and the loss of data acquisition, all have apparent promotion effect:

1. when the acquisition scheme of the basic data is divided into the acquisition scheme and the reporting scheme, the rule of the terminal for acquiring the electric meter and the rule of the terminal for reporting the data are respectively defined, and the method has the following two advantages:

(1) The method has the advantages that flexible data acquisition modes (real-time acquisition and packed acquisition) and periods and frequency of reported data are realized, local communication flow peak staggering is realized, and acquisition leakage points are reduced;

(2) Data acquisition and data reporting can be configured selectively, partial data items (such as clock patrol and local copy) of local services of the supporting equipment can be acquired only without reporting, and the service diversity and the data acquisition quality of field equipment are effectively improved.

2. Different acquisition data items can be configured for the type of the electric energy meter, for example, a three-phase meter acquires a current data block and a voltage data block (A, B, C), and a single-phase meter only acquires A-phase current and A-phase voltage, so that compared with the traditional meter which acquires A, B, C three-phase voltage and current, the flow loss is effectively reduced.

3. For the acquisition of the terminal event, acquisition templates of different levels can be defined according to the severity and urgency of the event, different reporting frequencies are specified, channel resources can be more reasonably distributed, unnecessary expenditure of a terminal processor is reduced, and meanwhile management personnel can be assisted to analyze and process abnormal events and management efficiency is improved.

The technical scheme realizes dynamic adjustment to reduce and balance loading system files and reduce the requirement of programs on the configuration of the server memory.

The mass data storage architecture is characterized by 'internal storage, cloud and division specialization', and electric energy data integration and fusion and high-efficiency management are realized.

And analyzing the mass data, namely realizing real-time calculation and off-line analysis of the service data by adopting a big data cloud platform through a big data framework based on a distributed file system.

The offline analysis framework preferably adopts Hive + Spark, and Spark offline calculation framework realizes the purpose of importing the original data into a Hive data warehouse to execute statistical analysis service and data mining of mass data.

Drawings

FIG. 1 is a block diagram of the present invention;

FIG. 2 is a flow chart of the master station setup and recall of the present invention;

FIG. 3 is a flow chart of the electrical energy data acquisition of the present invention;

FIG. 4 is a flow chart of the present invention for complementing electrical energy data;

Detailed Description

The technical scheme of the invention is further explained in detail by combining the drawings in the specification.

As shown in fig. 1, an object-oriented data acquisition system includes a gateway cluster, a communication pre-cluster, a service processor cluster, a data bus, a storage service module, a mass data analysis module, and a data storage module.

The technical scheme adopts a distributed elastic architecture design, applies the technologies of stream processing, message middleware, distributed storage, parallel computation and the like, reconstructs the power data acquisition system, greatly improves the storage capacity, the computation performance, the data processing speed, the intelligent analysis and the like, and provides powerful guarantee for the aspects of supporting power consumption marketing intelligent analysis, service business innovation, expanding professional application, improving the power supply service level and the like.

The technical scheme has the following characteristics:

1. and the elastic architecture is adopted to reconstruct the communication program, so that the continuously increased user scale and acquisition requirements are met: the architecture of the power utilization information acquisition system is redesigned by applying a big data technology, and the distributed elastic architecture design is adopted: the communication gateway and the acquisition front-end processor utilize message cache as a bus to carry out message communication; secondly, the front-end processor only carries out protocol analysis on the message and writes data into message cache; thirdly, the storage architecture is characterized by 'internal storage, cloud and division specialization', NOSQL storage is introduced, the data volume is large, the storage management capacity is various, the cloud platform streaming computing and offline analysis service cluster performs operations such as event analysis, data verification and restoration, and the relational database is divided into a main production library, a disaster recovery library, a historical library and a data release library according to different data use attributes. And fourthly, all the collected data are uniformly stored in a storage by the storage service cluster.

The communication gateway service is mainly responsible for accessing acquisition equipment such as a special transformer load control terminal, a distribution transformer monitoring terminal and a low-voltage concentrator into a power data acquisition system and maintaining a terminal communication link and the receiving and sending of original messages.

The communication preposition service is responsible for the distribution and scheduling of the original data message.

The distribution scheduling of the communication preposed service to the message realizes the strategy distribution based on the equipment address domain algorithm, and the dynamic adjustment of the distribution strategy is realized by monitoring the operation condition of each node of the preposed service processor. The specific algorithm is as follows:

a. dividing all the terminal device address fields into a plurality of intervals according to a certain rule, for example, obtaining 100 groups of address field intervals by modulo of the device address according to the number (100) of the downlink Topic. Thus, there is also a mapping relationship between the downstream Topic and the address range, and the pre-service processor node manages the address range, that is, manages the downstream Topic.

b. And the initialized address domain distribution strategy is modulo according to the number of service nodes of the service preprocessing, dynamic adjustment is realized respectively according to a 'newly added node distribution strategy', 'distribution strategy when node fails' and 'distribution strategy when failed node recovers' when nodes are newly added, node fails and failed nodes are recovered, and distribution information is timely updated to the Zookeeper distributed service system. The distribution strategy aims to reduce program memory loading and improve the expansion capability of the program cluster.

"distribution strategy when nodes are newly added (extended)":

1) The topics assigned to each traffic processor node are ordered according to the Topic code (for example: from large to small) and calculates the total number of Topic currently processed by the node.

2) The service processor nodes (without the newly added node) are sorted according to the total number of Topic (for example: from large to small);

3) Calculate the average value that each traffic processor node can handle Topic (assuming this value is labeled avgtipic): the total number of Topic is divided by the total number of traffic processor nodes (including the newly added node) and the decimal place is discarded.

4) Taking out redundant topics of all service processor nodes with the number of topics larger than AvgTopic in the nodes, and taking out a rule: preferably, the service processor node with the larger ranking in step 2 is selected to have the larger topo code.

5) Preferentially distributing the Topic taken out in the step 4 to the newly added service processor node to ensure that the number of the Topic of the newly added node is about the average value; and if the unallocated Topic still exists, performing modulo allocation on all the nodes.

6) And deleting the allocated Topic information in other nodes.

"node failure distribution policy":

1) The pre-service processor nodes are sorted by the number of topics (e.g., from large to small).

2) Dividing the total number of the Topic by the total number of the nodes of the preposed service processor which is currently operated to obtain the average value of the Topic processed by each node of the service processor which is currently operated;

3) And calculating the newly added quantity of Topic of each service processor node currently running according to the average value: mean calculated from b-existing Topic number;

4) C, distributing the Topic to be distributed caused by the node failure to the service processor nodes with small sequence in turn according to the calculated value in the step c;

"distribution policy at recovery time of failed node":

1) The recovered service processor node loads the corresponding Topic according to the allocation strategy during initialization;

2) The Topic information returned to the recovery node is periodically deleted from the other traffic processor nodes.

The preposed service processing service is responsible for communication protocol analysis and interacts with the distributed message queue. The method comprises the steps of acquiring a downlink request from a message queue, forming a downlink frame, carrying out protocol analysis on an original data message of the communication preposition, and pushing an analysis result to a distributed message queue.

The communication preposed service node obtains the running states of all the service processor nodes by timing heartbeat handshake with the service processor nodes, and distributes the terminal address on the node to the appointed service processor node according to the address domain algorithm.

The distributed message queue is used as a data bus and is responsible for supporting the time sequencing and persistence of uplink and downlink communication interaction information. The method specifically adopts a high-throughput distributed Kafka message queue, fully utilizes the theme and theme partition of Kafka service, associates the publisher theme with the master station application service cluster and the service processor service cluster, and uniformly manages the receiving/sending of downlink request data and terminal uplink data generated by the master station application.

And the warehousing service cluster is responsible for acquiring data from the message queue and storing the data in the relational database in batches.

The real-time processing cluster adopts a big data cloud platform to realize real-time calculation and off-line analysis of service data through a big data frame based on a distributed file system, and provides technical support for further deep mining.

2. The collected data is subjected to 'immediate collection and correction' and real-time restoration, so that the data quality is improved: by using a stream processing technology, real-time inspection and verification of the acquired load and electric energy indicating value data are realized, the problem data are marked, and abnormal load data are repaired; the problem data are repaired through power estimation, an ARIMA algorithm and marketing distribution electric quantity, the reasonability, consistency and logicality of the data are guaranteed, and the data quality of the system is improved through timely finding and marking invalid and distorted data. Meanwhile, the real-time monitoring and analysis of the electric energy data and the alarm event are realized by using a flow processing technology. The flow processing technology is a real-time computing framework, hbase + Storm is preferably adopted, and the Storm real-time computing framework is responsible for acquiring original data and message data from a message queue and inputting the original data and the message data into an HBase distributed database;

3. through a distributed parallel computing framework, mass data quasi-real-time statistics is realized: through a big data distributed memory parallel computing framework, statistics on acquisition success rate indexes, various user electric quantity and loads, line loss computation, distribution transformer operation monitoring, mobile operator channel quality monitoring, terminal online rate and the like can be realized according to hours, and the management and control requirements of unit services at all levels can be met. The quasi-real-time analysis framework preferably adopts Hive + Spark, and Spark off-line computation framework realizes that the original data is imported into a Hive data warehouse to execute statistical analysis service and data mining of mass data.

4. A flexible data storage strategy is constructed, so that the on-demand storage is realized, and the multi-dimensional query requirements are met: the method has the advantages that different service data application requirements are analyzed, the advantages of a commercial database (Oracle), a cache database (Redis), a distributed database (HBase) and a data warehouse (Hive) are brought into play, a multi-level storage mechanism is designed, query performance is improved, and data application efficiency is improved.

The commercial database adopts an Oracle12c database version, combines an InfiniBand high-speed network and SSD (solid state disk) storage to build a data storage platform for supporting high-throughput high-concurrency OLTP (online transaction processing) services, is mainly responsible for storing all service data, archive data and original data, and provides basic data support and computing service for the system. The relational database can be subdivided into a main production library, a disaster recovery library, a historical library and a data release library, the safety and stability of collected data are guaranteed, the data access pressure of the production library is reduced, the data release efficiency is improved, and a database partitioning strategy is carried out according to business and storage time limit so as to reduce the access pressure of a single-point database.

The cache database (Redis) is a high-performance Key-Value database, the performance is extremely high, and the Redis can support the read-write frequency of more than 100K + per second. The method not only supports simple Key-Value type data, but also provides storage of list, set, zset, hash and other data structures.

The distributed database HBase is a high-reliability, high-performance, column-oriented and scalable distributed storage system.

Hive is a data warehouse tool based on Hadoop, can map structured data files into a database table, quickly realizes simple MapReduce statistics through SQL-like statements, and is very suitable for statistical analysis of a data warehouse.

Data synchronization between the cloud data platform and the relational database is preferably realized by adopting a Sqoop data transfer tool.

5. Based on the object-oriented communication protocol characteristics, a plurality of sets of data acquisition methods can be designed, and compared with the traditional acquisition mode, the method has obvious improvement effects on the efficiency, flexibility and loss of data acquisition, and specifically comprises the following steps:

a, dividing a basic data acquisition scheme into an acquisition scheme and a reporting scheme, and respectively defining a rule of acquiring an electric meter by a terminal and a rule of reporting terminal data, wherein the method has the following two advantages:

(1) The method has the advantages that flexible data acquisition modes (real-time acquisition and packing acquisition) and periods and frequencies of reported data are realized, local communication flow peak staggering is realized, and acquisition leakage points are reduced;

Different acquisition data items can be configured for the type of the electric energy meter, for example, a three-phase meter acquires a current data block and a voltage data block (A, B, C), and a single-phase meter only acquires A-phase current and A-phase voltage, so that compared with the traditional meter which acquires A, B, C three-phase voltage and current, the flow loss is effectively reduced.

And c, for the acquisition of the terminal event, acquisition templates of different levels can be defined according to the severity of the event, and different reporting frequencies are specified, so that channel resources can be more reasonably distributed, unnecessary overhead of a terminal processor is reduced, and meanwhile, management personnel can be assisted to analyze and process abnormal events and the management efficiency is improved.

The data acquisition mode of the object-oriented protocol terminal can be divided into an acquisition scheme and a reporting scheme, wherein the acquisition scheme defines the rule of the terminal for acquiring the electric meter, the reporting scheme defines the rule of the terminal for reporting data, and the template sample is shown in the table below.

TABLE 1

The object-oriented terminal event is divided into various levels according to the importance degree of the event, each level defines the respective acquisition frequency and the reported data item, and the template example is shown in the following table.

/>

A data acquisition method based on an object-oriented data acquisition system comprises the following steps:

firstly, the method comprises the following steps: the collection master station sets and calls up the flow of the operation of the terminal and the measurement point, as shown in fig. 2.

1) The Oracle main production library synchronizes basic data from the marketing system, mainly stores all business data, archive data and original data, and provides data query for the acquisition master station.

2) The acquisition master station initiates a downlink request, can set different keys according to different operation types to be issued to downlink Topic of the Kafka service, and stores the operation command id into a Redis cache.

3) Messages in the downlink Topic are stored in a partitioned manner according to keys and algorithms, and different partitions can define different priorities, such as a partition processing control type downlink request with the highest configuration priority, a partition processing setting type downlink request with the second priority, and a partition processing summoning/relaying type downlink request with other priorities.

4) The service processor node loads and synchronizes the archive information of the designated terminal from the Redis cache server, subscribes the information of the downlink queue from the Kafka service, executes the subscription according to different Partition priorities, forms a downlink request message frame, distributes the downlink request message frame to the communication front-end cluster, and pushes the downlink message to the message Topic in the Kafka.

5) Sending the communication front cluster to the communication gateway cluster according to the scheduling distribution strategy.

6) The communication gateway sends the downlink request to the terminal device.

7) And the terminal returns the operation result, the message is analyzed by the service processor through the communication gateway and the communication front-end cluster, and the operation result is returned to the operation command id corresponding to the terminal in the Redis.

8) And the acquisition master station acquires an operation result from the Redis according to the operation command id corresponding to the terminal.

II, secondly, the method comprises the following steps: the flow of electric energy data acquisition is shown in fig. 3;

1) The collection terminal sends the task data and the abnormal event data to the gateway cluster in a message form;

2) Distributing the gateway cluster to a communication front cluster according to a load balancing strategy;

3) The communication front-end distributes the original message data to the service processor cluster according to the scheduling distribution strategy;

4) The service processor node loads and synchronizes file information of the appointed terminal from the Redis cache server, analyzes uplink original message data, and pushes information such as an analysis result and the original message data to a corresponding Kafka message queue; that is, the analysis result is pushed to the reported data Topic, and the original message data is pushed to the message Topic.

5) The flow processing and warehousing service subscribes messages from the Kafka service, and the Storm real-time computing framework acquires original message data, electric energy data and the like from the Kafka message queue and stores the original message data, the electric energy data and the like into the HBase distributed database. The Spark offline computation framework imports raw data into the Hive data warehouse to perform complex statistical analysis and data mining. And the data warehousing service stores the original message data and the electric energy data into the relational database in batches.

6) And the acquisition master station rapidly queries the electric energy data acquisition details, the acquisition success rate and the like from the cloud platform.

7) The Oracle main production library synchronizes basic data from the marketing system, mainly stores all business data, archive data and original data, and provides data query for the acquisition master station.

Performance indexes are as follows:

calculating a time consumption index: various off-line computing services of the big data cloud platform are completed within half an hour (such as typical services of quality analysis, industry load trend analysis and the like); the processing scale of each type of real-time stream computing service reaches 2 ten thousand per second (such as typical services of load characteristic analysis, terminal communication state maintenance and the like).

The highest communication processing index is a communication preposed cluster single-node TCP link access amount, and the maximum number of the communication preposed cluster single-node TCP links is 40 ten thousand; acquiring 3 thousands of distribution processing messages of a single node of a front cluster per second; the whole data storage efficiency of the data storage service reaches 6 ten thousand pieces per second.

Data completion can be carried out on the missing electric energy data through a compensation strategy, and the acquisition success rate is improved; and the quick completion of the missing points can be realized by means of a big data cloud platform. The compensation can be divided into a real-time leakage point compensation and a master station manual leakage point compensation, and particularly relates to a process for completing electric energy data.

Thirdly, the method comprises the following steps: a flow chart for complementing the electric energy data is shown in fig. 4.

1) And the acquisition terminal transmits the electric energy data to the communication gateway cluster in a message form through various communication modes.

2) The communication gateway cluster sends the load balancing distribution strategy to the communication front cluster.

3) The communication head-end cluster is distributed to the traffic processor cluster by a scheduling distribution strategy.

4) The service processor node loads and synchronizes the file information of the appointed terminal from the Redis cache server, analyzes the uplink original message data, and pushes the analysis result, the original message data and other information to the corresponding Kafka message queue; that is, the analysis result is pushed to the reported data Topic, and the original message data is pushed to the message Topic.

5) The flow computing service Storm subscribes information from the Kafka service, acquires electric energy data in real time, and stores a task data dotting table in HBase distributed data in real time.

6) Spark RDD executes the task of missing point audit at regular time, namely, the missing point audit is carried out on the dotting table in HBase according to the missing point recruitment strategy, a corresponding missing point request is formed according to the terminal communication state, and the missing point request is pushed to a downlink Topic of Kafka service to be acquired and issued by a service processor, so that the missing point recruitment is realized in real time.

On the other hand, the collection master station can also trigger manual missed spot recruitment.

1) The recruitment policy (e.g., city unit, user type, data type, etc.) is stored in an Oracle database.

2) And after reading the recruitment strategy from the Oracle database, the Spark RDD timing task executes a missing point audit task, forms a corresponding missing point request according to the communication condition of the terminal, and pushes the missing point request to a Kafka message service for a service processor to acquire and issue so as to realize missing point recruitment.

Fourthly, the method comprises the following steps: according to the technical scheme, different acquisition methods are required to be selected during implementation according to different types of the terminal, the equipment, the acquired data (electric quantity, load and the like) and the electric energy meter.

1) Special data acquisition method

When the active electric energy is frozen in a collection day of a special transformer, the execution frequency of an ammeter collected by a terminal is 1 day, the frequency of data reported by the terminal is 12 hours, the data are classified into day frozen data, and the data items comprise a current one-quadrant reactive electric energy indicating data block, a current four-quadrant reactive electric energy indicating data block, a current forward active electric energy indicating data block and a current reverse active electric energy indicating data block.

When the specific transformer acquires a 96-point load curve, the execution frequency of the terminal for acquiring the ammeter and reporting data is 15 minutes, the data are classified into real-time data, and the data items comprise a voltage data block, a current data block, active power, a current one-quadrant reactive power indicating data block, a current four-quadrant reactive power indicating data block, forward active total electric energy and a power factor.

2) Data acquisition method for low-voltage I-type concentrator

When the low-voltage I-type concentrator collects the daily frozen active electric energy, the execution frequency of the terminal collection ammeter is 1 day, the frequency of the terminal reported data is 12 hours, the data are classified into the daily frozen data, and the collected and reported data items comprise a current forward active electric energy indicating data block and a current reverse active electric energy indicating data block.

When the low-voltage I-type concentrator collects a 96-point load curve, the execution frequency of the terminal for collecting the electric meter and reporting data is 6 hours, the data are classified into minute frozen data, if a three-phase electric energy meter is installed, the data items are a voltage data block, a current data block, a power factor, active power and forward active total electric energy, and if a single-phase electric energy meter is installed, the data items are an A-phase voltage, an A-phase current, a power factor, forward active total electric energy, active power and an N-line current.

3) Low-voltage II-type concentrator data acquisition method

When the low-voltage II type concentrator collects daily frozen active electric energy, the daily frozen active electric energy is completely consistent with that of the I type concentrator, namely the same collection scheme template and the same reporting scheme template are issued.

4) Event schema design

The event scheme designed in the invention can almost cover all terminal events, and three events are selected as cases in the embodiment:

the object-oriented data acquisition method shown in fig. 1-4 is a specific embodiment of the present invention, which has embodied the substantial features and advantages of the present invention, and it is within the scope of the present invention to modify the shape, structure, etc. of the object-oriented data acquisition method according to the practical needs.

Claims

1. An object-oriented data acquisition system, characterized by: the system comprises a gateway cluster, a communication front-end cluster, a service processor cluster, a data bus, a warehousing service module, a mass data analysis module and a data storage module;

the communication front-end cluster is connected with the gateway cluster and used for distributing and scheduling the original data message and pushing the original data message to a distributed message queue; the distribution scheduling of the communication preposed cluster to the message is based on the equipment address domain algorithm to realize the strategy distribution, and the dynamic adjustment of the distribution strategy is realized by monitoring the operation condition of each node of the service processor cluster; the operating conditions of each node of the preposed service processor are monitored through a heartbeat handshake mechanism, dynamic adjustment is carried out on three scenes, namely node newly-added, node failure and failed node recovery according to a 'newly-added node distribution strategy', 'distribution strategy when node fails' and 'distribution strategy when failed node recovers', and a terminal address on the node is distributed to a designated service processor node according to an address domain algorithm, so that system files are reduced and loaded in a balanced mode, and the requirement of a program on server memory configuration is lowered;

the service processor cluster is connected with the communication front-end module, is used for communication protocol analysis and interacts with the distributed message queue; namely, a downlink request is obtained from a message queue and a downlink frame is formed, the original data message of the communication preposition is carried out protocol analysis and the analysis result is pushed to a distributed message queue,

the data bus module is used for supporting the time sequencing and the persistence of uplink and downlink communication interaction information; adopting a high-throughput distributed Kafka message queue, fully utilizing the theme and theme partition of Kafka service, associating a publisher theme with a master station application cluster and a service processor cluster, and uniformly managing the receiving/sending of downlink request data and terminal uplink data generated by master station application;

the warehousing service module is used for acquiring data from the message queue and storing the data into the relational database in batches; the method adopts a mode of combining a distributed big data frame Hadoop and a traditional relational database Oracle to adapt to analysis and storage of mass data;

the data storage module is used for storing all the service data, the archive data and the original data and providing basic data support and calculation service for the system; the system is divided into a main production library, a disaster recovery library, a historical library and a data release library, and a library division strategy is made according to business and storage time limit so as to reduce the access pressure of a single-point database;

a) The distribution strategy during node addition/capacity expansion is as follows:

a02 Sorting the service processor nodes according to the total number of Topic;

a06 B) preferentially distributing the Topic taken out in the step A04) to the newly added service processor node to ensure that the number of the Topic of the newly added node is about the average value; if unallocated Topic still exists, performing modulo allocation on all nodes;

a07 Delete the allocated Topic information in other nodes;

b) The distribution strategy when the node fails is as follows:

b03 Calculating the newly added quantity of Topic of each service processor node currently running according to the average value: mean calculated from b-current Topic number;

b04 To-be-distributed Topic caused by node failure is sequentially distributed to the service processor nodes with small ordering according to the calculation value of the distribution strategy when the node is newly added/expanded;

c) The distribution strategy when the fault node is recovered is as follows:

2. An object-oriented based data acquisition system according to claim 1, characterized in that: dividing all the address fields of the terminal equipment on site into a plurality of intervals according to a certain rule by the communication front-end processor, and obtaining the corresponding group address field interval by the equipment address according to the number of the downlink Topic; a mapping relation exists between the downlink Topic and the address domain interval, and the prepositive service processor node manages the address domain interval, namely the downlink Topic; the initialized address domain distribution strategy is modulo according to the number of service nodes of the service preprocessing, dynamic adjustment is realized when the nodes are newly added, the nodes are in fault and the fault nodes are recovered according to a 'newly added node distribution strategy', 'distribution strategy when the nodes are in fault' and 'distribution strategy when the fault nodes are recovered', and distribution information is timely updated to the Zookeeper distributed service system, so that program memory loading is reduced, and the expansion capability of a program cluster is improved.

3. An object-oriented based data acquisition system according to claim 2, characterized in that: the warehouse entry service module carries out immediate acquisition, correction and real-time restoration on the acquired data, realizes real-time inspection and verification on the acquired load and electric energy indicating value data by using a stream processing technology, marks problem data and restores abnormal load data; problem data is repaired through a power estimation value, an ARIMA algorithm and marketing distribution electric quantity, the reasonability, consistency and logicality of the data are guaranteed, and the quality of system data is improved through timely finding and marking invalid and distorted data; the real-time monitoring and analysis of the electric energy data and the alarm event are realized by using a flow processing technology; the stream processing technology adopts a real-time computing framework, and adopts Hbase + Storm, wherein the Storm real-time computing framework is responsible for acquiring original data and message data from a message queue and inputting the original data and the message data into an HBase distributed database.

4. An object-oriented based data acquisition system according to claim 3, characterized in that: the mass data analysis module realizes statistics on acquisition success rate indexes, various user electric quantities and loads, line loss calculation, distribution transformer operation monitoring, mobile operator channel quality monitoring and terminal online rate in hours through a big data distributed memory parallel calculation framework so as to meet the management and control requirements of unit services at all levels; the quasi-real-time analysis framework adopts Hive + Spark and Spark off-line calculation framework to lead the original data into a Hive data warehouse to execute statistical analysis service and data mining of mass data.

5. A data acquisition method based on an object oriented data acquisition system according to any of claims 1-4, characterized in that:

102 The acquisition master station initiates a downlink request, can set different keys according to different operation types to be issued to downlink Topic of the Kafka service, and stores an operation command id into a Redis cache;

104 Service processor node loads and synchronizes archive information of a designated terminal from Redis cache server, subscribes information of downlink queue from Kafka service, executes according to different Partition priorities, forms downlink request message frame, distributes the downlink request message frame to communication preposition cluster, and pushes downlink message to message Topic in Kafka;

204 The device node loads and synchronizes the file information of the appointed terminal from the Redis cache server, analyzes the uplink original message data, and pushes the analysis result, the original message data and other information to the corresponding Kafka message queue; that is, the analysis result is pushed to the reported data Topic, and the original message data is pushed to the message Topic;

205 Library service subscribes messages from Kafka service, and a Storm real-time computing framework acquires original message data, electric energy data and the like from a Kafka message queue and stores the original message data, the electric energy data and the like into an HBase distributed database; the Spark offline calculation framework leads the original data into a Hive data warehouse to perform complex statistical analysis and data mining; the data warehousing service stores the original message data and the electric energy data into a relational database in batches;

305 Flow calculation service Storm subscribes messages from Kafka service, electric energy data are obtained in real time, and task data dotting tables in HBase distributed data are stored in real time;

306 When the real-time missing point recruitment is performed, spark RDD executes a missing point audit task regularly, namely, missing point audit is performed on a dotting table in HBase according to a missing point recruitment strategy, a corresponding missing point request is formed according to a terminal communication state, and the missing point request is pushed to a downlink Topic of Kafka service for a service processor to obtain and issue, so that real-time missing point recruitment is realized;

6. The object-oriented-based data acquisition method according to claim 5, wherein: when the terminal event is collected, collecting templates with different levels are defined according to the severity and urgency of the event, and different reporting frequencies are specified.

7. The object-oriented-based data acquisition method according to claim 5, wherein: the data acquisition comprises the acquisition of special variable data, the data acquisition of a low-voltage I type concentrator and the data acquisition of a low-voltage II type concentrator;

one) special data collection:

II) data acquisition of a low-voltage I-type concentrator:

thirdly), collecting data of a low-voltage II-type concentrator: