WO2016029367A1

WO2016029367A1 - System, method and apparatuses for data processing in power system

Info

Publication number: WO2016029367A1
Application number: PCT/CN2014/085227
Authority: WO
Inventors: Guo MA; Zhihui YANG; Qin Zhou; Jinlin AN; Xiaopei CHENG
Original assignee: Accenture Global Services Limited
Priority date: 2014-08-26
Filing date: 2014-08-26
Publication date: 2016-03-03
Also published as: CN107078541B; AU2014405046B2; AU2014405046A1; CN107078541A

Abstract

System, method and apparatuses for data processing in a power system and a tangible computer readable medium are disclosed. In an embodiment, the system comprises at least one processor; and at least one memory storing computer executable instructions. The at least one memory and the computer executable instructions are configured to, with the at least one processor, cause the system to: group data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith, which are determined partially based on a data model for the power system defining a hierarchy of power grid operation; send the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith; and process the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith. A low-cost solution for a large scale real-time processing in the power system is provided, which is efficient, expansible and enables online changes of data processing processes.

Description

SYSTEM， METHOD AND APPARATUSES FOR DATA PROCESSING IN A POWER SYSTEM

TECHNICAL FIELD

Embodiments of the present disclosure relate to the field of power system and more particularly to a system， method， and apparatuses of data processing in the power system and a tangible computer readable medium therefor.

BACKGROUND

Various industries have networks associated with them. One such industry is the utility industry that manages a power grid. The power grid may include one or all of the following： electricity generation， electric power transmission， and electricity distribution. Electricity may be generated using generating stations， such as a coal fire power plant， a nuclear power plant， etc. For efficiency purposes， the generated electrical power is stepped up to a very high voltage (such as， for example， 345K Volts) and transmitted over transmission lines. The transmission lines may transmit the power long distances， such as across state lines or across international boundaries， until it reaches its wholesale customer， which may be a company that owns the local distribution network. The transmission lines may terminate at a transmission substation， which may step down the very high voltage to an intermediate voltage (such as，for ex ample， 138K Volts) . From a transmission substation， smaller transmission lines (such as， for example， sub-transmission lines) transmit the intermediate voltage to distribution substations. At the distribution substations， the intermediate voltage may be again stepped down to a "medium voltage" (such as， for example， from 4K Volts to 23K Volts) . One or more feeder circuits may emanate from the distribution substations. For example， four to tens of feeder circuits may emanate from the distribution substation. The feeder circuit is a 3-phase circuit comprising 4 wires (three wires for each of the 3 phases and one wire for neutral) . Feeder circuits may be routed either above ground (on poles) or underground. The voltage on the feeder circuits may be tapped off periodically using distribution transformers， which step down the voltage from "medium voltage" to the consumer voltage (such as， for example， 120V) . The consumer voltage may then be used by the consumers.

One or more power companies， whose main responsibility is to supply reliable and economic electricity to their customers， manage the power grid， including planning， operation， and maintenance related to the power grid. In order to improve management efficiency and reduce management cost， the power companies have attempted to upgrade the power grid to be a “smart grid” by applying the state-of-the-art IT and power engineering technologies. The development of the smart grid requires a huge number of sensing devices or systems， such as Advanced Metering Infrastructures (AMI) ， Phase Measurement Units (PMU) and other online monitoring devices. These devices will generate a large volume of data and the data scale can be at a TB level. For example， in China， a provincial power company usually plans to install more than 10 million meters， which might generate nearly 4 TB data every month. Besides， Fig. 1 illustrates the data growth trend from recent research， from which it can be seen that the data volume will increase dramatically with the development of advanced distribution automation and AMI， and nearly reaches 800TB， which is a rather huge number of data. At the same time， it requires rapid data processing so as to provide useful information in time. Hence， the utilities are facing great challenges， especially， in memory computing and database storage.

Therefore， there is a need in the art for an improved solution for data processing and storage， more particularly for a large volume of real-time data.

SUMMARY OF THE DISCLOSURE

To this end， according to a first aspect of the present disclosure， there is provided a system for data processing in a power system. The system comprises at least one processor； and at least one memory storing computer executable instructions. The at least one memory and the computer executable instructions are configured to， with the at least one processor， cause the system to： group data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation； send the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and process the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.

In an embodiment of the present disclosure， the grouping data in the data stream may comprise determining the power system resources respectively associated with the data in the data stream based on measurement point identification information contained within the data in the data stream and correlation between power system resources and measurement point identification information defined in the data model； and determining the power system resource groups associated with the data in the stream according to the determined power system resources and the hierarchy of the power grid operation defined in the data model.

In another embodiment of the present disclosure， the sending the data in the data stream may be dependent on current buffered message size and current waiting time.

In a further embodiment of the present disclosure， the data in the data stream may be sent when the current buffered message size reaches a size threshold considering current network load conditions.

In a still further embodiment of the present disclosure， the data in the data stream may be sent when the current waiting time is longer than a time threshold considering current network load conditions.

In a yet further embodiment of the present disclosure， the data in the data stream may be cached in the one or more data processing nodes for processing and the data in the data stream are further stored in a database， wherein corresponding data are loaded from the database to a cache of a data processing node in case of fault recovery or newly starting of the data processing node.

In a still yet further embodiment of the present disclosure， the data in the data stream may be stored in tables by year and a name for a table comprises information on the year， and wherein a key for each of the data is formed by an identification of a power system resource associated therewith and measurement time in a form of time shift in millisecond within the year of the data time.

In a further embodiment of the present disclosure， the data in the data stream may be split into one or more segments according to power system resources associated therewith and the one or more segments are parallelly and distributively written to multiple regions of the database in a chronological order based on the keys for the data. Particularly， the data in the data stream are stored in the data base in formats that are respectively suitable for their own data types and suitable for data processing.

In a still further embodiment of the present disclosure， the processing the data in the data stream may be based on a distributive streaming computing architecture， and the data storage is based on a distributive database

According to a second aspect of the present disclosure， there is provided a method for data processing in a power system. The method may comprise： grouping data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation； sending the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and processing the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.

According to a third aspect of the present disclosure， there is provided an apparatus for data processing in a power system. The apparatus comprises： a data grouping module configured to group data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation； a data sending module configured to send . the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and a data processing module configured to process the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.

According to a fourth aspect of the present disclosure， there is provided another apparatus for data processing in a power system. The apparatus comprises： means for grouping data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation； means for sending the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and means for processing the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.

According to a fifth aspect of the present disclosure， there is provided a tangible computer-readable medium having a plurality of instructions executable by a processor to perform data processing in a power system. The tangible computer-readable medium may comprise instructions configured to perform steps of the method according to the second aspect of present disclosure.

With embodiments of the present disclosure， the data stream are grouped at the one or more receiving nodes based on the power system resource groups associated with data and distributed to and processed at processing nodes in accordance with the power system resource groups. In the present disclosure， there is provided a low-cost solution for a large scale real-time processing in the power system， which is efficient， expansible and enables online changes of data processing processes.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other features of the present disclosure will become more apparent through detailed explanation on the embodiments as illustrated in the embodiments with reference to the accompanying drawings wherein like reference numbers represent same or similar components throughout the accompanying drawings of the present disclosure， wherein：

Fig. 1 schematically illustrates a curve of electronic power data growth according to recent research results in the prior art；

Fig. 2 schematically illustrates a diagram of exemplary architecture for the power rigid in which embodiments of the present disclosure can be implemented according to an embodiment of the present disclosure；

Fig. 3 schematically illustrates a diagram of data processing and data storage according to an embodiment of the present disclosure；

Fig. 4 schematically illustrates a flow chart of a method for data processing in power system according to an embodiment of the present disclosure；

Figs. 5A to 5C schematically illustrate diagrams of power gird operation hierarchy， exemplary raw data， and exemplary mapped data according to an embodiment of the present disclosure；

Fig. 6 schematically illustrates a diagram of data processing operations in event of fault recovery according to an embodiment of the present disclosure；

Figs. 7A and 7B schematically illustrate diagrams of example data table design according to an embodiment of the present disclosure；

Figs. 8A and 8B schematically illustrate diagrams of exemplary data storage manner according to an embodiment of the present disclosure；

Figs. 9A and 9B schematically illustrate case simulation results according to an embodiment of the present disclosure；

Fig. 10 schematically illustrates a block diagram of a system for data processing in a power system according to an embodiment of the present disclosure；

Fig. 11 schematically illustrates a block diagram of an apparatus for data processing in a power system according to an embodiment of the present disclosure；

Fig. 12 schematically illustrates a block diagram of another apparatus for data processing in a power system according to another embodiment of the present disclosure； and

Fig. 13 schematically illustrates a general computer system， programmable to be a specific computer system， which may represent any of the computing devices referenced herein.

DETAILED DESCRIPTION OF EMBODIMENTS

Hereinafter， embodiments of the present disclosure will be described with reference to the accompanying drawings. In the following description， numerous specific details are set forth in order to provide a thorough understanding of the embodiments. However， it is apparent to the skilled in the art that implementation of the present disclosure may not have these details and the present disclosure is not limited to the particular embodiments as introduced herein. On the contrary， any arbitrary combination of the following features and elements may be considered to implement and practice the present disclosure， regardless of whether they involve different embodiments. Thus， the following aspects， features and embodiments are only for illustrative purposes， and should not be understood as elements or limitations of the appended claims， unless otherwise explicitly specified in the claims. Additionally， in some instances， well known methods and structures have not been described in detail so as not to unnecessarily obscure the embodiments of the present disclosure.

In the current power system， a great deal of data will be generated per day and rapid data processing is required to provide useful information in time. Hence， in the present disclosure， a solution for data processing in power system is proposed to enable the data to be processed in real time， or in near real time so as to meet the requirements of the current power system.

Fig. 2 schematically illustrates a diagram of exemplary architecture for the power system in which embodiments of the present disclosure can be implemented according to an embodiment of the present disclosure. As illustrated in Fig. 2， data sources 110-1to ll0-n， such as various monitoring systems， monitoring devices， sensors， meters， etc. ， generate various types of data， these data are transmitted to a real-time processing &storage system 130 through a real-time data bus 120-1or a real-time event bus 120-2. In the real-time processing &storage system 130， a real-time processing module 140 receives these data and processes based on information obtained from the data model 160 and information contained within the data and stores processed results into the storage 150. Via a unified information accessing module 170， various analysis and applications 180 can access the data model 160 and processes results or other data in the storage 150. This architecture as illustrated in the present disclosure will provide not only a real-time data processing and storage for a high volume of data but a rather good reliability， scalability and a ready data access， detailed description will be made to the real-time processing &storage hereinafter with reference to Figs. 3 to 12.

Fig. 3 schematically illustrates a diagram of data processing and data storage according to an embodiment of the present disclosure. As illustrated， data source 110 including e. g. Supervisory Control And Data Acquisition (SCADA) system， device monitoring devices， AMI， etc. provide data generated or acquired by them to the real-time processing system 140. Herein， the real-time processing system 140 may be based on a distributive stream processing technology， such as Apache Storm system， which is a distributed real-time computation system for processing fast， large streams of data. The real-time processing module 140 may comprise one or more receiving nodes 141-1， 141-2， ... 141-m； one or more data processing nodes 142-1， 142-2， 142-3， 142-4， ... ， 142-k-1， 142-k； and optionally， one or more data outputting nodes 143-1， ... 143-j. The one or more receiving nodes are configured to receive the data stream from the data sources 110， which may be called as spouts in the Apache Storm system. The one or more data processing nodes 142-1， 142-2， 142-3， 142-4， ...， 142-k-1， 142-keach contains a plurality of processes therein and are configured to process the data sent to them， which may be called as bolts in in the Apache Storm system. The one or more data outputting nodes 143-1， 143-jmay be configured to collect processed results from the processing nodes and sent them to the data storage 150 for storing， which may also be called as bolts in the Apache Storm system. The receiving， processing and outputting may be performed in a distributed way utilizing distributed computing， for example based on clouding computing. Therefore， a cloud computing platform 190 may be provided to perform the data streaming processing. Finally， results obtained from the data stream processing can be stored into the data storage 150， which may have at least one database therein， and optionally the database may be a distributive database， such as HbaseTM， which is the Hadoop database， an open-source， distributed， versioned， non-relational database modeled after Google′sBigtable.

A flow chart of a method for data processing in a power system according to an embodiment of the present disclosure is shown in Fig. 4. As illustrated in Fig. 4， first as step S401， data in a data stream at one or more data receiving node are grouped based on power system resource groups associated therewith. The power system power system resource groups associated therewith can be determined based on measurement point identification and the data model for the power system.

In the power grid， there is usually provided with a data model defining a hierarchy of power grid operation， which may cover various originations， for example， various elements and objects in transmission network， distribution network and customer domains and may provide a uniform and centralized data view to various data in the power grid. Fig. 5A schematically illustrates an exemplary data model according to an embodiment of the present disclosure. As illustrated， in the exemplary data model， power devices are arranged in a hierarchy of power grid operation in power transmission， power distribution and customer domains. For example， for power transmission， there are different voltage levels for electric equipment， such as 500kV， 220kV， and so on； for a certain voltage level， there are different types of electric equipment， such as substation， lines， etc. ； for a certain type of electric equipment at a certain level， such as substation at 500KV， there are different electric equipment such as ST0082， ST0083， etc. Each substation ST0082， ST0083 further includes different electronic devices， such as busbar section， capacitor， circuit breaker， transformer and so on，each of which contains a plurality of measurement points， such as active power， inactive power， bottom oil temperature， etc. Therefore， such a data model clearly defines the hierarchy of the power grid operations. Besides， in the data model， there is contained identification information on electric equipment， electric devices， and measurement points， which reflects the correlation between these components and their identification. Based on the information， the data in data stream may be grouped.

Fig. 5B illustrates an exemplary raw data received from a data source. As illustrated in Fig. 5B， the raw data includes measurement ID “MeaslD” ， which is an identification of measurement point associated with the data； measurement time “timeStamp” ， which indicates the time at which the data is measured； measurement value “Value” ， which indicates the value of the measured data； data quality “Quality” ， which indicates the quality of the data， with “0” representing a high quality. From information contained in the data， especially the measurement ID “MeaslD， ” and correlation provided in the data model， a power system resource associated with the data may be first determined， for example by searching the corresponding table in the data model using the measurement ID “MeasID” as searching condition. Thus， the measurement point or power system resource with which a measurement record is associated may be determined， by using the measurement point identification as contained in the data. Furthermore， based on the date model， an electric device which the measurement point belongs to， for example TRl122， may be further determined. Based on the data model， the power system resource group in which the electric device belongs to may be determined. For example， for transformer TRl122， it may be clearly seen from the data model that it belongs to the power system resource group ST0082， i. e. ， substation ST0082.

The determined information on power system resource identification and the power system resource group can be included into the data for later use. An exemplary mapped data is illustrated in Fig. 5C only for a purpose of illustration. As illustrated， in addition to the information contained in the raw data， the mapped data further contains “PSRID” which indicates the identification of the power system resource and “PSRGrplD” which indicates the identification of the power system resource group.

Referring back to Fig. 4， at step S402， the data in the data stream are sent to one or more processing nodes based on the power system resource group associated therewith. After the data in the data stream are grouped， they will be sent to the following processing nodes for further processing， based on power system resource group associated therewith. That is to say， data belonging to the same power system resource group will be sent to the same one or more processing nodes.

In an embodiment of the present disclosure， the message passing may be further optimized to improve transmission efficiency. Inventors notice that， for a certain message bus， the performance is a variant wtfich changes according to the message size and there is usually an optimal message size which can achieve a best performance. For example， as far as the Apache Storm system which uses zeroMQ as internal message bus is concerned， the best performance can be achieved at a message size of 100k. The following Table 1 gives the relationship between performance and message size in the Apache Storm system.

Table 1 Relationship Between Performance and the Message Size

From Table 1， it is clear that a best performance is achieved when the message size is 100k and a larger or smaller size will not provide better performance. In view of this discovery， in the present disclosure， an optimization algorithm is designed， which considers both the bus performance and the deadline for transmission of the real-time data， which can be， for example， expressed as follows：

BooleanSend ＝ (t＞＝T*a*w) or (s＞S*b*w) Wherein BolleanSend is a boolean value that determines whether to transmit the data， t denotes the cunrent waiting time； T is the maximum transmission delay time allowed for real-time data， S is the amount of current buffered real-time data； S is the maximum number of data per message，， which is determined by the optimal message size and the data size， for example 4.17k (i. e. ， 100k/24) in a case that the optimal message size is 100k and each data has a size of 24 bytes； w denotes a network load ratio in a value ranging from 0 to 1， which reflects network load conditions， i. e. ， current network load or bandwidth； a is an optimization ratio for delay time； and b is an optimization ratio for message size. Values of a， b are optimization ratio to enable data to be transmitted in a small size， which may be set according to network load conditions. For example， their default values may be set as for example 1， however， in the event the network load isrelatively lighter， the values of a and b may be set as a lower value.

Therefore， in embodiments of the present disclosure， the sending of data in the data stream may be dependent on current buffered message size and current waiting time. For example， the data in the data stream will be sent when the current buffered message size reaches a size threshold considering the current network load conditions. Additionally， or alternatively， the data in the data stream will be sent when the current waiting time is longer than a time threshold considering the current network load conditions. In such a way， it may allow message to be transmitted when the message reaches a specified size to obtain an optimal performance， or transmitted in a smaller size when the network is idle so that the real-time performance of data transmission can be ensured.

Reference is made back to Fig. 4， at step S403， the data in the data stream at the one or more data processing nodes are processed based on the power system resource groups associated therewith. At each processing node， there are running one or more processes for processing the data sent thereto. These processes may be configured according to the type of the power system resource group and updated as required. At each of the processing node， the processing algorithms or processes can be easily updated or changed online by embedded script executing engine such as python， etc. ， and/or java reflection mechanism. The processing of the data is known in the art and thus detailed description about the operations of these processing nodes will not be elaborated herein for a purpose of simplification.

In embodiments of the present disclosure， the data in the data stream belonging to the same power system resource group will be sent to the same one or more processing nodes and at these processing nodes， the data can be processed simultaneously， which may realize a real-time data processing in power system especially for a large scale real-time data. Besides， in the present disclosure， the processing process nodes may be based on distributed stream computing teclmology and thus it possesses a good expansibility， which means the stream computing clusters can be increased dynamically to meet new requirements， such as when the data amount increases， or it requires decreasing the time cost. In addition， according to the present disclosure， various types of data processing are performed according to the related PSR group， which reduces repetitive data transmission and improve the efficiency of data processing. Moreover， since data in the data stream belonging to the same PSR will be mapped， packaged and sent to the same one processing node， it may facilitate to achieve a power gird operation oriented data processing.

In addition， in order to ensure both high efficiency and high reliability of the data processing， in embodiments of the present disclosure， the data sent to each of the processing nodes are cached in the processing node and at the same time raw data are also stored in a database such as HBase. The cached data will be used for data processing while the data stored in the database can be used in event of fault recovery or initiating of one or more processing nodes. For example， the data may be loaded from the database first when a processing node is restarted due to for example a fault.

Fig. 6 illustrates a flow chart of data processing operations in the event of fault recovery according to an embodiment of the present disclosure. As illustrated， the data will be first loaded from the database to the cache at step 601 and the processing node begins to receive the grouped data at step 602 and then caches the received data in the cache. At step 603， the processing node further stores the received data into the database for using in the event of e. g. ， afailure recovery. Then the data are obtained and processed by processes at step 604 and the processed results are stored into the database at step 605. After the data was processed successfully， at step 606， an ACK may be sent to the receiving node as a reply.

From the flow chart， it can be seen that even if there is a failure， the raw data can still be obtained fiom the database and thus it may provide a higher reliability. Moreover， it may perform similar operation when the number of data increase substantially and it requires more processing nodes to process these data.

In addition to the above， in the present disclosure， there is also provided an application-oriented data storage solution so as to achieve high performance of large-scale real-time data storage and accessing， which will be detailed next with reference to Figs. 7 to 12.

Reference will be first made to Figs. 7A and 7B， which show an example data storage table designs according to an embodiment of the present disclosure. In the present disclosure， it proposes to store the measurement data by year i.e. ， annual measurement data will be stored in one same table and data for different years will be stored in different table. The annual measurement data table may have a table name such as “Historydata 2014” to include the information about the year of measurement time of the data stored therein. The table may comprise a row key “Rowkey” for identifying a piece of data. More particularly， the row key “Rowkey” may comprise two parts， i. e. ， information on power system resource and information on measurement time. The information on power system resource can be the power system resource identification “PSRID” . The “PSRID” is the unified id of the power system resource as mentioned hereinbefore， which has a data type of unsigned int (4 bytes) . The information on measurement time may only contain information on month and day without the year， as information on the year can be reflected by the name of table. Particularly， the information on measurement time used herein is not in a usual format of time but represented time shift in millisecond within the year of the data time. This means the information on measurement time does not include information about the year and it uses the time shift from， for example， 0： 00 am of the first day in this year. Thus， the information on measurement time may be designed as an unsigned int format， 4 bytes. Therefore， in embodiments of the present disclosure， the row key only needs 8 bytes， which will save the storage space greatly comparing with the design of the row key in the prior art.

In addition， column family “CF” ， column “qualifier” ， and Value may be included. The ” CF” denotes the family which the data belong to， wherein ‘r’ refers to the raw data； “s” refers to statistic data； “e” refers to the event. The column qualifier is a code representing the measurement type and the format of the value. The measurement type and the format of the value can be determined by using the code to perform a search in a measurement type and format definition table as illustrated in Fig. 7B.The “Value” stores the data in a format which is represented by the column qualifier. Additionally， a “Region” may be included， which denotes the position in which the measure data are stored. The “Region” may be a number pre-assigned according to data scale and computing resources to avoid any influence on the performance of the automatic adjustment.

In a traditional method， a row key is usually used but this would require about 24 bytes， while in the present disclosure， by using the PSRID and the time shift in milliseconds， it only requires 8 bytes. Therefore， the traditional row key will occupy nearly 3 times storage space larger than the row key of the present disclosure， which means a great storage space saving. Besides， in the present disclosure， the data in the data stream are stored in the data base in formats that are respectively suitable for their own data types and suitable for data processing instead of only in a string format. For example， numerical values， such as voltage， current， power， and so on， are stored in a format of e. g. ， double， instead of string as commonly used.. The types of data such as double， float， etc. ， require less storage space than string， and they can be processed directly without conversion. Thus， it may not only save the storage space but also improve the processing efficiency. In addition， if a new type of measurement data is to be added， the only action needed is to insert a piece of record into the table of Fig. 7B to define the measurement type and the value format.

Fig. 8A illustrates a diagram of the parallel storage of the data according to an embodiment of the present disclosure. According to the present disclosure， the data may be stored into the database parallelly and distributively in a chronological record. In the present disclosure， the row key is composed of PSRID and time shift in milliseconds. The data will be split into multiple segments based on the PSRID and parallelly stores the segments into different regions in time order， as illustrated in Fig. 8B. Due to the special design of the row key， in database such as HBase， the storing process will be performed automatically because the HBase per se can support such an mechanism.

Such a storage manner may provide an application-oriented data access. As is known， the application usually will access data belonging to a same power system resource group at a certain time period， the way of storing the data in the database will facilitate the accessing of the data greatly because all the data required are usually stored in a region continuously， as illustrated in Fig. 8B. This data storage manner will ensure data can be accessed in a rather high efficiency when they are accessed by PSR and time period. Therefore， in embodiments of the present disclosure， the data storage solution will help to achieve high data access performance.

The above-described storage solution not only supports large-scale real-time data storage but also provides a good expansibility. Moreover， it is quite suitable for big data analysis， for example it may be easily integrated with other analytical components of Hadoop ecosystem.

Figs. 9A and 9B schematically illustrate case simulation results according to an embodiment of the present disclosure. In simulations， 3 to 15 host nodes are used， each with a 16G memory， 8-core CPU， 1T storage， 1Gbit Network； 3 million meters are used as data sources each with 33 measurement data. Fig. 9A illustrates the data process and writing performance， wherein the X-axis shows the number of host nodes， the Y-axis shows the number of meters that are processed and written per second. From Fig. 9A， it is clear that the process and writing performance increases nearly linearly as the clusters increases and it achieves 50,000 meters/sec (45 million meters/15minutes) write performance for 15 host nodes. Fig. 9B illustrates data reading performance obtained from case simulations. From the table， it is clear that the data can be read in a very short time and for the same number of meters， the larger amount of threads may provide a higher read performance.

Therefore， with embodiment of the present disclosure， it enables to process and store high volume real-time data at a lot-cost and high performance， which facilitate to extract information from the real-time data and make better business decision in time. Besides， it provides a critical and fundamental solution for a large and wildly distributed real-time data processing and storage for the utilities.

In the present disclosure， there is also provided a system for data processing in a power system， and hereinbelow this system will be described with reference to Fig. 10， which schematically illustrates a block diagram of a system for data processing in a power system according to an embodiment of the present disclosure.

As illustrated in Fig. 10， the system 1000 comprises at least one processor 1010； and at least one memory 1020 storing computer executable instructions 1030. The at least one memory 1020 and the computer executable instructions 1030 configured to， with the at least one processor 1010， cause the system 1000 to group data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation； send the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and process the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.

Particularly， in an embodiment of the present disclosure the grouping data in the data stream to be processed may comprise determining the power system resources respectively associated with the data in the data stream based on measurement point identification information contained within the data in the data stream and correlation between power system resources and measurement point identification information defined in the data model； and determining the power system resource groups associated with the data in the stream according to the determined power system resources and the hierarchy of the power grid operation defined in the data model.

In embodiments of the present disclosure， the sending of data in the data stream may be optimized so as to achieve a higher performance. For example， the sending of data may be dependent on current buffered message size and current waiting time. The data in the data stream may be sent when the current buffered message size reaches a size threshold considering current network load conditions. Or alternatively， the data in the data stream may be sent when the current waiting time is longer than a time threshold considering current network load conditions.

In embodiments of the present disclosure， the data in the data stream may be cached in the one or more data processing nodes for processing and the data in the data stream are further stored in a database. In case of fault recovery or newly starting of a processing node， corresponding data may be loaded from the database to a cache of the data processing node. The data in the data stream may be stored in tables by year and a name for a table comprises information on the year. A key for each of the data may be fonned by an identification of a power system resource associated therewith and measurement time in a form of time shift in millisecond within the year of the data time.

In embodiments of the present disclosure， the data in the data stream may be split into one or more segments according to power system resources associated therewith and the one or more segments are parallelly and distributively written to multiple regions of the database in a chronological order based on the keys for the data. Furthermore， the data in the data stream may be stored in the data base in formats that are respectively suitable for their own data types and suitable for data processing.

In embodiments of the present disclosure， the processing of data in the data stream may be based on a distributive streaming computing architecture， such as Apache Storm system and the data storage is based on distributive database such as HBase.

In Fig. 11， there is illustrated an apparatus for data processing in a power system according to an embodiment of the present disclosure. The apparatus 1100 comprises： a data grouping module 1110 a data sending module 1120， and a data processing module 1130. The data grouping module 1110 may be configured to group data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation. The data sending module 1120 may be configured to send the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith. The data processing module 1130 may be configured to process the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.

In an embodiment of the present disclosure， the data grouping module 1110 may first determine power system resources respectively associated with the data in the data stream， based on measurement point identification information contained within the data in the data stream and correlation between power system resources and measurement point identification information defined the data model. Then according to the determined power system resources and the hierarchy of the power grid operation defined in the data mode， the data grouping module 1110 may in turn determine the power system resource groups associated with the data in the stream.

In another embodiment of the present disclosure， the data sending module 1120 may be further configured to send the data in the data stream dependent on current buffered message size and current waiting time. For example， the data in the data stream can be sent when the current buffered message size reaches a size threshold considering current network load conditions. Additionally or alternatively， the data in the data stream can be sent when the current waiting time is longer than a time threshold considering the current network load conditions.

In embodiments of the present disclosure， the data in the data stream may be cached in the one or more data processing nodes for processing. At the same time， an optional data storing module 1140 may store the data in the data stream into a database. In case of fault recovery or initiating of a data processing node， corresponding data are loaded from the database to a cache of the data processing node. The data storing module 1140 may be configured to store the data in the data stream in tables by year. The name for a table may comprise information on the year， and a key for each of the data may be formed by an identification of a power system resource associated therewith and measurement time in a form of time shift in millisecond within the year of the data time. The data in the data stream may be split into one or more segments according to PSRID for the data and the one or more segments are parallelly and distributively written to multiple regions of the database in a chronological order based on their the keys for the data. Particularly， the data in the data stream are stored in the data base in formats that are respectively suitable for their own data types and suitable for data processing.

In addition， in embodiments of the present disclosure， the processing the data in the data stream is based on a distributive streaming computing architecture and the data storage is based on a distributive database.

In Fig. 12， there is further illustrated another apparatus for data processing in a power system according to another embodiment of the present disclosure. The apparatus 1200 comprises： means 1210 for grouping data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation； means 1220 for sending the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith and means 1230 for processing the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.

As illustrated in Fig. 12， the apparatus 1200 may additionally comprise means 1240 for storing data in the data stream into a database. In case of fault recovery or newly starting of a data processing node， corresponding data are loaded from the database to a cache of the data processing node. The means 1240 for storing data may be further configured to store the data in the data stream in tables by year and a name for a table comprises information on the year， and wherein a key for each of the data may be formed by an identification of a power system resource associated therewith and measurement time in a form of time shift in millisecond within the year of the data time. Moreover， the data in the data stream may be split into one or more segments according to PSRID for the data and the one or more segments are parallelly and distributively written to multiple regions of the database in a chronological order based on their the keys for the data. Particularly， the data in the data stream are stored in the data base in formats that are respectively suitable for their own data types and suitable for data processing.

Furthermore， there is provided a tangible computer-readable medium having a plurality of instructions executable by a processor to perform data processing in a power system， the tangible computer-readable medium may comprise instructions configured to perform steps of the method according to any embodiments of method of the present disclosure.

It should be noted that operations of respective modules or means as comprised in the system 1000， apparatus 1100， and apparatus 1200 substantially correspond to respective method steps as previously described. Therefore， for detailed operations of respective modules or means in the system 1000， apparatus 1100， and apparatus 1200， please refer to the previous descriptions of the methods of the present disclosure with reference to Figs. 2 to 9B.

Besides， although hereinbefore specific embodiments are presented， they are only given for a purpose of illustration so as to enable the skilled in the art to understanding the idea of the present disclosure completely and thoroughly so that they can practice the solution of the present disclosure. From the teaching as provided herein， the skilled in the art can conceive various modifications； all these modifications should be fallen within the scope of the appendix claims. For example， although stream processing system is described in a topology comprising data receiving nodes， data processing nodes and data outputting nodes， it is also possible to use a stream processing topology without separate data outputting nodes. In embodiment of the present disclosure， data belonging to certain electronic equipment at certain voltage level， such as ST0082 are divided into a same power system resource group； however， it is also possible to divide data belonging to a certain or certain type of electric device (e.g. ， transformer， TRl122) in the certain electronic equipment ST0082 such as transformer into same power system resource group. Besides， in order to improve the transmission efficiency， there is provided a specific optimization approach； however， it is also possible to conceive other different optimization approach from the teaching provided herein.

In Fig. 13 is further illustrated a general computer system 1300， which may represent any of the computing devices referenced herein. For instance， the general computer system 1300 may represent-in part or in its entirety-the control center， the head end， the integrated network operations and management system (NOMS) ， the fault， performance， and configuration management (FPCM) module， or any other computing devices referenced herein such as the end devices， the meters， the telemetry interface modules (TIUs) ， the collectors， and/or any networked components such as routers， switches or servers as discussed herein. The computer system 1300 may include an ordered listing of a set of instructions 1302 that may be executed to cause the computer system 1300 to perform any one or more of the methods or computer-based functions disclosed herein. The computer system 1300 may operate as a stand-alone device or may be connected， e. g. ， using the network 115， 125， to other computer systems or peripheral devices.

In a networked deployment， the computer system 1300 may operate in the capacity of a server or as a client-user computer in a server-client user network environment， or as a peer computer system in a peer-to-peer (or distributed) network environment. The computer system 1300 may also be implemented as or incorporated into various devices， such as a personal computer or a mobile computing device capable of executing a set of instructions 1302 that specify actions to be taken by that machine， including and not limited to， accessing the network 115， 125 through any form of browser. Further， each of the systems described may include any collection of sub-systems that individually or jointly execute a set， or multiple sets， of instructions to perform one or more computer functions.

The computer system 1300 may include a processor 1307， such as a central processing module (CPU) and/or a graphics processing module (GPU) . The processor 1307 may include one or more general processors， digital signal processors， application specific integrated circuits， field programmable gate arrays， digital circuits， optical circuits， analog circuits， combinations thereof， or other now known or later-developed devices for analyzing and processing data. The processor 1307 may implement the set of instructions 1302 or other software program， such as manually-programmed or computer-generated code for implementing logical functions. The logical function or any system element described may， among other functions， process and/or convert an analog data source such as an analog electrical， audio， or video signal， or a combination thereof， to a digital data source for audio-visual purposes or other digital processing purposes such as for compatibility with computer processing or networked communication.

The computer system 1300 may include a memory 1305 on a bus 1320 for communicating information. Code operable to cause the computer system to perform any of the acts or operations described herein may be stored in the memory 1305. The memory 1305 may be a random-access memory， read-only memory， programmable memory， hard disk drive or any other type of volatile or non-volatile memory or storage device.

The computer system 1300 may also include a disk， solid-state drive optical drive module 1315. The disk drive module 1315 may include a non-transitory or tangible computer-readable medium 1340 in which one or more sets of instructions 1302， e. g. ， software， can be embedded. Further， the instructions 1302 may perform one or more of the operations as described herein. The instructions 1302 may reside completely， or at least partially， within the memory 1305 and/or within the processor 1307 during execution by the computer system 1300. The database or any other databases described above may be stored in the memory 1305 and/or the disk module 1315.

The memory 1305 and the processor 1307 also may include computer-readable media as discussed above. A “computer-readable medium， ” “computer-readable storage medium， ” “machine readable medium， ” “propagated-signal medium， ” and/or “signal-bearing medium” may include any device that includes， stores， communicates， propagates， or transports software for use by or in connection with an instruction executable system， apparatus， or device. The machine-readable medium may selectively be， but not limited to， an electronic， magnetic， optical， electromagnetic， infrared， or semiconductor system， apparatus， device， or propagation medium.

Additionally， the computer system 1300 may include an input device 1325， such as a keyboard or mouse， configured for a user to interact with any of the components of system 1300， including user selections or menu entries of display menus. It may further include a display 1330， such as a liquid crystal display (LCD) ， a cathode ray tube (CRT) ， or any other display suitable for conveying information. The display 1330 may act as an interface for the user to see the functioning of the processor 1307， or specifically as an interface with the software stored in the memory 1305 or the drive module 1315.

The computer system 1300 may include a communication interface 1336 that enables communications via the communications network 125. The network 125 may include wired networks， wireless networks， or combinations thereof. The communication interface 1336 networks may enable communications via any number of communication standards， such as Ethernet AVB， 802.11， 802.13， 802.20， WiMax， or other communication standards.

Accordingly， the system may be realized in hardware， software， or a combination of hardware and software. The system may be realized in a centralized fashion in at least one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that， when being loaded and executed， controls the computer system such that it carries out the methods described herein. Such a programmed computer may be considered a special-purpose computer.

As described herein， any modules or processing boxes are defined to include software， hardware or some combination thereof executable by the processor 1307. Software modules may include instructions stored in the memory 1305， or other memory device， that are executable by the processor 1307 or other processors. Hardware modules may include various devices， components， circuits， gates， circuit boards， and the like that are executable， directed， and/or controlled for performance by the processor 1307.

The system may also be embedded in a computer program product， which includes all the features enabling the implementation of the operations described herein and which， when loaded in a computer system， is able to carry out these operations. Computer program in the present context means any expression， in any language， code or notation， of a set of instructions intended to cause a system having an information processing capability to perform a particular function， either directly or after either or both of the following： a) conversion to another language， code or notation； b) reproduction in a different material form.

By far， the present disclosure has been described with reference to the accompanying drawings through particular preferred embodiments. However， it should be noted that the present disclosure is not limited to the illustrated and provided particular embodiments， but various modification may be made within the scope of the present disclosure.

Further， the embodiments of the present disclosure can be implemented in software， hardware or the combination thereof. The hardware part can be implemented by a special logic； the software part can be stored in a memory and executed by a proper instruction execution system such as a microprocessor or a dedicated designed hardware. Those normally skilled in the art may appreciate that the above method and system can be implemented with a computer-executable instructions and/or control codes contained in the processor， for example， such codes provided on a bearer medium such as a magnetic disk， CD， or DVD-ROM， or a programmable memory such as a read-only memory (firmware) or a data bearer such as an optical or electronic signal bearer. The apparatus and its components in the present embodiments may be implemented by hardware circuitry， for example， a very large scale integrated circuit or gate array， a semiconductor such as logical chip or transistor， or a programmable hardware device such as a field-programmable gate array， or a programmable logical device， or implemented by software executed by various kinds of processors， or implemented by combination of the above hardware circuitry and software， for example， by firmware.

While various embodiments of the disclosure have been described， it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the disclosure. Accordingly， the disclosure is not to be restricted except in light of the attached claims and their equivalents.

Claims

A system for data processing in a power system， comprising：

at least one processor； and

at least one memory storing computer executable instructions，

wherein the at least one memory and the computer executable instructions are configured to， with the at least one processor， cause the system to：

group data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation；

send the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and

process the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.
The system according to Claim 1， wherein the grouping data in the data stream comprises

determining the power system resources respectively associated with the data in the data stream based on measurement point identification information contained within the data in the data stream and correlation between power system resources and measurement point identification information defined in the data model； and

determining the power system resource groups associated with the data in the stream according to the determined power system resources and the hierarchy of the power grid operation defined in the data model.
The system according to Claim 1 or 2， wherein the sending the data in the data stream is dependent on current buffered message size and current waiting time.
The system according to any of Claims 1 to 3， wherein the data in the data stream are sent when the current buffered message size reaches a size threshold considering current network load conditions.
The system according to any of Claims 1 to 4， wherein the data in the data stream are sent when the current waiting time is longer than a time threshold considering current network load conditions.
The system according to any of Claims 1 to 5， wherein the data in the data stream are cached in the one or more data processing nodes for processing and the data in the data stream are further stored in a database， wherein corresponding data are loaded from the database to a cache of a data processing node in case of fault recovery or initiating of the data processing node.
The system according to any of Claims 1 to 6， wherein the data in the data stream are stored in tables by year and a name for a table comprises information on the year， and wherein a key for each of the data is formed by an identification of a power system resource associated therewith and measurement time in a form of time shift in millisecond within the year of the data time.
The system according to Claim 7， wherein the data in the data stream are split into one or more segments according to power system resources associated therewith and the one or more segments are parallelly and distributively written to multiple regions of the database in a chronological order based on the keys for the data.
The system according to Claim 7 or 8， wherein the data in the data stream are stored in the data base in formats that are respectively suitable for their own data types and suitable for data processing.
The system according to any of Claims 1 to 9， wherein the processing the data in the data stream is based on a distributive streaming computing architecture and the data storage is based on a distributive database.
A method for data processing in a power system， comprising：

grouping data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation；

sending the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and

processing the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.
The method according to Claim 11， wherein the grouping data in the data stream comprises

determining the power system resources respectively associated with the data in the data stream based on measurement point identification information contained within the data in the data stream and correlation between power system resources and measurement point identification information defined in the data model； and

determining the power system resource groups associated with the data in the stream according to the determined power system resources and the hierarchy of the power grid operation defined in the data model.
The method according to Claim 11 or 12， wherein the sending the data in the data stream is dependent on current buffered message size and current waiting time.
The method according to any of Claims 11 to 13， wherein the data in the data stream are sent when the current buffered message size reaches a size threshold considering current network load conditions.
The method according to any of Claims 11 to 14， wherein the data in the data stream are sent when the current waiting time is longer than a time threshold considering current network load conditions.
The method according to any of Claims 11 to 15， wherein the data in the data stream are cached in the one or more data processing nodes for processing and the data in the data stream are further stored in a database， wherein corresponding data are loaded from the database to a cache a data processing node in case of fault recovery or initiating of the data processing node.
The method according to any of Claims 11 to 16， wherein the data in the data stream are stored in tables by year and a name for a table comprises information on the year， and wherein a key for each of the data is formed by an identification of a power system resource associated therewith and measurement time in a form of time shift in millisecond within the year of the data time.
The method according to Claim 17， wherein the data in the data stream are split into one or more segments according to power system resources associated therewith and the one or more segments are parallelly and distributively written to multiple regions of the database in a chronological order based on the keys for the data.
The system according to Claim 17 or 18， wherein the data in the data stream are stored in the data base in formats that are respectively suitable for their own data types and suitable for data processing.
The method according to any of Claims 11 to 19， wherein the processing the data in the data stream is based on a distributive streaming computing architecture and the data storage is based on a distributive database.
An apparatus for data processing in a power system， the apparatus comprising：

a data grouping module configured to group data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation；

a data sending module configured to send the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and

a data processing module configured to process the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.
An apparatus for data processing in a power system， the method comprising：

means for grouping data in a data stream at one or more data receiving nodes based on power system resource groups associated therewith， which are determined partially based on a data model for the power system defining a hierarchy of power grid operation；

means for sending the data in the data stream to one or more data processing nodes based on the power system resource groups associated therewith； and

means for processing the data in the data stream at the one or more data processing nodes based on the power system resource groups associated therewith.
A tangible computer-readable medium having a plurality of instructions executable by a processor to perform data processing in a power system， the tangible computer-readable medium comprises instructions configured to perform steps of the method according to any one of Claims 11 to 20.