CN114691654A

CN114691654A - Data processing method and data processing system in energy Internet

Info

Publication number: CN114691654A
Application number: CN202011576479.5A
Authority: CN
Inventors: 郭健
Original assignee: Tsinghua University; Toyota Motor Corp
Current assignee: Tsinghua University; Toyota Motor Corp
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-01

Abstract

The disclosure relates to a data processing method and a data processing system in an energy internet. The method may receive data from a plurality of energy hub stations. The at least one processor may be configured to divide the received data into static data and dynamic data, divide the static data into various data including facility-related data, home-related data, and transaction configuration-related data of the energy hub station, and divide the dynamic data into various data including external environment-related data, internal operation-related data, and transaction dynamic data of the energy hub station. For various data, the related parameters of accuracy, completeness, consistency and timeliness can be respectively determined, and the parameters are combined to determine the quality evaluation parameters of the various data. Therefore, the data can be subjected to targeted and efficient quantitative quality evaluation aiming at main application scenes of all aspects of static and dynamic states, and the data are corrected, classified and stored, so that the overall quality and the reliability of the data are improved.

Description

Data processing method and data processing system in energy Internet

Technical Field

The present disclosure relates to an energy internet, and more particularly, to a method for processing multi-station fusion data in an energy internet.

Background

Operation management of smart cities needs energy and information support under energy information infrastructure integration, and multi-station integration is an important way for realizing energy information infrastructure integration. The intelligent city management system is characterized by unified planning, unified construction and unified operation of different business applications, such as an energy station, an energy storage station, a data station and the like, and is the main content of construction of a smart city. The multi-station fusion organically combines the distributed data center with various energy facilities, can jointly manage and optimize the efficiency of an energy system and the operation efficiency of the data center, processes and controls data in real time, and quickly forms a system optimization control strategy; meanwhile, data center service is provided, and a coordination control center of 'energy flow + information flow + value flow' in a region is created; and finally, the intelligent operation level of the energy information station is improved through big data analysis and a new generation of artificial intelligence technology.

Renewable new energy sources such as wind, light, water and the like gradually reduce the degree of dependence of human beings on fossil energy. Countries such as the european union, the united states and china have proposed to achieve the aggressive goal of renewable new energy accounting for 100%, 80% and 50% -70% of the energy supply structure by 2050, which all promote the large-scale popularization of distributed new energy stations at the user side. The Internet is based on various communication connection modes, adopts a hierarchical design structure to shield complex networking protocols at the bottom layer, provides great convenience for people to acquire and utilize data information, changes the conventional communication and information exchange means, and simultaneously reshapes the production and operation modes of a plurality of traditional industries. These technical achievements all contribute to the energy industry achieving structural adjustment goals in the middle of this century. The internet information technology is combined with the renewable new energy power generation technology, and as an important component of an energy internet, on one hand, the goal of energy cleanness and low carbon can be achieved, on the other hand, the efficiency of energy utilization and fair trading is improved, and the method becomes a necessary way for constructing a future smart city energy system.

The multi-station fusion is a hub station which extends, expands and integrates three stations (an energy station, an energy storage station and a data station) into a region of 'energy flow + information flow + value flow', and is also called an energy hub station. In the operation process of the energy hub station, a large amount of data can be collected, and the quality of the data is uneven, so that the stable operation of the energy internet system is adversely affected. Especially energy internet systems based on information-energy coupling, whose dependency on data leads to their sensitivity to fluctuations in data quality. At present, an effective processing method for data from an energy hub station is lacked, the quality of fusion data from the energy hub station is difficult to monitor and ensure, and once the data from the energy hub station has problems, the fairness of safe operation and transaction of supply and demand in the whole energy Internet can be influenced.

Disclosure of Invention

The present disclosure is provided to solve the above-mentioned problems occurring in the prior art.

There is a need for a data processing method and a data processing system in the energy internet, which can perform targeted and efficient quantitative quality assessment on data from an energy hub station for main application scenarios of various aspects of static and dynamic states, and modify and store the data in a classified manner according to the quantitative quality assessment, so as to improve the overall quality and reliability of the data in the energy internet.

According to a first aspect of the present disclosure, a data processing method in an energy internet is provided. The data processing method may include receiving data from a plurality of energy hub stations. The method may further include dividing, with at least one processor, the received data into static data and dynamic data, further dividing the static data into various data including facility-related data, affiliation-related data, and transaction configuration-related data of the energy hub station, and further dividing the dynamic data into various data including external environment-related data, internal operation-related data, and transaction dynamic data of the energy hub station. The at least one processor may be further configured to determine an accuracy-related parameter, an integrity-related parameter, a consistency-related parameter, and a timeliness-related parameter for the various data, respectively, and synthesize the accuracy-related parameter, the integrity-related parameter, the consistency-related parameter, and the timeliness-related parameter to determine a quality evaluation parameter for the various data.

According to a second aspect of the present disclosure, a data processing system in an energy internet is provided. The data processing system may include an interface and at least one processor. The interface is configured to receive data from a plurality of energy hub stations. The at least one processor is configured to perform the following steps. The received data may be divided into static data and dynamic data, the static data is further divided into various data including facility-related data, home-related data and transaction configuration-related data of the energy hub station, and the dynamic data is further divided into various data including external environment-related data, internal operation-related data and transaction dynamic data of the energy hub station. For the various data, an accuracy related parameter, an integrity related parameter, a consistency related parameter and a timeliness related parameter can be respectively determined, and the accuracy related parameter, the integrity related parameter, the consistency related parameter and the timeliness related parameter are integrated to determine a quality evaluation parameter of the various data.

By using the data processing method and the data processing system in the energy internet according to the embodiments of the present disclosure, the data from the energy hub station can be subjected to targeted and efficient quantitative quality assessment for the main application scenarios in various aspects of static and dynamic states, and the data can be corrected and classified for storage, so that the overall quality and the reliability of the data in the energy internet can be improved.

Drawings

In the drawings, which are not necessarily drawn to scale, like reference numerals may describe similar components in different views. Like reference numerals having letter suffixes or different letter suffixes may represent different instances of similar components. The drawings illustrate various embodiments generally by way of example and not by way of limitation, and together with the description and claims serve to explain the disclosed embodiments. Such embodiments are illustrative, and are not intended to be exhaustive or exclusive embodiments of the present apparatus or method.

Fig. 1 shows a schematic diagram of a data processing method and a data processing system in an energy internet according to an embodiment of the present disclosure;

fig. 2 shows a flow chart of a data processing method in an energy internet according to an embodiment of the present disclosure;

fig. 3 shows an exemplary schematic diagram of a sub-flow of pre-data classification of data from an energy hub station in a data processing method in an energy internet according to an embodiment of the present disclosure; and

fig. 4 shows a flowchart of a data processing method in an energy internet according to an embodiment of the present disclosure.

Detailed Description

The following detailed description is provided to enable those skilled in the art to better understand the technical solutions of the present disclosure, with reference to the accompanying drawings and specific embodiments. Embodiments of the disclosure are described in further detail below with reference to the figures and the detailed description, but the disclosure is not limited thereto. The terms "first," "second," and "third" as used in this disclosure are intended only to distinguish between corresponding features, do not denote a need for such ordering, and do not necessarily denote only the singular. The execution order of the processing steps in this document is merely an example, and the execution order of the steps may be appropriately adjusted or separate steps may be integrally executed as long as the logical relationship of the steps is not affected.

Fig. 1 shows a schematic diagram of a data processing method and a data processing system in an energy internet according to an embodiment of the present disclosure. As shown in fig. 1, the energy internet includes an energy hub station 1100, energy hub stations 2100, … … and an energy hub station n 100(n is a natural number), and each energy hub station 100 generates and collects a large amount of data during operation, and the data fuses various aspects of information, energy and value, which are also called fused data. The data from the various energy hub stations 100 may be processed for storage in distributed data storage (not shown, such as but not limited to, a distributed database, etc.).

The data processing method in the energy internet is exemplarily described below with reference to fig. 1 and 2. Data from a plurality of energy hub stations 100 may be received at step 201 and the fused data fed to at least one processor 101 for subsequent processing. The at least one processor 101 may be one or more processors 101 of various configurations as long as it can process and manage data from the plurality of energy hub stations 100. In some embodiments, the at least one processor 101 is a distributed processing system, such as but not limited to a cloud processor, which can process (e.g., but not limited to pre-sort, quality assessment, and correction, etc.) the received data from the plurality of energy hub stations 100 and then allocate the processed data for storage in a distributed data store in the energy internet.

As shown in fig. 1, the data processing system may include the at least one processor 101 and an interface (not shown). The interface may be configured to enable the at least one processor 101 to receive data from a plurality of energy hub stations 100 for the following processing. In some embodiments, various communication interfaces may be provided for each energy hub station 100 so that a distributed data processing system may obtain data from each energy hub station 100 through the communication interfaces via various networks. These networks may employ any of wired public networks, wireless public networks, private networks, and the like. In some embodiments, the communication protocol may be customized for communication between each energy hub station 100 and the data processing system or a general purpose communication protocol may be employed.

At step 202, the received data may be partitioned into static data 102 and dynamic data 103 using the at least one processor 101. The dynamic data 103 may represent data that may change in time sequence, and the static data 102 may represent data that remains substantially stable in time sequence; by dividing the static data 102 and the dynamic data 102, the accuracy of data feature extraction can be improved, and the processing efficiency can be improved. For example, the sample profile-based modification described in detail below may be used to modify dynamic data 103 (i.e., time series data) better than static data 102. Such corrections may not be applied to the static data 102, and may be time consuming (as the static data 102 may only present the sample profile over a long period of time), may even introduce inappropriate corrections for reasonable deviations, and may waste computing resources of the overall system. Generally, the accuracy of the static data 102 is better, and the higher accuracy-related parameters 104a can be given in advance, and the sample characteristic curve-based correction is not performed by default. By dividing the dynamic data 102 and performing the correction based on the sample characteristic curve specifically, the use efficiency of the computing resources can be improved and the inappropriate correction can be avoided.

Further, in step 203, the static data 102 can be further divided into subclasses of various data including facility related data 102a, home related data 102b and transaction configuration related data 102c of the energy hub station 100, and the dynamic data 103 can be further divided into subclasses of various data including external environment related data 103a, internal operation related data 103b and transaction dynamic data 103c of the energy hub station 100.

The static data 102 may represent the physical basis information of the energy hub station (e.g., a new energy plant station therein) that is stable over a longer period of time, and may primarily include facility-related data 102a, home-related data 102b, and transaction configuration-related data 102 c. For example, the facility-related data 102a may include model parameter data such as photovoltaic modules, wind turbine modules, inverters, and storage batteries, and set condition parameters such as installed capacity. The attribution-related data 102b may represent the attribution relationship of the energy hub (e.g., the new energy plant therein) at the geographic and management level, including but not limited to the names of the sites, the location, the area, the line-variation relationship, the power generation group, the grid-connected voltage level, the consumption mode, the tuning/non-tuning management mode, and the like. For example, the transaction configuration related data 102c may represent a set configuration manner of the energy hub station (e.g., a new energy plant station therein) associated with the transaction, including but not limited to data of medium-term or long-term or day-ahead power generation plan, transaction mode, transaction type, transaction manner, transaction assessment and evaluation manner, and the like.

Dynamic data 103 may represent dynamic operational monitoring data generated during new energy station operations and transactions, primarily with respect to time series data. In some embodiments, the external environment related data 103a, the internal operation related data 103b and the transaction dynamic data 103c of the energy hub site 100 may constitute the main components of the dynamic data 103. For example, the external environment-related data 103a may include, but is not limited to, wind speed, light intensity, ambient temperature, humidity, wind power, hours of sunshine, and the like. For example, the internal operation related data 103b may include, but is not limited to, operation state sequence data such as gear rotation speed, component temperature, oil pressure, output power, output voltage, output current, control state data such as action commands, alarm codes, switching, and the like, and recording data in emergency and fault states. For example, trade dynamic data 103c may include, but is not limited to, market trade sequence data such as node electricity prices, trade electricity quantities, trade prices, trade time, credit valuations, and equipment prices related to new energy, manufacturer stock prices, futures, options, and the like.

The above is to subdivide the static data 102 and the dynamic data 103 into various sub-classes, considering the various business application analyses oriented by the converged data in the energy internet, so that most of the various rich data from the energy hub 100 can be subdivided into various sub-classes, and a unified data structure (such as but not limited to an instance table structure, a business table, etc.) is adopted for various data in the same sub-class to facilitate subsequent application analysis and modification (if necessary).

An accuracy-related parameter 104a, an integrity-related parameter 104b, a consistency-related parameter 104c and a time-dependent parameter 104d may be determined (each subclass being independently determined) separately for the various data, such as, but not limited to, facility-related data 102a, home-related data 102b, transaction-configuration-related data 102c, external-environment-related data 103a, internal-operation-related data 103b and transaction-dynamics data 103c, using the at least one processor 101 at step 204. Note that reference numerals are provided in fig. 1 only for the accuracy-related parameter 104a, the integrity-related parameter 104b, the consistency-related parameter 104c, and the aging-related parameter 104d of the facility-related data 102a, and the accuracy-related parameter, the integrity-related parameter, the consistency-related parameter, and the aging-related parameter for other subclasses are omitted.

By carrying out multi-level evaluation on various data from 4 dimensions of accuracy, integrity, consistency and timeliness, comprehensive quantitative analysis on abnormal data and unreasonable data of fusion data in the energy Internet can be realized. The accuracy-related parameter 104a, the integrity-related parameter 104b, the consistency-related parameter 104c, and the timeliness-related parameter 104d will be exemplified below. The multi-level evaluation of the 4 dimensions is particularly effective to the abnormal and unreasonable various causes of the fused data in the energy Internet, for example, the abnormal results are caused by data loss, the recording format of the data is not matched with the structural attributes of the business table, the data is too old or outdated, a few distortion points deviate from normal values, and the high detection rate of the abnormal data and the unreasonable data of the fused data in the energy Internet can be realized.

In some embodiments, an accuracy-related parameter 104a, an integrity-related parameter 104b, a consistency-related parameter 104c, and a timeliness-related parameter 104d, as defined below, may be employed.

The accuracy-related parameter 104a may be determined based on a ratio of the number of data points deviating from the sample characteristic curve to the number of data points of the sample curve, such that the higher the ratio, the lower the accuracy-related parameter. The sample profile may be determined using various clustering algorithms based on historical data. In some embodiments, the clustering algorithm may employ a supervised or unsupervised clustering algorithm. In some embodiments, unsupervised clustering algorithms such as k-means clustering algorithms, graph-based clustering algorithms, etc. may be employed to address the lack of ground truth for data in energy internets.

In some embodiments, the accuracy analysis may be a statistical analysis of the true level of the data records, and the accuracy-related parameter Q1 is calculated according to formula (1) after the sample data characteristic curve is calculated by using a clustering algorithm:

where Σ out of structural curve may represent the total number of data points deviating from the sample characteristic curve, and Σ number of lineData may represent the total number of data points of the sample curve (i.e., the total number of data points for which the accuracy analysis is directed).

The integrity-related parameter 104b may be determined based on a ratio of the amount of missing data to the total amount of data, such that the higher the ratio, the lower the integrity-related parameter. In some embodiments, the integrity-related parameter Q2 may be calculated according to equation (2):

where, Σ Lines of LossData can represent the amount of missing data (or can be obtained by summing records of data missing), and Σ Lines of grossData can represent the total amount of data.

The consistency-related parameter 104c may represent how well defined attributes (such as, but not limited to, format) of the load data record match defined attributes of the load table (e.g., service table) structure. In some embodiments, the consistency analysis is a matching analysis of the attribute definitions (e.g., without limitation, formats) of the load data records to the attribute definitions of the load table structure. For example, the consistency-related parameter Q3 may be calculated according to equation (3):

the Σ column of load Data file may represent an attribute definition of a real load Data record, and the Σ column of load attributes may represent an attribute definition of a load table.

The timeliness-related parameter 104d may be determined based on a ratio of the update data amount to the total data amount. In some embodiments, the timeliness analysis may be a statistical analysis of data record updates. For example, the time-dependent parameter Q4 may be calculated according to equation (4):

where Σ lines of updateData can represent the amount of update data, and Σ lines of grossData can represent the total amount of data.

Next, in step 205, the accuracy-related parameter 104a, the integrity-related parameter 104b, the consistency-related parameter 104c, and the timeliness-related parameter 104d may be combined to determine a quality evaluation parameter for each data. In this way, good robustness of the quality assessment parameters of various data can be achieved. In some embodiments, corresponding weights may be applied to the accuracy-related parameter, the integrity-related parameter, the consistency-related parameter, and the timeliness-related parameter, respectively, so as to consider the evaluation index parameters of the four layers differentially and comprehensively. In some embodiments, the distribution of causes of errors of data transmitted in the energy internet may be such that the weight of the accuracy-related parameter, the weight of the integrity-related parameter, the weight of the consistency-related parameter, and the weight of the timeliness-related parameter are sequentially reduced.

For example, the quality assessment parameter Q for various data may be determined by way of a weighted sum of Q1, Q2, Q3, and Q4 according to equation (5):

q w 1Q 1+ w 2Q 2+ w 3Q 3+ w 4Q 4 formula (5)

Wherein w1, w2, w3 and w4 are weights of Q1, Q2, Q3 and Q4, respectively.

In some embodiments, the respective weights may be determined separately by forming a decision matrix and based on the decision matrix. In particular, the relative importance of accuracy, completeness, consistency, and timeliness may be determined and represented by a matrix to form a determination matrix. For example, the relative importance of the various hierarchical factors may be determined based on the reported proportion of data errors or anomalies that are caused in the historical data. The eigenvectors of the judgment matrix corresponding to the maximum eigenvalues can be determined, the relative weight values w1, w2, w3 and w4 of each layer factor (for example, the accuracy related parameter 104a, the integrity related parameter 104b, the consistency related parameter 104c and the timeliness related parameter 104d) can be obtained after normalization, and whether the consistency condition is satisfied or not can be verified and corresponding adjustment can be performed to obtain the weights w1, w2, w3 and w4 satisfying the consistency condition.

In some embodiments, in order to calculate the accuracy-related parameters 104a, especially the accuracy-related parameters 104a of various data sub-classes of dynamic data, a sample characteristic curve of historical data needs to be calculated. As an example, the following steps may be taken to calculate via a k-value clustering algorithm.

The calculation sub-process may begin with reading a sample data set M ═ x from a database₁,x₂,x₃,…,x_nN is a natural number, wherein x_i＝{x_i1,x_i2,…,x_itCalculating Euclidean distance between sample data as distance metric D (x)_i,x_j). It is also possible to use other distance measures, here using the euclidean distance as an example of a distance measure, x_iA sequence of sample data of class i; it is the number of samples in the ith class. The number k of desired clusters (i.e., classes) may be preset.

K objects may be selected from M as centroids of the initial cluster and each object partition is classified to the closest cluster point according to the nearest classification method.

Batch-wise modification methods may be employed to modify the partitioning. The point of convergence and the division may be modified after all objects are entered. In particular, the steps of the batch-wise modification method may be as follows.

Selecting the classification number k, selecting k values from all sample data as initial central points, wherein each central point belongs to an independent class, and the set of the classes where the central points are located can be expressed as S ═ S { (S ═ S { (S) }₁,s₂,…,s_k}。

And classifying all sample data into the class where the central point closest to the sample data is located, recalculating the central point of the class, and replacing the old central point with the new central point. For example, if D (x)_i,s_j) Is the minimum value, then x can be determined_i∈s_j(ii) a The new center point may be calculated according to equation (6).

Wherein, C_kRepresenting the new center point in the class in which the kth center point is located.

In some embodiments, the classification may be stopped if the pre-update centerpoint and the post-update centerpoint are found to differ less (e.g., less than a threshold), otherwise the update iteration process may continue. In some embodiments, a preset condition may also be preset, and the update iteration process may be ended if the preset condition is satisfied. For example, a square error criterion function may be used as a preset condition, and the k clusters with the minimized square error criterion function are finally obtained. By this K-value clustering process, a representative load characteristic curve can be obtained on the basis of the historical load data. For example, but not by way of limitation, a daily load characteristic curve may be extracted in conjunction with an effective index criterion.

Fig. 3 illustrates an exemplary schematic diagram of a sub-process of pre-data classification of data from an energy hub station in a data processing method in an energy internet according to an embodiment of the present disclosure, which may be performed before an accuracy-related parameter, an integrity-related parameter, a consistency-related parameter, and a timeliness-related parameter of various data. As shown in fig. 3, domain analysis 301 may be performed on the data. The domain analysis 301 is primarily intended to analyze the source of data from the energy hub, determine whether the data originated from operations within the site, originated from the outside environment, originated from a preset of the facility (e.g., including the site and equipment, etc.), originated from a preset of the transaction, etc. Through the domain analysis 301, respective sub-categories of the static data 102 and the dynamic data 103 can be identified accordingly, for example, data 301a which is originated from the station internal operation, i.e., can be identified as internal operation related data, which can be power station production operation data, provides support for the production decision of the upper scheduling management mechanism, and accepts the authority management of the upper scheduling management mechanism. For example, the external environment-related data 301b, which is derived from the external environment, can be identified as the external environment-related data, which can be weather environment-type data closely related to the production operation, and provides high-precision weather data for the upper-level scheduling plan. For example, pre-set, i.e., identifiable, facility-related data 301c derived from a facility, such as model data, which may include both site models and equipment models, provide accurate parameters for both site-level and area-level online analysis applications. For example, pre-set data derived from a transaction may be identified as transaction configuration related data 301 d. Subclassing can be efficiently performed by domain analysis 301 in a qualitative analytic manner to facilitate subsequent storage and processing of data (including but not limited to upper-level application analysis such as evaluation and modification).

The data may also be subjected to a structure type analysis 302 to divide the data into structured data 302a, unstructured data 302b, and semi-structured data 302 c. Specifically, the corresponding data can be obtained in different reading modes according to the structure type of the data to form an instance library for data quality analysis; after the domain analysis 301 and the structure type analysis 302, an instance table structure for different business application analysis can be formed, so as to be stored in different business tables after subsequent processing.

The above division of the structure types and the subclass division based on the data sources are beneficial to the preprocessing of data, including the processes of data cleaning, data integration, normalization processing, storage management and the like, so that timely and accurate data can be provided for the subsequent upper-layer application analysis based on the data.

Specifically, unstructured data 302b and semi-structured data 302c may be converted to structured data 302a for upload using the at least one processor 101. For example, data extraction and filter cleansing operations may be performed on unstructured data 302b, which involves binning requirements, to form structured data 302a and perform data insertion operations on the corresponding database. For structured data 302a to be uploaded, at least one of the following steps may be performed.

In some embodiments, the update data record number (the numerator that affects equation (4)) may be updated in the event of an upload or insertion failure.

In some embodiments, for periodically uploaded data, the domain scope and data type of the data may be determined, and in the case of a missing record of high priority for data upload, the number of missing data records (numerator affecting equation (2)) may be updated.

In some embodiments, for periodically uploaded data, the domain range and the data type where the data is located may be determined, and in the case that the data is repeated with other data records of the same attribute but the content is inconsistent, the data consistency record number (the numerator affecting equation (3)) is updated.

In some embodiments, for the data uploaded in the failure event, the domain scope and the data type where the data is located may be determined, and in the case that the record of which the data upload is at a high priority is missing, the number of missing data records (the numerator affecting equation (2)) is updated.

In some embodiments, for the data uploaded by the failure event, the domain range and the data type where the data is located may be determined, and in the case that the data is repeated with other data records of the same attribute but the content is inconsistent, the data consistency record number (the numerator influencing the formula (3)) is updated.

Further, associations between structured data 302a can be identified and integrated. For example, the incidence relation between the attributes of the structured data can be identified, an isolated and one-sided data set oriented to a business system is reduced into a set of complete and comprehensive data, and the digital twin corresponding fusion of the static data 102 and the dynamic data 103 is realized, so that timely and accurate data is provided for the subsequent upper-layer application analysis based on the data.

Fig. 4 shows a flowchart of a data processing method in an energy internet according to an embodiment of the present disclosure. As shown in fig. 4, data from the energy hub station may be loaded at step 401. Four levels of analysis 402, namely integrity analysis, consistency analysis, timeliness analysis and accuracy analysis, may be performed at step 402 for each subclass of loaded data to obtain accuracy related parameters, integrity related parameters, consistency related parameters and timeliness related parameters. In step 403, data quality evaluation may be performed, for example, the accuracy related parameter, the integrity related parameter, the consistency related parameter, and the timeliness related parameter may be integrated to determine quality evaluation parameters of various data.

A data modification process may be performed at step 404 that is preferably, but not limited to, applicable to various data in the dynamic data. The current data may be loaded first. A sample profile thereof may be determined based on the current data with the at least one processor. The sample curve of the current data may be compared to a corresponding sample characteristic curve of historical data and an abnormal data point may be determined based on the comparison. The outlier data point can be corrected using a corresponding segment of the sample profile relative to the outlier data point.

In some embodiments, determining an outlier data point based on the comparison result may further comprise: determining a difference between the sample curve and corresponding data points of a sample characteristic curve; in the event that the difference between corresponding data points exceeds a fluctuation threshold, the data point in the sample curve may be determined to be an abnormal data point.

Specifically, after the characteristic curve is obtained, the smoothness of the characteristic curve is used for checking abnormal data points in the historical data. Is provided with L_dIs a sample curve, L_tIs a sample characteristic curve. Using the sample curve L_dIs set to be L_d(i) Let it be the same value L as the daily load characteristic curve_t(i) By comparison, the fluctuation rate δ (i) of these two points is calculated, see the following equation (7).

The normal range of the rate of change of the value within a certain range of the history can be counted and recorded as [ + D, -D ]. Then, whether the change rate of the ith point of the sample data is in the normal range of [ + D, -D ] is compared, so that whether the point is an abnormal data point is judged. After abnormal data are found, correction can be carried out in time, and the corresponding section of the characteristic curve can be translated to the detected data. In some embodiments, in case both ends of the respective segment of the characteristic curve cannot be translated exactly to both ends of the corresponding segment of the detected data (with a deviation), the former may also be translated to such a position that the average distance to the latter is minimal.

In some embodiments, correcting the outlier data point using a corresponding segment of the sample profile relative to the outlier data point may further comprise correcting the outlier data point according to the following equation:

wherein L is_rRepresents the corrected sample curve, L_tRepresenting the characteristic curve of the sample, L_dThe curve is a sample curve before correction, the m-th to n-th points are abnormal data points, and i represents the serial number of the abnormal data points.

Corrections may be made primarily for abnormal data points in the data without altering normal data points, and the corrected data may be stored (step 405) for subsequent application analysis. Therefore, data points with large difference with the characteristic data can be effectively removed, the data quality is effectively guaranteed, the adjusted curve has better similarity and smoothness, and a foundation can be laid for subsequent application analysis.

In some embodiments, the data may be quality evaluated after the correction and before the correction, also referred to as post-evaluation and pre-evaluation, respectively. For example, after correcting the abnormal data point, the at least one processor 101 may determine an accuracy-related parameter 104a, an integrity-related parameter 104b, a consistency-related parameter 104c, and an aging-related parameter 104d for each of the static data 102 and the corrected dynamic data 103, respectively, and synthesize the accuracy-related parameter 104a, the integrity-related parameter 104b, the consistency-related parameter 104c, and the aging-related parameter 104d to determine a quality evaluation parameter of the corrected corresponding data. The quality improvement effect of the data correction on the data can be verified by comparing the pre-evaluation result with the post-evaluation result. In some embodiments, step 404 may also be performed iteratively, and if a certain condition is met, for example, if a difference between the evaluation result before the iterative correction step and the evaluation result after the iterative correction step is lower than a certain threshold, it is considered that the data quality is not improved greatly by continuing the iteration, and the correction has reached the desired effect, and may be ended.

Moreover, although exemplary embodiments have been described herein, the scope thereof includes any and all embodiments based on the disclosure with equivalent elements, modifications, omissions, combinations (e.g., of various embodiments across), adaptations or alterations. The elements of the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification or during the prosecution of the application, which examples are to be construed as non-exclusive. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.

The order of the various steps in this disclosure is merely exemplary and not limiting. The order of execution of the steps may be adjusted without affecting the implementation of the present disclosure (without destroying the logical relationship between the required steps), and various embodiments obtained after the adjustment still fall within the scope of the present disclosure.

The above description is intended to be illustrative and not restrictive. For example, the above-described examples (or one or more versions thereof) may be used in combination with each other. For example, other embodiments may be used by those of ordinary skill in the art upon reading the above description. In addition, in the foregoing detailed description, various features may be grouped together to streamline the disclosure. This should not be interpreted as an intention that a disclosed feature not claimed is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the detailed description as examples or embodiments, with each claim standing on its own as a separate embodiment, and it is contemplated that these embodiments may be combined with each other in various combinations or permutations. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

1. A data processing method in an energy Internet is characterized by comprising the following steps:

receiving data from a plurality of energy hub stations;

dividing, with at least one processor, the received data into static data and dynamic data, further dividing the static data into various data including facility-related data, affiliation-related data, and transaction-configuration-related data of the energy hub station, and further dividing the dynamic data into various data including external environment-related data, internal operation-related data, and transaction dynamic data of the energy hub station;

and respectively determining an accuracy related parameter, an integrity related parameter, a consistency related parameter and a timeliness related parameter for the various data by utilizing the at least one processor, and determining quality evaluation parameters of the various data by integrating the accuracy related parameter, the integrity related parameter, the consistency related parameter and the timeliness related parameter.

2. The data processing method according to claim 1, further comprising, for each of the dynamic data:

loading current data;

determining, with the at least one processor, a sample curve thereof based on the current data;

comparing the sample curve of the current data with the sample characteristic curve of the corresponding historical data;

determining an abnormal data point based on the comparison result; and

the abnormal data points are corrected using corresponding segments of the sample profile relative to the abnormal data points.

3. The data processing method of claim 2, wherein the sample profile is determined using cluster analysis based on corresponding historical data.

4. The data processing method of claim 2, wherein determining an outlier data point based on the comparison further comprises:

determining a difference between the sample curve and corresponding data points of a sample characteristic curve;

in a case where a difference between corresponding data points exceeds a fluctuation threshold, the data point in the sample curve is determined to be an abnormal data point.

5. The data processing method of any of claims 2-4, wherein correcting the outlier data point using a corresponding segment of the sample profile relative to the outlier data point further comprises correcting the outlier data point according to the following equation:

wherein L is_rRepresents the corrected sample curve, L_tRepresenting the characteristic curve of the sample, L_dThe curve represents a sample curve before correction, where the m-th to n-th points represent abnormal data points, and i represents the serial number of the abnormal data points.

6. The data processing method of claim 2, further comprising, after correcting the outlier data point, by the at least one processor:

and respectively determining an accuracy related parameter, an integrity related parameter, a consistency related parameter and a timeliness related parameter for each kind of data of the static data and the modified dynamic data, and determining a quality evaluation parameter of the modified corresponding data by integrating the accuracy related parameter, the integrity related parameter, the consistency related parameter and the timeliness related parameter.

7. The data processing method of claim 1, wherein the accuracy-related parameter is determined based on a ratio of a number of data points deviating from the sample characteristic curve to a number of data points of the sample curve, the integrity-related parameter is determined based on a ratio of a missing data amount to a total data amount, the consistency-related parameter represents a degree of matching of an attribute definition of the load data record with an attribute definition of the load table, and the timeliness-related parameter is determined based on a ratio of an updated data amount to a total data amount.

8. The data processing method of claim 7, wherein determining quality assessment parameters for various data by integrating the accuracy-related parameter, the integrity-related parameter, the consistency-related parameter, and the timeliness-related parameter further comprises: forming a judgment matrix; determining respective weights of the accuracy related parameter, the integrity related parameter, the consistency related parameter and the timeliness related parameter based on the judgment matrix; and determining the weighted results of the accuracy related parameter, the integrity related parameter, the consistency related parameter and the timeliness related parameter based on respective weights as quality evaluation parameters of various data.

9. The data processing method of claim 1, further comprising, prior to determining the accuracy-related parameter, the integrity-related parameter, the consistency-related parameter, and the timeliness-related parameter for the respective data, further utilizing at least one processor to:

analyzing the received data for structure types, wherein the structure types comprise structured data, unstructured data and semi-structured data;

converting, with the at least one processor, the unstructured data into structured data for uploading, identifying and integrating associations between the structured data.

10. The data processing method of claim 9, further comprising, for the structured data to be uploaded, performing at least one of:

updating the number of the updated data records under the condition of uploading or inserting failure;

for periodically uploaded data, under the condition that the records of the data uploaded at high priority are missing, updating the number of the missing data records;

for periodically uploaded data, under the condition that the data are repeated with other data records with the same attribute but the contents of the data records are inconsistent, updating the data consistency record number;

for the data uploaded by the fault event, under the condition that the records uploaded by the data at the high priority are missing, updating the number of the missing data records; and

and for the data uploaded by the fault event, updating the data consistency record number when the data is repeated with other data records with the same attribute but the content is inconsistent.

11. The data processing method of claim 1, wherein the at least one processor is in a cloud.

12. A data processing system in an energy internet, the data processing system comprising:

an interface configured to receive data from a plurality of energy hub stations;

at least one processor configured to:

dividing the received data into static data and dynamic data, further dividing the static data into various data including facility related data, attribution related data and transaction configuration related data of the energy hub station, and further dividing the dynamic data into various data including external environment related data, internal operation related data and transaction dynamic data of the energy hub station;

and respectively determining an accuracy related parameter, an integrity related parameter, a consistency related parameter and a timeliness related parameter for the various data, and integrating the accuracy related parameter, the integrity related parameter, the consistency related parameter and the timeliness related parameter to determine quality evaluation parameters of the various data.