CN117131464B - Availability evaluation method and system for power grid data - Google Patents

Availability evaluation method and system for power grid data Download PDF

Info

Publication number
CN117131464B
CN117131464B CN202311387218.2A CN202311387218A CN117131464B CN 117131464 B CN117131464 B CN 117131464B CN 202311387218 A CN202311387218 A CN 202311387218A CN 117131464 B CN117131464 B CN 117131464B
Authority
CN
China
Prior art keywords
data
reliability
evaluation
low
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311387218.2A
Other languages
Chinese (zh)
Other versions
CN117131464A (en
Inventor
刘勇昊
常强
俞亮
董彬
夏远清
许瀚
屈虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Central China Technology Development Of Electric Power Co ltd
Original Assignee
Hubei Central China Technology Development Of Electric Power Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Central China Technology Development Of Electric Power Co ltd filed Critical Hubei Central China Technology Development Of Electric Power Co ltd
Priority to CN202311387218.2A priority Critical patent/CN117131464B/en
Publication of CN117131464A publication Critical patent/CN117131464A/en
Application granted granted Critical
Publication of CN117131464B publication Critical patent/CN117131464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a system for evaluating availability of power grid data, wherein the method comprises the following steps: acquiring power grid data and constructing a test set and a verification set; performing encryption detection on the test set, and taking the encrypted data as high-credibility data; performing reliability evaluation on the low-reliability data, wherein the evaluation results conform to expected data with high reliability and can be adjusted; integrating the high-reliability data into a plurality of data fragments according to the requirement of a preset task model, inputting the task model for matching, and carrying out matching again after modifying the data fragments with the matching degree which does not meet the expectations. According to the method, aiming at micro-grid data, data availability evaluation is carried out from two aspects of data reliability and data adaptation, data screening in four dimensions of data quality, data source, data standardization degree and data safety is realized on the premise of saving calculation force resources and reducing calculation amount, and the evaluation of the availability of the grid data is completed, so that reliable and effective evaluation results are obtained.

Description

Availability evaluation method and system for power grid data
Technical Field
The invention relates to the field of power grid data processing, in particular to a method and a system for evaluating availability of power grid data.
Background
Smart grids, which are a new type of power technology, are a major direction of global power development, and are also an indispensable link in power transmission and conversion. The system has the characteristics of high safety, high economy, high reliability and the like, and can effectively reduce the potential risk hidden in the power transportation process in the running process of the system. In the deep advancing process of electric power informatization, the electric power data size and the data variety are growing increasingly, and the emerging huge data influence the data transmission and data storage work of the whole electric power system to a certain extent and threaten the intelligent construction of a power grid.
Grid data relates to a great number of main bodies on a full chain of transmission, distribution, transformation and utilization, including transmission management, dispatching balance management, long-term planning, system protection, operation and maintenance management, market transaction and the like from generation to users, and various production activities related to the power are often associated with the grid data. Such data specifically includes, but is not limited to: smart meter and various sensor data, grid-edge distributed energy (renewable energy, electric vehicles, etc.) data, internet of things IoT device data, substation automation data, asset condition monitoring data, distribution grid data analysis, vegetation management data, user participation data, energy forecast and energy market data, geospatial information system data, archive management data, and the like.
The power grid data has the following characteristics: mass performance: the four directions of the power system lead the data of the power grid to be extremely large in scale, and the data of the power grid relates to a plurality of main bodies and links on a full chain for transmission, distribution, transformation and use. Diversity of: the grid data types are diverse, including but not limited to smart meters and various sensor data, distributed energy data at the grid edge, ioT device data in the internet of things, etc., and complex relationships may exist between the various data. Real-time performance: the grid data is generated in real time, requiring real-time processing and analysis to support real-time decisions and operations. The value density is low: there is a large amount of invalid and redundant information in the grid data, and the value density is relatively low.
The power grid data concern the safe and stable operation of the power system, and has high requirements on the usability of the data. Because of the massive and complex power grid data, the traditional data evaluation model is difficult to process and analyze the data, and an effective data availability evaluation result cannot be obtained.
Disclosure of Invention
In view of this, the invention provides a method and a system for evaluating availability of grid data, and the specific scheme is as follows:
an availability evaluation method of power grid data comprises the following steps:
acquiring power grid data generated in the operation process of different power equipment, clustering the power grid data according to equipment types or data types to obtain a plurality of groups of data sets, and sequentially and randomly selecting sample data from each group of data sets to construct a test set and a verification set;
judging whether each data in the test set is encrypted data or not by combining the data characteristics and the encryption mode, and taking the encrypted data as high-reliability data and the unencrypted data as low-reliability data;
respectively carrying out reliability evaluation on low-reliability data and data with the same data source and/or data format as the low-reliability data in the verification set by combining the historical data of the equipment type, wherein the two evaluation results are consistent with the expected low-reliability data and can be adjusted to high-reliability data;
defining all the high-credibility data and the data with the same data source and data format as the high-credibility data as first data which can be trusted, and integrating the first data into a plurality of data fragments according to the requirement of a preset task model;
and inputting each data segment into a task model with corresponding requirements for matching, and listing the data segments with matching degree meeting the expectations into the high-availability data segments of the task model under the corresponding requirements, wherein the data segments with matching degree not meeting the expectations are matched again after being modified.
In a specific embodiment, the confidence assessment includes:
the low-credibility data are used as data to be evaluated and are divided according to the equipment types, and the probability of occurrence of unreliable data of each equipment type in the past is calculated through analysis of historical data to obtain evaluation probability;
if the evaluation probability is 0, directly recognizing that the evaluation result of the low-reliability data corresponding to the equipment type accords with the expectation, and adjusting the evaluation result to be high-reliability data;
if the evaluation probability is greater than 0 and smaller than a preset threshold value, evaluating one or more evaluation items including data quality, data source, data format, data safety and data relevance for the low-reliability data and the data with the same data source and/or data format in the verification set and the low-reliability data respectively, and judging that the evaluation result of the data does not meet the expectations if any evaluation item is unqualified;
if the evaluation probability is not smaller than the preset threshold, directly judging that the evaluation result of the low-reliability data corresponding to the equipment type does not accord with the expectation.
In a specific embodiment, analyzing historical data through a probability statistical model to obtain a normal threshold value of each datum, wherein the normal threshold value relates to a data range when the power equipment normally operates;
if the data to be evaluated is not in the corresponding normal threshold value, the data quality evaluation of the data is judged to be inconsistent with the expectation.
In one embodiment, the presence or absence of a particular power device generated from the trace data;
if so, checking whether the power equipment corresponding to the data is legal or not in the historical data and the preset equipment list in sequence, and recognizing that the data source of the data accords with the expectation under the condition of legal.
In one embodiment, data having only the same data source or data format as the high-confidence data is used as the medium-confidence data;
and carrying out reliability evaluation on the reliability data in the pair or the data in the verification set, which have the same data source and data format as the medium reliability data, by combining the historical data of the equipment type, wherein the evaluation result accords with the expected medium reliability data and can be adjusted to be high reliability data.
In a specific embodiment, screening all data to be evaluated with association relations and dividing the data into a plurality of association groups according to the association relations;
if the data to be evaluated in the association group does not accord with the corresponding association relationship, judging that the data association relationship evaluation of the data does not accord with the expectation.
In a specific embodiment, the encryption mode determining process includes:
all encryption algorithms involved in the power grid data are obtained in advance;
and screening the encrypted data based on the data characteristics by analyzing the data characteristics including the data length, the data character and the data occurrence frequency of the data encrypted by various encryption algorithms and utilizing a preset data encryption screening tool.
In a specific embodiment, different data items generated by the same equipment type are distributed in the data set in sequence according to the time dimension;
setting a corresponding time interval by analyzing the distribution density of each data item in the time dimension;
and randomly selecting any one or more data in each data item under the corresponding time point based on the same time interval and different time starting points to obtain a test set and a verification set, so that the data in the test set and the verification set are distributed in a staggered manner in the time dimension.
An availability evaluation system of grid data, comprising:
the preprocessing unit is used for acquiring power grid data generated in the operation process of different power equipment, clustering the power grid data according to equipment types or data types to obtain a plurality of groups of data sets, and randomly selecting sample data from each group of data sets in sequence to construct a test set and a verification set;
the encryption detection unit is used for judging whether each data in the test set is encrypted data or not according to the data characteristics and the encryption mode, and taking the encrypted data as high-reliability data and the unencrypted data as low-reliability data;
the reliability adjustment unit is used for respectively carrying out reliability evaluation on the low reliability data and the data with the same data source and/or data format as the low reliability data in the verification set in combination with the historical data of the equipment type, and the two evaluation results are consistent with the expected low reliability data and can be adjusted into high reliability data;
the data integration unit is used for defining all the high-credibility data and the data with the same data source and data format as the high-credibility data as the first data which can be trusted, and integrating the first data into a plurality of data fragments according to the requirement of a preset task model;
the data matching unit is used for inputting each data segment into a task model of corresponding requirements for matching, and listing the data segments with matching degree meeting the expectations into the high-availability data segments of the task model under the corresponding requirements, and the data segments with matching degree not meeting the expectations are matched again after modification.
In a specific embodiment, the reliability adjustment unit specifically includes:
the low-credibility data are used as data to be evaluated and are divided according to the equipment types, and the probability of occurrence of unreliable data of each equipment type in the past is calculated through analysis of historical data to obtain evaluation probability;
if the evaluation probability is 0, directly recognizing that the evaluation result of the low-reliability data corresponding to the equipment type accords with the expectation, and adjusting the evaluation result to be high-reliability data;
if the evaluation probability is greater than 0 and smaller than a preset threshold value, evaluating one or more evaluation items including data quality, data source, data format, data safety and data relevance for the low-reliability data and the data with the same data source and/or data format in the verification set and the low-reliability data respectively, and judging that the evaluation result of the data does not meet the expectations if any evaluation item is unqualified;
if the evaluation probability is not smaller than the preset threshold, directly judging that the evaluation result of the low-reliability data corresponding to the equipment type does not accord with the expectation.
The beneficial effects are that: the invention provides a method and a system for evaluating availability of power grid data, which are used for evaluating availability of the data from two aspects of data reliability and data adaptation degree aiming at microscopic power grid data, and realizing data screening of four dimensions of data quality, data source, data standardization degree and data safety on the premise of saving calculation power resources and reducing calculation amount, so as to complete the power grid data availability evaluation, discover possible problems in the data in time and obtain reliable and effective power grid data availability evaluation results, thereby guaranteeing stable operation of a power grid.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an usability evaluation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an availability evaluation method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an availability evaluation system according to an embodiment of the present invention.
Reference numerals: 1-a pretreatment unit; 2-an encryption detection unit; 3-a reliability adjustment unit; a 4-data integration unit; 5-data matching unit.
Detailed Description
Hereinafter, various embodiments of the present disclosure will be more fully described. The present disclosure is capable of various embodiments and its modifications and variations are possible in light of the above teachings. However, it should be understood that: there is no intention to limit the various embodiments of the present disclosure to the specific embodiments disclosed herein, but rather the present disclosure is to be understood to cover all modifications, equivalents, and/or alternatives falling within the spirit and scope of the various embodiments of the present disclosure.
The invention evaluates the usability of the data mainly from the two aspects of credibility and matching degree. Data reliability refers to the degree to which data can be considered reliable and valid under certain application conditions. The data matching degree is the matching degree between the data and the data form required by the task demand model, and different task demand models often need specific data forms.
In the invention, high-reliability data means that the reliability of the data is high, and the data can be directly subjected to adaptation evaluation. The medium credibility data can be converted into high credibility data after relatively light examination. The low-reliability data can be converted into high-reliability data through more strict examination. The device can be simply halved to divide high, medium and low, and can be flexibly set according to practical application.
The terminology used in the various embodiments of the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments of the disclosure. As used herein, the singular is intended to include the plural as well, unless the context clearly indicates otherwise. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present disclosure belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is identical to the meaning of the context in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in the various embodiments of the disclosure.
Example 1
The embodiment 1 of the invention discloses a method for evaluating the availability of power grid data, which evaluates the availability of the power grid data from two aspects of data reliability and data adaptation degree, can measure the reliability and the integrity of the power grid data and provides scientific basis for the operation decision of power enterprises. The usability evaluation method is shown in the figure 1, the principle is shown in the figure 2, and the specific scheme is as follows:
an availability evaluation method of power grid data comprises the following steps:
101. acquiring power grid data generated in the operation process of different power equipment, clustering the power grid data according to equipment types or data types to obtain a plurality of groups of data sets, and sequentially and randomly selecting sample data from each group of data sets to construct a test set and a verification set;
102. judging whether each data in the test set is encrypted data or not by combining the data characteristics and the encryption mode, and taking the encrypted data as high-reliability data and the unencrypted data as low-reliability data;
103. respectively carrying out reliability evaluation on low-reliability data and data with the same data source and/or data format as the low-reliability data in the verification set by combining the historical data of the equipment type, wherein the two evaluation results are consistent with the expected low-reliability data and can be adjusted to high-reliability data;
104. defining all the high-credibility data and the data with the same data source and data format as the high-credibility data as first data which can be trusted, and integrating the first data into a plurality of data fragments according to the requirement of a preset task model;
105. and inputting each data segment into a task model with corresponding requirements for matching, and listing the data segments with matching degree meeting the expectations into the high-availability data segments of the task model under the corresponding requirements, wherein the data segments with matching degree not meeting the expectations are matched again after being modified.
The steps 102-103 are mainly performed for evaluating the reliability of the power grid data, and the steps 104-105 are performed for evaluating the adaptability of the power grid data and the task model. The task model can be understood as a demander who often needs a specific data format.
The embodiment evaluates the usability of the data from two aspects of the reliability of the data and the adaptation degree of the data, wherein the reliability of the data mainly aims at the quality, the source and the safety of the data, and the adaptation degree of the data mainly aims at the standardization degree of the data. In other words, the availability of the grid data is measured from four dimensions of data quality, data source, data standardization level and data security.
Data reliability is affected by a variety of factors such as data quality, data source, data standardization level, data privacy and security. In the data analysis and decision process, if the data reliability is high, the data has a larger influence on the reliability of analysis and decision; conversely, if the reliability of the data is low, the data has less impact on the reliability of analysis and decision making. In the power system, through real-time monitoring and analysis of the power data, information such as the running state of the power equipment, the price and the demand of the power market and the like can be obtained. The information has important significance for stable operation and optimal management of the power system. If errors or anomalies exist in the processing and analyzing processes of the data, the credibility of the data is affected, so that the stable operation and the optimal management of the power system are adversely affected.
The power grid data for which the present embodiment is directed is focused on microscopic data, and is limited only to data generated during operation of a specific power device, such as voltage and current of a transformer, various sensor data, metering data in a power grid, and the like. The data relate to specific equipment information of the power grid, are various, can be continuously produced, are extremely easy to generate abnormal data, and can easily trace to specific generation equipment. The management data such as production management, resource management, operation management, etc. of the power enterprise, the macroscopic data of the power system, and the power transaction data such as price, transaction amount, transaction protocol, etc. of the power market do not belong to the power grid data of the present embodiment.
After the power grid data are obtained, the power grid data are clustered according to the type of the power equipment to obtain a plurality of groups of data sets. The granularity of the device type division needs to be specifically set according to the data size and the requirements of the task model. For example, in the case of a small data size, the device type may be refined into devices directly involved in the production and distribution of electric energy, such as generators, transformers, circuit breakers, disconnectors, power cables, and the like. Under the condition of large data scale, the device types can be widely divided into primary devices, secondary devices, communication systems and other large devices. In some cases, even parameters describing the same feature may be classified as a class, i.e. according to the data type, such as the voltage parameters of the transformer, a long time recording may generate a large number of parameters of this type, and the voltage parameters may be classified as a class alone.
In this embodiment, each set of data sets corresponds to a type of power device, and the purpose of constructing the data sets is to pick sample data from each data set to construct a test set and a validation set. The data in the test set and the validation set each include sample data in all of the data sets so as to be sufficiently representative of each data set. In other words, the test results of the test set represent the user's confidence in the various types of data sets. The number of test sets is greater than the validation set to increase the data's decision.
Specifically, in the data set, different data items generated by the same equipment type are distributed in sequence according to the time dimension; setting a corresponding time interval by analyzing the distribution density of each data item in the time dimension; based on the same time interval and different time starting points, randomly selecting any one or more data in each data item under the corresponding time point to obtain a test set and a verification set, so that the data in the test set and the verification set are distributed in a staggered manner in the time dimension, and the representativeness of the test set and the verification set is further improved.
In practical applications, telemetry, and remote control data in the power grid are key information for operation and maintenance of the power grid, and encryption processing is needed to prevent malicious attack or tampering. Metering data, load data, transaction data and the like in the power grid, wherein the data relate to economic operation of the power grid, electric power transaction and the like, and encryption processing is needed to protect the privacy and safety of the data. Smart meters and various sensor data, distributed energy data at the edge of the grid, etc., which relate to equipment and asset information of the grid, require encryption processing to prevent data leakage and unauthorized access. These data relate to parameters associated with the particular power equipment or parameters detected by the particular device.
The present embodiment first evaluates the credibility of the data by judging whether the data is encrypted or not. In the grid system, the data to be encrypted is mainly key and sensitive information related to the operation, maintenance, equipment, management and the like of the grid. Data to be encrypted is often highly secure or has been subjected to a round of data screening, at least to a degree that the data is standardized. The encrypted data itself has a certain data value, and can reflect the usability of the data to a certain extent. Therefore, the embodiment directly takes the encrypted data as high-reliability data, and does not need to carry out subsequent reliability evaluation. The non-encrypted data is temporarily used as low-reliability data, and can be adjusted to high-reliability data after the reliability evaluation is completed. In this embodiment, the subsequent data specification processing can be performed only if the high-reliability data representing data passes the reliability detection.
Preferably, the judging flow of the encryption mode includes: all encryption algorithms involved in the power grid data are obtained in advance; and screening the encrypted data based on the data characteristics by analyzing the data characteristics including the data length, the data characters and the data occurrence frequency of the data encrypted by various encryption algorithms and utilizing a preset data encryption screening tool. Encrypted data typically has certain data characteristics such as shorter data length, lower frequency of occurrence, more complex data structure, etc. Different encryption algorithms can show different characteristics and rules after encrypting the data. The characteristics and rules of the encrypted data can be mined from a large amount of data by using a data mining technology, such as association rule mining, cluster analysis and the like, and the encrypted data can be rapidly and efficiently screened by means of a data encryption tool, a data desensitization tool and the like.
Specifically, the reliability evaluation includes: the low-reliability data are used as data to be evaluated and are divided according to the device types, and the probability of occurrence of unreliable data of each device type in the past is calculated through analysis of historical data to obtain evaluation probability. Some equipment types are easy to generate error data, and some important equipment types are required to ensure excellent performance due to the importance of the data, so that daily maintenance is required to ensure the normal operation of the equipment, the error probability of the data is relatively small, and the credibility of the data is naturally high. The present embodiment, starting from historical data, studies and judges such devices. If the evaluation probability is 0, the equipment of the type is proved to have no error, even if the error is caused, the probability is very low, the evaluation result of the low-reliability data corresponding to the equipment type is directly determined to be in line with the expectation, and the evaluation result is adjusted to be the high-reliability data; if the evaluation probability is greater than 0 and smaller than a preset threshold value, partial errors occur in the data of the type of equipment, but the frequency of the errors is lower, and reliability evaluation is needed to be carried out to avoid prediction errors; the preset threshold value needs to be set according to the requirement in practical application, and is generally not more than 10%. More than 10% means that the error probability exceeds one, and in particular in a huge amount of grid data, one-component error data is also more. And the special requirement on the precision can be limited to 5% or even 1%. Evaluating the low-reliability data and the data with the same data source and/or data format in the verification set and the low-reliability data respectively, wherein one or more evaluation items including data quality, data source, data format, data security and data relevance are evaluated, and if any evaluation item is not qualified, the evaluation result of the data is judged to be not in accordance with expectations; if the evaluation probability is not smaller than the preset threshold, the equipment is proved to have larger error probability, and the evaluation result of the low-reliability data corresponding to the equipment type is directly judged to be inconsistent with the expectation, so that the calculation power resource is saved.
The confidence level assessment of the present embodiment consists of one or more assessment items, which relate to data quality, data source, data format, data security, and data relevance. Data quality is a key factor affecting the reliability of data, including accuracy, integrity, consistency, reliability, and traceability of data. For the operation data of the power system, the evaluation can be performed by comparing the history of the data, the adaptation degree with other data sources, the stability and reliability of data transmission and the like. Data sources are also important factors in evaluating the credibility of data, including data providers, methods and means for data acquisition, data processing modes, and the like. For power system operation data, the evaluation can be performed by knowing from which power plant, which substation the data comes, whether the data acquisition equipment and sensors are working properly, etc. The degree of standardization is a factor for evaluating the credibility of data, and is one of the factors for judging whether the data meets the unified specification and standard. For the power system operation data, the unit, dimension, calculation method of the data and the like of the data can be compared for evaluation. In addition, data interpretability is whether meaning and background of data are clear or not, and is one of factors for evaluating data credibility. For the operation data of the power system, the evaluation can be performed by knowing the meaning of the data, generating background and influencing factors and the like.
Preferably, the historical data is analyzed through a probability statistical model to obtain a normal threshold value of each data, wherein the normal threshold value relates to a data range when the power equipment normally operates. For example, if the voltage of a device is maintained between 50-60V throughout the year, the normal threshold is 50-60V. The normal threshold reflects the data state of the device when the device is operating normally; if the data to be evaluated is not in the corresponding normal threshold value, the data quality evaluation of the data is judged to be inconsistent with the expectation. Exemplary probabilistic statistical models include bayesian models, decision tree models, and neural network models.
Preferably, whether the specific power equipment generated by the traceable data exists or not; if so, checking whether the power equipment corresponding to the data is legal or not in the historical data and the preset equipment list in sequence, and recognizing that the data source of the data accords with the expectation under the condition of legal.
Preferably, all the data to be evaluated with the association relationship are screened out and are divided into a plurality of association groups according to the association relationship; if the data to be evaluated in the association group does not accord with the corresponding association relationship, judging that the data association relationship evaluation of the data does not accord with the expectation. Association rule mining may be employed to evaluate the degree of confidence between data items by finding and evaluating relationships between sets of items in a data set. Principal component analysis may also be employed, where the original variable is converted to a new variable, the principal component, by linear transformation, where the principal components are uncorrelated and the variance decreases in sequence. This method can be used to evaluate the importance and trustworthiness of the data.
The purpose of setting the test set and the verification set in the embodiment is to evaluate the credibility of the data by means of the similarity of the data items in the same data group, and the high-dimensional data is reduced in dimension to a low-dimensional space, so that the distribution and the similarity of the data can be observed, the credibility of the data is evaluated, and the calculated amount is reduced. The data source and data format are important similar attributes between data items.
The data source is the power equipment generated by the data, and is essentially used for measuring the credibility of the specific power equipment. The data source corresponding to the high-reliability data has higher reliability, and the data source and the parameters with the same data source have higher reliability naturally. For example, some power devices are important, require frequent maintenance and detection, have low natural error probability, and have low error probability for various parameters of the devices.
The data format is different from the data source, and mainly, character strings appear for distinguishing data such as abnormality, wherein the data type is supposed to be integer. The embodiment has higher requirements on the data format, and specifically comprises the following parts: 1. data type: this is a fundamental attribute of data, such as integer (int), floating point number (float), string (string), etc. 2. Data structure: the data structure describes the organization and arrangement of the data. Such as an array (array), list (list), tuple (tuple), etc. 3. Data length: generally, the length or size of the data may be a fixed value or may be variable. For example, when processing audio or video data, the length of the data may vary. 4. Data precision: generally refers to the accuracy or resolution of the data, such as the number of bits of a number or the number of bits after a decimal point. 5. Data format: generally refers to a format or style of data, such as a date format, a currency format, etc. 6. And (3) data coding: data encoding is a process for converting data from one form to another. For example, certain data may need to be encoded using a particular character encoding or compression algorithm. 7. And (3) data verification: typically referred to as a checksum or hash value, etc., of the data for checking the integrity or consistency of the data.
Preferably, only data having the same data source or data format as the high-reliability data is taken as medium-reliability data; and carrying out reliability evaluation on the middle reliability data or the data with the same data source and data format in the verification set and the middle reliability data by combining the historical data of the equipment type, wherein the evaluation result accords with the expected middle reliability data and can be adjusted to be high reliability data. In this embodiment, data having the same data source and data format as the high-reliability data may be used as the high-reliability data, data having the same data source or data format as the high-reliability data may be used as the medium-reliability data, and data having the same data source and/or data format as the low-reliability data may be used as the low-reliability data. The reliability data in the test set is verified only by one time of reliability evaluation, or the reliability data in the test set is verified, and the evaluation result meets the expectation and can be adjusted to high reliability data so as to reduce the calculated amount. The low-reliability data is required to be evaluated twice, so that the medium-reliability data in the test set is required to be verified, the medium-reliability data in the verification set is also required to be verified, and the high-reliability data can be adjusted only by qualification of the two evaluations, so that the accuracy of data prediction is improved.
The data adaptation degree is mainly to maintain compatibility and adaptation degree between data and task models, and needs to be considered by combining business requirements and actual conditions and comprehensively considering factors such as scale, quality, characteristics and model selection of the data. In this embodiment, data preprocessing such as cleaning, filtering, deduplication, normalization, standardization and the like is required to be performed on the highly reliable data to remove impurities, eliminate noise, and unify the scale. After the reliability evaluation, the data are scattered, the data are integrated according to the requirements of the model to obtain data fragments, and then the task model is matched by taking the data fragments as units. And (3) listing the data fragments with the matching degree meeting the expectations into the high-availability data fragments of the task model under the corresponding requirements, and re-matching the data fragments with the matching degree not meeting the expectations after modification, wherein the problems of feature deletion, feature discretization and the like exist if the distribution, the type, the value range, the interrelation and the like of various features in the data need to be checked.
The embodiment provides a method for evaluating availability of power grid data, which is used for evaluating availability of the data from two aspects of data reliability and data adaptation degree aiming at microscopic power grid data, and realizing data screening of four dimensions of data quality, data source, data standardization degree and data safety on the premise of saving computing power resources and reducing calculated amount, so as to complete the power grid data availability evaluation, discover possible problems in the data in time and obtain reliable and effective power grid data availability evaluation results, thereby guaranteeing stable operation of a power grid.
Example 2
The embodiment 2 of the invention discloses a system for evaluating the availability of power grid data, which is characterized in that a method for evaluating the availability of the power grid data in the embodiment 1 is systemized, the specific structure of the system is shown in a figure 3 of the specification, and the specific scheme is as follows:
an availability evaluation system of grid data, comprising:
the preprocessing unit 1 is used for acquiring power grid data generated in the operation process of different power equipment, clustering the power grid data according to equipment types or data types to obtain a plurality of groups of data sets, and randomly selecting sample data from each group of data sets in sequence to construct a test set and a verification set;
the encryption detection unit 2 is used for judging whether each data in the test set is encrypted data according to the data characteristics and the encryption mode, and taking the encrypted data as high-reliability data and the unencrypted data as low-reliability data;
the reliability adjustment unit 3 is configured to perform reliability evaluation on the low reliability data and data in the same data source and/or data format as the low reliability data in the verification set in combination with the historical data of the device type, where the two evaluation results conform to the expected low reliability data and can be adjusted to high reliability data;
the data integration unit 4 is configured to define all the high-reliability data and the data having the same data source and data format as the high-reliability data as the first data that can be trusted, and integrate the first data into a plurality of data segments according to the requirement of the preset task model;
the data matching unit 5 is configured to input each data segment into a task model of a corresponding requirement for matching, and list the data segments with matching degree meeting the expected data segments into high availability data segments of the task model under the corresponding requirement, and re-match the data segments with matching degree not meeting the expected data segments after modification.
The reliability adjustment unit 3 specifically includes: the low-credibility data are used as data to be evaluated and are divided according to the equipment types, and the probability of occurrence of unreliable data of each equipment type in the past is calculated through analysis of historical data to obtain evaluation probability; if the evaluation probability is 0, directly recognizing that the evaluation result of the low-reliability data corresponding to the equipment type accords with the expectation, and adjusting the evaluation result to be high-reliability data; if the evaluation probability is greater than 0 and smaller than a preset threshold value, evaluating one or more evaluation items including data quality, data source, data format, data safety and data relevance for the low-reliability data and the data with the same data source and/or data format in the verification set and the low-reliability data respectively, and judging that the evaluation result of the data does not meet the expectations if any evaluation item is unqualified; if the evaluation probability is not smaller than the preset threshold, directly judging that the evaluation result of the low-reliability data corresponding to the equipment type does not accord with the expectation.
The invention provides a method and a system for evaluating availability of power grid data, which are used for evaluating availability of the data from two aspects of data reliability and data adaptation degree aiming at microscopic power grid data, and realizing data screening of four dimensions of data quality, data source, data standardization degree and data safety on the premise of saving calculation power resources and reducing calculation amount, so as to complete the power grid data availability evaluation, discover possible problems in the data in time and obtain reliable and effective power grid data availability evaluation results, thereby guaranteeing stable operation of a power grid.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the invention. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules. The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the invention, and the invention is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the invention.

Claims (10)

1. A method for evaluating availability of grid data, comprising:
acquiring power grid data generated in the operation process of different power equipment, clustering the power grid data according to equipment types or data types to obtain a plurality of groups of data sets, and sequentially and randomly selecting sample data from each group of data sets to construct a test set and a verification set;
judging whether each data in the test set is encrypted data or not by combining the data characteristics and the encryption mode, and taking the encrypted data as high-reliability data and the unencrypted data as low-reliability data;
respectively carrying out reliability evaluation on low-reliability data and data with the same data source and/or data format as the low-reliability data in the verification set by combining the historical data of the equipment type, wherein the two evaluation results are consistent with the expected low-reliability data and can be adjusted to high-reliability data;
defining all the high-credibility data and the data with the same data source and data format as the high-credibility data as first data which can be trusted, and integrating the first data into a plurality of data fragments according to the requirement of a preset task model;
and inputting each data segment into a task model with corresponding requirements for matching, taking the data segment as a unit to match the task model, and listing the data segments with matching degree meeting expectations into the high-availability data segments of the task model under the corresponding requirements, wherein the data segments with matching degree not meeting expectations are matched again after being modified.
2. The usability evaluation method of claim 1, wherein the reliability evaluation includes:
the low-credibility data are used as data to be evaluated and are divided according to the equipment types, and the probability of occurrence of unreliable data of each equipment type in the past is calculated through analysis of historical data to obtain evaluation probability;
if the evaluation probability is 0, directly recognizing that the evaluation result of the low-reliability data corresponding to the equipment type accords with the expectation, and adjusting the evaluation result to be high-reliability data;
if the evaluation probability is greater than 0 and smaller than a preset threshold value, evaluating one or more evaluation items including data quality, data source, data format, data safety and data relevance for the low-reliability data and the data with the same data source and/or data format in the verification set and the low-reliability data respectively, and judging that the evaluation result of the data does not meet the expectations if any evaluation item is unqualified;
if the evaluation probability is not smaller than the preset threshold, directly judging that the evaluation result of the low-reliability data corresponding to the equipment type does not accord with the expectation.
3. The usability assessment method according to claim 2, wherein a normal threshold value of each data is obtained by analyzing the history data by a probability statistical model, the normal threshold value relating to a data range when the power equipment is operating normally;
if the data to be evaluated is not in the corresponding normal threshold value, the data quality evaluation of the data is judged to be inconsistent with the expectation.
4. The usability assessment method of claim 2, wherein the specific power device generated by the trace data is present;
if so, checking whether the power equipment corresponding to the data is legal or not in the historical data and the preset equipment list in sequence, and recognizing that the data source of the data accords with the expectation under the condition of legal.
5. The usability evaluation method of claim 2, wherein only data having the same data source or data format as the high-reliability data is taken as the medium-reliability data;
and carrying out reliability evaluation on the reliability data in the pair or the data in the verification set, which have the same data source and data format as the medium reliability data, by combining the historical data of the equipment type, wherein the evaluation result accords with the expected medium reliability data and can be adjusted to be high reliability data.
6. The usability evaluation method of claim 2, wherein all the data to be evaluated having the association relationship are screened out and divided into a plurality of association groups according to the association relationship;
if the data to be evaluated in the association group does not accord with the corresponding association relationship, judging that the data association relationship evaluation of the data does not accord with the expectation.
7. The usability evaluation method of claim 1, wherein the encryption mode judging process includes:
all encryption algorithms involved in the power grid data are obtained in advance;
and screening the encrypted data based on the data characteristics by analyzing the data characteristics including the data length, the data character and the data occurrence frequency of the data encrypted by various encryption algorithms and utilizing a preset data encryption screening tool.
8. The usability evaluation method of claim 1, wherein different data items generated by the same device type are distributed sequentially in the data set in a time dimension;
setting a corresponding time interval by analyzing the distribution density of each data item in the time dimension;
and randomly selecting any one or more data in each data item under the corresponding time point based on the same time interval and different time starting points to obtain a test set and a verification set, so that the data in the test set and the verification set are distributed in a staggered manner in the time dimension.
9. A system for evaluating availability of grid data, comprising:
the preprocessing unit is used for acquiring power grid data generated in the operation process of different power equipment, clustering the power grid data according to equipment types or data types to obtain a plurality of groups of data sets, and randomly selecting sample data from each group of data sets in sequence to construct a test set and a verification set;
the encryption detection unit is used for judging whether each data in the test set is encrypted data or not according to the data characteristics and the encryption mode, and taking the encrypted data as high-reliability data and the unencrypted data as low-reliability data;
the reliability adjustment unit is used for respectively carrying out reliability evaluation on the low reliability data and the data with the same data source and/or data format as the low reliability data in the verification set in combination with the historical data of the equipment type, and the two evaluation results are consistent with the expected low reliability data and can be adjusted into high reliability data;
the data integration unit is used for defining all the high-credibility data and the data with the same data source and data format as the high-credibility data as the first data which can be trusted, and integrating the first data into a plurality of data fragments according to the requirement of a preset task model;
the data matching unit is used for inputting each data segment into a task model of corresponding requirements for matching, taking the data segment as a unit to match the task model, and listing the data segments with the matching degree meeting the expected data segments into the high-availability data segments of the task model under the corresponding requirements, and re-matching the data segments with the matching degree not meeting the expected data segments after modification.
10. The usability evaluation system of claim 9, wherein the reliability adjustment unit specifically includes:
the low-credibility data are used as data to be evaluated and are divided according to the equipment types, and the probability of occurrence of unreliable data of each equipment type in the past is calculated through analysis of historical data to obtain evaluation probability;
if the evaluation probability is 0, directly recognizing that the evaluation result of the low-reliability data corresponding to the equipment type accords with the expectation, and adjusting the evaluation result to be high-reliability data;
if the evaluation probability is greater than 0 and smaller than a preset threshold value, evaluating one or more evaluation items including data quality, data source, data format, data safety and data relevance for the low-reliability data and the data with the same data source and/or data format in the verification set and the low-reliability data respectively, and judging that the evaluation result of the data does not meet the expectations if any evaluation item is unqualified;
if the evaluation probability is not smaller than the preset threshold, directly judging that the evaluation result of the low-reliability data corresponding to the equipment type does not accord with the expectation.
CN202311387218.2A 2023-10-25 2023-10-25 Availability evaluation method and system for power grid data Active CN117131464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311387218.2A CN117131464B (en) 2023-10-25 2023-10-25 Availability evaluation method and system for power grid data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311387218.2A CN117131464B (en) 2023-10-25 2023-10-25 Availability evaluation method and system for power grid data

Publications (2)

Publication Number Publication Date
CN117131464A CN117131464A (en) 2023-11-28
CN117131464B true CN117131464B (en) 2024-01-09

Family

ID=88856756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311387218.2A Active CN117131464B (en) 2023-10-25 2023-10-25 Availability evaluation method and system for power grid data

Country Status (1)

Country Link
CN (1) CN117131464B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915514A (en) * 2012-10-31 2013-02-06 清华大学 Method for assessing state estimation credibility of power system based on cumulants method
CN106447210A (en) * 2016-10-10 2017-02-22 国家电网公司 Distribution network equipment health degree dynamic diagnosis method involving credibility evaluation
CN108446861A (en) * 2018-03-28 2018-08-24 南方电网科学研究院有限责任公司 Multi-source data quality evaluation method of power dispatching system based on directed graph sorting
EP3422262A1 (en) * 2017-06-30 2019-01-02 Royal Holloway And Bedford New College Method of monitoring the performance of a machine learning algorithm
CN111292020A (en) * 2020-03-13 2020-06-16 贵州电网有限责任公司 Power grid real-time operation risk assessment method and system based on random forest
WO2020237729A1 (en) * 2019-05-31 2020-12-03 东北大学 Virtual machine hybrid standby dynamic reliability assessment method based on mode transfer
CN112069727A (en) * 2020-08-20 2020-12-11 国网河南省电力公司经济技术研究院 Intelligent transient stability evaluation system and method with high reliability for power system
CN113282588A (en) * 2021-06-11 2021-08-20 亿景智联(北京)科技有限公司 Method and device for evaluating quality of spatio-temporal data
CN115659214A (en) * 2022-10-09 2023-01-31 中能融合智慧科技有限公司 Energy industry data credible evaluation method based on PaaS platform
CN115794795A (en) * 2022-12-08 2023-03-14 湖北华中电力科技开发有限责任公司 Power distribution station power consumption data standardized cleaning method, device and system and storage medium
KR20230087097A (en) * 2021-12-09 2023-06-16 주식회사 카카오뱅크 Method for operating credit scoring model using two-stage logistic regression

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230170694A1 (en) * 2021-11-29 2023-06-01 Prabuddha Banerjee System and method for evaluating reliability of an electrical network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915514A (en) * 2012-10-31 2013-02-06 清华大学 Method for assessing state estimation credibility of power system based on cumulants method
CN106447210A (en) * 2016-10-10 2017-02-22 国家电网公司 Distribution network equipment health degree dynamic diagnosis method involving credibility evaluation
EP3422262A1 (en) * 2017-06-30 2019-01-02 Royal Holloway And Bedford New College Method of monitoring the performance of a machine learning algorithm
CN108446861A (en) * 2018-03-28 2018-08-24 南方电网科学研究院有限责任公司 Multi-source data quality evaluation method of power dispatching system based on directed graph sorting
WO2020237729A1 (en) * 2019-05-31 2020-12-03 东北大学 Virtual machine hybrid standby dynamic reliability assessment method based on mode transfer
CN111292020A (en) * 2020-03-13 2020-06-16 贵州电网有限责任公司 Power grid real-time operation risk assessment method and system based on random forest
CN112069727A (en) * 2020-08-20 2020-12-11 国网河南省电力公司经济技术研究院 Intelligent transient stability evaluation system and method with high reliability for power system
CN113282588A (en) * 2021-06-11 2021-08-20 亿景智联(北京)科技有限公司 Method and device for evaluating quality of spatio-temporal data
KR20230087097A (en) * 2021-12-09 2023-06-16 주식회사 카카오뱅크 Method for operating credit scoring model using two-stage logistic regression
CN115659214A (en) * 2022-10-09 2023-01-31 中能融合智慧科技有限公司 Energy industry data credible evaluation method based on PaaS platform
CN115794795A (en) * 2022-12-08 2023-03-14 湖北华中电力科技开发有限责任公司 Power distribution station power consumption data standardized cleaning method, device and system and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
智能电网大数据分析与决策系统的研究;陈钦柱;符传福;韩来君;;电子设计工程(06);36-40 *
电网数据可信性度量模型研究;程晓荣;李天琦;;华北电力大学学报(自然科学版)(02);87-94 *
网络信息可信度评估的研究进展及述评;王平;程齐凯;;信息资源管理学报(01);48-54 *

Also Published As

Publication number Publication date
CN117131464A (en) 2023-11-28

Similar Documents

Publication Publication Date Title
Faisal et al. Data-stream-based intrusion detection system for advanced metering infrastructure in smart grid: A feasibility study
Bagheri et al. Distributionally robust reliability assessment for transmission system hardening plan under $ Nk $ security criterion
Maamar et al. Machine learning techniques for energy theft detection in AMI
Zhang et al. Anomaly detection based on random matrix theory for industrial power systems
Althobaiti et al. Energy theft in smart grids: a survey on data-driven attack strategies and detection methods
CN110011990A (en) Intranet security threatens intelligent analysis method
CN112487042A (en) Electric energy metering data processing method and device, computer equipment and storage medium
CN117574436B (en) Tensor-based big data privacy security protection method
CN117992861A (en) Electric power data accuracy checking method and system
Mi et al. A method of entropy weight quantitative risk assessment for the safety and security integration of a typical industrial control system
Ausmus et al. Big data analytics and the electric utility industry
CN117131464B (en) Availability evaluation method and system for power grid data
CN115176254A (en) System and method for ensuring machine learning model results can be audited
Ezeme et al. An imputation-based augmented anomaly detection from large traces of operating system events
Cheng et al. Power system abnormal pattern detection for new energy big data
Wang et al. Research on network security situation assessment model based on double ahp
Xia et al. Privacy-Preserving Electricity Data Classification Scheme Based on CNN Model with Fully Homomorphism
Paeizi et al. Data Analytics Applications in Digital Energy System Operation
Ramirez et al. Motif analysis in internet of the things platform for wind turbine maintenance management
CN118300896B (en) Abnormal user behavior management method and system for cloud computing service environment
Li et al. An Effective Credit Evaluation Mechanism with Softmax Regression and Blockchain in Power IoT
Jneid Cluster Analysis for Medium Voltage Distribution Feeders
Wang et al. A Management Specification for Data Sharing Security in the System Construction of Smart Mine
CN117931937A (en) Block chain-based power grid material product carbon footprint data sharing method and system
CN118551411A (en) Power core data traceability system based on block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant