CN107818523B - Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning - Google Patents

Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning Download PDF

Info

Publication number
CN107818523B
CN107818523B CN201711123306.6A CN201711123306A CN107818523B CN 107818523 B CN107818523 B CN 107818523B CN 201711123306 A CN201711123306 A CN 201711123306A CN 107818523 B CN107818523 B CN 107818523B
Authority
CN
China
Prior art keywords
data
attribute
unstable
value
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711123306.6A
Other languages
Chinese (zh)
Other versions
CN107818523A (en
Inventor
杨济海
余放
伍小生
彭汐单
巢玉坚
蔡志民
王�华
付萍萍
李敏
吕顺利
邓伟
李志鹏
王泉啸
李石君
余伟
李宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING NANRUI GROUP CO
State Grid Corp of China SGCC
Wuhan University WHU
Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Original Assignee
NANJING NANRUI GROUP CO
State Grid Corp of China SGCC
Wuhan University WHU
Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING NANRUI GROUP CO, State Grid Corp of China SGCC, Wuhan University WHU, Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd filed Critical NANJING NANRUI GROUP CO
Priority to CN201711123306.6A priority Critical patent/CN107818523B/en
Publication of CN107818523A publication Critical patent/CN107818523A/en
Application granted granted Critical
Publication of CN107818523B publication Critical patent/CN107818523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a method for distinguishing and deducing true values of power communication system data based on unstable frequency distribution and frequency factor learning. The mass monitoring data collected in the power communication system is learned and rule-analyzed by combining a set truth value discrimination method with a prediction function, so that the distribution of unstable frequencies points to abnormal points in the power communication system, and the truth value of the data is judged. By utilizing the historical data, the abnormal data is automatically positioned and the truth value reasoning is supplemented, so that the data quality in the power communication system is improved.

Description

Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning
Technical Field
The invention belongs to application research of fusion of electric power communication data and big data technology and machine learning technology, and realizes learning and rule analysis of massive monitoring data collected in an electric power communication system by substituting an unstable frequency mode into a frequency factor learning function for machine learning, and automatically positions abnormal data and completes true value reasoning so as to improve the data quality in the electric power communication system.
Background
The electric power communication system is a generalized concept, generally refers to various subsystems related to an electric power grid and data information generated by the subsystems, and along with the continuous development of the electric power grid in China, the power consumption demand is continuously enlarged, data generated in the electric power communication system is increasingly huge, meanwhile, the data generation speed is also faster and faster, data structures among different subsystems are also greatly different, and the data generated by the electric power communication system becomes typical big data.
The power communication system is an important system for guaranteeing the normal operation of the power system, and equipment is monitored through various sensors, so that a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important obstacles for locating and analyzing faults of power equipment. Improving the data quality of the power communication system is an important link for perfecting the power grid system. Experts at home and abroad propose various solutions for detecting distorted data in a power system, and a document [1] researches the reason of data distortion in an energy management system, and then starts from the reason to solve the problem of data distortion. Document [2] is an attempt to improve data quality from the data platform. Document [3] predicts data quality from the perspective of interpolation fitting, document [4] is based on a data verification technology between different systems using Common Information Model (CIM) high-speed Model exchange format CIM/E text as a carrier, and improves the overall data quality of the power grid dispatching system by adopting an improved means of multi-source data screening better quality data and a method for feeding back field data according to master station state estimation.
The above bad data detection based on the power equipment state estimation has a certain effect when the quality of local data of the local system is improved, but the bad data detection still has no good applicability to multisource heterogeneous big data generated by the whole power communication system, and the cost for establishing a corresponding knowledge base for each data distortion is relatively high. The frequency factor learning function is based on the machine learning technology, so that the intelligence of the algorithm is relatively improved, a knowledge base does not need to be established, the algorithm is used and learned at any time, and the cost is reduced. Meanwhile, the format of multi-source heterogeneous data is unified through the frequency of the heterogeneous data, and the experimental result shows that the method is suitable for distortion discrimination and truth value inference of the multi-source heterogeneous data generated by the power communication system in a big data environment through a national power grid real data set verification algorithm.
Disclosure of Invention
The power communication system is an important system for guaranteeing the normal operation of the power system, and equipment is monitored through various sensors, so that a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important obstacles for locating and analyzing faults of power equipment. Data anomalies occurring in power communication systems mainly include the following forms:
1. the accuracy of the monitoring data is violated, and the accuracy of the monitoring data refers to the fact that the data monitored by the power information system is finally used for analyzing the degree of closeness between the value before decision making and the real value after being recorded, transmitted, exchanged and stored.
2. The consistency of the monitoring data is violated, and the consistency of the monitoring data refers to whether the data actually recorded by the system meets a certain function dependence or logic relationship, whether the data exceeds the attribute definition domain, and whether the data does not meet the reality.
3. Violating the dimension uniformity of the monitoring data, the dimension uniformity of the monitoring data refers to whether the data with the same attribute has a uniform measurement unit or not.
4. And violating the integrity of the monitoring data, wherein the integrity of the monitoring data is that the data actually recorded by the power information system is missing, and whether all data recorded according to the design requirement is completely recorded or not is judged.
The first three types can be summarized as data accuracy problems, and the fourth type can be regarded as data missing problems. At the present stage, a large amount of data loss phenomena which can be visually observed exist in moral data acquired by the power communication system, and meanwhile, inaccurate data also fills a database. These phenomena are caused in part by the problems of the power monitoring system itself, in part by accidental errors occurring during data entry, and in part by data distortion caused by system incompatibility problems during system upgrade.
Aiming at the problem of low data quality in the conventional power communication system, the invention aims to establish a machine learning discrimination method for automatically carrying out distortion identification and distortion bit and true value inference on power data.
To accomplish the above objectives, the present invention comprises four steps as a whole, and the whole flow chart is shown in fig. 1, comprising the following steps:
define 1 Property stationary phase
Figure BDA0001467848960000031
Wherein ei(t0) Representing an attribute eiAt t0The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum eiThe values at the initial moments not differing significantly
Figure BDA0001467848960000032
Epsilon table defines a small positive deviation from the attribute value during the stabilization period;
defining 2 stationary phases of property sets
T=m(t1,t2,…,tn)
Wherein, T is the stable period of the attribute set, which represents the least common multiple of all the stable periods of the attributes in the power data set, and m is the mapping symbol for extracting the least common multiple;
define 3 set of stable states
Figure BDA0001467848960000041
Figure BDA0001467848960000042
Wherein the content of the first and second substances,
Figure BDA0001467848960000043
representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute seti(t) is relatively close to each other,
Figure BDA0001467848960000044
the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;
definitions 4 extraction of unstable frequencies
Figure BDA0001467848960000045
Wherein f isi(t) represents an attribute eiOf unstable frequency, Nei′(t)]Representing an attribute eiThe count of out-of-bounds deviations when traversed during the stabilization period,
Figure BDA0001467848960000046
representing an attribute eiThe number of times traversed during the stabilization period; d (e)i) Denotes eiA domain that does not exceed the degree of deviation;
definition 5 unstable frequency distribution
FA(t)=[f1(t),f2(t),…,fn(t)]
Wherein, FA(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal period, and is defined in the form of vector for machine learning input in the next step 3;
defining 6 unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2
Dtrain(A)={(FA (1),y(1)),(FA (2),y(2)),...,(FA (n),y(n))}
Wherein Dtrain(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time periodA (i)Corresponding equipment fault label y(1)A composed training data set; the data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role in classification;
defining 7 an unstable frequency distribution matrix
Figure BDA0001467848960000051
Wherein F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distribution obtained by the ith traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm;
the method specifically comprises the following steps:
step 1, extracting a stable mode of a power data set, and determining the to-be-tested attribute of the contained power equipment based on the constructed power data set:
A={e1,e2,e3,…,en}
wherein A represents a set of power data attributes, ei,i∈[1,n2]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.),
then, setting a deviation degree and determining a stability period of the attribute set; and extracting a set of stable states
Step 2, constructing unstable frequency distribution, unifying data structures of multi-source heterogeneous data generated by the power communication system by means of frequency conversion, conveniently introducing the algorithm of the step 3, then extracting unstable frequency, and constructing unstable frequency distribution
Step 3, learning unstable frequency factors, which specifically comprises the following steps:
step 3.1 construct an unstable frequency tag vector and an unstable frequency distribution matrix
Step 3.2 frequency factor learning: based on a frequency factor learning function, adopting the function to learn parameters;
Figure BDA0001467848960000061
Figure BDA0001467848960000062
wherein the content of the first and second substances,
Figure BDA0001467848960000063
as regression labels of the learning function, Fi(i ═ 0,1,2, …) is an unstable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not have data present in the unit pattern period, then the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity; different unstable frequency distributions actually depict the degree of abnormal data with different attributes and all abnormal combination modes; w is aj(j ═ 0,1,2, …) is a univariate learning parameter, vi,vjAre respectively cross variables Fi,FjHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<vi,vj>for implicit parameter vector vi,vjSolving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;
λ(Fi) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F)i)
Step 3.3 optimal solution
According to the equipment fault type, a continuity numerical value fault label or a classification fault label can be set, and a regression type target loss function and a classification type target loss function are respectively adopted for optimization;
return the objective loss function, note that in this case λ 1
Figure BDA0001467848960000071
Constructing a classification objective loss function
Figure BDA0001467848960000072
When y is 1:
Figure BDA0001467848960000073
when y is-1:
Figure BDA0001467848960000074
representing a hingeloss type classification optimization loss function, wherein max { } represents the maximum value in brackets, and the hingeloss type target optimization function predicts that a specific unsteady frequency distribution points to a power equipment error or a data entry error through the sign taking of an estimated value;
whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
Figure BDA0001467848960000075
wherein Θ is*Representing a set of parameters in a learning function, including a single-factor parameter wiAnd cross term parameter vi,vj,i,j∈Z+,i<j.
The optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
Figure BDA0001467848960000076
Wherein
Figure BDA0001467848960000077
The prediction function is a prediction function substituted into the optimal parameters, when a brand-new unstable frequency distribution is input, the function can provide a predicted value of the state classification of the power equipment, and when the function is substituted into the optimized parameters obtained by training a large amount of historical data, the prediction function value converges to a true value;
step 4, true value discrimination and deduction completion, which specifically comprises the following steps:
step 4.1 discrimination flow
(1) Type data distortion discrimination
When the prediction function value converges to the normal label value of the equipment, attribute data and null set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distortion;
(2) type data distortion discrimination
When the prediction function value is converged and the equipment-specific abnormal label is detected, attribute data equal to 0 in corresponding unstable distribution and an empty set element are judged to be distorted;
(3) type data distortion discrimination
When all elements of unstable distribution are empty sets, the data are completely lost, and the distribution is judged to be distorted;
step 4.2 true value deduction and completion process
When the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the value with the maximum occurrence frequency in the history data which is not overrun is taken to assign and complement the attribute;
when the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and a value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;
when the (3) type data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the (1) type data distortion mode; and (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.
According to the invention, the mass monitoring data collected in the power communication system is subjected to learning and rule analysis by combining a set truth value discrimination method with a prediction function, so that the distribution of unstable frequencies points to abnormal points in the power communication system, and the truth value of the data is judged. By utilizing historical data, abnormal data is automatically positioned and truth value reasoning and completion are carried out, so that the data quality in the power communication system is improved
Drawings
Fig. 1 is a general flow chart.
Fig. 2 is a schematic diagram of a correspondence relationship between an unstable frequency distribution and a tag set.
FIG. 3 is a flow chart of truth discrimination and inferred completion.
Detailed Description
Aiming at the problem of low data quality in the conventional power communication system, the invention aims to establish a machine learning discrimination method for automatically carrying out distortion identification and distortion bit and true value inference on power data.
To accomplish the above object, the present invention is divided into four steps as a whole, and the whole flow chart is shown in figure 1
Step 1 extraction of Power data set stabilization mode
The step is divided into three sub-steps
Step 1.1, constructing a power data set, and determining to-be-tested attributes of power equipment contained in the power data set
A={e1,e2,e3,…,en}
Wherein A represents a set of power data attributes, ei,i∈[1,n2]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.) representing the monitoring of the environment in which the electrical device is located. The principle of determining the attributes is:
(1) the essential attributes are selected, and the attributes carried by the data for truth discrimination are called the essential attributes, and the attributes are selected.
(2) The associated attribute is selected, the attribute related to the necessary attribute is called the associated attribute, the associated attribute is only used as a machine learning auxiliary basis for true value judgment and inference of the necessary attribute during subsequent processing (for example, the attribute to be measured is the device temperature, and the environment temperature can be selected as the associated attribute), and the system does not perform true value judgment on the associated attribute, so that the selection of the associated attribute can be flexibly determined according to specific conditions.
And 1.2, setting the deviation degree and determining the stable period of the attribute set.
Defining 2 stabilization periods for Properties
Figure BDA0001467848960000101
Wherein ei(t0) Representing an attribute eiAt t0The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum eiThe values at the initial moments not differing significantly
Figure BDA0001467848960000102
ε represents the small positive deviation that defines the maximum deviation of the attribute value during the stabilization period.
Defining 3 a stabilization period for a property set
T=m(t1,t2,…,tn)
Where T is the stability period of the attribute set, which represents the least common multiple of the stability periods of all attributes in the power data set, and m is the mapping notation from which the least common multiple is extracted.
Step 1.3 extraction of the steady state set
Define 4 set of steady states
Figure BDA0001467848960000103
Figure BDA0001467848960000104
Wherein the content of the first and second substances,
Figure BDA0001467848960000105
representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute seti(t) is relatively close to each other,
Figure BDA0001467848960000106
and the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all the attributes in the set A. The realistic meaning of the stable mode is that it describes a stable value distribution of normal attribute values over a small period.
It should be noted that the stable period T is a minimum time span for ensuring that each attribute value keeps the data value stable in this time period, and when the data is amplified to a longer time span, the data may show various changing trends. For constant type attributes (e.g., device location, network element ID, etc.), its settling period can be considered to be 0. For discrete non-numerical attribute values, the integer is adopted to assign values to the discrete non-numerical attribute values in a classified mode, and the method is still suitable for the processing mode.
Step 2, constructing an unstable frequency distribution,
in the step, the data structure of the multi-source heterogeneous data generated by the power communication system is unified by frequency conversion, so that the algorithm in the step 3 is conveniently introduced, and the method is divided into two sub-steps
Step 2.1 extraction of unstable frequencies
Definitions 5 extraction of unstable frequencies
Figure BDA0001467848960000111
Wherein f isi(t) represents an attribute eiOf unstable frequency, Nei′(t)]Representing an attribute eiThe count of out-of-bounds deviations when traversed during the stabilization period,
Figure BDA0001467848960000112
representing an attribute eiNumber of times traversed during the stabilization period. D (e)i) Denotes eiWithout exceeding the domain of definition of the degree of deviation.
Step 2.2 construction of unstable frequency distributions
Define 6 unstable frequency distribution
FA(t)=[f1(t),f2(t),…,fn(t)]
Wherein, FA(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal, and is defined in the form of vector for machine learning input in the next step 3.
The practical meaning of the unstable frequency is that the 'abrupt change' attribute value which does not conform to the change rule is statistically indicated, and according to step 2.1, when the data change exceeds the preset deviation degree limit delta, the corresponding unstable frequency can appear, and in a unit time period, the more the number of times of exceeding the limit of the data appears, the larger the corresponding unstable frequency value can be.
Step 3 unstable frequency factor learning
The data quality of the power communication system needs to be quantified by the proportion of the true value, and in a stable period, when the equipment in the power or grid system normally and stably operates, the detected or recorded data can reflect the physical stability, and the data can be defined into a stable set (the complement is unstable distribution) through the deviation degree set in the step 1.2. But if the system is abnormal physically or data entry is wrong, the data is still stable after the power communication is traversed for multiple times, or when the system has no problem, the data traversed for multiple times exceeds the deviation limit, the data can be regarded as distorted, and the problem can be classified as a data accuracy problem, and meanwhile, if the data is totally or partially lost, the problem can be classified as a data integrity problem.
The problems can be uniformly solved through an unstable frequency factor learning algorithm. This step is divided into four substeps.
Step 3.1 construct unstable frequency tag vector
Defining 7 unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2
Dtrain(A)={(FA (1),y(1)),(FA (2),y(2)),...,(FA (n),y(n))}
Wherein Dtrain(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time periodA (i)Corresponding equipment fault label y(1)And (4) forming a training data set. The data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role of classification.
Step 3.2 construct an unstable frequency distribution matrix
Defining 8 an unstable frequency distribution matrix
Figure BDA0001467848960000131
Wherein, F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distributions obtained by the ith traversal in a stable period in a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm.
Step 3.3 frequency factor learning
The method designs a frequency factor learning function, and parameter learning is carried out by adopting the function.
Definition 8
Figure BDA0001467848960000132
Figure BDA0001467848960000133
Wherein the content of the first and second substances,
Figure BDA0001467848960000134
as regression labels of the learning function, FiAnd (i ═ 0,1,2, …) is an unstable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not have data present in the unit pattern period, the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity. The different unstable frequency distributions actually depict the degree of abnormal data appearing in different attributes and all abnormal combination patterns. w is aj(j ═ 0,1,2, …) is a univariate learning parameter, vi,vjAre respectively cross variables Fi,FjHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<vi,vj>for implicit parameter vector vi,vjSolving inner product, implicit parameter is used for resolving implicit parameter before two different unstable frequency distributions are obtained in the optimization stage of objective functionMeanwhile, because i is not equal to j, the autocorrelation influence of unstable distribution is avoided, and the appearance of overfitting is effectively avoided.
λ(Fi) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F)i) The specific expression of the function may be defined according to actual conditions, and the existing purpose of the function is to directly point the missing value to the power communication device where the attribute of the function is located. The design idea here is: unstable truth of abnormal data needs a learning function to determine the truth, which belongs to the category of data accuracy, but data loss is determined to violate the integrity of the data, and at the moment, the index function is directly started to determine the physical reason of data distortion without any verification by the learning function. The parameter u is a dimensional unity factor when g (F)i) When the analytic expression of (2) is determined, the u factor is used to unify the dimension and the learning function.
Step 3.4 optimal solution
According to the equipment fault type, a continuity numerical value fault label or a classification fault label can be set, and a regression type target loss function and a classification type target loss function are respectively adopted for optimization.
A regression target loss function is defined 9, noting that in this case λ is 1
Figure BDA0001467848960000141
Wherein the content of the first and second substances,
Figure BDA0001467848960000142
in order to be a function of the target loss,
Figure BDA0001467848960000143
corresponding to the frequency factor learning function, y, of definition 8(i)And recording an abnormal label or an error label of the power equipment recorded in the real situation. The optimization goal is to minimize the value of the objective loss function, which has the practical meaning of determining the learning function
Figure BDA0001467848960000144
Such that the value of the learning function is closest to the label of the power equipment anomaly. 1/2 is multiplied in the formula, so as to keep the formula concise when partial derivatives are obtained in the subsequent optimization process.
When the abnormal tag or the logging error tag of the power equipment is a discrete classification structure, the target loss may be specifically defined as a changeloss type, note that λ is 1
Definitions 10 construct classification objective loss function
Figure BDA0001467848960000145
When y is 1:
Figure BDA0001467848960000146
when y is-1:
Figure BDA0001467848960000147
definition 10 denotes a hingeloss type classification optimization loss function, where max { } denotes the maximum value in parentheses, and the hingeloss type objective optimization function predicts that a particular unstable frequency distribution points to a power plant error or a data entry error by sign of the estimated value.
Whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
Figure BDA0001467848960000151
wherein Θ is*Representing a set of parameters in a learning function, including a single-factor parameter wiAnd cross term parameter vi,vj,i,j∈Z+,i<j.
The optimization target can be achieved by adopting a random gradient descent method (SGD), the direction of the gradient can be obtained by calculating the partial derivative of each parameter in the learning function, a step length is further set in the direction determined according to the gradient, and the local optimal solution can be obtained by cyclic iterative updating. The algorithm is as follows:
1. regression objective loss-type optimization iteration mode:
Figure BDA0001467848960000152
Figure BDA0001467848960000153
in the regression target loss type optimization iterative algorithm, parameters are updated according to the gradient direction, and delta is set as the step length of each iterative update. The delta needs to be preset according to specific problems, the step length needs to be moderate, when the set step length is too large, the optimization algorithm is possibly difficult to converge, and when the set step length is too small, the iteration times are easily too many, and the calculation resources are wasted.
2. And (3) a classification target loss type optimization mode:
Figure BDA0001467848960000154
Figure BDA0001467848960000161
the optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
Figure BDA0001467848960000162
Wherein
Figure BDA0001467848960000163
For the prediction function to be substituted into the optimum parameter, the function can give when a completely new unstable frequency distribution is inputA predicted value of the state classification of the power equipment is obtained, and when the predicted value is substituted into an optimized parameter obtained by training a large amount of historical data, a predicted function value is converged to a true value.
Step 4 true value discrimination and deduction completion
The step is divided into two sub-steps, and the schematic diagram of the step is shown in figure 3
Step 4.1 discrimination flow
(4) Type data distortion discrimination
And when the prediction function value converges to the normal label value of the equipment, attribute data and empty set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distorted.
(5) Type data distortion discrimination
When the prediction function value converges with the device-specific abnormality label, the attribute data equal to 0 in the corresponding unsteady distribution and the null set element are judged to be distorted.
(6) Type data distortion discrimination
When all elements of the unstable distribution are empty sets, the data are all missing, and the distribution is judged to be distorted as a whole.
Step 4.2 true value deduction and completion process
And when the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the attribute is assigned with the value with the maximum occurrence frequency in the history data which is not overrun.
When the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and the value with the maximum occurrence frequency in the overrun historical data is taken to complete the attribute assignment.
When the type (3) data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the type (1) data distortion mode. And (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.

Claims (1)

1. The electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning is characterized by comprising the following steps of:
define 1 Property stationary phase
Figure FDA0002968310980000011
Wherein ei(t0) Representing an attribute eiAt t0The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum eiE with slightly different values at the initial timeiε represents a small positive degree of deviation that defines the maximum degree of deviation of the attribute value during the stabilization period;
defining 2 stationary phases of property sets
T=m(t1,t2,…,tn)
Wherein, T is the stable period of the attribute set, which represents the least common multiple of all the stable periods of the attributes in the power data set, and m is the mapping symbol for extracting the least common multiple;
define 3 set of stable states
Figure FDA0002968310980000012
Figure FDA0002968310980000013
Wherein the content of the first and second substances,
Figure FDA0002968310980000014
representing the stable value of each attribute data in the attribute set after a period T of the attribute set, the value and the initial value ei(t) is relatively close to each other,
Figure FDA0002968310980000015
a steady state set representing a set of power data attributes A, whichThe method is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;
definitions 4 extraction of unstable frequencies
Figure FDA0002968310980000021
Wherein f isi(t) represents an attribute eiOf unstable frequency, N [ e'i(t)]Representing an attribute eiThe count of out-of-bounds deviations when traversed during the stabilization period,
Figure FDA0002968310980000022
representing an attribute eiThe number of times traversed during the stabilization period; d (e)i) Denotes eiA domain that does not exceed the degree of deviation;
definition 5 unstable frequency distribution
FA(t)=[f1(t),f2(t),…,fn(t)]
Wherein, FA(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal period, and is defined in the form of vector for machine learning input in the next step 3;
defining 6 unstable frequency distribution tag set
Figure FDA0002968310980000023
Wherein Dtrain(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time periodA (i)Corresponding equipment fault label y(i)A composed training data set; the data labels are obtained by carrying out numerical value assignment on error codes with errors in the system and only play a role in classification;
defining 7 an unstable frequency distribution matrix
Figure FDA0002968310980000024
Wherein F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distribution obtained by the ith traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm;
the method specifically comprises the following steps:
step 1, extracting a stable mode of a power data set, and determining the to-be-tested attribute of the contained power equipment based on the constructed power data set:
A={e1,e2,e3,…,en}
wherein A represents a set of power data attributes, ei,i∈[1,n2]Representing n attributes for monitoring the environment where the power equipment is located, then setting a deviation degree, and determining an attribute set stabilization period; and extracting a stable state set;
step 2, constructing unstable frequency distribution, unifying data structures of multi-source heterogeneous data generated by the power communication system by means of frequency conversion, conveniently introducing the algorithm of the step 3, then extracting unstable frequency, and constructing unstable frequency distribution;
step 3, learning unstable frequency factors, which specifically comprises the following steps:
step 3.1, constructing an unstable frequency label vector and an unstable frequency distribution matrix;
step 3.2 frequency factor learning: based on a frequency factor learning function, adopting the function to learn parameters;
Figure FDA0002968310980000031
Figure FDA0002968310980000032
wherein the content of the first and second substances,
Figure FDA0002968310980000033
as regression labels of the learning function, Fi(i-0, 1,2, …) is an unstable frequency distribution, and when the ith attribute does not have data in the unit pattern period, the frequency quantity is assigned to 1, which indicates that data is missing and data integrity is violated; different unstable frequency distributions actually depict the degree of abnormal data with different attributes and all abnormal combination modes; w is ai,(i0,1,2, …) is a univariate learning parameter, vi,vjAre respectively cross variables Fi,FjHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<vi,vj>for implicit parameter vector vi,vjSolving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;
λ(Fi) For the trigger factor, when the ith traversal attribute set is an empty set, the trigger factor closes the learning function and starts the index function g (F)i) U represents a dimensional unity factor;
step 3.3 optimal solution
Setting a continuity numerical value fault label or a classification fault label according to the equipment fault type, and respectively optimizing by adopting a regression type target loss function and a classification type target loss function;
return the objective loss function, note that in this case λ 1
Figure FDA0002968310980000041
Constructing a classification objective loss function
Figure FDA0002968310980000042
When y is 1:
Figure FDA0002968310980000043
when y is-1:
Figure FDA0002968310980000044
representing a hingeloss type classification optimization loss function, wherein max { } represents the maximum value in brackets, and the hingeloss type target optimization function predicts that a specific unsteady frequency distribution points to a power equipment error or a data entry error through the sign taking of an estimated value;
whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
Figure FDA0002968310980000045
wherein Θ is*Representing a set of parameters in a learning function, including a single-factor parameter wiAnd cross term parameter vi,vj,i,j∈Z+,i<j;
The optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
Figure FDA0002968310980000051
Wherein
Figure FDA0002968310980000052
For substituting a prediction function of the optimal parameters, when a brand new unstable frequency distribution is input, the function gives a prediction value of the state classification of the power equipment, and the current substitution adoptsWhen a large amount of historical data are trained to obtain optimized parameters, the prediction function value is converged to a true value;
step 4, true value discrimination and deduction completion, which specifically comprises the following steps:
step 4.1 discrimination flow
(1) Type data distortion discrimination
When the prediction function value converges to the normal label value of the equipment, attribute data and null set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distortion;
(2) type data distortion discrimination
When the prediction function value is converged and the equipment-specific abnormal label is detected, attribute data equal to 0 in corresponding unstable distribution and an empty set element are judged to be distorted;
(3) type data distortion discrimination
When all elements of unstable distribution are empty sets, the data are completely lost, and the distribution is judged to be distorted;
step 4.2 true value deduction and completion process
When the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the value with the maximum occurrence frequency in the history data which is not overrun is taken to assign and complement the attribute;
when the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree, and the value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;
when the (3) type data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the (1) type data distortion mode; and (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.
CN201711123306.6A 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning Active CN107818523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711123306.6A CN107818523B (en) 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711123306.6A CN107818523B (en) 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Publications (2)

Publication Number Publication Date
CN107818523A CN107818523A (en) 2018-03-20
CN107818523B true CN107818523B (en) 2021-04-16

Family

ID=61609208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711123306.6A Active CN107818523B (en) 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Country Status (1)

Country Link
CN (1) CN107818523B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549907B (en) * 2018-04-11 2021-11-16 武汉大学 Data verification method based on multi-source transfer learning
CN109243558A (en) * 2018-08-28 2019-01-18 重庆汇邡机械制造有限公司 Data after carrying out big data collection extract optimization method
CN113535693B (en) * 2020-04-20 2023-04-07 中国移动通信集团湖南有限公司 Data true value determination method and device for mobile platform and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101622814A (en) * 2007-03-02 2010-01-06 Nxp股份有限公司 Fast powering-up of data communication system
US7783584B1 (en) * 2007-10-03 2010-08-24 New York University Controllable oscillator blocks
EP2626995A1 (en) * 2012-02-13 2013-08-14 Siemens Aktiengesellschaft Method for protecting a frequency inverter from asymmetric electrical power flows
CN103957582A (en) * 2014-05-17 2014-07-30 浙江大学宁波理工学院 Wireless sensor network self-adaptation compression method
CN104156504A (en) * 2014-07-21 2014-11-19 国家电网公司 Parameter identifiability judgment method for generator excitation system
CN104866901A (en) * 2015-05-12 2015-08-26 西安理工大学 Optimized extreme learning machine binary classification method based on improved active set algorithms
CN105045976A (en) * 2015-07-01 2015-11-11 中国人民解放军信息工程大学 Method for modeling terrain property of Wargame map by grid matrix
CN105122619A (en) * 2013-03-05 2015-12-02 通用电气公司 Power converter and methods for increasing power delivery of soft alternating current power source

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101622814A (en) * 2007-03-02 2010-01-06 Nxp股份有限公司 Fast powering-up of data communication system
US7783584B1 (en) * 2007-10-03 2010-08-24 New York University Controllable oscillator blocks
EP2626995A1 (en) * 2012-02-13 2013-08-14 Siemens Aktiengesellschaft Method for protecting a frequency inverter from asymmetric electrical power flows
CN105122619A (en) * 2013-03-05 2015-12-02 通用电气公司 Power converter and methods for increasing power delivery of soft alternating current power source
CN103957582A (en) * 2014-05-17 2014-07-30 浙江大学宁波理工学院 Wireless sensor network self-adaptation compression method
CN104156504A (en) * 2014-07-21 2014-11-19 国家电网公司 Parameter identifiability judgment method for generator excitation system
CN104866901A (en) * 2015-05-12 2015-08-26 西安理工大学 Optimized extreme learning machine binary classification method based on improved active set algorithms
CN105045976A (en) * 2015-07-01 2015-11-11 中国人民解放军信息工程大学 Method for modeling terrain property of Wargame map by grid matrix

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《大数据环境下的多源数据演化更新研究》;余放等;《计算机科学》;20161231;第43卷(第12期);全文 *
《对区域电网稳定控制系统通信通道自愈方式的研究》;杨济海等;《计算机科学与探索》;20161231;全文 *

Also Published As

Publication number Publication date
CN107818523A (en) 2018-03-20

Similar Documents

Publication Publication Date Title
CN112202736B (en) Communication network anomaly classification method based on statistical learning and deep learning
CN109343995A (en) Intelligent O&amp;M analysis system based on multi-source heterogeneous data fusion, machine learning and customer service robot
CN102637019B (en) Intelligent integrated fault diagnosis method and device in industrial production process
CN105302096B (en) Intelligent factory scheduling method
CN107818523B (en) Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning
CN109255440B (en) Method for predictive maintenance of power production equipment based on Recurrent Neural Networks (RNN)
CN113746663B (en) Performance degradation fault root cause positioning method combining mechanism data and dual drives
CN109492790A (en) Wind turbines health control method based on neural network and data mining
CN114896882A (en) Data-driven green digital twinning construction system and method
CN115508672A (en) Power grid main equipment fault tracing reasoning method, system, equipment and medium
CN117390529A (en) Multi-factor traceable data center information management method
CN115822887A (en) Performance evaluation and energy efficiency diagnosis method and system of wind turbine generator
CN113740666B (en) Method for positioning root fault of storm alarm in power system of data center
CN117393076B (en) Intelligent monitoring method and system for heat-resistant epoxy resin production process
WO2021168490A1 (en) Method for at least partially decentralized calculation of the state of health of at least one wind turbine
CN117078123A (en) Method and system for calculating available transmission capacity of electric-gas comprehensive energy system
Grebenyuk et al. Technological infrastructure management models and methods based on digital twins
CN115145899B (en) Space-time data anomaly detection method based on manufacturing enterprise data space
Dagnely et al. A semantic model of events for integrating photovoltaic monitoring data
CN112801815B (en) Power communication network fault early warning method based on federal learning
Kuang et al. An Association Rules-Based Method for Outliers Cleaning of Measurement Data in the Distribution Network
CN117234785B (en) Centralized control platform error analysis system based on artificial intelligence self-query
Friederich et al. A Framework for Validating Data-Driven Discrete-Event Simulation Models of Cyber-Physical Production Systems
CN108521124A (en) A kind of visual distribution network failure section partition method
Dong et al. Log fusion technology of power information system based on fuzzy reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant