CN107818523B - Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning - Google Patents
Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning Download PDFInfo
- Publication number
- CN107818523B CN107818523B CN201711123306.6A CN201711123306A CN107818523B CN 107818523 B CN107818523 B CN 107818523B CN 201711123306 A CN201711123306 A CN 201711123306A CN 107818523 B CN107818523 B CN 107818523B
- Authority
- CN
- China
- Prior art keywords
- data
- attribute
- unstable
- value
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 74
- 238000004891 communication Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000006870 function Effects 0.000 claims abstract description 90
- 230000002159 abnormal effect Effects 0.000 claims abstract description 17
- 238000012544 monitoring process Methods 0.000 claims abstract description 16
- 238000005457 optimization Methods 0.000 claims description 21
- 230000006641 stabilisation Effects 0.000 claims description 13
- 238000011105 stabilization Methods 0.000 claims description 13
- 238000010801 machine learning Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 239000000126 substance Substances 0.000 claims description 7
- 101710154918 Trigger factor Proteins 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 230000000295 complement effect Effects 0.000 claims description 5
- 238000013479 data entry Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 230000005526 G1 to G0 transition Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000002349 favourable effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims 1
- 238000012850 discrimination method Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Water Supply & Treatment (AREA)
- Artificial Intelligence (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Mobile Radio Communication Systems (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention relates to a method for distinguishing and deducing true values of power communication system data based on unstable frequency distribution and frequency factor learning. The mass monitoring data collected in the power communication system is learned and rule-analyzed by combining a set truth value discrimination method with a prediction function, so that the distribution of unstable frequencies points to abnormal points in the power communication system, and the truth value of the data is judged. By utilizing the historical data, the abnormal data is automatically positioned and the truth value reasoning is supplemented, so that the data quality in the power communication system is improved.
Description
Technical Field
The invention belongs to application research of fusion of electric power communication data and big data technology and machine learning technology, and realizes learning and rule analysis of massive monitoring data collected in an electric power communication system by substituting an unstable frequency mode into a frequency factor learning function for machine learning, and automatically positions abnormal data and completes true value reasoning so as to improve the data quality in the electric power communication system.
Background
The electric power communication system is a generalized concept, generally refers to various subsystems related to an electric power grid and data information generated by the subsystems, and along with the continuous development of the electric power grid in China, the power consumption demand is continuously enlarged, data generated in the electric power communication system is increasingly huge, meanwhile, the data generation speed is also faster and faster, data structures among different subsystems are also greatly different, and the data generated by the electric power communication system becomes typical big data.
The power communication system is an important system for guaranteeing the normal operation of the power system, and equipment is monitored through various sensors, so that a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important obstacles for locating and analyzing faults of power equipment. Improving the data quality of the power communication system is an important link for perfecting the power grid system. Experts at home and abroad propose various solutions for detecting distorted data in a power system, and a document [1] researches the reason of data distortion in an energy management system, and then starts from the reason to solve the problem of data distortion. Document [2] is an attempt to improve data quality from the data platform. Document [3] predicts data quality from the perspective of interpolation fitting, document [4] is based on a data verification technology between different systems using Common Information Model (CIM) high-speed Model exchange format CIM/E text as a carrier, and improves the overall data quality of the power grid dispatching system by adopting an improved means of multi-source data screening better quality data and a method for feeding back field data according to master station state estimation.
The above bad data detection based on the power equipment state estimation has a certain effect when the quality of local data of the local system is improved, but the bad data detection still has no good applicability to multisource heterogeneous big data generated by the whole power communication system, and the cost for establishing a corresponding knowledge base for each data distortion is relatively high. The frequency factor learning function is based on the machine learning technology, so that the intelligence of the algorithm is relatively improved, a knowledge base does not need to be established, the algorithm is used and learned at any time, and the cost is reduced. Meanwhile, the format of multi-source heterogeneous data is unified through the frequency of the heterogeneous data, and the experimental result shows that the method is suitable for distortion discrimination and truth value inference of the multi-source heterogeneous data generated by the power communication system in a big data environment through a national power grid real data set verification algorithm.
Disclosure of Invention
The power communication system is an important system for guaranteeing the normal operation of the power system, and equipment is monitored through various sensors, so that a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important obstacles for locating and analyzing faults of power equipment. Data anomalies occurring in power communication systems mainly include the following forms:
1. the accuracy of the monitoring data is violated, and the accuracy of the monitoring data refers to the fact that the data monitored by the power information system is finally used for analyzing the degree of closeness between the value before decision making and the real value after being recorded, transmitted, exchanged and stored.
2. The consistency of the monitoring data is violated, and the consistency of the monitoring data refers to whether the data actually recorded by the system meets a certain function dependence or logic relationship, whether the data exceeds the attribute definition domain, and whether the data does not meet the reality.
3. Violating the dimension uniformity of the monitoring data, the dimension uniformity of the monitoring data refers to whether the data with the same attribute has a uniform measurement unit or not.
4. And violating the integrity of the monitoring data, wherein the integrity of the monitoring data is that the data actually recorded by the power information system is missing, and whether all data recorded according to the design requirement is completely recorded or not is judged.
The first three types can be summarized as data accuracy problems, and the fourth type can be regarded as data missing problems. At the present stage, a large amount of data loss phenomena which can be visually observed exist in moral data acquired by the power communication system, and meanwhile, inaccurate data also fills a database. These phenomena are caused in part by the problems of the power monitoring system itself, in part by accidental errors occurring during data entry, and in part by data distortion caused by system incompatibility problems during system upgrade.
Aiming at the problem of low data quality in the conventional power communication system, the invention aims to establish a machine learning discrimination method for automatically carrying out distortion identification and distortion bit and true value inference on power data.
To accomplish the above objectives, the present invention comprises four steps as a whole, and the whole flow chart is shown in fig. 1, comprising the following steps:
define 1 Property stationary phase
Wherein ei(t0) Representing an attribute eiAt t0The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum eiThe values at the initial moments not differing significantlyEpsilon table defines a small positive deviation from the attribute value during the stabilization period;
defining 2 stationary phases of property sets
T=m(t1,t2,…,tn)
Wherein, T is the stable period of the attribute set, which represents the least common multiple of all the stable periods of the attributes in the power data set, and m is the mapping symbol for extracting the least common multiple;
define 3 set of stable states
Wherein the content of the first and second substances,representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute seti(t) is relatively close to each other,the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;
definitions 4 extraction of unstable frequencies
Wherein f isi(t) represents an attribute eiOf unstable frequency, Nei′(t)]Representing an attribute eiThe count of out-of-bounds deviations when traversed during the stabilization period,representing an attribute eiThe number of times traversed during the stabilization period; d (e)i) Denotes eiA domain that does not exceed the degree of deviation;
definition 5 unstable frequency distribution
FA(t)=[f1(t),f2(t),…,fn(t)]
Wherein, FA(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal period, and is defined in the form of vector for machine learning input in the next step 3;
defining 6 unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2
Dtrain(A)={(FA (1),y(1)),(FA (2),y(2)),...,(FA (n),y(n))}
Wherein Dtrain(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time periodA (i)Corresponding equipment fault label y(1)A composed training data set; the data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role in classification;
defining 7 an unstable frequency distribution matrix
Wherein F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distribution obtained by the ith traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm;
the method specifically comprises the following steps:
A={e1,e2,e3,…,en}
wherein A represents a set of power data attributes, ei,i∈[1,n2]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.),
then, setting a deviation degree and determining a stability period of the attribute set; and extracting a set of stable states
Step 3, learning unstable frequency factors, which specifically comprises the following steps:
step 3.1 construct an unstable frequency tag vector and an unstable frequency distribution matrix
Step 3.2 frequency factor learning: based on a frequency factor learning function, adopting the function to learn parameters;
wherein the content of the first and second substances,as regression labels of the learning function, Fi(i ═ 0,1,2, …) is an unstable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not have data present in the unit pattern period, then the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity; different unstable frequency distributions actually depict the degree of abnormal data with different attributes and all abnormal combination modes; w is aj(j ═ 0,1,2, …) is a univariate learning parameter, vi,vjAre respectively cross variables Fi,FjHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<vi,vj>for implicit parameter vector vi,vjSolving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;
λ(Fi) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F)i)
Step 3.3 optimal solution
According to the equipment fault type, a continuity numerical value fault label or a classification fault label can be set, and a regression type target loss function and a classification type target loss function are respectively adopted for optimization;
return the objective loss function, note that in this case λ 1
Constructing a classification objective loss function
When y is 1:
when y is-1:
representing a hingeloss type classification optimization loss function, wherein max { } represents the maximum value in brackets, and the hingeloss type target optimization function predicts that a specific unsteady frequency distribution points to a power equipment error or a data entry error through the sign taking of an estimated value;
whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
wherein Θ is*Representing a set of parameters in a learning function, including a single-factor parameter wiAnd cross term parameter vi,vj,i,j∈Z+,i<j.
The optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
WhereinThe prediction function is a prediction function substituted into the optimal parameters, when a brand-new unstable frequency distribution is input, the function can provide a predicted value of the state classification of the power equipment, and when the function is substituted into the optimized parameters obtained by training a large amount of historical data, the prediction function value converges to a true value;
step 4, true value discrimination and deduction completion, which specifically comprises the following steps:
step 4.1 discrimination flow
(1) Type data distortion discrimination
When the prediction function value converges to the normal label value of the equipment, attribute data and null set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distortion;
(2) type data distortion discrimination
When the prediction function value is converged and the equipment-specific abnormal label is detected, attribute data equal to 0 in corresponding unstable distribution and an empty set element are judged to be distorted;
(3) type data distortion discrimination
When all elements of unstable distribution are empty sets, the data are completely lost, and the distribution is judged to be distorted;
step 4.2 true value deduction and completion process
When the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the value with the maximum occurrence frequency in the history data which is not overrun is taken to assign and complement the attribute;
when the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and a value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;
when the (3) type data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the (1) type data distortion mode; and (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.
According to the invention, the mass monitoring data collected in the power communication system is subjected to learning and rule analysis by combining a set truth value discrimination method with a prediction function, so that the distribution of unstable frequencies points to abnormal points in the power communication system, and the truth value of the data is judged. By utilizing historical data, abnormal data is automatically positioned and truth value reasoning and completion are carried out, so that the data quality in the power communication system is improved
Drawings
Fig. 1 is a general flow chart.
Fig. 2 is a schematic diagram of a correspondence relationship between an unstable frequency distribution and a tag set.
FIG. 3 is a flow chart of truth discrimination and inferred completion.
Detailed Description
Aiming at the problem of low data quality in the conventional power communication system, the invention aims to establish a machine learning discrimination method for automatically carrying out distortion identification and distortion bit and true value inference on power data.
To accomplish the above object, the present invention is divided into four steps as a whole, and the whole flow chart is shown in figure 1
The step is divided into three sub-steps
Step 1.1, constructing a power data set, and determining to-be-tested attributes of power equipment contained in the power data set
A={e1,e2,e3,…,en}
Wherein A represents a set of power data attributes, ei,i∈[1,n2]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.) representing the monitoring of the environment in which the electrical device is located. The principle of determining the attributes is:
(1) the essential attributes are selected, and the attributes carried by the data for truth discrimination are called the essential attributes, and the attributes are selected.
(2) The associated attribute is selected, the attribute related to the necessary attribute is called the associated attribute, the associated attribute is only used as a machine learning auxiliary basis for true value judgment and inference of the necessary attribute during subsequent processing (for example, the attribute to be measured is the device temperature, and the environment temperature can be selected as the associated attribute), and the system does not perform true value judgment on the associated attribute, so that the selection of the associated attribute can be flexibly determined according to specific conditions.
And 1.2, setting the deviation degree and determining the stable period of the attribute set.
Defining 2 stabilization periods for Properties
Wherein ei(t0) Representing an attribute eiAt t0The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum eiThe values at the initial moments not differing significantlyε represents the small positive deviation that defines the maximum deviation of the attribute value during the stabilization period.
Defining 3 a stabilization period for a property set
T=m(t1,t2,…,tn)
Where T is the stability period of the attribute set, which represents the least common multiple of the stability periods of all attributes in the power data set, and m is the mapping notation from which the least common multiple is extracted.
Step 1.3 extraction of the steady state set
Define 4 set of steady states
Wherein the content of the first and second substances,representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute seti(t) is relatively close to each other,and the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all the attributes in the set A. The realistic meaning of the stable mode is that it describes a stable value distribution of normal attribute values over a small period.
It should be noted that the stable period T is a minimum time span for ensuring that each attribute value keeps the data value stable in this time period, and when the data is amplified to a longer time span, the data may show various changing trends. For constant type attributes (e.g., device location, network element ID, etc.), its settling period can be considered to be 0. For discrete non-numerical attribute values, the integer is adopted to assign values to the discrete non-numerical attribute values in a classified mode, and the method is still suitable for the processing mode.
in the step, the data structure of the multi-source heterogeneous data generated by the power communication system is unified by frequency conversion, so that the algorithm in the step 3 is conveniently introduced, and the method is divided into two sub-steps
Step 2.1 extraction of unstable frequencies
Definitions 5 extraction of unstable frequencies
Wherein f isi(t) represents an attribute eiOf unstable frequency, Nei′(t)]Representing an attribute eiThe count of out-of-bounds deviations when traversed during the stabilization period,representing an attribute eiNumber of times traversed during the stabilization period. D (e)i) Denotes eiWithout exceeding the domain of definition of the degree of deviation.
Step 2.2 construction of unstable frequency distributions
Define 6 unstable frequency distribution
FA(t)=[f1(t),f2(t),…,fn(t)]
Wherein, FA(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal, and is defined in the form of vector for machine learning input in the next step 3.
The practical meaning of the unstable frequency is that the 'abrupt change' attribute value which does not conform to the change rule is statistically indicated, and according to step 2.1, when the data change exceeds the preset deviation degree limit delta, the corresponding unstable frequency can appear, and in a unit time period, the more the number of times of exceeding the limit of the data appears, the larger the corresponding unstable frequency value can be.
Step 3 unstable frequency factor learning
The data quality of the power communication system needs to be quantified by the proportion of the true value, and in a stable period, when the equipment in the power or grid system normally and stably operates, the detected or recorded data can reflect the physical stability, and the data can be defined into a stable set (the complement is unstable distribution) through the deviation degree set in the step 1.2. But if the system is abnormal physically or data entry is wrong, the data is still stable after the power communication is traversed for multiple times, or when the system has no problem, the data traversed for multiple times exceeds the deviation limit, the data can be regarded as distorted, and the problem can be classified as a data accuracy problem, and meanwhile, if the data is totally or partially lost, the problem can be classified as a data integrity problem.
The problems can be uniformly solved through an unstable frequency factor learning algorithm. This step is divided into four substeps.
Step 3.1 construct unstable frequency tag vector
Defining 7 unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2
Dtrain(A)={(FA (1),y(1)),(FA (2),y(2)),...,(FA (n),y(n))}
Wherein Dtrain(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time periodA (i)Corresponding equipment fault label y(1)And (4) forming a training data set. The data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role of classification.
Step 3.2 construct an unstable frequency distribution matrix
Defining 8 an unstable frequency distribution matrix
Wherein, F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distributions obtained by the ith traversal in a stable period in a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm.
Step 3.3 frequency factor learning
The method designs a frequency factor learning function, and parameter learning is carried out by adopting the function.
Definition 8
Wherein the content of the first and second substances,as regression labels of the learning function, FiAnd (i ═ 0,1,2, …) is an unstable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not have data present in the unit pattern period, the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity. The different unstable frequency distributions actually depict the degree of abnormal data appearing in different attributes and all abnormal combination patterns. w is aj(j ═ 0,1,2, …) is a univariate learning parameter, vi,vjAre respectively cross variables Fi,FjHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<vi,vj>for implicit parameter vector vi,vjSolving inner product, implicit parameter is used for resolving implicit parameter before two different unstable frequency distributions are obtained in the optimization stage of objective functionMeanwhile, because i is not equal to j, the autocorrelation influence of unstable distribution is avoided, and the appearance of overfitting is effectively avoided.
λ(Fi) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F)i) The specific expression of the function may be defined according to actual conditions, and the existing purpose of the function is to directly point the missing value to the power communication device where the attribute of the function is located. The design idea here is: unstable truth of abnormal data needs a learning function to determine the truth, which belongs to the category of data accuracy, but data loss is determined to violate the integrity of the data, and at the moment, the index function is directly started to determine the physical reason of data distortion without any verification by the learning function. The parameter u is a dimensional unity factor when g (F)i) When the analytic expression of (2) is determined, the u factor is used to unify the dimension and the learning function.
Step 3.4 optimal solution
According to the equipment fault type, a continuity numerical value fault label or a classification fault label can be set, and a regression type target loss function and a classification type target loss function are respectively adopted for optimization.
A regression target loss function is defined 9, noting that in this case λ is 1
Wherein the content of the first and second substances,in order to be a function of the target loss,corresponding to the frequency factor learning function, y, of definition 8(i)And recording an abnormal label or an error label of the power equipment recorded in the real situation. The optimization goal is to minimize the value of the objective loss function, which has the practical meaning of determining the learning functionSuch that the value of the learning function is closest to the label of the power equipment anomaly. 1/2 is multiplied in the formula, so as to keep the formula concise when partial derivatives are obtained in the subsequent optimization process.
When the abnormal tag or the logging error tag of the power equipment is a discrete classification structure, the target loss may be specifically defined as a changeloss type, note that λ is 1
Definitions 10 construct classification objective loss function
When y is 1:
when y is-1:
definition 10 denotes a hingeloss type classification optimization loss function, where max { } denotes the maximum value in parentheses, and the hingeloss type objective optimization function predicts that a particular unstable frequency distribution points to a power plant error or a data entry error by sign of the estimated value.
Whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
wherein Θ is*Representing a set of parameters in a learning function, including a single-factor parameter wiAnd cross term parameter vi,vj,i,j∈Z+,i<j.
The optimization target can be achieved by adopting a random gradient descent method (SGD), the direction of the gradient can be obtained by calculating the partial derivative of each parameter in the learning function, a step length is further set in the direction determined according to the gradient, and the local optimal solution can be obtained by cyclic iterative updating. The algorithm is as follows:
1. regression objective loss-type optimization iteration mode:
in the regression target loss type optimization iterative algorithm, parameters are updated according to the gradient direction, and delta is set as the step length of each iterative update. The delta needs to be preset according to specific problems, the step length needs to be moderate, when the set step length is too large, the optimization algorithm is possibly difficult to converge, and when the set step length is too small, the iteration times are easily too many, and the calculation resources are wasted.
2. And (3) a classification target loss type optimization mode:
the optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
WhereinFor the prediction function to be substituted into the optimum parameter, the function can give when a completely new unstable frequency distribution is inputA predicted value of the state classification of the power equipment is obtained, and when the predicted value is substituted into an optimized parameter obtained by training a large amount of historical data, a predicted function value is converged to a true value.
Step 4 true value discrimination and deduction completion
The step is divided into two sub-steps, and the schematic diagram of the step is shown in figure 3
Step 4.1 discrimination flow
(4) Type data distortion discrimination
And when the prediction function value converges to the normal label value of the equipment, attribute data and empty set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distorted.
(5) Type data distortion discrimination
When the prediction function value converges with the device-specific abnormality label, the attribute data equal to 0 in the corresponding unsteady distribution and the null set element are judged to be distorted.
(6) Type data distortion discrimination
When all elements of the unstable distribution are empty sets, the data are all missing, and the distribution is judged to be distorted as a whole.
Step 4.2 true value deduction and completion process
And when the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the attribute is assigned with the value with the maximum occurrence frequency in the history data which is not overrun.
When the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and the value with the maximum occurrence frequency in the overrun historical data is taken to complete the attribute assignment.
When the type (3) data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the type (1) data distortion mode. And (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.
Claims (1)
1. The electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning is characterized by comprising the following steps of:
define 1 Property stationary phase
Wherein ei(t0) Representing an attribute eiAt t0The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum eiE with slightly different values at the initial timeiε represents a small positive degree of deviation that defines the maximum degree of deviation of the attribute value during the stabilization period;
defining 2 stationary phases of property sets
T=m(t1,t2,…,tn)
Wherein, T is the stable period of the attribute set, which represents the least common multiple of all the stable periods of the attributes in the power data set, and m is the mapping symbol for extracting the least common multiple;
define 3 set of stable states
Wherein the content of the first and second substances,representing the stable value of each attribute data in the attribute set after a period T of the attribute set, the value and the initial value ei(t) is relatively close to each other,a steady state set representing a set of power data attributes A, whichThe method is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;
definitions 4 extraction of unstable frequencies
Wherein f isi(t) represents an attribute eiOf unstable frequency, N [ e'i(t)]Representing an attribute eiThe count of out-of-bounds deviations when traversed during the stabilization period,representing an attribute eiThe number of times traversed during the stabilization period; d (e)i) Denotes eiA domain that does not exceed the degree of deviation;
definition 5 unstable frequency distribution
FA(t)=[f1(t),f2(t),…,fn(t)]
Wherein, FA(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal period, and is defined in the form of vector for machine learning input in the next step 3;
defining 6 unstable frequency distribution tag set
Wherein Dtrain(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time periodA (i)Corresponding equipment fault label y(i)A composed training data set; the data labels are obtained by carrying out numerical value assignment on error codes with errors in the system and only play a role in classification;
defining 7 an unstable frequency distribution matrix
Wherein F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distribution obtained by the ith traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm;
the method specifically comprises the following steps:
step 1, extracting a stable mode of a power data set, and determining the to-be-tested attribute of the contained power equipment based on the constructed power data set:
A={e1,e2,e3,…,en}
wherein A represents a set of power data attributes, ei,i∈[1,n2]Representing n attributes for monitoring the environment where the power equipment is located, then setting a deviation degree, and determining an attribute set stabilization period; and extracting a stable state set;
step 2, constructing unstable frequency distribution, unifying data structures of multi-source heterogeneous data generated by the power communication system by means of frequency conversion, conveniently introducing the algorithm of the step 3, then extracting unstable frequency, and constructing unstable frequency distribution;
step 3, learning unstable frequency factors, which specifically comprises the following steps:
step 3.1, constructing an unstable frequency label vector and an unstable frequency distribution matrix;
step 3.2 frequency factor learning: based on a frequency factor learning function, adopting the function to learn parameters;
wherein the content of the first and second substances,as regression labels of the learning function, Fi(i-0, 1,2, …) is an unstable frequency distribution, and when the ith attribute does not have data in the unit pattern period, the frequency quantity is assigned to 1, which indicates that data is missing and data integrity is violated; different unstable frequency distributions actually depict the degree of abnormal data with different attributes and all abnormal combination modes; w is ai,(i0,1,2, …) is a univariate learning parameter, vi,vjAre respectively cross variables Fi,FjHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<vi,vj>for implicit parameter vector vi,vjSolving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;
λ(Fi) For the trigger factor, when the ith traversal attribute set is an empty set, the trigger factor closes the learning function and starts the index function g (F)i) U represents a dimensional unity factor;
step 3.3 optimal solution
Setting a continuity numerical value fault label or a classification fault label according to the equipment fault type, and respectively optimizing by adopting a regression type target loss function and a classification type target loss function;
return the objective loss function, note that in this case λ 1
Constructing a classification objective loss function
When y is 1:
when y is-1:
representing a hingeloss type classification optimization loss function, wherein max { } represents the maximum value in brackets, and the hingeloss type target optimization function predicts that a specific unsteady frequency distribution points to a power equipment error or a data entry error through the sign taking of an estimated value;
whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
wherein Θ is*Representing a set of parameters in a learning function, including a single-factor parameter wiAnd cross term parameter vi,vj,i,j∈Z+,i<j;
The optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
WhereinFor substituting a prediction function of the optimal parameters, when a brand new unstable frequency distribution is input, the function gives a prediction value of the state classification of the power equipment, and the current substitution adoptsWhen a large amount of historical data are trained to obtain optimized parameters, the prediction function value is converged to a true value;
step 4, true value discrimination and deduction completion, which specifically comprises the following steps:
step 4.1 discrimination flow
(1) Type data distortion discrimination
When the prediction function value converges to the normal label value of the equipment, attribute data and null set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distortion;
(2) type data distortion discrimination
When the prediction function value is converged and the equipment-specific abnormal label is detected, attribute data equal to 0 in corresponding unstable distribution and an empty set element are judged to be distorted;
(3) type data distortion discrimination
When all elements of unstable distribution are empty sets, the data are completely lost, and the distribution is judged to be distorted;
step 4.2 true value deduction and completion process
When the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the value with the maximum occurrence frequency in the history data which is not overrun is taken to assign and complement the attribute;
when the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree, and the value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;
when the (3) type data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the (1) type data distortion mode; and (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711123306.6A CN107818523B (en) | 2017-11-14 | 2017-11-14 | Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711123306.6A CN107818523B (en) | 2017-11-14 | 2017-11-14 | Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107818523A CN107818523A (en) | 2018-03-20 |
CN107818523B true CN107818523B (en) | 2021-04-16 |
Family
ID=61609208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711123306.6A Active CN107818523B (en) | 2017-11-14 | 2017-11-14 | Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107818523B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549907B (en) * | 2018-04-11 | 2021-11-16 | 武汉大学 | Data verification method based on multi-source transfer learning |
CN109243558A (en) * | 2018-08-28 | 2019-01-18 | 重庆汇邡机械制造有限公司 | Data after carrying out big data collection extract optimization method |
CN113535693B (en) * | 2020-04-20 | 2023-04-07 | 中国移动通信集团湖南有限公司 | Data true value determination method and device for mobile platform and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101622814A (en) * | 2007-03-02 | 2010-01-06 | Nxp股份有限公司 | Fast powering-up of data communication system |
US7783584B1 (en) * | 2007-10-03 | 2010-08-24 | New York University | Controllable oscillator blocks |
EP2626995A1 (en) * | 2012-02-13 | 2013-08-14 | Siemens Aktiengesellschaft | Method for protecting a frequency inverter from asymmetric electrical power flows |
CN103957582A (en) * | 2014-05-17 | 2014-07-30 | 浙江大学宁波理工学院 | Wireless sensor network self-adaptation compression method |
CN104156504A (en) * | 2014-07-21 | 2014-11-19 | 国家电网公司 | Parameter identifiability judgment method for generator excitation system |
CN104866901A (en) * | 2015-05-12 | 2015-08-26 | 西安理工大学 | Optimized extreme learning machine binary classification method based on improved active set algorithms |
CN105045976A (en) * | 2015-07-01 | 2015-11-11 | 中国人民解放军信息工程大学 | Method for modeling terrain property of Wargame map by grid matrix |
CN105122619A (en) * | 2013-03-05 | 2015-12-02 | 通用电气公司 | Power converter and methods for increasing power delivery of soft alternating current power source |
-
2017
- 2017-11-14 CN CN201711123306.6A patent/CN107818523B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101622814A (en) * | 2007-03-02 | 2010-01-06 | Nxp股份有限公司 | Fast powering-up of data communication system |
US7783584B1 (en) * | 2007-10-03 | 2010-08-24 | New York University | Controllable oscillator blocks |
EP2626995A1 (en) * | 2012-02-13 | 2013-08-14 | Siemens Aktiengesellschaft | Method for protecting a frequency inverter from asymmetric electrical power flows |
CN105122619A (en) * | 2013-03-05 | 2015-12-02 | 通用电气公司 | Power converter and methods for increasing power delivery of soft alternating current power source |
CN103957582A (en) * | 2014-05-17 | 2014-07-30 | 浙江大学宁波理工学院 | Wireless sensor network self-adaptation compression method |
CN104156504A (en) * | 2014-07-21 | 2014-11-19 | 国家电网公司 | Parameter identifiability judgment method for generator excitation system |
CN104866901A (en) * | 2015-05-12 | 2015-08-26 | 西安理工大学 | Optimized extreme learning machine binary classification method based on improved active set algorithms |
CN105045976A (en) * | 2015-07-01 | 2015-11-11 | 中国人民解放军信息工程大学 | Method for modeling terrain property of Wargame map by grid matrix |
Non-Patent Citations (2)
Title |
---|
《大数据环境下的多源数据演化更新研究》;余放等;《计算机科学》;20161231;第43卷(第12期);全文 * |
《对区域电网稳定控制系统通信通道自愈方式的研究》;杨济海等;《计算机科学与探索》;20161231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107818523A (en) | 2018-03-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112202736B (en) | Communication network anomaly classification method based on statistical learning and deep learning | |
CN109343995A (en) | Intelligent O&M analysis system based on multi-source heterogeneous data fusion, machine learning and customer service robot | |
CN102637019B (en) | Intelligent integrated fault diagnosis method and device in industrial production process | |
CN105302096B (en) | Intelligent factory scheduling method | |
CN107818523B (en) | Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning | |
CN109255440B (en) | Method for predictive maintenance of power production equipment based on Recurrent Neural Networks (RNN) | |
CN113746663B (en) | Performance degradation fault root cause positioning method combining mechanism data and dual drives | |
CN109492790A (en) | Wind turbines health control method based on neural network and data mining | |
CN114896882A (en) | Data-driven green digital twinning construction system and method | |
CN115508672A (en) | Power grid main equipment fault tracing reasoning method, system, equipment and medium | |
CN117390529A (en) | Multi-factor traceable data center information management method | |
CN115822887A (en) | Performance evaluation and energy efficiency diagnosis method and system of wind turbine generator | |
CN113740666B (en) | Method for positioning root fault of storm alarm in power system of data center | |
CN117393076B (en) | Intelligent monitoring method and system for heat-resistant epoxy resin production process | |
WO2021168490A1 (en) | Method for at least partially decentralized calculation of the state of health of at least one wind turbine | |
CN117078123A (en) | Method and system for calculating available transmission capacity of electric-gas comprehensive energy system | |
Grebenyuk et al. | Technological infrastructure management models and methods based on digital twins | |
CN115145899B (en) | Space-time data anomaly detection method based on manufacturing enterprise data space | |
Dagnely et al. | A semantic model of events for integrating photovoltaic monitoring data | |
CN112801815B (en) | Power communication network fault early warning method based on federal learning | |
Kuang et al. | An Association Rules-Based Method for Outliers Cleaning of Measurement Data in the Distribution Network | |
CN117234785B (en) | Centralized control platform error analysis system based on artificial intelligence self-query | |
Friederich et al. | A Framework for Validating Data-Driven Discrete-Event Simulation Models of Cyber-Physical Production Systems | |
CN108521124A (en) | A kind of visual distribution network failure section partition method | |
Dong et al. | Log fusion technology of power information system based on fuzzy reasoning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |