CN107818523A - Power communication system data true value based on unstable frequency distribution and frequency factor study differentiates and estimating method - Google Patents

Power communication system data true value based on unstable frequency distribution and frequency factor study differentiates and estimating method Download PDF

Info

Publication number
CN107818523A
CN107818523A CN201711123306.6A CN201711123306A CN107818523A CN 107818523 A CN107818523 A CN 107818523A CN 201711123306 A CN201711123306 A CN 201711123306A CN 107818523 A CN107818523 A CN 107818523A
Authority
CN
China
Prior art keywords
data
attribute
unstable
value
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711123306.6A
Other languages
Chinese (zh)
Other versions
CN107818523B (en
Inventor
杨济海
余放
伍小生
彭汐单
巢玉坚
蔡志民
王�华
付萍萍
李敏
吕顺利
邓伟
李志鹏
王泉啸
李石君
余伟
李宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Wuhan University WHU
Nanjing NARI Group Corp
Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Wuhan University WHU
Nanjing NARI Group Corp
Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Wuhan University WHU, Nanjing NARI Group Corp, Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201711123306.6A priority Critical patent/CN107818523B/en
Publication of CN107818523A publication Critical patent/CN107818523A/en
Application granted granted Critical
Publication of CN107818523B publication Critical patent/CN107818523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Water Supply & Treatment (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The present invention is a kind of based on unstable frequency distribution and the power communication system data true value differentiation of frequency factor study and estimating method, the form of multi-source heterogeneous data in power communication system is unified by the frequency of isomeric data first, by the uniform format of each data in power communication system into unstable frequency, machine characteristic study is carried out by frequency factor learning function again, it is divided to two kinds of label forms to carry out parameter optimization solution to frequency learning function, draws anticipation function.Realized and the magnanimity monitoring data collected in power communication system is learnt and rule analysis by the true value method of discrimination combination anticipation function of setting, so that the abnormity point in power communication system is pointed in the distribution of unstable frequency, so as to judge the true value of data.By the utilization to historical data, positioning and true value reasoning completion are carried out to abnormal data automatically, to improve the quality of data in power communication system.

Description

Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning
Technical Field
The invention belongs to application research combining electric power communication data, big data technology and machine learning technology, and carries out machine learning by substituting an unstable frequency mode into a frequency factor learning function, thereby realizing the purposes of learning and rule analysis of massive monitoring data acquired in an electric power communication system, and automatically positioning abnormal data and complementing true value reasoning so as to improve the data quality in the electric power communication system.
Background
The electric power communication system is a generalized concept, generally refers to various subsystems related to an electric power grid and data information generated by the subsystems, and along with the continuous development of the electric power grid in China and the continuous expansion of power consumption requirements, data generated in the electric power communication system is increasingly huge, meanwhile, the data generation speed is also faster and faster, data structures among different subsystems are also greatly different, and the data generated by the electric power communication system becomes typical big data.
The power communication system is an important system for guaranteeing the normal operation of the power system, and equipment is monitored through various sensors, so that a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important blocking factors for locating and analyzing power equipment faults. Improving the data quality of the power communication system is an important link for perfecting the power grid system. Experts at home and abroad propose various solutions for detecting distorted data in a power system, and a document [1] researches the reasons of data distortion in an energy management system, and then solves the problem of data distortion from the reasons. Document [2] is an attempt to improve data quality from the data platform. Document [3] predicts data quality from the perspective of interpolation fitting, and document [4] adopts a data verification technology between different systems using a Common Information Model (CIM) -based high-speed Model exchange format CIM/E text as a carrier, adopts an improved means of screening better quality data from multi-source data, and adopts a method of feeding back field data according to master station state estimation, thereby improving the overall data quality of the power grid dispatching system.
The above bad data detection based on the power equipment state estimation has a certain effect when the quality of local data of the local system is improved, but the bad data detection still has no good applicability to multisource heterogeneous big data generated by the whole power communication system, and the cost for establishing a corresponding knowledge base for each data distortion is relatively high. The frequency factor learning function is based on the machine learning technology, so that the intelligence of the algorithm is relatively improved, a knowledge base does not need to be established, the algorithm is used and learned at any time, and the cost is reduced. Meanwhile, the format of multi-source heterogeneous data is unified through the frequency of the heterogeneous data, and the experimental result shows that the method is suitable for distortion discrimination and truth value inference of the multi-source heterogeneous data generated by the power communication system in a big data environment through a national power grid real data set verification algorithm.
Disclosure of Invention
The power communication system is an important system for guaranteeing the normal operation of the power system, equipment is monitored through various sensors, a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important obstacles for locating and analyzing faults of power equipment. Data anomalies occurring in power communication systems mainly include the following forms:
1. the accuracy of the monitoring data is violated, and the accuracy of the monitoring data refers to the degree of closeness between the value before decision making and the true value, which is finally used for analyzing the data monitored by the power information system after the data is recorded, transmitted, exchanged and stored.
2. The consistency of the monitoring data is violated, and the consistency of the monitoring data refers to whether the data actually recorded by the system meets a certain function dependence or logic relationship, whether the data exceeds the attribute definition domain, and whether the data does not meet the reality.
3. Violating the dimension uniformity of the monitoring data, the dimension uniformity of the monitoring data refers to whether the data with the same attribute has a uniform measurement unit or not.
4. And violating the integrity of the monitoring data, wherein the integrity of the monitoring data is that the data actually recorded by the power information system is missing or whether all data recorded according to the design requirement is completely recorded.
The first three types can be summarized as data accuracy problems, and the fourth type can be regarded as data missing problems. At the present stage, a large amount of data loss phenomena which can be visually observed exist in moral data acquired by the power communication system, and meanwhile, inaccurate data also fills a database. These phenomena are caused partly by the problems of the power monitoring system itself, partly by accidental errors occurring during data entry, and partly by data distortion caused by the problem of system incompatibility during system upgrade.
Aiming at the problem of low data quality in the conventional power communication system, the invention aims to establish a machine learning discrimination method for automatically carrying out distortion identification and distortion bit and true value inference on power data.
To accomplish the above objectives, the present invention comprises four steps as a whole, and the whole flow chart is shown in fig. 1, comprising the following steps:
define 1 Property stationary phase
Wherein e i (t 0 ) Representing an attribute e i At t 0 The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum e i The values at the initial moments not differing significantlyEpsilon represents a small positive deviation from the value that defines the maximum deviation of the attribute value during the stabilization period;
defining 2 stationary phases of property sets
T=m(t 1 ,t 2 ,…,t n )
Wherein, T is the stable period of the attribute set, which represents the least common multiple of the stable periods of all attributes in the power data set, and m is the mapping symbol for extracting the least common multiple;
define 3 the set of steady states
Wherein the content of the first and second substances,representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute set i (t) is relatively close to each other,the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;
definitions 4 extraction of unstable frequencies
Wherein f is i (t) represents an attribute e i Of unstable frequency, ne i ′(t)]Representing an attribute e i The count of out-of-bounds deviations when traversed during the stabilization period,representing an attribute e i Number of times traversed during the stabilization period; d (e) i ) Denotes e i A domain that does not exceed the degree of deviation;
definition 5 unstable frequency distribution
F A (t)=[f 1 (t),f 2 (t),…,f n (t)]
Wherein, F A (t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal period, and is defined in the form of vector for machine learning input in the next step 3;
defining 6 an unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2
D train (A)={(F A (1) ,y (1) ),(F A (2) ,y (2) ),...,(F A (n) ,y (n) )}
Wherein D train (A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time period A (i) Device fault label y corresponding to the device fault label (1) A composed training data set; the data labels can be obtained by carrying out numerical value assignment on error codes with errors in the system, and only play a role in classification;
defining 7 an unstable frequency distribution matrix
Wherein F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distribution obtained by the ith traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm;
the method specifically comprises the following steps:
step 1, extracting a stable mode of a power data set, and determining the attribute to be tested of contained power equipment based on the constructed power data set:
A={e 1 ,e 2 ,e 3 ,…,e n }
wherein A represents a set of power data attributes, e i ,i∈[1,n 2 ]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.),
then, setting a deviation degree and determining a stability period of the attribute set; and extracting a set of stable states
Step 2, constructing unstable frequency distribution, unifying data structures of multi-source heterogeneous data generated by the power communication system by means of frequency conversion, conveniently introducing the algorithm of the step 3, then extracting unstable frequency, and constructing unstable frequency distribution
Step 3, learning unstable frequency factors, which specifically comprises the following steps:
step 3.1 construct an unstable frequency tag vector and an unstable frequency distribution matrix
Step 3.2 frequency factor learning: based on a frequency factor learning function, adopting the function to learn parameters;
wherein, the first and the second end of the pipe are connected with each other,as regression labels of learning functions, F i (i =0,1,2, \8230;) is a non-stable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not appear to data in a unit pattern period, then the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity; different unstable frequency distributions actually depict the degree of abnormal data with different attributes and all abnormal combination modes; w is a j (j =0,1,2, \8230;) is a univariate learning parameter, v i ,v j Are respectively cross variables F i ,F j Hidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<v i ,v j &gt is a hidden parameter vector v i ,v j Solving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;
λ(F i ) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F) i )
Step 3.3 optimal solution
According to the equipment fault type, a continuity value fault label or a classification fault label can be set, and a regression type objective loss function and a classification type objective loss function are respectively adopted for optimization;
regression target loss function, note that in this case λ =1
Constructing a classification objective loss function
When y = 1:
when y = -1:
representing a hingeloss type classification optimization loss function, wherein max { } represents the maximum value in brackets, and the hingeloss type target optimization function predicts that a specific unsteady frequency distribution points to a power equipment error or a data entry error through the sign taking of an estimated value;
whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
wherein Θ is * Representing a set of parameters in a learning function, including a single-factor parameter w i And cross term parameter v i ,v j ,i,j∈Z + ,i<j.
The optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
WhereinA prediction function for substituting the optimum parameters, which gives a classification of the state of the power equipment when a completely new unstable frequency distribution is inputWhen the optimized parameters obtained by training a large amount of historical data are used in the generation of the predicted value of (1), the predicted function value converges to the true value;
step 4, true value discrimination and deduction completion, which specifically comprises the following steps:
step 4.1 discriminating flow
(1) Type data distortion discrimination
When the prediction function value converges to the normal label value of the equipment, attribute data and empty set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distortion;
(2) Type data distortion discrimination
When the prediction function value is converged and the equipment-specific abnormal label is detected, attribute data equal to 0 in the corresponding unstable distribution and the empty set element are judged to be distorted;
(3) Type data distortion discrimination
When all elements of the unstable distribution are empty sets, the data are completely lost, and the distribution is judged to be distorted as a whole;
step 4.2 true value deduction and completion process
When the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the value with the maximum occurrence frequency in the history data which is not overrun is taken to assign and complement the attribute;
when the (2) type data distortion occurs, the distortion data are stable data which do not exceed a preset deviation degree (equipment changes, and the data do not change correspondingly), and a value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;
when the (3) type data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the (1) type data distortion mode; and (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.
According to the invention, the mass monitoring data collected in the power communication system is subjected to learning and rule analysis by combining a set truth value judging method with a prediction function, so that the distribution of unstable frequencies points to abnormal points in the power communication system, and the truth value of the data is judged. By utilizing historical data, abnormal data is automatically positioned and truth value reasoning and completion are carried out so as to improve data quality in the power communication system
Drawings
Fig. 1 is a general flow chart.
Fig. 2 is a schematic diagram of a correspondence relationship between an unstable frequency distribution and a tag set.
Fig. 3 is a flow chart of truth discrimination and inferred completion.
Detailed Description
Aiming at the problem of low data quality in the conventional power communication system, the invention aims to establish a machine learning discrimination method for automatically carrying out distortion identification and distortion bit and true value inference on power data.
In order to achieve the above objects, the present invention is divided into four steps integrally, and the overall flow chart is shown in figure 1
Step 1 extraction of Power data set stabilization mode
The step is divided into three sub-steps
Step 1.1, constructing a power data set, and determining to-be-tested attributes of power equipment contained in the power data set
A={e 1 ,e 2 ,e 3 ,…,e n }
Wherein A represents a set of power data attributes, e i ,i∈[1,n 2 ]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.) representing the monitoring of the environment in which the electrical device is located. The principle of determining the attributes is:
(1) The essential attribute is selected, and the attribute carried by the data for truth discrimination is called the essential attribute, and the attribute is selected.
(2) The associated attribute is selected, the attribute related to the necessary attribute is called the associated attribute, the associated attribute is only used as a machine learning auxiliary basis for true value judgment and inference of the necessary attribute during subsequent processing (for example, the attribute to be measured is the device temperature, and the environment temperature can be selected as the associated attribute), and the system does not perform true value judgment on the associated attribute, so that the selection of the associated attribute can be flexibly determined according to specific conditions.
And 1.2, setting the deviation degree and determining the stable period of the attribute set.
Defining 2 stabilization periods for Properties
Wherein e i (t 0 ) Representing an attribute e i At t 0 The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum e i The values at the initial moments not differing significantlyε represents the small positive deviation that defines the maximum deviation of the attribute value during the stabilization period.
Defining 3 a stabilization period for a property set
T=m(t 1 ,t 2 ,…,t n )
Where T is the stability period of the attribute set, which represents the least common multiple of the stability periods of all attributes in the power data set, and m is the mapping notation from which the least common multiple is extracted.
Step 1.3 extraction of the set of Stable states
Define 4 set of steady states
Wherein the content of the first and second substances,representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute set i (t) is relatively close to each other,and the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all the attributes in the set A. The realistic meaning of the stable mode is that it describes a stable value distribution of normal attribute values over a small period.
It should be noted that the stable period T is a minimum time span for ensuring that each attribute value keeps the data value stable in this time period, and when zooming in to a longer time span, the data may show various changing trends. For constant type attributes (e.g., device location, network element ID, etc.), its settling period can be considered to be 0. For discrete non-numerical attribute values, the integer is adopted to assign values to the discrete non-numerical attribute values in a classified mode, and the method is still suitable for the processing mode.
Step 2, constructing an unstable frequency distribution,
in the step, the data structure of multi-source heterogeneous data generated by the power communication system is unified by means of frequency conversion, so that the algorithm in the step 3 is conveniently introduced, and the method is divided into two sub-steps
Step 2.1 extraction of unstable frequencies
Definitions 5 extraction of unstable frequencies
Wherein, f i (t) represents an attribute e i Of unstable frequency, ne i ′(t)]Representing an attribute e i Is passed over during the stabilization periodThe count of the degree of deviation is determined,representing an attribute e i Number of times traversed during the stabilization period. D (e) i ) Denotes e i Without exceeding the domain of definition of the degree of deviation.
Step 2.2 construction of unstable frequency distributions
Define 6 unstable frequency distribution
F A (t)=[f 1 (t),f 2 (t),…,f n (t)]
Wherein, F A (t) is called unstable frequency distribution, which represents the frequency distribution of the detected unstable attribute from the power attribute set a in the traversal period, and is defined in the form of vector for machine learning input in the next step 3.
The practical meaning of the unstable frequency is that the 'mutation' attribute value which does not accord with the change rule is statistically indicated, according to step 2.1, when the data change exceeds the preset deviation degree limit delta, the corresponding unstable frequency can appear, and in a unit time period, the more the out-of-limit times of the data appear, the larger the corresponding unstable frequency value can be.
Step 3 unstable frequency factor learning
The data quality of the power communication system needs to be quantified by the proportion of the true value, and in a stable period, when the equipment in the power or grid system normally and stably operates, the detected or recorded data can reflect the physical stability, and the data can be defined into a stable set (the complement is unstable distribution) through the deviation degree set in the step 1.2. But if the system is abnormal physically or data entry is wrong, the data is still stable after the power communication is traversed for multiple times, or when the system has no problem, the data traversed for multiple times exceeds the deviation limit, the data can be regarded as distorted, and the problem can be classified as a data accuracy problem, and meanwhile, if the data is totally or partially lost, the problem can be classified as a data integrity problem.
The problems can be uniformly solved through an unstable frequency factor learning algorithm. This step is divided into four substeps.
Step 3.1 construct unstable frequency tag vector
Defining 7 unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2
D train (A)={(F A (1) ,y (1) ),(F A (2) ,y (2) ),...,(F A (n) ,y (n) )}
Wherein D train (A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time period A (i) Corresponding equipment fault label y (1) And (4) forming a training data set. The data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role of classification.
Step 3.2 construct an unstable frequency distribution matrix
Defining 8 an unstable frequency distribution matrix
Wherein, F represents an unstable frequency distribution matrix which is an algebraic structure formed by assembling unstable frequency distributions obtained by the ith traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm.
Step 3.3 frequency factor learning
The method designs a frequency factor learning function, and parameter learning is carried out by adopting the function.
Definition 8
Wherein, the first and the second end of the pipe are connected with each other,as regression labels of learning functions, F i (i =0,1,2, \8230;) is a non-stable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not appear to data in a unit pattern period, the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity. The different unstable frequency distributions actually characterize the degree of abnormal data appearing in different attributes and all abnormal combination patterns. w is a j (j =0,1,2, \8230;) is a univariate learning parameter, v i ,v j Are respectively cross variables F i ,F j Hidden parameters which are key parameters for embodying the learning advantage of the frequency factor learning algorithm,<v i ,v j &gt is a hidden parameter vector v i ,v j And solving inner products, wherein implicit parameters have the function of resolving an implicit relation between two different unstable frequency distributions in the optimization stage of the target function, and meanwhile, since i is not equal to j, the autocorrelation influence of the unstable frequency distributions is avoided, and the appearance of overfitting is effectively avoided.
λ(F i ) For the trigger factor, when the attribute set is an empty set in the ith traversal (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F) i ) The specific expression of the function may be defined according to actual conditions, and the existing purpose of the function is to directly point the missing value to the power communication device where the attribute of the function is located. The design idea here is: unstable truth of abnormal data needs a learning function to determine the truth, which belongs to the category of data accuracy, but data loss is determined to violate the integrity of the data, and at the moment, the index function is directly started to determine the physical reason of data distortion without any verification by the learning function. The parameter u is a dimensional unity factor when g (F) i ) When the analytic expression of (2) is determined, the u factor is used to unify the dimension and the learning function.
Step 3.4 optimal solution
According to the equipment fault type, a continuity value fault label or a classification fault label can be set, and a regression type objective loss function and a classification type objective loss function are respectively adopted for optimization.
Define 9 regression objective loss function, note that this time λ =1
Wherein, the first and the second end of the pipe are connected with each other,in order to be a function of the target loss,corresponding to the frequency factor learning function, y, of definition 8 (i) And recording an abnormal label or an error label of the power equipment recorded in the real situation. The optimization goal is to minimize the value of the objective loss function, which has the practical meaning of determining the learning functionSuch that the value of the learning function is closest to the label of the power equipment anomaly. The expression is multiplied by 1/2, so that the expression is kept concise when partial derivatives are calculated in the subsequent optimization process.
When the abnormal tag or the logging error tag of the power equipment is a discrete classification structure, the target loss can be specifically defined as a changeloss type, note that λ =1 at this time
Definitions 10 construct classification objective loss function
When y = 1:
when y = -1:
definition 10 denotes a changeloss type classification optimization loss function, where max { } denotes taking the maximum value in parentheses, and the changeloss type objective optimization function predicts that a particular unstable frequency distribution points to a power equipment error or a data entry error by taking the sign of an estimated value.
Whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
wherein Θ is * Representing a set of parameters in a learning function, including a single-factor parameter w i And cross term parameter v i ,v j ,i,j∈Z + ,i<j.
A random gradient descent (SGD) method can be adopted to achieve an optimization target, the direction of the gradient can be obtained by calculating the partial derivative of each parameter in the learning function, a step length is further set in the direction determined according to the gradient, and a local optimal solution can be obtained by carrying out cyclic iteration updating. The algorithm is as follows:
1. regression objective loss-type optimization iteration mode:
in the regression target loss type optimization iterative algorithm, parameters are updated according to the gradient direction, and delta is set as the step length of each iterative update. The delta needs to be preset according to specific problems, the step size needs to be moderate, when the set step size is too large, the optimization algorithm is possibly difficult to converge, and when the set step size is too small, the iteration times are easily too many, and the calculation resources are wasted.
2. And (3) a classification target loss type optimization mode:
the optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
WhereinWhen a brand-new unstable frequency distribution is input, the function can provide a predicted value of the state classification of the power equipment, and when an optimized parameter obtained by training a large amount of historical data is substituted, a predicted function value converges to a true value.
Step 4 true value discrimination and inference completion
The step is divided into two sub-steps, and is schematically shown in figure 3
Step 4.1 discriminating flow
(4) Type data distortion discrimination
And when the prediction function value converges to the normal label value of the equipment, attribute data and empty set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distorted.
(5) Type data distortion discrimination
When the prediction function value converges with the device-specific abnormality label, the attribute data equal to 0 in the corresponding unsteady distribution and the null set element are judged to be distorted.
(6) Type data distortion discrimination
When all elements of the unstable distribution are empty sets, it is indicated that all data are missing, and the distribution is judged as distorted as a whole.
Step 4.2 true value deduction and completion process
When the (1) type data distortion occurs, the distortion data are overrun data exceeding a preset deviation degree, and values with the maximum occurrence frequency in the non-overrun historical data are taken to complete the attribute assignment.
When the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and the value with the maximum occurrence frequency in the overrun historical data is taken to complete the attribute assignment.
When the type (3) data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment runs normally, the completion is carried out according to the type (1) data distortion mode. And (3) when the equipment is abnormally operated or the equipment is changed, completing the operation according to a type (2) data distortion mode.

Claims (1)

1. The electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning is characterized by comprising the following steps of:
define 1 Property stationary phase
Wherein e i (t 0 ) Representing an attribute e i At t 0 The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum e i The values at the initial moments not differing significantlyEpsilon table defines a small positive deviation from the attribute value during the stabilization period;
defining 2 stationary phases of property sets
T=m(t 1 ,t 2 ,…,t n )
Wherein, T is the stable period of the attribute set, which represents the least common multiple of all the stable periods of the attributes in the power data set, and m is the mapping symbol for extracting the least common multiple;
define 3 set of stable states
Wherein the content of the first and second substances,representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute set i (t) is relatively close to each other,the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;
definitions 4 extraction of unstable frequencies
Wherein f is i (t) represents an attribute e i Of unstable frequency, ne i ′(t)]Representing an attribute e i Gauge of out-of-range deviation when traversed during stationary phaseThe number of the first and second groups is counted,representing an attribute e i Number of times traversed during the stabilization period; d (e) i ) Denotes e i A domain that does not exceed the degree of deviation;
definition 5 unstable frequency distribution
F A (t)=[f 1 (t),f 2 (t),…,f n (t)]
Wherein, F A (t) is called unstable frequency distribution, which represents the frequency distribution of the detected unstable attribute from the power attribute set a in the traversal period, and is defined in the form of vector for machine learning input in the next step 3;
defining 6 an unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2
D train (A)={(F A (1) ,y (1) ),(F A (2) ,y (2) ),...,(F A (n) ,y (n) )}
Wherein D train (A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time period A (i) Corresponding equipment fault label y (1) A composed training data set; the data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role in classification;
defining 7 an unstable frequency distribution matrix
Wherein, F represents an unstable frequency distribution matrix which is an algebraic structure formed by assembling unstable frequency distributions obtained by the i-th traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm;
the method specifically comprises the following steps:
step 1, extracting a stable mode of a power data set, and determining the attribute to be tested of contained power equipment based on the constructed power data set:
A={e 1 ,e 2 ,e 3 ,…,e n }
wherein A represents a set of power data attributes, e i ,i∈[1,n 2 ]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.),
then, setting a deviation degree and determining a stability period of the attribute set; and extracting a set of stable states
Step 2, constructing unstable frequency distribution, unifying data structures of multi-source heterogeneous data generated by the power communication system by means of frequency conversion, conveniently introducing the algorithm of the step 3, then extracting unstable frequency, and constructing unstable frequency distribution
Step 3, learning unstable frequency factors, which specifically comprises the following steps:
step 3.1 construct an unstable frequency tag vector and an unstable frequency distribution matrix
Step 3.2 frequency factor learning: based on a frequency factor learning function, adopting the function to learn parameters;
wherein, the first and the second end of the pipe are connected with each other,as regression labels of the learning function, F i (i =0,1,2, \ 8230;) is a non-stable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not appear to data in a unit pattern period, then the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity; different unstable frequency distributions actually depict different attributesThe degree of the current abnormal data and all abnormal combination modes; w is a j (j =0,1,2, \8230;) is a univariate learning parameter, v i ,v j Are respectively cross variables F i ,F j Hidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<v i ,v j &gt is a hidden parameter vector v i ,v j Solving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;
λ(F i ) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F) i )
Step 3.3 optimal solution
According to the equipment fault type, a continuity numerical value fault label or a classification fault label can be set, and a regression type target loss function and a classification type target loss function are respectively adopted for optimization;
regression of the objective loss function, note that in this case λ =1
Constructing a classification objective loss function
When y = 1:
when y = -1:
representing a hingeloss type classification optimization loss function, wherein max { } represents the maximum value in brackets, and the hingeloss type target optimization function predicts that a specific unsteady frequency distribution points to a power equipment error or a data entry error through the sign taking of an estimated value;
whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:
wherein Θ * Representing a set of parameters in a learning function, including a single-factor parameter w i And cross term parameter v i ,v j ,i,j∈Z + ,i<j.
The optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment
WhereinIn order to substitute the prediction function of the optimal parameter, when a brand-new unstable frequency distribution is input, the function can provide a prediction value of the state classification of the power equipment, and when the optimal parameter obtained by training by adopting a large amount of historical data is substituted, the prediction function value is converged to a true value;
step 4, true value discrimination and deduction completion, which specifically comprises the following steps:
step 4.1 discriminating flow
(1) Type data distortion discrimination
When the prediction function value converges to the normal label value of the equipment, attribute data and null set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distortion;
(2) Type data distortion discrimination
When the prediction function value is converged and the equipment-specific abnormal label is detected, attribute data equal to 0 in corresponding unstable distribution and an empty set element are judged to be distorted;
(3) Type data distortion discrimination
When all elements of unstable distribution are empty sets, the data are completely lost, and the distribution is judged to be distorted;
step 4.2 truth inference and completion flow
When the (1) type data distortion occurs, the distortion data are overrun data exceeding a preset deviation degree, and values with the maximum occurrence frequency in non-overrun historical data are taken to assign and complement the attributes;
when the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and a value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;
when the (3) type data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the (1) type data distortion mode;
and (3) when the equipment is abnormally operated or the equipment is changed, completing the operation according to a type (2) data distortion mode.
CN201711123306.6A 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning Active CN107818523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711123306.6A CN107818523B (en) 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711123306.6A CN107818523B (en) 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Publications (2)

Publication Number Publication Date
CN107818523A true CN107818523A (en) 2018-03-20
CN107818523B CN107818523B (en) 2021-04-16

Family

ID=61609208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711123306.6A Active CN107818523B (en) 2017-11-14 2017-11-14 Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Country Status (1)

Country Link
CN (1) CN107818523B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549907A (en) * 2018-04-11 2018-09-18 武汉大学 A kind of data verification method based on multi-source transfer learning
CN109243558A (en) * 2018-08-28 2019-01-18 重庆汇邡机械制造有限公司 Data after carrying out big data collection extract optimization method
CN113535693A (en) * 2020-04-20 2021-10-22 中国移动通信集团湖南有限公司 Data true value determination method and device for mobile platform and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101622814A (en) * 2007-03-02 2010-01-06 Nxp股份有限公司 Fast powering-up of data communication system
US7783584B1 (en) * 2007-10-03 2010-08-24 New York University Controllable oscillator blocks
EP2626995A1 (en) * 2012-02-13 2013-08-14 Siemens Aktiengesellschaft Method for protecting a frequency inverter from asymmetric electrical power flows
CN103957582A (en) * 2014-05-17 2014-07-30 浙江大学宁波理工学院 Wireless sensor network self-adaptation compression method
CN104156504A (en) * 2014-07-21 2014-11-19 国家电网公司 Parameter identifiability judgment method for generator excitation system
CN104866901A (en) * 2015-05-12 2015-08-26 西安理工大学 Optimized extreme learning machine binary classification method based on improved active set algorithms
CN105045976A (en) * 2015-07-01 2015-11-11 中国人民解放军信息工程大学 Method for modeling terrain property of Wargame map by grid matrix
CN105122619A (en) * 2013-03-05 2015-12-02 通用电气公司 Power converter and methods for increasing power delivery of soft alternating current power source

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101622814A (en) * 2007-03-02 2010-01-06 Nxp股份有限公司 Fast powering-up of data communication system
US7783584B1 (en) * 2007-10-03 2010-08-24 New York University Controllable oscillator blocks
EP2626995A1 (en) * 2012-02-13 2013-08-14 Siemens Aktiengesellschaft Method for protecting a frequency inverter from asymmetric electrical power flows
CN105122619A (en) * 2013-03-05 2015-12-02 通用电气公司 Power converter and methods for increasing power delivery of soft alternating current power source
CN103957582A (en) * 2014-05-17 2014-07-30 浙江大学宁波理工学院 Wireless sensor network self-adaptation compression method
CN104156504A (en) * 2014-07-21 2014-11-19 国家电网公司 Parameter identifiability judgment method for generator excitation system
CN104866901A (en) * 2015-05-12 2015-08-26 西安理工大学 Optimized extreme learning machine binary classification method based on improved active set algorithms
CN105045976A (en) * 2015-07-01 2015-11-11 中国人民解放军信息工程大学 Method for modeling terrain property of Wargame map by grid matrix

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
余放等: "《大数据环境下的多源数据演化更新研究》", 《计算机科学》 *
杨济海等: "《对区域电网稳定控制系统通信通道自愈方式的研究》", 《计算机科学与探索》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549907A (en) * 2018-04-11 2018-09-18 武汉大学 A kind of data verification method based on multi-source transfer learning
CN108549907B (en) * 2018-04-11 2021-11-16 武汉大学 Data verification method based on multi-source transfer learning
CN109243558A (en) * 2018-08-28 2019-01-18 重庆汇邡机械制造有限公司 Data after carrying out big data collection extract optimization method
CN113535693A (en) * 2020-04-20 2021-10-22 中国移动通信集团湖南有限公司 Data true value determination method and device for mobile platform and electronic equipment
CN113535693B (en) * 2020-04-20 2023-04-07 中国移动通信集团湖南有限公司 Data true value determination method and device for mobile platform and electronic equipment

Also Published As

Publication number Publication date
CN107818523B (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN112202736B (en) Communication network anomaly classification method based on statistical learning and deep learning
CN109343995A (en) Intelligent O&amp;M analysis system based on multi-source heterogeneous data fusion, machine learning and customer service robot
CN107818523B (en) Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning
CN109492790A (en) Wind turbines health control method based on neural network and data mining
CN116032557B (en) Method and device for updating deep learning model in network security anomaly detection
CN117234785B (en) Centralized control platform error analysis system based on artificial intelligence self-query
CN106249709B (en) Dynamic process quality control figure repairs co-design optimal control method with age is determined
CN116308304A (en) New energy intelligent operation and maintenance method and system based on meta learning concept drift detection
CN117390529A (en) Multi-factor traceable data center information management method
CN117743909A (en) Heating system fault analysis method and device based on artificial intelligence
CN113740666B (en) Method for positioning root fault of storm alarm in power system of data center
CN115169704A (en) CVT error state prediction method and device based on increment integrated learning model
CN117436846B (en) Equipment predictive maintenance method and system based on neural network
CN117930815A (en) Wind turbine generator remote fault diagnosis method and system based on cloud platform
CN117419829A (en) Overheat fault early warning method and device and electronic equipment
CN116720983A (en) Power supply equipment abnormality detection method and system based on big data analysis
CN116704729A (en) Industrial kiln early warning system and method based on big data analysis
CN115983714A (en) Static security assessment method and system for edge graph neural network power system
Dagnely et al. A semantic model of events for integrating photovoltaic monitoring data
Yasenjiang et al. Fault Diagnosis and Prediction of Continuous Industrial Processes Based on Hidden Markov Model‐Bayesian Network Hybrid Model
CN112801815A (en) Power communication network fault early warning method based on federal learning
Wang et al. Health state assessment of industrial equipment driven by the fusion of digital twin model and intelligent algorithm
Khan et al. Outliers detection and repairing technique for measurement data in the distribution system
Khalyasmaa et al. Fuzzy inference algorithms for power equipment state assessment
Bai et al. Abnormal Detection Scheme of Substation Equipment based on Intelligent Fusion Terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant