CN107818523B

CN107818523B - Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Info

Publication number: CN107818523B
Application number: CN201711123306.6A
Authority: CN
Inventors: 杨济海; 余放; 伍小生; 彭汐单; 巢玉坚; 蔡志民; 王�华; 付萍萍; 李敏; 吕顺利; 邓伟; 李志鹏; 王泉啸; 李石君; 余伟; 李宇轩
Original assignee: NANJING NANRUI GROUP CO; State Grid Corp of China SGCC; Wuhan University WHU; Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Current assignee: NANJING NANRUI GROUP CO; State Grid Corp of China SGCC; Wuhan University WHU; Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2021-04-16
Anticipated expiration: 2037-11-14
Also published as: CN107818523A

Abstract

The invention relates to a method for distinguishing and deducing true values of power communication system data based on unstable frequency distribution and frequency factor learning. The mass monitoring data collected in the power communication system is learned and rule-analyzed by combining a set truth value discrimination method with a prediction function, so that the distribution of unstable frequencies points to abnormal points in the power communication system, and the truth value of the data is judged. By utilizing the historical data, the abnormal data is automatically positioned and the truth value reasoning is supplemented, so that the data quality in the power communication system is improved.

Description

Electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning

Technical Field

The invention belongs to application research of fusion of electric power communication data and big data technology and machine learning technology, and realizes learning and rule analysis of massive monitoring data collected in an electric power communication system by substituting an unstable frequency mode into a frequency factor learning function for machine learning, and automatically positions abnormal data and completes true value reasoning so as to improve the data quality in the electric power communication system.

Background

The electric power communication system is a generalized concept, generally refers to various subsystems related to an electric power grid and data information generated by the subsystems, and along with the continuous development of the electric power grid in China, the power consumption demand is continuously enlarged, data generated in the electric power communication system is increasingly huge, meanwhile, the data generation speed is also faster and faster, data structures among different subsystems are also greatly different, and the data generated by the electric power communication system becomes typical big data.

The power communication system is an important system for guaranteeing the normal operation of the power system, and equipment is monitored through various sensors, so that a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important obstacles for locating and analyzing faults of power equipment. Improving the data quality of the power communication system is an important link for perfecting the power grid system. Experts at home and abroad propose various solutions for detecting distorted data in a power system, and a document [1] researches the reason of data distortion in an energy management system, and then starts from the reason to solve the problem of data distortion. Document [2] is an attempt to improve data quality from the data platform. Document [3] predicts data quality from the perspective of interpolation fitting, document [4] is based on a data verification technology between different systems using Common Information Model (CIM) high-speed Model exchange format CIM/E text as a carrier, and improves the overall data quality of the power grid dispatching system by adopting an improved means of multi-source data screening better quality data and a method for feeding back field data according to master station state estimation.

The above bad data detection based on the power equipment state estimation has a certain effect when the quality of local data of the local system is improved, but the bad data detection still has no good applicability to multisource heterogeneous big data generated by the whole power communication system, and the cost for establishing a corresponding knowledge base for each data distortion is relatively high. The frequency factor learning function is based on the machine learning technology, so that the intelligence of the algorithm is relatively improved, a knowledge base does not need to be established, the algorithm is used and learned at any time, and the cost is reduced. Meanwhile, the format of multi-source heterogeneous data is unified through the frequency of the heterogeneous data, and the experimental result shows that the method is suitable for distortion discrimination and truth value inference of the multi-source heterogeneous data generated by the power communication system in a big data environment through a national power grid real data set verification algorithm.

Disclosure of Invention

The power communication system is an important system for guaranteeing the normal operation of the power system, and equipment is monitored through various sensors, so that a decision is provided for equipment failure, and a basis is provided for equipment maintenance. A large-scale power communication system generates massive monitoring data, and the data inevitably has a data distortion phenomenon in the processes of acquisition, recording, transmission, exchange and storage. In reality, these distorted data have become important obstacles for locating and analyzing faults of power equipment. Data anomalies occurring in power communication systems mainly include the following forms:

1. the accuracy of the monitoring data is violated, and the accuracy of the monitoring data refers to the fact that the data monitored by the power information system is finally used for analyzing the degree of closeness between the value before decision making and the real value after being recorded, transmitted, exchanged and stored.

2. The consistency of the monitoring data is violated, and the consistency of the monitoring data refers to whether the data actually recorded by the system meets a certain function dependence or logic relationship, whether the data exceeds the attribute definition domain, and whether the data does not meet the reality.

3. Violating the dimension uniformity of the monitoring data, the dimension uniformity of the monitoring data refers to whether the data with the same attribute has a uniform measurement unit or not.

4. And violating the integrity of the monitoring data, wherein the integrity of the monitoring data is that the data actually recorded by the power information system is missing, and whether all data recorded according to the design requirement is completely recorded or not is judged.

The first three types can be summarized as data accuracy problems, and the fourth type can be regarded as data missing problems. At the present stage, a large amount of data loss phenomena which can be visually observed exist in moral data acquired by the power communication system, and meanwhile, inaccurate data also fills a database. These phenomena are caused in part by the problems of the power monitoring system itself, in part by accidental errors occurring during data entry, and in part by data distortion caused by system incompatibility problems during system upgrade.

Aiming at the problem of low data quality in the conventional power communication system, the invention aims to establish a machine learning discrimination method for automatically carrying out distortion identification and distortion bit and true value inference on power data.

To accomplish the above objectives, the present invention comprises four steps as a whole, and the whole flow chart is shown in fig. 1, comprising the following steps:

define 1 Property stationary phase

Wherein e_i(t₀) Representing an attribute e_iAt t₀The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum e_iThe values at the initial moments not differing significantly

Epsilon table defines a small positive deviation from the attribute value during the stabilization period;

defining 2 stationary phases of property sets

T＝m(t₁,t₂,…,t_n)

Wherein, T is the stable period of the attribute set, which represents the least common multiple of all the stable periods of the attributes in the power data set, and m is the mapping symbol for extracting the least common multiple;

define 3 set of stable states

Wherein the content of the first and second substances,

representing a stable value of the attribute data in the attribute set, usually this value together with an initial value e, after a period T of the attribute set_i(t) is relatively close to each other,

the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;

definitions 4 extraction of unstable frequencies

Wherein f is_i(t) represents an attribute e_iOf unstable frequency, Ne_i′(t)]Representing an attribute e_iThe count of out-of-bounds deviations when traversed during the stabilization period,

representing an attribute e_iThe number of times traversed during the stabilization period; d (e)_i) Denotes e_iA domain that does not exceed the degree of deviation;

definition 5 unstable frequency distribution

F_A(t)＝[f₁(t),f₂(t),…,f_n(t)]

Wherein, F_A(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal period, and is defined in the form of vector for machine learning input in the next step 3;

defining 6 unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2

D_train(A)＝{(F_A ⁽¹⁾,y⁽¹⁾),(F_A ⁽²⁾,y⁽²⁾),...,(F_A ⁽ⁿ⁾,y⁽ⁿ⁾)}

Wherein D_train(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time period_A ⁽ⁱ⁾Corresponding equipment fault label y⁽¹⁾A composed training data set; the data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role in classification;

defining 7 an unstable frequency distribution matrix

Wherein F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distribution obtained by the ith traversal in a stable period according to a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm;

the method specifically comprises the following steps:

step 1, extracting a stable mode of a power data set, and determining the to-be-tested attribute of the contained power equipment based on the constructed power data set:

A＝{e₁,e₂,e₃,…,e_n}

wherein A represents a set of power data attributes, e_i,i∈[1,n₂]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.),

then, setting a deviation degree and determining a stability period of the attribute set; and extracting a set of stable states

Step 2, constructing unstable frequency distribution, unifying data structures of multi-source heterogeneous data generated by the power communication system by means of frequency conversion, conveniently introducing the algorithm of the step 3, then extracting unstable frequency, and constructing unstable frequency distribution

Step 3, learning unstable frequency factors, which specifically comprises the following steps:

step 3.1 construct an unstable frequency tag vector and an unstable frequency distribution matrix

Step 3.2 frequency factor learning: based on a frequency factor learning function, adopting the function to learn parameters;

wherein the content of the first and second substances,

as regression labels of the learning function, F_i(i ═ 0,1,2, …) is an unstable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not have data present in the unit pattern period, then the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity; different unstable frequency distributions actually depict the degree of abnormal data with different attributes and all abnormal combination modes; w is a_j(j ═ 0,1,2, …) is a univariate learning parameter, v_i,v_jAre respectively cross variables F_i，F_jHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<v_i,v_j>for implicit parameter vector v_i,v_jSolving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;

λ(F_i) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F)_i)

Step 3.3 optimal solution

According to the equipment fault type, a continuity numerical value fault label or a classification fault label can be set, and a regression type target loss function and a classification type target loss function are respectively adopted for optimization;

return the objective loss function, note that in this case λ 1

Constructing a classification objective loss function

When y is 1:

when y is-1:

representing a hingeloss type classification optimization loss function, wherein max { } represents the maximum value in brackets, and the hingeloss type target optimization function predicts that a specific unsteady frequency distribution points to a power equipment error or a data entry error through the sign taking of an estimated value;

whatever objective function is set, the optimization goal is to solve the parameters in the learning function to minimize the objective loss function value, namely:

wherein Θ is^*Representing a set of parameters in a learning function, including a single-factor parameter w_iAnd cross term parameter v_i,v_j，i,j∈Z⁺,i＜j.

The optimal parameters obtained by the solution are brought into the learning function, and the learning function is converted into a prediction function at the moment

Wherein

The prediction function is a prediction function substituted into the optimal parameters, when a brand-new unstable frequency distribution is input, the function can provide a predicted value of the state classification of the power equipment, and when the function is substituted into the optimized parameters obtained by training a large amount of historical data, the prediction function value converges to a true value;

step 4, true value discrimination and deduction completion, which specifically comprises the following steps:

step 4.1 discrimination flow

(1) Type data distortion discrimination

When the prediction function value converges to the normal label value of the equipment, attribute data and null set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distortion;

(2) type data distortion discrimination

When the prediction function value is converged and the equipment-specific abnormal label is detected, attribute data equal to 0 in corresponding unstable distribution and an empty set element are judged to be distorted;

(3) type data distortion discrimination

When all elements of unstable distribution are empty sets, the data are completely lost, and the distribution is judged to be distorted;

step 4.2 true value deduction and completion process

When the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the value with the maximum occurrence frequency in the history data which is not overrun is taken to assign and complement the attribute;

when the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and a value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;

when the (3) type data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the (1) type data distortion mode; and (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.

According to the invention, the mass monitoring data collected in the power communication system is subjected to learning and rule analysis by combining a set truth value discrimination method with a prediction function, so that the distribution of unstable frequencies points to abnormal points in the power communication system, and the truth value of the data is judged. By utilizing historical data, abnormal data is automatically positioned and truth value reasoning and completion are carried out, so that the data quality in the power communication system is improved

Drawings

Fig. 1 is a general flow chart.

Fig. 2 is a schematic diagram of a correspondence relationship between an unstable frequency distribution and a tag set.

FIG. 3 is a flow chart of truth discrimination and inferred completion.

Detailed Description

To accomplish the above object, the present invention is divided into four steps as a whole, and the whole flow chart is shown in figure 1

Step 1 extraction of Power data set stabilization mode

The step is divided into three sub-steps

Step 1.1, constructing a power data set, and determining to-be-tested attributes of power equipment contained in the power data set

A＝{e₁,e₂,e₃,…,e_n}

Wherein A represents a set of power data attributes, e_i,i∈[1,n₂]N attributes (e.g., network element ID, current, device temperature, humidity, time, etc.) representing the monitoring of the environment in which the electrical device is located. The principle of determining the attributes is:

(1) the essential attributes are selected, and the attributes carried by the data for truth discrimination are called the essential attributes, and the attributes are selected.

(2) The associated attribute is selected, the attribute related to the necessary attribute is called the associated attribute, the associated attribute is only used as a machine learning auxiliary basis for true value judgment and inference of the necessary attribute during subsequent processing (for example, the attribute to be measured is the device temperature, and the environment temperature can be selected as the associated attribute), and the system does not perform true value judgment on the associated attribute, so that the selection of the associated attribute can be flexibly determined according to specific conditions.

And 1.2, setting the deviation degree and determining the stable period of the attribute set.

Defining 2 stabilization periods for Properties

ε represents the small positive deviation that defines the maximum deviation of the attribute value during the stabilization period.

Defining 3 a stabilization period for a property set

T＝m(t₁,t₂,…,t_n)

Where T is the stability period of the attribute set, which represents the least common multiple of the stability periods of all attributes in the power data set, and m is the mapping notation from which the least common multiple is extracted.

Step 1.3 extraction of the steady state set

Define 4 set of steady states

Wherein the content of the first and second substances,

and the stable state set represents the power data attribute set A and is formed by combining stable values corresponding to all the attributes in the set A. The realistic meaning of the stable mode is that it describes a stable value distribution of normal attribute values over a small period.

It should be noted that the stable period T is a minimum time span for ensuring that each attribute value keeps the data value stable in this time period, and when the data is amplified to a longer time span, the data may show various changing trends. For constant type attributes (e.g., device location, network element ID, etc.), its settling period can be considered to be 0. For discrete non-numerical attribute values, the integer is adopted to assign values to the discrete non-numerical attribute values in a classified mode, and the method is still suitable for the processing mode.

Step 2, constructing an unstable frequency distribution,

in the step, the data structure of the multi-source heterogeneous data generated by the power communication system is unified by frequency conversion, so that the algorithm in the step 3 is conveniently introduced, and the method is divided into two sub-steps

Step 2.1 extraction of unstable frequencies

Definitions 5 extraction of unstable frequencies

representing an attribute e_iNumber of times traversed during the stabilization period. D (e)_i) Denotes e_iWithout exceeding the domain of definition of the degree of deviation.

Step 2.2 construction of unstable frequency distributions

Define 6 unstable frequency distribution

F_A(t)＝[f₁(t),f₂(t),…,f_n(t)]

Wherein, F_A(t) is called unstable frequency distribution, which represents the frequency distribution where the unstable attribute is detected from the power attribute set a during the traversal, and is defined in the form of vector for machine learning input in the next step 3.

The practical meaning of the unstable frequency is that the 'abrupt change' attribute value which does not conform to the change rule is statistically indicated, and according to step 2.1, when the data change exceeds the preset deviation degree limit delta, the corresponding unstable frequency can appear, and in a unit time period, the more the number of times of exceeding the limit of the data appears, the larger the corresponding unstable frequency value can be.

Step 3 unstable frequency factor learning

The data quality of the power communication system needs to be quantified by the proportion of the true value, and in a stable period, when the equipment in the power or grid system normally and stably operates, the detected or recorded data can reflect the physical stability, and the data can be defined into a stable set (the complement is unstable distribution) through the deviation degree set in the step 1.2. But if the system is abnormal physically or data entry is wrong, the data is still stable after the power communication is traversed for multiple times, or when the system has no problem, the data traversed for multiple times exceeds the deviation limit, the data can be regarded as distorted, and the problem can be classified as a data accuracy problem, and meanwhile, if the data is totally or partially lost, the problem can be classified as a data integrity problem.

The problems can be uniformly solved through an unstable frequency factor learning algorithm. This step is divided into four substeps.

Step 3.1 construct unstable frequency tag vector

Defining 7 unstable frequency distribution label set, and the corresponding relation is schematically shown in figure 2

Wherein D_train(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time period_A ⁽ⁱ⁾Corresponding equipment fault label y⁽¹⁾And (4) forming a training data set. The data label can be obtained by carrying out numerical value assignment on the error code with the error in the system, and only plays a role of classification.

Step 3.2 construct an unstable frequency distribution matrix

Defining 8 an unstable frequency distribution matrix

Wherein, F represents an unstable frequency distribution matrix, which is an algebraic structure formed by assembling unstable frequency distributions obtained by the ith traversal in a stable period in a row vector form, and the structure is favorable for being introduced into an algorithm and is a standard format of an input frequency factor learning algorithm.

Step 3.3 frequency factor learning

The method designs a frequency factor learning function, and parameter learning is carried out by adopting the function.

Definition 8

Wherein the content of the first and second substances,

as regression labels of the learning function, F_iAnd (i ═ 0,1,2, …) is an unstable frequency distribution (argument in the form of a vector), and in particular, when the ith attribute does not have data present in the unit pattern period, the frequency quantity is assigned a value of 1, indicating that data is missing, violating data integrity. The different unstable frequency distributions actually depict the degree of abnormal data appearing in different attributes and all abnormal combination patterns. w is a_j(j ═ 0,1,2, …) is a univariate learning parameter, v_i,v_jAre respectively cross variables F_i，F_jHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<v_i,v_j>for implicit parameter vector v_i,v_jSolving inner product, implicit parameter is used for resolving implicit parameter before two different unstable frequency distributions are obtained in the optimization stage of objective functionMeanwhile, because i is not equal to j, the autocorrelation influence of unstable distribution is avoided, and the appearance of overfitting is effectively avoided.

λ(F_i) For the trigger factor, when the ith traversal attribute set is an empty set (when all data in the attribute set is missing), the trigger factor closes the learning function and starts the index function g (F)_i) The specific expression of the function may be defined according to actual conditions, and the existing purpose of the function is to directly point the missing value to the power communication device where the attribute of the function is located. The design idea here is: unstable truth of abnormal data needs a learning function to determine the truth, which belongs to the category of data accuracy, but data loss is determined to violate the integrity of the data, and at the moment, the index function is directly started to determine the physical reason of data distortion without any verification by the learning function. The parameter u is a dimensional unity factor when g (F)_i) When the analytic expression of (2) is determined, the u factor is used to unify the dimension and the learning function.

Step 3.4 optimal solution

According to the equipment fault type, a continuity numerical value fault label or a classification fault label can be set, and a regression type target loss function and a classification type target loss function are respectively adopted for optimization.

A regression target loss function is defined 9, noting that in this case λ is 1

Wherein the content of the first and second substances,

in order to be a function of the target loss,

corresponding to the frequency factor learning function, y, of definition 8⁽ⁱ⁾And recording an abnormal label or an error label of the power equipment recorded in the real situation. The optimization goal is to minimize the value of the objective loss function, which has the practical meaning of determining the learning function

Such that the value of the learning function is closest to the label of the power equipment anomaly. 1/2 is multiplied in the formula, so as to keep the formula concise when partial derivatives are obtained in the subsequent optimization process.

When the abnormal tag or the logging error tag of the power equipment is a discrete classification structure, the target loss may be specifically defined as a changeloss type, note that λ is 1

Definitions 10 construct classification objective loss function

When y is 1:

when y is-1:

definition 10 denotes a hingeloss type classification optimization loss function, where max { } denotes the maximum value in parentheses, and the hingeloss type objective optimization function predicts that a particular unstable frequency distribution points to a power plant error or a data entry error by sign of the estimated value.

The optimization target can be achieved by adopting a random gradient descent method (SGD), the direction of the gradient can be obtained by calculating the partial derivative of each parameter in the learning function, a step length is further set in the direction determined according to the gradient, and the local optimal solution can be obtained by cyclic iterative updating. The algorithm is as follows:

1. regression objective loss-type optimization iteration mode:

in the regression target loss type optimization iterative algorithm, parameters are updated according to the gradient direction, and delta is set as the step length of each iterative update. The delta needs to be preset according to specific problems, the step length needs to be moderate, when the set step length is too large, the optimization algorithm is possibly difficult to converge, and when the set step length is too small, the iteration times are easily too many, and the calculation resources are wasted.

2. And (3) a classification target loss type optimization mode:

Wherein

For the prediction function to be substituted into the optimum parameter, the function can give when a completely new unstable frequency distribution is inputA predicted value of the state classification of the power equipment is obtained, and when the predicted value is substituted into an optimized parameter obtained by training a large amount of historical data, a predicted function value is converged to a true value.

Step 4 true value discrimination and deduction completion

The step is divided into two sub-steps, and the schematic diagram of the step is shown in figure 3

Step 4.1 discrimination flow

(4) Type data distortion discrimination

And when the prediction function value converges to the normal label value of the equipment, attribute data and empty set elements which are more than 0 in the corresponding unstable frequency distribution are judged to be distorted.

(5) Type data distortion discrimination

When the prediction function value converges with the device-specific abnormality label, the attribute data equal to 0 in the corresponding unsteady distribution and the null set element are judged to be distorted.

(6) Type data distortion discrimination

When all elements of the unstable distribution are empty sets, the data are all missing, and the distribution is judged to be distorted as a whole.

Step 4.2 true value deduction and completion process

And when the (1) type data distortion occurs, the distortion data is overrun data exceeding a preset deviation degree, and the attribute is assigned with the value with the maximum occurrence frequency in the history data which is not overrun.

When the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree (equipment changes, and data does not change correspondingly), and the value with the maximum occurrence frequency in the overrun historical data is taken to complete the attribute assignment.

When the type (3) data distortion occurs, the distortion data is data of an empty set, the completion mode is divided into two conditions, and when the equipment operates normally, the completion is carried out according to the type (1) data distortion mode. And (3) when the equipment is abnormally operated or the equipment is changed, completing according to the type (2) data distortion mode.

Claims

1. The electric power communication system data truth value distinguishing and deducing method based on unstable frequency distribution and frequency factor learning is characterized by comprising the following steps of:

define 1 Property stationary phase

Wherein e_i(t₀) Representing an attribute e_iAt t₀The value of the attribute at the moment, t represents a minimum stable period of the attribute, i.e. after t time, the value of the attribute returns to the sum e_iE with slightly different values at the initial time_iε represents a small positive degree of deviation that defines the maximum degree of deviation of the attribute value during the stabilization period;

defining 2 stationary phases of property sets

T＝m(t₁,t₂,…,t_n)

define 3 set of stable states

Wherein the content of the first and second substances,

representing the stable value of each attribute data in the attribute set after a period T of the attribute set, the value and the initial value e_i(t) is relatively close to each other,

a steady state set representing a set of power data attributes A, whichThe method is formed by combining stable values corresponding to all attributes in the set A; the realistic meaning of the stable mode is that the stable mode describes the stable value distribution condition of the normal attribute value in a small period;

definitions 4 extraction of unstable frequencies

Wherein f is_i(t) represents an attribute e_iOf unstable frequency, N [ e'_i(t)]Representing an attribute e_iThe count of out-of-bounds deviations when traversed during the stabilization period,

definition 5 unstable frequency distribution

F_A(t)＝[f₁(t),f₂(t),…,f_n(t)]

defining 6 unstable frequency distribution tag set

Wherein D_train(A) Representing an unstable frequency signature vector which is essentially a distribution F of unstable frequencies from the i-th time period_A ⁽ⁱ⁾Corresponding equipment fault label y⁽ⁱ⁾A composed training data set; the data labels are obtained by carrying out numerical value assignment on error codes with errors in the system and only play a role in classification;

defining 7 an unstable frequency distribution matrix

the method specifically comprises the following steps:

A＝{e₁,e₂,e₃,…,e_n}

wherein A represents a set of power data attributes, e_i,i∈[1,n₂]Representing n attributes for monitoring the environment where the power equipment is located, then setting a deviation degree, and determining an attribute set stabilization period; and extracting a stable state set;

step 2, constructing unstable frequency distribution, unifying data structures of multi-source heterogeneous data generated by the power communication system by means of frequency conversion, conveniently introducing the algorithm of the step 3, then extracting unstable frequency, and constructing unstable frequency distribution;

step 3.1, constructing an unstable frequency label vector and an unstable frequency distribution matrix;

wherein the content of the first and second substances,

as regression labels of the learning function, F_i(i-0, 1,2, …) is an unstable frequency distribution, and when the ith attribute does not have data in the unit pattern period, the frequency quantity is assigned to 1, which indicates that data is missing and data integrity is violated; different unstable frequency distributions actually depict the degree of abnormal data with different attributes and all abnormal combination modes; w is a_i,(_i0,1,2, …) is a univariate learning parameter, v_i,v_jAre respectively cross variables F_i，F_jHidden parameters which are key parameters for embodying the learning advantages of the frequency factor learning algorithm,<v_i,v_j>for implicit parameter vector v_i,v_jSolving inner products, wherein implicit parameters are used for resolving implicit relations before two different unstable frequency distributions in the target function optimization stage, and meanwhile, since i is not equal to j, autocorrelation influence of unstable distribution is avoided, and overfitting is effectively avoided;

λ(F_i) For the trigger factor, when the ith traversal attribute set is an empty set, the trigger factor closes the learning function and starts the index function g (F)_i) U represents a dimensional unity factor;

step 3.3 optimal solution

Setting a continuity numerical value fault label or a classification fault label according to the equipment fault type, and respectively optimizing by adopting a regression type target loss function and a classification type target loss function;

return the objective loss function, note that in this case λ 1

Constructing a classification objective loss function

When y is 1:

when y is-1:

wherein Θ is^*Representing a set of parameters in a learning function, including a single-factor parameter w_iAnd cross term parameter v_i,v_j，i,j∈Z⁺,i＜j；

Wherein

For substituting a prediction function of the optimal parameters, when a brand new unstable frequency distribution is input, the function gives a prediction value of the state classification of the power equipment, and the current substitution adoptsWhen a large amount of historical data are trained to obtain optimized parameters, the prediction function value is converged to a true value;

step 4.1 discrimination flow

(1) Type data distortion discrimination

(2) type data distortion discrimination

(3) type data distortion discrimination

step 4.2 true value deduction and completion process

when the type (2) data distortion occurs, the distortion data is stable data which does not exceed a preset deviation degree, and the value with the maximum occurrence frequency in the overrun historical data is taken to assign and complement the attribute;