CN111737907A

CN111737907A - Transformer fault diagnosis method and device based on deep learning and DGA

Info

Publication number: CN111737907A
Application number: CN202010483448.9A
Authority: CN
Inventors: 刘力卿; 王伟; 张鑫; 张弛; 郗晓光; 张春晖; 李琳; 冯军基; 马昊; 魏菊芳; 姚创; 段明辉; 文清丰
Original assignee: State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-10-02

Abstract

The invention relates to a transformer fault diagnosis method and a device based on deep learning and DGA, which are technically characterized in that: the method comprises the following steps: taking the volume fractions of DGA characteristic gases H2, CH4, C2H6, C2H4 and C2H2 as diagnostic samples; inputting the sample to be diagnosed into the stack type sparse self-coding deep learning model for transformer fault diagnosis after fine adjustment, and outputting a transformer fault diagnosis result. The method can obviously improve the fault diagnosis effect of the transformer and improve the accuracy of the diagnosis result when the training samples are unbalanced.

Description

Transformer fault diagnosis method and device based on deep learning and DGA

Technical Field

The invention belongs to the technical field of transformer fault diagnosis, relates to a transformer fault diagnosis method and device based on deep learning and DGA, and particularly relates to a transformer fault diagnosis method and device based on weighted comprehensive loss optimization (DGA).

Background

The power transformer is one of the most important devices of the power system, and the accurate diagnosis and the rapid treatment of the transformer fault have important significance for improving the safe and stable operation level of the power system. However, the structure and the operating environment of the transformer are complex, and how to effectively improve the fault diagnosis effect is still a difficult problem.

Analysis of Dissolved Gas (DGA) in oil is an important basis for realizing transformer fault diagnosis, and simple and convenient transformer fault diagnosis methods such as a three-ratio method, an improved three-ratio method and the like are formed on the basis of the DGA. However, the method has the defects of over-absolute characteristic gas ratio limit, incomplete coding and the like, and the fault diagnosis effect is influenced. In order to solve the above defects, Artificial intelligence and Machine learning methods such as an expert system method, a fuzzy theory method, A Neural Network (ANN) method, a Support Vector Machine (SVM) method, and the like are gradually applied, and a good effect is obtained. However, the diagnosis effect of the transformer fault diagnosis method based on the expert system is greatly influenced by the priori knowledge; the transformer fault diagnosis method based on the fuzzy theory has limited learning capacity, and the fault diagnosis effect is greatly influenced by the initial clustering center; when the ANN processes large sample data, the convergence speed is low, and the ANN is easy to fall into local optimum; the SVM method is essentially a two-classifier method, a plurality of SVM classifiers are needed for realizing the fault diagnosis of the transformer, and the parameter setting of each classifier is complicated, so that the fault diagnosis effect is influenced.

The traditional transformer fault diagnosis method based on artificial intelligence and machine learning is a shallow learning method, has the defects of insufficient learning ability, insufficient feature mining ability and the like, and influences the transformer fault diagnosis effect. Compared with a shallow learning method, the deep learning theory has a multi-hidden-layer structure, and high-efficiency approximation of complex input can be realized. Based on the deep learning theory, the deep mining and analysis of the original characteristics can be completed, and the fault-tolerant and expansion capability is high-efficient. The deep learning has outstanding effects in the aspects of feature learning, classification prediction, intelligent decision and the like, and is rapidly popularized and applied.

However, the traditional fault diagnosis method based on deep learning generally realizes optimization based on a cross entropy loss function, however, the method does not limit the misjudgment probability value, and the fault diagnosis effect is influenced; in addition, due to various limitations of equipment, environment and the like, complete samples are difficult to obtain in the running and monitoring processes of the transformer, and the imbalance of the training samples easily affects the fault diagnosis effect.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a transformer fault diagnosis method and device based on weighted comprehensive loss optimization deep learning and DGA (differential global analysis), which can obviously improve the transformer fault diagnosis effect and improve the accuracy of diagnosis results when training samples are unbalanced.

The technical problem to be solved by the invention is realized by adopting the following technical scheme:

a transformer fault diagnosis method based on deep learning and DGA is characterized in that:

taking the volume fractions of DGA characteristic gases H2, CH4, C2H6, C2H4 and C2H2 as diagnostic samples;

inputting the sample to be diagnosed into the stack type sparse self-coding deep learning model for transformer fault diagnosis after fine adjustment, and outputting a transformer fault diagnosis result.

And before inputting the sample to be diagnosed into the trimmed stacked sparse self-coding deep learning model for transformer fault diagnosis, the method further comprises the following steps:

constructing a stack type sparse self-coding deep learning model for transformer fault diagnosis;

setting an input layer of a stack type sparse self-coding deep learning model;

setting an output layer of a stack type sparse self-coding deep learning model;

pre-training network parameters of a stack type sparse self-coding deep learning model for transformer fault diagnosis;

and fine-tuning parameters of a stacked sparse self-coding deep learning model for transformer fault diagnosis.

Moreover, the specific steps of constructing the stacked sparse self-coding deep learning model for transformer fault diagnosis include:

(1) let input and output layers have N number of neurons, hidden layer has M number of neurons, and let input x ∈ R^NWhich can be expressed as h ∈ R after being encoded and decoded respectively^MAnd

and the encoding and decoding processes can be represented by equation (1) and equation (2), respectively, that is:

h＝f(Wx+b) (1)

in the formula, W, W' is encoding and decoding weight matrix respectively; b. b' are respectively encoding and decoding offset vectors; f (-) and g (-) are nonlinear activation functions in the encoding and decoding process, and a Sigmoid function can be generally adopted;

(2) by adjusting network parameters, realize

When x reconstruction error is minimum, h is used as the intrinsic characteristic of the original input, and sparsity limitation is performed on the hidden layer, so that a Sparse Auto-Encoder (SAE) is formed, and the SAE cost function is as follows:

wherein β is a sparse penalty factor coefficient, which can be generally set to 0.3, K_L(. h) is KL divergence; rho₀Is a sparsity parameter;

is the mean activation of neurons in the hidden layer, and has

(3) And stacking the multiple levels of SAEs according to a stacked structure, wherein all SAE input layers are upper SAE output layers, and further constructing a stacked sparse self-coding deep learning model for transformer fault diagnosis.

Moreover, the specific method for setting the input layer of the stacked sparse self-coding deep learning model is as follows:

the input layer of the stacked sparse self-coding deep learning model for transformer fault diagnosis is set to be 5 neurons, and is respectively expressed as x ═ x (x)₁,x₂,x₃,x₄,x₅)^TThe elements thereof each represent hydrogen H₂Methane CH₄Ethane C₂H₆Ethylene C₂H₄Acetylene C₂H₂Volume fraction of (a).

Moreover, the specific method for setting the output layer of the stacked sparse self-coding deep learning model is as follows:

the Softmax layer is set as an output layer of the failure diagnosis model, and the layer contains 9 neurons, which are respectively expressed as p ═ p (p)₁,p₂,p₃,p₄,p₅,p₆,p₇,p₈,p₉)^TThe elements thereof represent the probability of diagnosing as respective states.

Moreover, the specific steps of pre-training the network parameters of the stacked sparse self-coding deep learning model for transformer fault diagnosis include:

(1) for the training set { (x)_i,y_i)|i∈1,2,…,N，y_i∈ 1,2, … K }, wherein x is_iAnd y_iRespectively representing the ith training sample feature vector and a state label, and totally having K states; after the state labels of the training set are hidden, the original training set is changed into a label-free data set { x_i|i∈1,2,…,N}；

(2) And removing the Softmax layer, adding a decoding layer, and sequentially realizing the training of network parameters of each layer of the SSAE deep learning model by adopting a layer-by-layer greedy method according to an SAE cost function based on label-free sample data.

Moreover, the specific step of fine-tuning the parameters of the stacked sparse self-coding deep learning model for transformer fault diagnosis includes:

(1) removing a traditional output layer of the SSAE deep learning model, adding a Softmax classification layer, and realizing optimization of network parameters of each layer of the SSAE based on a cross entropy loss function gradient value and a BP algorithm;

(2) for training sample (x)_i,y_i) After SSAE deep learning model, the cross entropy loss value C_iCan be expressed as:

in the formula 1{k＝y_iIs an indicative function, if k is y_iIf k is not equal to y, its value is 1_iA value of 0;

the probability of the state belonging to the kth class is judged for the kth neuron value of the Softmax layer, namely the SSAE deep learning model;

(3) for sample (x)_i,y_i) Of the combined loss function L_iComprises the following steps:

in the formula (I), the compound is shown in the specification,

to determine the sample as the state k by mistake (k ≠ y)_i) Loss values, and have:

(4) sample (x)_i,y_i) Loss weight value when determined to be state k

Comprises the following steps:

(5) combining the loss weight with the composite loss function to form a weighted composite loss function, as shown in equation (8):

(6) adopting a supervised learning mechanism, adding a sample fault type, firstly taking SSAE network parameters obtained by pre-training as initial parameters, and then taking a weighted loss function value between each training sample prediction fault and a real fault thereof as a network convergence cost; and then summing the weighted loss function values of the training samples to obtain the total loss, and performing optimization fine adjustment on the network parameters of the SSAE diagnostic model layer by using a BP algorithm gradient descent method with the aim of minimum total loss.

A transformer fault diagnosis device based on deep learning and DGA (differential global analysis), comprising:

the diagnostic sample input module takes the volume fractions of DGA characteristic gases H2, CH4, C2H6, C2H4 and C2H2 as diagnostic samples;

and the to-be-detected transformer fault diagnosis result output module inputs the to-be-diagnosed sample into the fine-tuned stack type sparse self-coding deep learning model for transformer fault diagnosis and outputs a transformer fault diagnosis result.

Moreover, the transformer fault diagnosis device based on deep learning and DGA further comprises:

the transformer fault diagnosis model building module is used for building a stack type sparse self-coding deep learning model for transformer fault diagnosis;

the input layer setting module is used for setting an input layer of the stack type sparse self-coding deep learning model;

the output layer setting module is used for setting an output layer of the stacked sparse self-coding deep learning model;

the parameter pre-training module is used for pre-training network parameters of a stack type sparse self-coding deep learning model for transformer fault diagnosis;

and the parameter fine-tuning module is used for fine-tuning network parameters of the stacked sparse self-coding deep learning model for transformer fault diagnosis.

Moreover, the transformer fault diagnosis model building module includes:

an encoding and decoding module for setting N neurons in input and output layers and M neurons in hidden layer, and setting input x ∈ R^NWhich can be expressed as h ∈ R after being encoded and decoded respectively^MAnd

h＝f(Wx+b) (1)

the sparse self-encoder forming module is used for realizing the adjustment of network parameters

is the mean activation of neurons in the hidden layer, and has

And the multi-level SAE stacking module is used for stacking the multi-level SAEs according to a stacked structure, and each SAE input layer is an upper SAE output layer, so that a stacked sparse self-coding deep learning model for transformer fault diagnosis is constructed.

The invention has the advantages and positive effects that:

the invention provides a transformer fault diagnosis method based on weighted comprehensive loss improved deep learning and DGA (differential global analysis), which uses DGA (differential global analysis) characteristic gas H₂、CH₄、C₂H₆、C₂H₄、C₂H₂The volume fraction of the volume fraction is used as input, a transformer fault diagnosis model is constructed by adopting a stack-type sparse self-coding deep learning theory, and meanwhile, network parameters of the deep learning model are finely adjusted by adopting a weighted comprehensive loss function, so that the adverse effect of unbalance of training samples is weakened, and the fault diagnosis effect is improved. The method provided by the invention can obviously improve the transformer fault diagnosis effect and improve the accuracy of the diagnosis result when the training samples are unbalanced.

Drawings

FIG. 1 is a diagram of a transformer fault diagnosis model according to the present invention;

FIG. 2 is a diagram of a basic self-encoder structure according to the present invention;

FIG. 3 is a flow chart of pre-training of network parameters of a transformer fault diagnosis model according to the present invention;

FIG. 4 is a flow chart of fine tuning of network parameters of the transformer fault diagnosis model of the present invention;

FIG. 5 is a flowchart of the transformer fault diagnosis method based on weighted integrated loss optimization deep learning and DGA according to the present invention.

Detailed Description

The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:

example 1:

a transformer fault diagnosis method based on deep learning and DGA, as shown in fig. 1 to 5, comprising the following steps:

step 1, constructing a stack type sparse self-coding deep learning model for transformer fault diagnosis;

as shown in fig. 2, an Auto-encoder (AE) is one of typical structures of deep learning. The AE output layer and the input layer have equal neuron numbers, the process from the input layer to the hidden layer is coding, and the process from the hidden layer to the output layer is decoding.

The specific steps of the step 1 comprise:

h＝f(Wx+b) (1)

(2) by adjusting network parameters, realize

When x has the minimum reconstruction error, h can be used as the intrinsic characteristic of the original input;

when the number of neurons in the hidden layer is large, the AE accuracy is high, but the calculation amount is large. In order to improve the convergence speed, sparsity limitation can be performed on the hidden layer, so as to form a Sparse Auto-Encoder (SAE), where the SAE cost function is:

wherein β is a sparse penalty factor coefficient, which can be generally set to 0.3, K_L(. h) is KL divergence; rho₀The sparsity parameter can be set to 0.05 in general;

is the mean activation of neurons in the hidden layer, and has

(3) SAE is still a shallow learning model, in order to realize feature depth extraction, multiple levels of SAE can be stacked according to a stacked structure, and each SAE input layer is an upper SAE output layer, so that a stacked Sparse Auto-Encoder (SSAE) deep learning model is constructed. The SSAE deep learning model can obtain more complex and abstract deep feature expression of the original input.

Step 2, setting an input layer of a stack type sparse self-coding deep learning model;

most of large-scale power transformers are oil-immersed, when the state of the transformer is abnormal, the components and the content of dissolved gas in oil can be changed along with the abnormal state, fault diagnosis of the transformer can be realized on the basis of DGA, and H is generally selected₂、CH₄、C₂H₆、C₂H₄、C₂H₂As the characteristic gas.

Therefore, as shown in fig. 1, the specific method of step 2 is:

the input layer of the stacked sparse self-coding deep learning model for transformer fault diagnosis is set to be 5 neurons (bias neurons are not considered, the same is applied below), and the input layer is respectively expressed as x (x ═ x-₁,x₂,x₃,x₄,x₅)^TThe elements thereof each represent H₂、CH₄、C₂H₆、C₂H₄、C₂H₂Volume fraction (× 10)^-6)。

Step 3, setting an output layer of the stack type sparse self-coding deep learning model;

the common faults of the transformer mainly include low-temperature overheating, medium-temperature overheating, high-temperature overheating, low-energy discharge, high-energy discharge, low-energy discharge and overheating, high-energy discharge and overheating, partial discharge and the like, and the fault state of the transformer can be coded as shown in table 1.

TABLE 1 Transformer Fault status Label and status coding

Therefore, as shown in fig. 1, the specific method of step 3 is:

Step 4, pre-training network parameters of a stack type sparse self-coding deep learning model for transformer fault diagnosis;

the pre-training phase is an unsupervised learning process.

The specific steps of the step 4 comprise:

(1) for the training set { (x)_i,y_i)|i∈1,2,…,N，y_i∈ 1,2, … K }, wherein x is_iAnd y_iRespectively representing the ith training sample feature vector and a state label, and totally having K states; after the state labels of the training set are hidden, the original training set is changed into a label-free data set { x_i|i∈1,2,…,N}。

As shown in fig. 3, the model Softmax layer in fig. 1 is removed, a decoding layer is added, and based on unlabeled sample data, training of network parameters of each layer of the SSAE deep learning model is sequentially achieved by a layer-by-layer greedy method according to a loss function (SAE cost function) shown in formula (3).

Step 5, fine tuning network parameters of a stack type sparse self-coding deep learning model for transformer fault diagnosis;

the fine tuning stage is a supervised learning process.

As shown in fig. 4, the specific steps of step 5 include:

(2) in the stage, the traditional output layer (namely decoding layer) of the SSAE deep learning model is removed, a Softmax classification layer is added, and optimization of network parameters of each layer of the SSAE is achieved based on the cross entropy loss function gradient value and the BP algorithm.

For training sample (x)_i,y_i) After SSAE deep learning model, the cross entropy loss value C_iCan be expressed as:

wherein 1{ k ═ y_iIs an indicative function, if k is y_iIf k is not equal to y, its value is 1_iA value of 0;

the probability of the state belonging to the kth class is judged for the kth neuron value of the Softmax layer, namely the SSAE deep learning model.

As can be seen from equation (4), for the training sample (x)_i,y_i) The deep learning model correctly determines that the state belongs to y_iThe larger the probability value of (3), the lower the cross entropy loss value, and the better the effect is considered at this moment. But this approach does not take into account the effect of the probability of misjudging as other states. The ideal deep learning model ensures that the probability of correct judgment is higher, and the probability of misjudgment into other states is lower, so that the discrimination of positive judgment and misjudgment is increased, and the deep learning fault diagnosis effect is improved. In addition, in the actual operation and monitoring process of the transformer, complete samples are difficult to obtain, more training samples (large samples) in normal state are highlighted, and fewer training samples (small samples) in fault state or certain specific type of fault state are highlighted, so that the number of the training samples in various fault states is unbalanced. The deep learning model needs to realize feature extraction based on training samples, and if the number of small samples is too small, the feature extraction capability of deep learning on the state sample is reduced, so that the fault diagnosis effect is influenced. The ideal deep learning transformer fault diagnosis method aims at unbalanced distribution of training samples, increases learning strength of small samples and improves fault diagnosis effect.

(2) In order to improve the discrimination of positive judgment and false judgment of the deep learning model, the invention provides a weighted comprehensive loss function. For sample (x)_i,y_i) Weighted integral loss function L thereof_iComprises the following steps:

in the formula, R_iCorrectly determines the state y for the sample_iIs the cross entropy loss value, R_i＝C_i；

as can be seen from the expressions (4) to (6), R is higher as the sample judgment accuracy is higher_iThe smaller the value; the lower the sample decision is to be an error class probability, Z_iThe lower the value. The comprehensive loss function can enable the probability of correct judgment of the sample to be higher and the probability of misjudgment to be lower, so that the discrimination of the correct judgment and the misjudgment is increased, and the accuracy and the stability of fault diagnosis are improved.

(3) In order to improve the training intensity of the small sample, the invention provides loss weight. Sample (x)_i,y_i) Loss weight value when determined to be state k

Comprises the following steps:

from equation (7), the weight is lost

Sample correct decision state y_iThe higher the probability of (c), the loss weight value

The lower. Misjudgment as the state k (k ≠ y)_i) The lower the probability of (D), the loss weight value

The lower. When the training samples are not balanced, under the deep learning model, the small samples are easier to misjudge, and the loss weight value is larger.

(4) Combining the loss weight with the composite loss function to form a weighted composite loss function, as shown in equation (8):

the weighted comprehensive loss function increases the discrimination of positive judgment and false judgment of the sample, and simultaneously improves the proportion of the small sample in the loss function, thereby enhancing the learning strength of the small sample and improving the fault diagnosis effect.

And 6, inputting the sample to be diagnosed into the fine-tuned stack type sparse self-coding deep learning model for transformer fault diagnosis, and outputting a transformer fault diagnosis result.

As shown in FIG. 5, according to the transformer fault diagnosis method based on the weighted integrated loss optimization deep learning and DGA, after the transformer fault diagnosis model is established, according to the sample to be diagnosed, the H value is used as H₂、CH₄、C2H₆、C₂H₄、C₂H₂The volume fraction of the transformer is used as input, and the transformer fault diagnosis can be realized according to the output value of the Softmax layer.

Example 2:

the parameter fine-tuning module is used for fine-tuning network parameters of a stacked sparse self-coding deep learning model for transformer fault diagnosis;

and the transformer fault diagnosis result output module outputs a transformer fault diagnosis result according to the sample to be diagnosed.

The transformer fault diagnosis model building module comprises:

h＝f(Wx+b) (1)

is the mean activation of neurons in the hidden layer, and has

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, but also includes other embodiments that can be derived from the technical solutions of the present invention by those skilled in the art.

Claims

1. A transformer fault diagnosis method based on deep learning and DGA is characterized in that: the method comprises the following steps:

gas H characterized by DGA₂、CH₄、C₂H₆、C₂H₄、C₂H₂The volume fraction of (a) is a diagnostic sample;

2. The transformer fault diagnosis method based on deep learning and DGA as claimed in claim 1, wherein: before inputting a sample to be diagnosed into the trimmed stack type sparse self-coding deep learning model for transformer fault diagnosis, the method further comprises the following steps:

setting an input layer of a stack type sparse self-coding deep learning model;

setting an output layer of a stack type sparse self-coding deep learning model;

3. The transformer fault diagnosis method based on deep learning and DGA as claimed in claim 2, wherein: the specific steps of constructing the stacked sparse self-coding deep learning model for transformer fault diagnosis comprise:

h＝f(Wx+b) (1)

(2) by adjusting network parameters, realize

is the mean activation of neurons in the hidden layer, and has

4. The transformer fault diagnosis method based on deep learning and DGA as claimed in claim 2, wherein: the specific method for setting the input layer of the stacked sparse self-coding deep learning model comprises the following steps:

5. The transformer fault diagnosis method based on deep learning and DGA as claimed in claim 2, wherein: the specific method for setting the output layer of the stacked sparse self-coding deep learning model comprises the following steps:

6. The transformer fault diagnosis method based on deep learning and DGA as claimed in claim 2, wherein: the specific steps of pre-training the network parameters of the stacked sparse self-coding deep learning model for transformer fault diagnosis comprise:

(1) for the training set { (x)_i,y_i)|i∈1,2,…,N，y_i∈ 1,2, … K }, wherein x is_iAnd y_iAre respectively provided withThe feature vector and the state label of the ith training sample are provided, and K states are provided in total; after the state labels of the training set are hidden, the original training set is changed into a label-free data set { x_i|i∈1,2,…,N}；

7. The transformer fault diagnosis method based on deep learning and DGA as claimed in claim 2, wherein: the specific steps of finely adjusting the parameters of the stacked sparse self-coding deep learning model for transformer fault diagnosis comprise:

in the formula (I), the compound is shown in the specification,

(4) sample (x)_i,y_i) Loss weight value when determined to be state k

Comprises the following steps:

8. A transformer fault diagnosis device based on deep learning and DGA is characterized in that: the method comprises the following steps:

9. A transformer fault diagnosis device based on deep learning and DGA is characterized in that: further comprising:

10. The deep learning and DGA-based transformer fault diagnosis apparatus according to claim 9, wherein: the transformer fault diagnosis model building module comprises:

h＝f(Wx+b) (1)

is the mean activation of neurons in the hidden layer, and has