CN113191397B

CN113191397B - Multidimensional signal feature fusion method based on maximum correlation entropy criterion

Info

Publication number: CN113191397B
Application number: CN202110381734.9A
Authority: CN
Inventors: 董鑫源; 林鹏; 曹九稳
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2024-02-13
Anticipated expiration: 2041-04-09
Also published as: CN113191397A

Abstract

The invention discloses a multidimensional signal feature fusion method based on a maximum correlation entropy criterion. Firstly, introducing a kernel function theory aiming at the defect of minimum mean square error, and introducing a maximum correlation entropy criterion; then, a maximum correlation entropy criterion is put forward to replace the minimum mean square error as a loss function of ELM-AE; secondly, describing the derivation process of the formula in detail, stacking a plurality of ELM-AE to form a deep neural network based on the maximum relevant entropy criterion; thirdly, extracting multidimensional time domain statistical characteristic indexes and nonlinear characteristic entropy of the signals aiming at the mechanical vibration signals of the rotating stall of a certain axial flow compressor; finally, the depth neural network is utilized to perform feature fusion on the extracted multidimensional signal features, and the effectiveness of the method in the aspect of feature fusion is further verified through feature visual analysis.

Description

Multidimensional signal feature fusion method based on maximum correlation entropy criterion

Technical Field

The invention belongs to the field of machine learning and data mining, and particularly relates to a method for realizing feature fusion of multidimensional signals by introducing a maximum related entropy criterion as a loss function of a deep neural network algorithm.

Background

Often there is a large amount of irrelevant and redundant information in a multi-dimensional dataset, increasing the difficulty of data processing, knowledge mining, and pattern recognition. Feature fusion is used as a key method for solving the problem, original multidimensional data is subjected to feature fusion by using a deep neural network, noise and redundant information are filtered, representative sensitive features are reserved, and a common strategy is adopted. The complexity and uncertainty caused by the traditional manual feature extraction and selection are avoided, and meanwhile, the accuracy of the classifier can be improved. The maximum correlation entropy is used as a local similarity measurement criterion, and has better robustness in the aspect of processing non-Gaussian noise and distant outliers. Has been successfully applied to various fields such as signal filtering, degradation and machine learning.

The self-encoder is an unsupervised neural network with three layers, the first layer being the input layer, the middle layer being the hidden layer, and the last layer being the output layer. As shown in fig. 1. The intrinsic features of the data can be learned without the tag vector, belonging to unsupervised learning. Unlike conventional single hidden layer neural networks, the learning process of the self-encoder includes two stages, encoding and decoding. The data is encoded by mapping the input data of the multidimensional space into the low-dimensional space from the input layer to the hidden layer of the encoder, and finally reconstructing the input data from the hidden layer to the output layer for decoding. The hidden layer is used for ensuring that the output data is consistent with the input data as much as possible in the encoding and decoding processes, and extracting the main characteristics of the input data and the output data by using the minimum dimension. The target output value is brought close to the input value by minimizing the reconstruction error. Minimizing Mean Square Error (MSE) is widely used for the loss function of self-encoders, which generally perform well without complex noise interference. However, in practical applications, the measured data is affected by the working conditions and environmental changes, and is doped with abnormal data of a large amount of noise, so that the performance of the conventional self-encoder is degraded. In addition, the traditional self-encoder adopts a gradient descent method to train network parameters, so that a loss function is minimized, and under the supervision condition, a label of data is used as a supervision signal to calculate a network error. In the process of finishing the reverse fine tuning of the weights and the biases of each layer by the gradient descent method, the local optimal solution is easy to fall into, and the training time of the network is increased.

The overrunning learning machine (Extreme LearningMachine, ELM) algorithm is a simple, easy-to-use, effective single hidden layer feedforward neural network learning algorithm, as shown in fig. 2. The traditional neural network learning algorithm (such as BP algorithm) needs to manually set a large number of network training parameters, and a local optimal solution is easy to generate in the solving process by adopting a gradient descent method. In the process of determining network parameters, ELM only needs to set the number of hidden layer nodes of the network, and in the process of executing the algorithm, the input weight of the network and the bias of hidden elements do not need to be adjusted, and a unique optimal solution is generated. The output of ELM isWherein: beta _i Is the weight between hidden node and output node, G (a _i ,b _i X) is an implicit layer output function, h (x) = [ G (a) ₁ ,b ₁ ,x),...,G(a _L ,b _L ,x)] ^T Is the output vector of the hidden layer relative to the input X. The key to ELM is to minimize training error +.>And outputting the weight norm beta. Thus, by setting the output of the ELM equal to the input, an over-limit learning-based self-encoder (ELM-AE) can be obtained, as shown in fig. 3. Compared with the traditional self-encoder based on back propagation, the self-encoder based on the overrun learning machine adopts a least square method and only performs one-step back calculation to obtain updated weights, so that the self-encoder based on the overrun learning machine has faster training speed and less human intervention.

Disclosure of Invention

The invention aims to provide a depth neural network algorithm based on a maximum correlation entropy criterion, which is a more efficient feature fusion method, aiming at the defect that the existing minimum mean square error exists as a loss function of a traditional self-encoder.

The invention mainly realizes the following flow: first, ELM algorithm and minimum mean square error principle are summarized; secondly, introducing a kernel function theory, and providing a maximum correlation entropy criterion to replace a minimum mean square error as a loss function of ELM-AE, and forming a plurality of ELM-AE stacks into a multi-layer depth neural network; thirdly, aiming at a vibration signal of a certain axial flow compressor with rotating stall, intercepting the signal in equal step length by window, and extracting a multidimensional time domain statistical index and nonlinear characteristic entropy of each window signal; finally, the extracted multidimensional signal features are used as input data of the proposed method, and the effectiveness of the proposed method in the aspect of feature fusion is further verified through feature visual analysis of feature output.

The specific technical scheme of the invention is realized by the following steps:

step 1, summarizing ELM algorithm:

1-1、given training set { (x) _i ,t _i )x _i ∈R ⁿ ,t _i ∈R ^m I=1, 2,..n }, hidden layer node output function g (w, b, x) and hidden layer node number L;

1-2, randomly assigning parameters (w) of hidden layer nodes _i ,b _i )，i＝1,2...,L；

1-3, calculating an implicit layer output matrix H;

1-4, calculating a weight β=h between the hidden layer node and the output node ⁺ T；

Wherein H is ⁺ The Moore-Penrose generalized inverse matrix of the hidden layer output matrix H can be calculated by using methods such as an orthogonal projection method, an orthogonalization method, singular value decomposition and the like.

Step 2, minimum mean square error principle

Given training sample x= { X ₁ ,x ₂ ,…x _m (where x _i ＝[x ₁ ,x ₂ ,…,x _D ] ^T ) The input vector X is coded by the self-encoder into the representation h= { h of the hidden layer ₁ ,h ₂ ,…h _m (wherein h) _i ＝[h ₁ ,h ₂ ,…h _d ] ^T ) Derived from the Sigmod activation function:

h＝S _f (ω ⁽¹⁾ x+b ⁽¹⁾ ) (1)

S _f (t)＝1/(1+e ^-t ) (2)

wherein X is a D-dimensional vector, h is a D-dimensional vector, ω ⁽¹⁾ Is a D x D-dimensional weight matrix, b ⁽¹⁾ Is a d-dimensional offset vector;

2-1, decoding the vector h from z=s _f (ω ⁽²⁾ h+b ⁽²⁾ ) Resulting in a reconstruction vector z= { z ₁ ,z ₂ ,z ₃ …z _m (wherein z) _i ＝[z ₁ ,z ₂ ,…,z _D ] ^T ) Wherein Z is a D-dimensional vector, ω ⁽²⁾ Is a D x D-dimensional weight matrix, b ⁽²⁾ Is a D-dimensional offset vector;

the training phase of the 2-2, self-encoder aims at optimizing the parameter set θ= { ω ⁽¹⁾ ,b ⁽¹⁾ ,ω ⁽²⁾ ,b ⁽²⁾ Minimizing reconstruction errors, the minimum Mean Square Error (MSE) is typically the function of conventional self-encoder loss:

step 3, ELM-AE multilayer deep neural network

3-1, aiming at the problem of insufficient characterization capability of a single-layer ELM-AE in a noise-doped complex abnormal data set, adopting a multi-layer deep neural network algorithm of a stack ELM-AE, and aiming at the ELM-AE of each layer, the objective function is as follows:

in the formula, C is a regularization coefficient, and the depth is generated by the input weight omega _A And bias b _A An orthogonal constraint is applied and an output weight beta is introduced _A The F norm regularization term of (2) so as to improve the generalization performance of ELM-AE;

3-2, output weight beta _A Can pass throughIt is obtained that when the number of input samples is far greater than the number of hidden nodes (L>N), then the output weight can be derived from the following equation;

the encoded output of 3-3, ELM-AE is y=g (β _A X) for the ELM-AE structure of the stack, each ELM-AE encoded feature will serve as input to the next ELM-AE. K ELM-AE are used in total in the feature extraction stage to let Y ^(k) As a coding feature of the Kth ELM-AE (Y ⁽⁰⁾ X), then the code output of the k+1th elm=ae is Y ^(k+1) ＝g(β ^k+1 ·X ^(k) ) Where k=0, 1, … K-1.

Step 4, kernel function theory

The function satisfying the Mercer theory can be regarded as a kernel function to calculate the dot product of the feature space, and the function must be continuous and positive. Through a kernel function, the original data point can be mapped to a multidimensional Hilbert space, the inner product of a data sample in the multidimensional space is calculated, and the expression can be expressed as:

K(x ₁ ,x ₂ )＝<Φ(x ₁ ),Φ(x ₂ )> _H (5)

where Φ (·) represents the multidimensional nonlinear mapping process.

Step 5, maximum correlation entropy criterion

5-1, given two depth variables X and Y, the correlation entropy of which is a correlation measure defined over the kernel space, can be expressed as:

V(X,Y)＝E[<Φ(X),Φ(Y)> _H ]＝E[κ(X,Y)] (6)

wherein E [. Cndot.]It indicates that κ (·, ·) is the Mercer kernel function, i.e., K (X, Y) =<Φ(X),Φ(Y)> _H Phi (·) is a nonlinear kernel function and H represents the hilbert space;

the nonlinear kernel function of 5-2 uses a widely used gaussian kernel function:

5-3, defining a loss function C (X, Y) over the kernel space as:

from the above equation, minimizing C (X, Y) is equivalent to maximizing the relative entropy κ _σ (X-Y) and is therefore referred to as the principle of maximum correlation entropy (MCC).

Step 6, redefining the loss function

6-1, using the maximum correlation entropy criterion instead of the minimum mean square error principle, the loss function of ELM-AE can be defined as:

wherein beta is _A For output weight, C is regularization coefficient, X represents input vector, and H represents hidden layer output matrix;

6-2, taking the correlation entropy negative, so finding the maximum correlation entropy is equivalent to minimizing the negative correlation entropy, using a gaussian kernel, equation (7) can be redefined as:

wherein N represents the number of training samples, h _i ＝[h ₁ ,h ₂ ,…h _d ] ^T The input vector X is converted into a representation form of an implicit layer through coding, and G (·) represents a Gaussian kernel function;

and (3) deriving the formula (8) to enable the derivative to be 0, solving the output weight, and finishing to obtain:

recording deviceε _i ＝X-h _i Beta, finishing can obtain:

order theThen there are:

where σ is the gaussian kernel bandwidth,is a Gaussian kernel function and the square of the bandwidthShorthand form of ratio epsilon _i Is the training error of input X.

6-3, beta is contained in the above formula _A Combining the items to obtain:

where Λ is a diagonal matrix andn is the number of training samples, I is the identity matrix, C is the regularization coefficient, X is the input vector, H is the hidden layer output matrix, where H _i Is the hidden layer output of the ith sample; since Λ contains β -related terms, equation (6) is not a closed-loop solution, but a fixed-point iterative equation, which can solve for the output weights.

Step 7, depth feature fusion

And 7-1, uniformly arranging 8 pressure sensors in the circumferential direction of a certain axial flow compressor casing, collecting vibration signals of the certain axial flow compressor gradually evolving from normal working conditions to rotating stall, and obtaining 8 vibration signal sequences, as shown in fig. 4.

7-2, after filtering and preprocessing the vibration signals, extracting a time domain statistical index and nonlinear characteristic entropy (sample entropy, approximate entropy, fuzzy entropy and permutation entropy) of each window signal by utilizing a sliding window method, and obtaining a 96-dimensional characteristic data set by 12 characteristic values. The time domain statistics are shown in the following table:

7-3, three ELM-AE stacks are arranged to form a deep ELM-AE neural network based on the maximum correlation entropy criteria, as shown in fig. 5. The output weight of the upper ELM-AE is used as the input of the lower layer, and the number of hidden layer nerve nodes of each ELM-AE is respectively set to 60, 30 and 10.

7-4, setting the tolerance error of the loss function to 10 ^-3 The 96-dimensional feature data set is used as feature input, and when the tolerance error range set in advance is reached, the node output of the hidden layer is a 10-dimensional feature matrix after feature fusion.

7-5, in order to verify the effectiveness of the method, the t-SNE is adopted to perform visual analysis on the output feature matrix, and effect evaluation is performed from the perspective of clustering performance. All the features before and after feature fusion are represented on a three-dimensional feature map, as shown in fig. 6 and 7.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the invention adopts maximum correlation entropy to replace minimum mean square error as a loss function of ELM-AE, and a plurality of ELM-AE stacks form a deep neural network for feature fusion of multidimensional data. The algorithm is a more efficient feature fusion algorithm than the traditional self-coding (AE) algorithm, can effectively reduce the dimension of input data, and can flexibly set the number of hidden layer nodes so as to realize secondary feature fusion of a feature data set. The feature learning capability is enhanced by arranging a plurality of ELM-AE neural networks in a superposition way, so that the method is not only suitable for small data sets, but also suitable for multidimensional large data sets, and has better generalization performance.

The invention adopts the maximum correlation entropy to replace the minimum mean square error as the loss function of ELM-AE, solves the problem that the minimum mean square error is insensitive to outliers with non-Gaussian noise and farther distances in data under the complex condition, and has better robustness. The feature fusion is realized by reducing the dimension, different specific gravities are distributed according to different sample errors, more compact aggregation of similar sample points is realized, the distance between different types of samples is increased, and the later mode recognition is facilitated.

The invention tests the characteristic fusion effect of the vibration signal in the process that a certain axial flow compressor gradually evolves from a normal working condition to rotating stall. After preprocessing the collected actual measurement vibration signals, respectively extracting time domain statistical indexes and nonlinear characteristic entropy of a normal part and a rotating stall part, realizing characteristic fusion by utilizing a deep neural network based on the maximum correlation entropy, verifying the effectiveness of the method through characteristic visualization, and improving the accuracy of the classifier.

Drawings

Fig. 1: a traditional self-encoder structure schematic diagram;

fig. 2: a single hidden layer feedforward neural network schematic diagram;

fig. 3: network structure of single ELM-AE;

fig. 4: a measured vibration signal of rotating stall of an axial flow compressor;

fig. 5: deep neural network structure diagram based on maximum correlation entropy;

fig. 6: feature space distribution diagram before feature fusion;

fig. 7: feature space distribution diagram after feature fusion;

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear and obvious, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment is a deep neural network based on a large correlation entropy criterion, which is used for a feature fusion method of multidimensional signals. And replacing the minimum mean square error by using a maximum correlation entropy criterion, forming a deep neural network as a loss function of ELM-AE, performing feature fusion on the multidimensional features extracted from the original vibration signals, performing visual analysis on the fused features, and verifying the effectiveness of the method.

The invention is further described below with reference to the drawings and examples.

Fig. 1 is a schematic structural diagram of a conventional self-encoder, in which a first layer is an input layer, an intermediate layer is an hidden layer, and a last layer is an output layer. Feature level fusion can be achieved by reducing feature dimensions by the encoding and decoding processes of the self-encoder. FIG. 2 is a basic structure of a single hidden layer feedforward neural network, namely a block diagram of the ELM. The advantage of ELM is that the input weights of the network and the bias of the underlying neurons do not need to be adjusted and a unique optimal solution is produced. FIG. 3 is a network structure of a single ELM-AE, namely, the self-encoder is based on ELM as an optimization algorithm, so that the training speed of the self-encoder is increased, and the defect that a gradient descent method is sunk into a local optimal solution is avoided. Fig. 4 shows 8 sensors for collecting vibration signals of the axial flow compressor gradually evolving from normal rotation speed to rotating stall. FIG. 5 is a schematic diagram of a deep neural network architecture employing maximum correlation entropy criteria in accordance with the present invention. And taking the output weight of the ELM-AE of the upper layer as the input of the lower layer, wherein the hidden layer output of the last ELM-AE is the characteristic after fusion. Fig. 6 shows the multidimensional feature spatial distribution before feature fusion, and it can be seen that some outliers far away exist before fusion, and meanwhile, data doping exists between normal class and abnormal class. Fig. 7 shows feature spatial distribution after deep neural network fusion, from the perspective of clustering performance, it can be seen that similar data points reach more compact aggregation, the spatial distance between different types of data is increased, and the deep neural network feature fusion method based on maximum correlation entropy obviously improves data clustering performance, and is also beneficial to improving the accuracy of a classifier in the later stage.

The invention mainly comprises the following steps:

step 1, summarizing ELM algorithm:

1-1, given training set { (x) _i ,t _i )|x _i ∈R ⁿ ,t _i ∈R ^m I=1, 2,..n }, hidden layer node output function g (w, b, x) and hidden layer node number L;

1-3, calculating an implicit layer output matrix H;

Wherein H is ⁺ Is hidden layer outputThe Moore-Penrose generalized inverse matrix of the matrix H can be calculated by using methods such as an orthogonal projection method, an orthogonalization method, singular value decomposition and the like.

Step 2, minimum mean square error principle

2-1, given training sample x= { X ₁ ,x ₂ ,…x _m (where x _i ＝[x ₁ ,x ₂ ,…,x _D ] ^T ) The input vector X is transcoded from the encoder into the representation h= { h of the hidden layer ₁ ,h ₂ ,…h _m (wherein h) _i ＝[h ₁ ,h ₂ ,…h _d ] ^T ) Derived from the Sigmod activation function:

h＝S _f (ω ⁽¹⁾ x+b ⁽¹⁾ ) (1)

S _f (t)＝1/(1+e ^-t ) (2)

2-2, decoding the vector h from z=s _f (ω ⁽²⁾ h+b ⁽²⁾ ) Resulting in a reconstruction vector z= { z ₁ ,z ₂ ,z ₃ …z _m (wherein z) _i ＝[z ₁ ,z ₂ ,…,z _D ] ^T ) Wherein Z is a D-dimensional vector, ω ⁽²⁾ Is a D x D-dimensional weight matrix, b ⁽²⁾ Is a D-dimensional offset vector;

the training phase of the 2-3, self-encoder aims at optimizing the parameter set θ= { ω ⁽¹⁾ ,b ⁽¹⁾ ,ω ⁽²⁾ ,b ⁽²⁾ The reconstruction error is minimized, and the minimum Mean Square Error (MSE) is typically a function of the conventional self-encoder loss:

step 3, ELM-AE multilayer deep neural network

3-2, output weight beta _A Can pass throughWhen the number of input samples is far greater than the number of hidden nodes (L>N), then the output weight can be made of +.>Obtaining;

Step 4, kernel function theory

K(x ₁ ,x ₂ )＝<Φ(x ₁ ),Φ(x ₂ )> _H (5)

where Φ (·) represents the multidimensional nonlinear mapping process.

Step 5, maximum correlation entropy criterion

V(X,Y)＝E[<Φ(X),Φ(Y)> _H ]＝E[κ(X,Y)] (6)

the nonlinear kernel function of 5-2 uses a widely used gaussian kernel function:wherein σ is a gaussian kernel function;

5-3, defining a loss function C (X, Y) over the kernel space as:

Step 6, redefining the loss function

recording deviceε _i ＝X-h _i Beta, finishing can obtain:

order theThen there are:

6-3, beta is contained in the above formula _A Combining the items to obtain:

where Λ is a diagonal matrix andsince Λ contains β -related terms, equation (12) is not a closed-loop solution, but a fixed-point iterative equation, which can solve for the output weights.

Step 7, depth feature fusion

And 7-1, uniformly arranging 8 pressure sensors in the circumferential direction of a certain axial flow compressor casing, collecting vibration signals of the certain axial flow compressor gradually evolving from normal working conditions to rotating stall, and obtaining 8 vibration signal sequences.

7-3, setting three ELM-AE stacks to construct a deep ELM-AE neural network based on the maximum relevant entropy criterion. The output weight of the upper ELM-AE is used as the input of the lower layer, and the number of hidden layer nerve nodes of each ELM-AE is respectively set to 60, 30 and 10.

7-5, in order to verify the effectiveness of the method, the t-SNE is adopted to perform visual analysis on the output feature matrix, and effect evaluation is performed from the perspective of clustering performance.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The multi-layer deep neural network based on the maximum correlation entropy is characterized by introducing a kernel function theory, adopting a maximum correlation entropy criterion to replace a minimum mean square error as a loss function of an overrun learning machine self-encoder algorithm, and being used for feature fusion of multi-dimensional data, and comprising the following steps of:

1-1. Using the maximum correlation entropy criterion instead of the minimum mean square error principle, the loss function of ELM-AE is defined as:

1-2, the correlation entropy is negative, so finding the maximum correlation entropy is equivalent to minimizing the negative correlation entropy, and the formula (1) is redefined as follows:

and (3) deriving the formula (2) to enable the derivative to be 0, solving the output weight, and finishing to obtain:

recording deviceε _i ＝X-h _i Beta, finishing can obtain:

order theThen there are:

where σ is the gaussian kernel bandwidth,is a shorthand form of the ratio of the gaussian kernel function to the square of the bandwidth, epsilon _i Is the training error of input X;

1-3 containing beta in the formula (5) _A Combining the items to obtain:

where Λ is a diagonal matrix andn is the number of training samples, I is the identity matrix, C is the regularization coefficient, X is the input vector, H is the hidden layer output matrix, where H _i Is the hidden layer output of the ith sample; since Λ contains β -related terms, equation (6) is not a closed-loop solution, but a fixed-point iterative equation, where the output weights can be solved by the fixed-point iterative method;

the feature fusion is realized specifically as follows:

2-1, uniformly arranging 8 pressure sensors in the circumferential direction of a designated axial flow compressor casing, collecting vibration signals of the axial flow compressor gradually evolving from normal working conditions to rotating stall, and obtaining 8 vibration signal sequences;

2-2, after filtering pretreatment of the vibration signals, extracting a time domain statistical index and nonlinear characteristic entropy of each window signal by utilizing a sliding window method, wherein the total number of the characteristic values is 12, and a 96-dimensional characteristic data set is obtained; the nonlinear characteristic entropy comprises sample entropy, approximate entropy, fuzzy entropy and permutation entropy;

2-3, setting three ELM-AE stacks to construct a multi-layer ELM-AE neural network based on the maximum correlation entropy criterion; the output weight is used as the input of a first layer of ELM-AE, then the output weight of the upper layer of ELM-AE is used as the input of a next layer, and the number of hidden layer nerve nodes of each ELM-AE is respectively set to be 60, 30 and 10;

2-4, setting the tolerance error of the loss function to 10 ^-3 And taking the 96-dimensional feature data set as feature input, and outputting the hidden layer node as a 10-dimensional feature matrix after feature fusion when the 96-dimensional feature data set is within a preset tolerance error range.