CN113191397B - Multidimensional signal feature fusion method based on maximum correlation entropy criterion - Google Patents

Multidimensional signal feature fusion method based on maximum correlation entropy criterion Download PDF

Info

Publication number
CN113191397B
CN113191397B CN202110381734.9A CN202110381734A CN113191397B CN 113191397 B CN113191397 B CN 113191397B CN 202110381734 A CN202110381734 A CN 202110381734A CN 113191397 B CN113191397 B CN 113191397B
Authority
CN
China
Prior art keywords
elm
entropy
maximum correlation
layer
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110381734.9A
Other languages
Chinese (zh)
Other versions
CN113191397A (en
Inventor
董鑫源
林鹏
曹九稳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110381734.9A priority Critical patent/CN113191397B/en
Publication of CN113191397A publication Critical patent/CN113191397A/en
Application granted granted Critical
Publication of CN113191397B publication Critical patent/CN113191397B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multidimensional signal feature fusion method based on a maximum correlation entropy criterion. Firstly, introducing a kernel function theory aiming at the defect of minimum mean square error, and introducing a maximum correlation entropy criterion; then, a maximum correlation entropy criterion is put forward to replace the minimum mean square error as a loss function of ELM-AE; secondly, describing the derivation process of the formula in detail, stacking a plurality of ELM-AE to form a deep neural network based on the maximum relevant entropy criterion; thirdly, extracting multidimensional time domain statistical characteristic indexes and nonlinear characteristic entropy of the signals aiming at the mechanical vibration signals of the rotating stall of a certain axial flow compressor; finally, the depth neural network is utilized to perform feature fusion on the extracted multidimensional signal features, and the effectiveness of the method in the aspect of feature fusion is further verified through feature visual analysis.

Description

Multidimensional signal feature fusion method based on maximum correlation entropy criterion
Technical Field
The invention belongs to the field of machine learning and data mining, and particularly relates to a method for realizing feature fusion of multidimensional signals by introducing a maximum related entropy criterion as a loss function of a deep neural network algorithm.
Background
Often there is a large amount of irrelevant and redundant information in a multi-dimensional dataset, increasing the difficulty of data processing, knowledge mining, and pattern recognition. Feature fusion is used as a key method for solving the problem, original multidimensional data is subjected to feature fusion by using a deep neural network, noise and redundant information are filtered, representative sensitive features are reserved, and a common strategy is adopted. The complexity and uncertainty caused by the traditional manual feature extraction and selection are avoided, and meanwhile, the accuracy of the classifier can be improved. The maximum correlation entropy is used as a local similarity measurement criterion, and has better robustness in the aspect of processing non-Gaussian noise and distant outliers. Has been successfully applied to various fields such as signal filtering, degradation and machine learning.
The self-encoder is an unsupervised neural network with three layers, the first layer being the input layer, the middle layer being the hidden layer, and the last layer being the output layer. As shown in fig. 1. The intrinsic features of the data can be learned without the tag vector, belonging to unsupervised learning. Unlike conventional single hidden layer neural networks, the learning process of the self-encoder includes two stages, encoding and decoding. The data is encoded by mapping the input data of the multidimensional space into the low-dimensional space from the input layer to the hidden layer of the encoder, and finally reconstructing the input data from the hidden layer to the output layer for decoding. The hidden layer is used for ensuring that the output data is consistent with the input data as much as possible in the encoding and decoding processes, and extracting the main characteristics of the input data and the output data by using the minimum dimension. The target output value is brought close to the input value by minimizing the reconstruction error. Minimizing Mean Square Error (MSE) is widely used for the loss function of self-encoders, which generally perform well without complex noise interference. However, in practical applications, the measured data is affected by the working conditions and environmental changes, and is doped with abnormal data of a large amount of noise, so that the performance of the conventional self-encoder is degraded. In addition, the traditional self-encoder adopts a gradient descent method to train network parameters, so that a loss function is minimized, and under the supervision condition, a label of data is used as a supervision signal to calculate a network error. In the process of finishing the reverse fine tuning of the weights and the biases of each layer by the gradient descent method, the local optimal solution is easy to fall into, and the training time of the network is increased.
The overrunning learning machine (Extreme LearningMachine, ELM) algorithm is a simple, easy-to-use, effective single hidden layer feedforward neural network learning algorithm, as shown in fig. 2. The traditional neural network learning algorithm (such as BP algorithm) needs to manually set a large number of network training parameters, and a local optimal solution is easy to generate in the solving process by adopting a gradient descent method. In the process of determining network parameters, ELM only needs to set the number of hidden layer nodes of the network, and in the process of executing the algorithm, the input weight of the network and the bias of hidden elements do not need to be adjusted, and a unique optimal solution is generated. The output of ELM isWherein: beta i Is the weight between hidden node and output node, G (a i ,b i X) is an implicit layer output function, h (x) = [ G (a) 1 ,b 1 ,x),...,G(a L ,b L ,x)] T Is the output vector of the hidden layer relative to the input X. The key to ELM is to minimize training error +.>And outputting the weight norm beta. Thus, by setting the output of the ELM equal to the input, an over-limit learning-based self-encoder (ELM-AE) can be obtained, as shown in fig. 3. Compared with the traditional self-encoder based on back propagation, the self-encoder based on the overrun learning machine adopts a least square method and only performs one-step back calculation to obtain updated weights, so that the self-encoder based on the overrun learning machine has faster training speed and less human intervention.
Disclosure of Invention
The invention aims to provide a depth neural network algorithm based on a maximum correlation entropy criterion, which is a more efficient feature fusion method, aiming at the defect that the existing minimum mean square error exists as a loss function of a traditional self-encoder.
The invention mainly realizes the following flow: first, ELM algorithm and minimum mean square error principle are summarized; secondly, introducing a kernel function theory, and providing a maximum correlation entropy criterion to replace a minimum mean square error as a loss function of ELM-AE, and forming a plurality of ELM-AE stacks into a multi-layer depth neural network; thirdly, aiming at a vibration signal of a certain axial flow compressor with rotating stall, intercepting the signal in equal step length by window, and extracting a multidimensional time domain statistical index and nonlinear characteristic entropy of each window signal; finally, the extracted multidimensional signal features are used as input data of the proposed method, and the effectiveness of the proposed method in the aspect of feature fusion is further verified through feature visual analysis of feature output.
The specific technical scheme of the invention is realized by the following steps:
step 1, summarizing ELM algorithm:
1-1、given training set { (x) i ,t i )x i ∈R n ,t i ∈R m I=1, 2,..n }, hidden layer node output function g (w, b, x) and hidden layer node number L;
1-2, randomly assigning parameters (w) of hidden layer nodes i ,b i ),i=1,2...,L;
1-3, calculating an implicit layer output matrix H;
1-4, calculating a weight β=h between the hidden layer node and the output node + T;
Wherein H is + The Moore-Penrose generalized inverse matrix of the hidden layer output matrix H can be calculated by using methods such as an orthogonal projection method, an orthogonalization method, singular value decomposition and the like.
Step 2, minimum mean square error principle
Given training sample x= { X 1 ,x 2 ,…x m (where x i =[x 1 ,x 2 ,…,x D ] T ) The input vector X is coded by the self-encoder into the representation h= { h of the hidden layer 1 ,h 2 ,…h m (wherein h) i =[h 1 ,h 2 ,…h d ] T ) Derived from the Sigmod activation function:
h=S f(1) x+b (1) ) (1)
S f (t)=1/(1+e -t ) (2)
wherein X is a D-dimensional vector, h is a D-dimensional vector, ω (1) Is a D x D-dimensional weight matrix, b (1) Is a d-dimensional offset vector;
2-1, decoding the vector h from z=s f(2) h+b (2) ) Resulting in a reconstruction vector z= { z 1 ,z 2 ,z 3 …z m (wherein z) i =[z 1 ,z 2 ,…,z D ] T ) Wherein Z is a D-dimensional vector, ω (2) Is a D x D-dimensional weight matrix, b (2) Is a D-dimensional offset vector;
the training phase of the 2-2, self-encoder aims at optimizing the parameter set θ= { ω (1) ,b (1)(2) ,b (2) Minimizing reconstruction errors, the minimum Mean Square Error (MSE) is typically the function of conventional self-encoder loss:
step 3, ELM-AE multilayer deep neural network
3-1, aiming at the problem of insufficient characterization capability of a single-layer ELM-AE in a noise-doped complex abnormal data set, adopting a multi-layer deep neural network algorithm of a stack ELM-AE, and aiming at the ELM-AE of each layer, the objective function is as follows:
in the formula, C is a regularization coefficient, and the depth is generated by the input weight omega A And bias b A An orthogonal constraint is applied and an output weight beta is introduced A The F norm regularization term of (2) so as to improve the generalization performance of ELM-AE;
3-2, output weight beta A Can pass throughIt is obtained that when the number of input samples is far greater than the number of hidden nodes (L>N), then the output weight can be derived from the following equation;
the encoded output of 3-3, ELM-AE is y=g (β A X) for the ELM-AE structure of the stack, each ELM-AE encoded feature will serve as input to the next ELM-AE. K ELM-AE are used in total in the feature extraction stage to let Y (k) As a coding feature of the Kth ELM-AE (Y (0) X), then the code output of the k+1th elm=ae is Y (k+1) =g(β k+1 ·X (k) ) Where k=0, 1, … K-1.
Step 4, kernel function theory
The function satisfying the Mercer theory can be regarded as a kernel function to calculate the dot product of the feature space, and the function must be continuous and positive. Through a kernel function, the original data point can be mapped to a multidimensional Hilbert space, the inner product of a data sample in the multidimensional space is calculated, and the expression can be expressed as:
K(x 1 ,x 2 )=<Φ(x 1 ),Φ(x 2 )> H (5)
where Φ (·) represents the multidimensional nonlinear mapping process.
Step 5, maximum correlation entropy criterion
5-1, given two depth variables X and Y, the correlation entropy of which is a correlation measure defined over the kernel space, can be expressed as:
V(X,Y)=E[<Φ(X),Φ(Y)> H ]=E[κ(X,Y)] (6)
wherein E [. Cndot.]It indicates that κ (·, ·) is the Mercer kernel function, i.e., K (X, Y) =<Φ(X),Φ(Y)> H Phi (·) is a nonlinear kernel function and H represents the hilbert space;
the nonlinear kernel function of 5-2 uses a widely used gaussian kernel function:
5-3, defining a loss function C (X, Y) over the kernel space as:
from the above equation, minimizing C (X, Y) is equivalent to maximizing the relative entropy κ σ (X-Y) and is therefore referred to as the principle of maximum correlation entropy (MCC).
Step 6, redefining the loss function
6-1, using the maximum correlation entropy criterion instead of the minimum mean square error principle, the loss function of ELM-AE can be defined as:
wherein beta is A For output weight, C is regularization coefficient, X represents input vector, and H represents hidden layer output matrix;
6-2, taking the correlation entropy negative, so finding the maximum correlation entropy is equivalent to minimizing the negative correlation entropy, using a gaussian kernel, equation (7) can be redefined as:
wherein N represents the number of training samples, h i =[h 1 ,h 2 ,…h d ] T The input vector X is converted into a representation form of an implicit layer through coding, and G (·) represents a Gaussian kernel function;
and (3) deriving the formula (8) to enable the derivative to be 0, solving the output weight, and finishing to obtain:
recording deviceε i =X-h i Beta, finishing can obtain:
order theThen there are:
where σ is the gaussian kernel bandwidth,is a Gaussian kernel function and the square of the bandwidthShorthand form of ratio epsilon i Is the training error of input X.
6-3, beta is contained in the above formula A Combining the items to obtain:
where Λ is a diagonal matrix andn is the number of training samples, I is the identity matrix, C is the regularization coefficient, X is the input vector, H is the hidden layer output matrix, where H i Is the hidden layer output of the ith sample; since Λ contains β -related terms, equation (6) is not a closed-loop solution, but a fixed-point iterative equation, which can solve for the output weights.
Step 7, depth feature fusion
And 7-1, uniformly arranging 8 pressure sensors in the circumferential direction of a certain axial flow compressor casing, collecting vibration signals of the certain axial flow compressor gradually evolving from normal working conditions to rotating stall, and obtaining 8 vibration signal sequences, as shown in fig. 4.
7-2, after filtering and preprocessing the vibration signals, extracting a time domain statistical index and nonlinear characteristic entropy (sample entropy, approximate entropy, fuzzy entropy and permutation entropy) of each window signal by utilizing a sliding window method, and obtaining a 96-dimensional characteristic data set by 12 characteristic values. The time domain statistics are shown in the following table:
7-3, three ELM-AE stacks are arranged to form a deep ELM-AE neural network based on the maximum correlation entropy criteria, as shown in fig. 5. The output weight of the upper ELM-AE is used as the input of the lower layer, and the number of hidden layer nerve nodes of each ELM-AE is respectively set to 60, 30 and 10.
7-4, setting the tolerance error of the loss function to 10 -3 The 96-dimensional feature data set is used as feature input, and when the tolerance error range set in advance is reached, the node output of the hidden layer is a 10-dimensional feature matrix after feature fusion.
7-5, in order to verify the effectiveness of the method, the t-SNE is adopted to perform visual analysis on the output feature matrix, and effect evaluation is performed from the perspective of clustering performance. All the features before and after feature fusion are represented on a three-dimensional feature map, as shown in fig. 6 and 7.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention adopts maximum correlation entropy to replace minimum mean square error as a loss function of ELM-AE, and a plurality of ELM-AE stacks form a deep neural network for feature fusion of multidimensional data. The algorithm is a more efficient feature fusion algorithm than the traditional self-coding (AE) algorithm, can effectively reduce the dimension of input data, and can flexibly set the number of hidden layer nodes so as to realize secondary feature fusion of a feature data set. The feature learning capability is enhanced by arranging a plurality of ELM-AE neural networks in a superposition way, so that the method is not only suitable for small data sets, but also suitable for multidimensional large data sets, and has better generalization performance.
The invention adopts the maximum correlation entropy to replace the minimum mean square error as the loss function of ELM-AE, solves the problem that the minimum mean square error is insensitive to outliers with non-Gaussian noise and farther distances in data under the complex condition, and has better robustness. The feature fusion is realized by reducing the dimension, different specific gravities are distributed according to different sample errors, more compact aggregation of similar sample points is realized, the distance between different types of samples is increased, and the later mode recognition is facilitated.
The invention tests the characteristic fusion effect of the vibration signal in the process that a certain axial flow compressor gradually evolves from a normal working condition to rotating stall. After preprocessing the collected actual measurement vibration signals, respectively extracting time domain statistical indexes and nonlinear characteristic entropy of a normal part and a rotating stall part, realizing characteristic fusion by utilizing a deep neural network based on the maximum correlation entropy, verifying the effectiveness of the method through characteristic visualization, and improving the accuracy of the classifier.
Drawings
Fig. 1: a traditional self-encoder structure schematic diagram;
fig. 2: a single hidden layer feedforward neural network schematic diagram;
fig. 3: network structure of single ELM-AE;
fig. 4: a measured vibration signal of rotating stall of an axial flow compressor;
fig. 5: deep neural network structure diagram based on maximum correlation entropy;
fig. 6: feature space distribution diagram before feature fusion;
fig. 7: feature space distribution diagram after feature fusion;
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear and obvious, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment is a deep neural network based on a large correlation entropy criterion, which is used for a feature fusion method of multidimensional signals. And replacing the minimum mean square error by using a maximum correlation entropy criterion, forming a deep neural network as a loss function of ELM-AE, performing feature fusion on the multidimensional features extracted from the original vibration signals, performing visual analysis on the fused features, and verifying the effectiveness of the method.
The invention is further described below with reference to the drawings and examples.
Fig. 1 is a schematic structural diagram of a conventional self-encoder, in which a first layer is an input layer, an intermediate layer is an hidden layer, and a last layer is an output layer. Feature level fusion can be achieved by reducing feature dimensions by the encoding and decoding processes of the self-encoder. FIG. 2 is a basic structure of a single hidden layer feedforward neural network, namely a block diagram of the ELM. The advantage of ELM is that the input weights of the network and the bias of the underlying neurons do not need to be adjusted and a unique optimal solution is produced. FIG. 3 is a network structure of a single ELM-AE, namely, the self-encoder is based on ELM as an optimization algorithm, so that the training speed of the self-encoder is increased, and the defect that a gradient descent method is sunk into a local optimal solution is avoided. Fig. 4 shows 8 sensors for collecting vibration signals of the axial flow compressor gradually evolving from normal rotation speed to rotating stall. FIG. 5 is a schematic diagram of a deep neural network architecture employing maximum correlation entropy criteria in accordance with the present invention. And taking the output weight of the ELM-AE of the upper layer as the input of the lower layer, wherein the hidden layer output of the last ELM-AE is the characteristic after fusion. Fig. 6 shows the multidimensional feature spatial distribution before feature fusion, and it can be seen that some outliers far away exist before fusion, and meanwhile, data doping exists between normal class and abnormal class. Fig. 7 shows feature spatial distribution after deep neural network fusion, from the perspective of clustering performance, it can be seen that similar data points reach more compact aggregation, the spatial distance between different types of data is increased, and the deep neural network feature fusion method based on maximum correlation entropy obviously improves data clustering performance, and is also beneficial to improving the accuracy of a classifier in the later stage.
The invention mainly comprises the following steps:
step 1, summarizing ELM algorithm:
1-1, given training set { (x) i ,t i )|x i ∈R n ,t i ∈R m I=1, 2,..n }, hidden layer node output function g (w, b, x) and hidden layer node number L;
1-2, randomly assigning parameters (w) of hidden layer nodes i ,b i ),i=1,2...,L;
1-3, calculating an implicit layer output matrix H;
1-4, calculating a weight β=h between the hidden layer node and the output node + T;
Wherein H is + Is hidden layer outputThe Moore-Penrose generalized inverse matrix of the matrix H can be calculated by using methods such as an orthogonal projection method, an orthogonalization method, singular value decomposition and the like.
Step 2, minimum mean square error principle
2-1, given training sample x= { X 1 ,x 2 ,…x m (where x i =[x 1 ,x 2 ,…,x D ] T ) The input vector X is transcoded from the encoder into the representation h= { h of the hidden layer 1 ,h 2 ,…h m (wherein h) i =[h 1 ,h 2 ,…h d ] T ) Derived from the Sigmod activation function:
h=S f(1) x+b (1) ) (1)
S f (t)=1/(1+e -t ) (2)
wherein X is a D-dimensional vector, h is a D-dimensional vector, ω (1) Is a D x D-dimensional weight matrix, b (1) Is a d-dimensional offset vector;
2-2, decoding the vector h from z=s f(2) h+b (2) ) Resulting in a reconstruction vector z= { z 1 ,z 2 ,z 3 …z m (wherein z) i =[z 1 ,z 2 ,…,z D ] T ) Wherein Z is a D-dimensional vector, ω (2) Is a D x D-dimensional weight matrix, b (2) Is a D-dimensional offset vector;
the training phase of the 2-3, self-encoder aims at optimizing the parameter set θ= { ω (1) ,b (1)(2) ,b (2) The reconstruction error is minimized, and the minimum Mean Square Error (MSE) is typically a function of the conventional self-encoder loss:
step 3, ELM-AE multilayer deep neural network
3-1, aiming at the problem of insufficient characterization capability of a single-layer ELM-AE in a noise-doped complex abnormal data set, adopting a multi-layer deep neural network algorithm of a stack ELM-AE, and aiming at the ELM-AE of each layer, the objective function is as follows:
in the formula, C is a regularization coefficient, and the depth is generated by the input weight omega A And bias b A An orthogonal constraint is applied and an output weight beta is introduced A The F norm regularization term of (2) so as to improve the generalization performance of ELM-AE;
3-2, output weight beta A Can pass throughWhen the number of input samples is far greater than the number of hidden nodes (L>N), then the output weight can be made of +.>Obtaining;
the encoded output of 3-3, ELM-AE is y=g (β A X) for the ELM-AE structure of the stack, each ELM-AE encoded feature will serve as input to the next ELM-AE. K ELM-AE are used in total in the feature extraction stage to let Y (k) As a coding feature of the Kth ELM-AE (Y (0) X), then the code output of the k+1th elm=ae is Y (k+1) =g(β k+1 ·X (k) ) Where k=0, 1, … K-1.
Step 4, kernel function theory
The function satisfying the Mercer theory can be regarded as a kernel function to calculate the dot product of the feature space, and the function must be continuous and positive. Through a kernel function, the original data point can be mapped to a multidimensional Hilbert space, the inner product of a data sample in the multidimensional space is calculated, and the expression can be expressed as:
K(x 1 ,x 2 )=<Φ(x 1 ),Φ(x 2 )> H (5)
where Φ (·) represents the multidimensional nonlinear mapping process.
Step 5, maximum correlation entropy criterion
5-1, given two depth variables X and Y, the correlation entropy of which is a correlation measure defined over the kernel space, can be expressed as:
V(X,Y)=E[<Φ(X),Φ(Y)> H ]=E[κ(X,Y)] (6)
wherein E [. Cndot.]It indicates that κ (·, ·) is the Mercer kernel function, i.e., K (X, Y) =<Φ(X),Φ(Y)> H Phi (·) is a nonlinear kernel function and H represents the hilbert space;
the nonlinear kernel function of 5-2 uses a widely used gaussian kernel function:wherein σ is a gaussian kernel function;
5-3, defining a loss function C (X, Y) over the kernel space as:
from the above equation, minimizing C (X, Y) is equivalent to maximizing the relative entropy κ σ (X-Y) and is therefore referred to as the principle of maximum correlation entropy (MCC).
Step 6, redefining the loss function
6-1, using the maximum correlation entropy criterion instead of the minimum mean square error principle, the loss function of ELM-AE can be defined as:
6-2, taking the correlation entropy negative, so finding the maximum correlation entropy is equivalent to minimizing the negative correlation entropy, using a gaussian kernel, equation (7) can be redefined as:
and (3) deriving the formula (8) to enable the derivative to be 0, solving the output weight, and finishing to obtain:
recording deviceε i =X-h i Beta, finishing can obtain:
order theThen there are:
6-3, beta is contained in the above formula A Combining the items to obtain:
where Λ is a diagonal matrix andsince Λ contains β -related terms, equation (12) is not a closed-loop solution, but a fixed-point iterative equation, which can solve for the output weights.
Step 7, depth feature fusion
And 7-1, uniformly arranging 8 pressure sensors in the circumferential direction of a certain axial flow compressor casing, collecting vibration signals of the certain axial flow compressor gradually evolving from normal working conditions to rotating stall, and obtaining 8 vibration signal sequences.
7-2, after filtering and preprocessing the vibration signals, extracting a time domain statistical index and nonlinear characteristic entropy (sample entropy, approximate entropy, fuzzy entropy and permutation entropy) of each window signal by utilizing a sliding window method, and obtaining a 96-dimensional characteristic data set by 12 characteristic values. The time domain statistics are shown in the following table:
7-3, setting three ELM-AE stacks to construct a deep ELM-AE neural network based on the maximum relevant entropy criterion. The output weight of the upper ELM-AE is used as the input of the lower layer, and the number of hidden layer nerve nodes of each ELM-AE is respectively set to 60, 30 and 10.
7-4, setting the tolerance error of the loss function to 10 -3 The 96-dimensional feature data set is used as feature input, and when the tolerance error range set in advance is reached, the node output of the hidden layer is a 10-dimensional feature matrix after feature fusion.
7-5, in order to verify the effectiveness of the method, the t-SNE is adopted to perform visual analysis on the output feature matrix, and effect evaluation is performed from the perspective of clustering performance.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (1)

1. The multi-layer deep neural network based on the maximum correlation entropy is characterized by introducing a kernel function theory, adopting a maximum correlation entropy criterion to replace a minimum mean square error as a loss function of an overrun learning machine self-encoder algorithm, and being used for feature fusion of multi-dimensional data, and comprising the following steps of:
1-1. Using the maximum correlation entropy criterion instead of the minimum mean square error principle, the loss function of ELM-AE is defined as:
wherein beta is A For output weight, C is regularization coefficient, X represents input vector, and H represents hidden layer output matrix;
1-2, the correlation entropy is negative, so finding the maximum correlation entropy is equivalent to minimizing the negative correlation entropy, and the formula (1) is redefined as follows:
wherein N represents the number of training samples, h i =[h 1 ,h 2 ,…h d ] T The input vector X is converted into a representation form of an implicit layer through coding, and G (·) represents a Gaussian kernel function;
and (3) deriving the formula (2) to enable the derivative to be 0, solving the output weight, and finishing to obtain:
recording deviceε i =X-h i Beta, finishing can obtain:
order theThen there are:
where σ is the gaussian kernel bandwidth,is a shorthand form of the ratio of the gaussian kernel function to the square of the bandwidth, epsilon i Is the training error of input X;
1-3 containing beta in the formula (5) A Combining the items to obtain:
where Λ is a diagonal matrix andn is the number of training samples, I is the identity matrix, C is the regularization coefficient, X is the input vector, H is the hidden layer output matrix, where H i Is the hidden layer output of the ith sample; since Λ contains β -related terms, equation (6) is not a closed-loop solution, but a fixed-point iterative equation, where the output weights can be solved by the fixed-point iterative method;
the feature fusion is realized specifically as follows:
2-1, uniformly arranging 8 pressure sensors in the circumferential direction of a designated axial flow compressor casing, collecting vibration signals of the axial flow compressor gradually evolving from normal working conditions to rotating stall, and obtaining 8 vibration signal sequences;
2-2, after filtering pretreatment of the vibration signals, extracting a time domain statistical index and nonlinear characteristic entropy of each window signal by utilizing a sliding window method, wherein the total number of the characteristic values is 12, and a 96-dimensional characteristic data set is obtained; the nonlinear characteristic entropy comprises sample entropy, approximate entropy, fuzzy entropy and permutation entropy;
2-3, setting three ELM-AE stacks to construct a multi-layer ELM-AE neural network based on the maximum correlation entropy criterion; the output weight is used as the input of a first layer of ELM-AE, then the output weight of the upper layer of ELM-AE is used as the input of a next layer, and the number of hidden layer nerve nodes of each ELM-AE is respectively set to be 60, 30 and 10;
2-4, setting the tolerance error of the loss function to 10 -3 And taking the 96-dimensional feature data set as feature input, and outputting the hidden layer node as a 10-dimensional feature matrix after feature fusion when the 96-dimensional feature data set is within a preset tolerance error range.
CN202110381734.9A 2021-04-09 2021-04-09 Multidimensional signal feature fusion method based on maximum correlation entropy criterion Active CN113191397B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110381734.9A CN113191397B (en) 2021-04-09 2021-04-09 Multidimensional signal feature fusion method based on maximum correlation entropy criterion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110381734.9A CN113191397B (en) 2021-04-09 2021-04-09 Multidimensional signal feature fusion method based on maximum correlation entropy criterion

Publications (2)

Publication Number Publication Date
CN113191397A CN113191397A (en) 2021-07-30
CN113191397B true CN113191397B (en) 2024-02-13

Family

ID=76975218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110381734.9A Active CN113191397B (en) 2021-04-09 2021-04-09 Multidimensional signal feature fusion method based on maximum correlation entropy criterion

Country Status (1)

Country Link
CN (1) CN113191397B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643722B (en) * 2021-08-27 2024-04-19 杭州电子科技大学 Urban noise identification method based on multilayer matrix random neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447272A (en) * 2018-09-29 2019-03-08 西安交通大学 A kind of extreme learning machine method based on center of maximum cross-correlation entropy criterion
CN111783837A (en) * 2020-06-05 2020-10-16 西安电子科技大学 Feature fusion method based on multi-core learning
CN112086100A (en) * 2020-08-17 2020-12-15 杭州电子科技大学 Quantization error entropy based urban noise identification method of multilayer random neural network
CN112435054A (en) * 2020-11-19 2021-03-02 西安理工大学 Nuclear extreme learning machine electricity sales amount prediction method based on generalized maximum correlation entropy criterion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447272A (en) * 2018-09-29 2019-03-08 西安交通大学 A kind of extreme learning machine method based on center of maximum cross-correlation entropy criterion
CN111783837A (en) * 2020-06-05 2020-10-16 西安电子科技大学 Feature fusion method based on multi-core learning
CN112086100A (en) * 2020-08-17 2020-12-15 杭州电子科技大学 Quantization error entropy based urban noise identification method of multilayer random neural network
CN112435054A (en) * 2020-11-19 2021-03-02 西安理工大学 Nuclear extreme learning machine electricity sales amount prediction method based on generalized maximum correlation entropy criterion

Also Published As

Publication number Publication date
CN113191397A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN110443281B (en) Text classification self-adaptive oversampling method based on HDBSCAN (high-density binary-coded decimal) clustering
CN107563426B (en) Method for learning locomotive running time sequence characteristics
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
CN112541572B (en) Residual oil distribution prediction method based on convolutional encoder-decoder network
Kheirandishfard et al. Deep low-rank subspace clustering
CN108763295A (en) A kind of video approximate copy searching algorithm based on deep learning
CN113191397B (en) Multidimensional signal feature fusion method based on maximum correlation entropy criterion
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN112199722B (en) K-means-based differential privacy protection clustering method
CN113988177A (en) Water quality sensor abnormal data detection and fault diagnosis method
CN110689092B (en) Sole pattern image depth clustering method based on data guidance
CN114003900A (en) Network intrusion detection method, device and system for secondary system of transformer substation
Zhang et al. Clustering noisy trajectories via robust deep attention auto-encoders
CN111737688B (en) Attack defense system based on user portrait
CN112084353A (en) Bag-of-words model method for rapid landmark-convolution feature matching
CN117171713A (en) Cross self-adaptive deep migration learning method and system based on bearing service life
CN106487389A (en) A kind of order orthogonal matching pursuit algorithm based on compressed sensing
CN116340936A (en) ICS intrusion detection system and method integrating reinforcement learning and feature selection optimization
Cheng et al. Research on feasibility of convolution neural networks for rock thin sections image retrieval
Zhai et al. Deep product quantization for large-scale image retrieval
CN113807422B (en) Weighted graph convolutional neural network scoring prediction model integrating multi-feature information
Li et al. An improved fuzzy k-means clustering with k-center initialization
CN117390297B (en) Large-scale talent intelligence library information optimization matching method
CN113902930B (en) Image classification method for optimizing bag-of-words model
CN116304110B (en) Working method for constructing knowledge graph by using English vocabulary data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant