CN117516937A

CN117516937A - Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement

Info

Publication number: CN117516937A
Application number: CN202311395313.7A
Authority: CN
Inventors: 牛迪; 于树松; 聂婕; 石硕; 王成龙; 刘晓菲
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-02-06

Abstract

The invention belongs to the technical field of fault diagnosis, and discloses a rolling bearing unknown fault detection method based on multi-modal feature fusion enhancement, which comprises two stages, namely a pre-training stage and a new class discovery stage, wherein the pre-training stage is used for supervised training of a depth encoder model on a marker set, and the depth encoder model comprises a signal depth encoder, an image depth encoder and a significance correlation and complementation fusion module and is used for multi-modal learning and multi-modal characterization complementation fusion; the new class discovery stage is used for identifying and discovering new classes, and the new class discovery stage uses marked and unmarked mixed data of the mixed data set as input data to perform unsupervised clustering joint training; and after the combined training is finished, a fault detection network model for identifying a new class is obtained, the unmarked set data is input into the fault detection network model, and a pseudo tag class is obtained, wherein the class is the new class to which the unmarked data belongs. The invention improves the accuracy of the discovery of the new type of unknown faults.

Description

Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement

Technical Field

The invention belongs to the technical field of fault diagnosis, and particularly relates to a rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement.

Background

The fault diagnosis technology has great significance for improving the reliability and stability of the equipment and reducing the loss and maintenance cost of the equipment, and is an indispensable important technology in various fields. The unknown fault detection technology is to analyze and process the fault data of the known class under the condition of unknown fault mode, and then detect the fault possibly existing in the process of applying the new class discovery to the learned characteristic representation.

Traditional fault diagnosis methods need to rely on professional skills and experience, often require a great deal of manpower, material resources and time costs, and have limited accuracy. As the highly automated, intelligent equipment on which modern production and services depend becomes more and more complex, fault diagnosis becomes more and more difficult, and has a great challenge to fault diagnosis technology. There are many deep learning-based methods that gradually replace the conventional fault diagnosis techniques, such as a stacked denoising auto-encoder (SDA) -based fault diagnosis method, a competitive sparse auto-encoder (WTA-AE) -based fault diagnosis method, a Restricted Boltzmann Machine (RBM) -based fault diagnosis method, an improved CNN multi-scale cascade convolutional neural network (MC-CNN) -based fault diagnosis method, a migration learning-based fault diagnosis method, and the like.

However, the unknown fault type that occurs for the fault diagnosis actual condition is not solved, and the following problems still remain: first, the diversity of data is not fully utilized, and the traditional fault diagnosis method often only depends on a single-mode data source, and under the complex fault condition, the accuracy and reliability of diagnosis may be low. And the multi-mode data representation fusion mode is single, and the problem of data redundancy exists when the information of the multi-mode data is fused. Secondly, the feature extraction is insufficient, and the multi-scale structure is shallow. The traditional multi-scale feature extraction structure is simple, multi-scale feature extraction is carried out on shallow features in most cases, and the distinguishability of deep semantic features on complex problems is ignored. Thirdly, in the generation process of the pseudo tag of the unsupervised clustering, only a single mode is used for generating the pseudo tag, only the characteristics and information of the mode can be reflected, and the problems that the probability distribution of each category is average and the overlapping area is large in the classification process, so that the classification result is inaccurate are solved.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides a rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement, and the problem of inaccurate new classification results is solved by applying knowledge learned in supervised data to an unsupervised clustering process.

In order to solve the technical problems, the invention adopts the following technical scheme:

the rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement comprises two stages of pre-training and new class discovery:

s1, a pre-training stage for supervised training of a depth encoder model on the marker set, the depth encoder model comprising a signal depth encoder f _s Image depth encoder f _v The significance correlation and complementation fusion module is used for carrying out multimode learning and multimode characterization complementation fusion;

the tag set data is tag vibration signal data, and the multi-mode learning is as follows: first by fast fourier transformThe original vibration signal +.>Conversion into a time-frequency domain image->Realizes the modal data expansion to obtain input data +.>Then through a signal depth encoder f _s And image depth encoder f _v Encoding signal data and image data as signal feature vector, respectively>And image feature vector +.>The multimodal characterization complementation fusion is: the data fusion is carried out through the saliency correlation and the complementary fusion module, the saliency correlation alignment of the multi-modal data is realized, the modal data balance complementary fusion is carried out on the aligned multi-modal feature vectors, the contribution degree of each modal feature in the fusion process is controlled through the learning weight parameter, the saliency feature and the data diversity in each modal data are reserved, and the aligned and fused multi-modal feature vectors are obtained>Finally pass through a catalyst having C ^l The classification layer of the outputs, obtaining the outputs +.>By true tags Y with a set of tags ^l Calculating cross entropy loss, and realizing supervised training of the model;

s2, a new class discovery stage, which is used for identifying and discovering new classes, and performing unsupervised clustering joint training by using marked and unmarked mixed data of the mixed data set D as input data;

at this stage, a signal depth encoder f trained using a pre-training stage _s And image depth encoder f _v As signal data and image data in a mixed dataset, respectivelyA feature extractor;

s21, marking set D in mixed data set D ^l As f _s And f _v Is subjected to a saliency correlation and complementary fusion module to obtain a multi-modal feature vectorThen pass through a catalyst having C ^l Old class classifier of the output, get the output +.>

S22, no-mark set D ^u Input to f _s And f _v Extracting features to obtain signal feature vectorAnd image feature vector +.>Subsequently, use is made of a composition having C ^u Classifying the new class classifier of the output to obtain the output +.>

S23, generating a module pair through the pseudo tagProcessing to obtain a pseudo tag without a tag set; the pseudo tag generation module comprises a discriminant enhancement module and a pseudo tag distribution module, wherein the discriminant enhancement module is used for obtaining two modesEnhancement of discriminant relationship is performed and in +.>Adding an entropy term as penalty, and taking the final output result as a label-free set D ^u Is a pseudo tag of (2);

s24, splicing two output featuresAnd->As the prediction output of the combined training, the label of the label set and the pseudo label without the label set obtained in the step S23 are spliced together to be used as the label of the combined training, and the label is combined with the prediction output of the combined training through calculating the cross entropy loss;

and S3, after the combined training is finished, a fault detection network model for identifying a new class is obtained, unmarked set data are input into the fault detection network model, and a pseudo tag class distributed by the fault detection network model is obtained, wherein the class is the new class to which the unmarked data belong.

Further, in the pre-training stage, the data fusion between the significance correlation and the complementary fusion module is divided into two steps: data saliency correlation alignment and modal data balance complementary fusion, and for obtained signal characteristic vectorAnd image feature vector->In the data saliency correlation alignment section, +.>And->Is transposed and cross multiplied to obtain an original correlation matrix S of the vibration signal and the time-frequency domain image _origin ：

Then adopts sparsificationObtaining a correlation matrix S _mask In the thinning process, setting element value s greater than or equal to 0.5 in the original correlation matrix _ij Set to 1, otherwise set to 0, so as to achieve the purpose of highlighting the correlation vector and thinning the uncorrelated vector,

alignment of multi-mode data is achieved through the correlation matrix, and aligned signal feature vectors are obtainedAnd image feature vector +.>

After realizing the correlation alignment of the significance of the multi-mode data, the aligned signal characteristic vectorAnd image feature vectorPerforming balanced complementary fusion of modal data, controlling the contribution degree of each modal feature in the fusion process by learning weight parameters, specifically, calculating a weight value for the feature vector of each modal, wherein the weight value represents the importance of the modal feature to the final fusion result, the modal feature with high importance is given a large weight, the modal feature with small importance is given a small weight, so that the fusion quality and effect are improved,

wherein W is ₁ ,W ₂ Represents a learning parameter matrix, lambda represents a corresponding weight coefficient,is the multi-modal feature vector that is eventually fused after alignment.

Further, the signal depth encoder f _s The image depth encoder f is divided into three shallow branch lines with different scales _v Only one shallow scale is adopted, two encoders perform semantic feature extraction with the same scale in a deep semantic stage, and an image depth encoder f _v Sum signal depth encoder f _s All adopt 1:1:3:1, two encoders respectively and simultaneously perform multi-scale operation aiming at shallow information and deep semantic information, and the method specifically comprises the following steps:

for vibration signal dataThrough k ₁ ,k ₂ ,k ₃ The three convolution kernels with different sizes extract shallow characteristic information with different scales, divide signal characteristics into three scales, and simultaneously realize denoising of vibration signals, and the process is described as follows:

wherein,respectively representing shallow characteristic information k of three different scales ₁ ,k ₂ ,k ₃ Representing three different sizes of convolution kernels, f representing the activation function;

then sequentially extracting deep features under respective scales, and respectively extracting deep feature information under different semantic scales through expansion convolution of different expansion coefficients in a deep semantic stage; finally, splicing three deep feature information with different scales together through a mapping function, and mapping the deep feature information and the image feature information extracted by an image depth encoder together into the same feature space for self-adaptive multi-mode fusion;

for time-frequency domain images, an image depth encoder f _v Vertical direction structure and signal depth encoder f _s And (3) carrying out deep semantic feature extraction by adopting the same method.

Further, the discriminant enhancement module of the pseudo tag generation module optimizes the discriminant vector classification score from the multi-view angle by utilizing the characteristic of complementation of the characteristics of the plurality of modal information, and improves the significance of the discriminant relationship among the classes in the classification score vector; the specific method comprises the following steps:

signal feature vectorAnd image feature vector +.>Go through new class classifier to its discriminant vector +.>And->Respectively calculate the discriminant thereofProbability distribution:

wherein C is ^u The number of categories in the new category is represented,representation->Probability belonging to class 1, < >>Representation->Probability belonging to class 2, < >>Representation->Belonging to the C ^u Probability of class->Representation->Probability belonging to class 1, < >>Representation->Probability belonging to class 2, < >>Representation->Belonging to the C ^u Probability of class;

firstly, from the angle of multiple modes and multiple views, the discriminant probability relation of samples among categories is considered, so that the aim of improving the significance of discriminant vectors is fulfilled:

Q＝Softmax(P _v ·P _s )

wherein Q represents class probability obtained after mutual guidance of different modes;

judging weights of different modes according to the information entropy of the modes, and fusing discriminant relation vectors of the different modes:

wherein alpha and beta are parameters,and->Information entropy, H (l) _i ) Information entropy representing the sum of the image and the signal, α+β=1; finally, the discriminant probability relationship among the categories is used as the weight among the discriminant vectors after fusion to optimize, so as to obtain the final discriminant vector with obviously enhanced discriminant relationship:

further, in the pseudo tag allocation module, for the case that discriminant vectors are equal to each other, an optimal transportation algorithm is used, an entropy term is added for punishing the case that all discriminant vectors are equal to each other, and all C is excited ^u Uniformly dividing pseudo tags on the clusters; the specific method comprises the following steps:

is provided withIs a matrix calculated for a new class header for a sample of size B, providedFor the unknown pseudo tag matrix of the current lot, +.>The solution of (2) is:

where H is the entropy function of the scattering pseudo tag, tr is the trace of the matrix, ε > 0 is the superparameter, Γ is the transport polytype, defined as:

wherein 1 is _B Andrepresents B and C ^u A vector matrix of all 1's, Y representing a joint probability matrix of Cu x B; the pseudo tag thus produced is composed of +.>Each row y of _i A representation;

and finally splicing the pseudo tag allocated for the data of the label-free set with the tag of the label set, and training the pseudo tag as a joint training tag and the prediction output of the joint training in a mode of calculating cross entropy loss.

Compared with the prior art, the invention has the advantages that:

(1) The invention provides a new type discovery method for fault detection by utilizing multi-mode data information, which can effectively utilize the complementarity of information among different mode information and enrich characteristic information; the multi-mode data fusion module based on the significance correlation is provided, namely the significance correlation and complementation fusion module is provided, different weight parameters are given to the data of different modes according to different working environment conditions, the redundancy problem of data fusion is solved, and the complementarity among different modes is fully utilized.

(2) The invention provides a novel multi-scale deep feature extractor, namely a signal depth encoder and an image depth encoder, which adopt different multi-scale feature extraction structures aiming at data of different modes, and combine shallow feature multi-scale information and deep semantic feature multi-scale information to improve the feature extraction capability of global information and promote the clustering accuracy of a new class discovery stage; and the anti-interference capability of the model under different complex environments is enhanced through multi-scale feature extraction.

(3) The invention provides a new multimode pseudo tag generation module. The weight super-parameter fusion multi-mode discriminant relation vector fused by different modes is primarily calculated by utilizing the difference of information entropy among the modes, and the significance of the discriminant relation among the classes of the fusion vector is improved by utilizing the discriminant relation probability among the classes. The method effectively and reasonably integrates the multi-modal discriminant vectors, and improves the final classification clustering effect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a training method in a pre-training phase of the present invention;

FIG. 2 is a schematic diagram of the training method of the new class discovery phase of the present invention;

FIG. 3 is a schematic diagram of a significance-dependent and complementary fusion module according to the present invention;

FIG. 4a is an image depth encoder of the present invention;

FIG. 4b is a schematic diagram of a signal depth encoder according to the present invention;

fig. 5 is a schematic diagram of a pseudo tag generating module according to the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples.

Given a data set of a mixture of tagged and untagged d= { x _i I=1,..m }, these data sets are in this embodiment fault data sets of the rolling bearing, typically a vibration signal data set. The aim of the invention is to automatically divide these data into C (c=c ^l +C ^u ) A different class of data. Furthermore, a labeled data set is providedWherein class allocation->Are known. By at least one ofFeature expression of learning data in marked dataset to realize unmarked data D ^u Automatically clustering targets. Of course, the present invention can also be used for other fault diagnosis, and the present embodiment will be described by taking the rolling bearing fault diagnosis as an example only. When other faults are diagnosed, the input data is corresponding fault information.

Therefore, the embodiment provides a rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement. The end-to-end training of the invention consists of two important parts, namely, complete supervised training in the labeled data set and unsupervised clustering training on the mixed data set, wherein the model used by the unsupervised clustering is the model obtained after the complete supervised training, and the invention is described in detail below.

The training process at each stage is described in detail below in conjunction with fig. 1 and 2, respectively.

S1, a pre-training stage for supervised training of a depth encoder model on the marker set, as shown in FIG. 1, the depth encoder model comprising a signal depth encoder f _s Image depth encoder f _v And the significance correlation and complementary fusion module is used for carrying out multi-modal learning and multi-modal characterization complementary fusion. The data of the marker set in the pre-training stage is the data of the vibration signal with the label.

The multi-modal learning is: first by fast fourier transformThe original vibration signal +.>Converting into time-frequency domain imageRealizes the modal data expansion to obtain input data +.>Then through a signal depth encoder f _s And image depthEncoder f _v Encoding signal data and image data as signal feature vector, respectively>And image feature vector +.>

The multimodal characterization complementation fusion is: the data fusion is carried out through the saliency correlation and the complementary fusion module, the saliency correlation alignment of the multi-modal data is realized, the aligned multi-modal feature vectors are subjected to the modal data balance complementary fusion, the contribution degree of each modal feature in the fusion process is controlled through the learning weight parameter, the saliency feature and the data diversity in each modal data are reserved, and the aligned and fused multi-modal feature vectors are obtainedFinally pass through a catalyst having C ^l The classification layer of the outputs, obtaining the outputs +.>By true tags Y with a set of tags ^l And calculating the cross entropy loss, and realizing the supervised training of the depth encoder model.

Single-mode learning may be difficult to handle when dealing with relatively complex problems, because it provides limited amounts of information, is more constrained by data, and may also suffer from noise, distortion, and the like. Compared with single-mode learning, the multi-mode learning can extract different forms of characteristic information by cross-mode, and fuse information of a plurality of data sources to extract deeper and richer semantic characteristics, fully utilize correlation and complementarity among different modes, and realize more efficient characteristic expression and learning.

In the complementary fusion stage of multi-modal characterization, data is generally integrated by adopting a splicing or feature addition mode, while original information of each mode is reserved, and more comprehensive and rich feature representation is provided, a plurality of problems are caused as well: (a) high dimensionality: the feature dimension increases significantly, increasing the complexity of computation and storage. (b) feature weight balancing problem: features of different modalities have different importance and expressive power, and simple addition or concatenation cannot handle the weight relationship between features. (c) feature redundancy: the features of different modes have redundant information, and direct splicing or addition can lead to feature redundancy, so that the identification capability and generalization of the features are reduced. Therefore, the invention provides a multimode characterization complementary fusion module based on significance correlation, namely a significance correlation and complementary fusion module, so as to improve the effect and the robustness of a fusion result.

As a preferred embodiment, as shown in fig. 3, in the pre-training stage, the data fusion between the significance-related and complementary fusion modules is divided into two steps: data saliency correlation alignment and modal data balance complementary fusion, and for obtained signal characteristic vectorAnd image feature vector->Wherein m=n, in the data saliency correlation alignment section, +.>And->Is transposed and cross multiplied to obtain an original correlation matrix S of the vibration signal and the time-frequency domain image _origin ：

Then sparse is adopted to obtain a correlation matrix S _mask In the thinning process, setting element value s greater than or equal to 0.5 in the original correlation matrix _ij Set to 1, otherwise set to 0, so as to achieve the purpose of highlighting the correlation vector and thinning the uncorrelated vector,

wherein W is ₁ ,W ₂ Represents a learning parameter matrix, lambda represents a corresponding weight coefficient,the multi-modal feature vector is finally fused after alignment, contains richer information representation, and improves the robustness and the expressive power of the features.

As a preferred implementation mode, the method extracts the characteristics of samples of different modes under a plurality of scales, and effectively avoids noise interference and information loss in a complex environment. The deep multi-scale feature extraction process of the invention is consistent for the signals and the images, but the shallow features are processed by three convolution kernels additionally for the signals. That is to say that the encoder structure of both is vertically identical, except that the horizontal direction of the signal becomes 3 branches. As shown in fig. 4a, 4b, in the figures,representing convolution operations, different numbers representing convolutions of different scales, +.>Representing the operation of the dilation convolution, different numbers representing the dilation convolutions of different scales, +.>Representing a pooling operation.

Signal depth encoder f _s The image depth encoder f is divided into three shallow branch lines with different scales _v Only one shallow scale is adopted, two encoders perform semantic feature extraction with the same scale in a deep semantic stage, and an image depth encoder f _v Sum signal depth encoder f _s All adopt 1:1:3:1, two encoders respectively and simultaneously perform multiple aiming at shallow layer information and deep semantic informationThe scale operation is as follows:

wherein,respectively representing shallow characteristic information k of three different scales ₁ ,k ₂ ,k ₃ Three different sizes of convolution kernels are represented, f representing the activation function.

Then sequentially extracting deep features under respective scales, and respectively extracting deep feature information under different semantic scales through expansion convolution of different expansion coefficients (such as 1,3 and 5) at a deep semantic stage of more semantic degree; finally, the three deep feature information with different scales are spliced together through a mapping function, and the deep feature information and the image feature information extracted by the image depth encoder are mapped to the same feature space together for self-adaptive multi-mode fusion.

The multi-scale feature extraction for the time-frequency domain image is mainly embodied on deep semantic features, the shallow features mainly reflect basic features of signals, and the deep semantic features are more distinguishable for complex signals and fault diagnosis.Image depth encoder f _v Vertical direction structure and signal depth encoder f _s And (3) carrying out deep semantic feature extraction by adopting the same method. And will not be described in detail herein. The invention carries out multi-scale operation on shallow information and deep semantic information at the same time, thereby effectively improving the feature extraction capability of the model.

And S2, a new class discovery stage, which is used for identifying and discovering new classes, wherein the new class discovery stage uses marked and unmarked mixed data of the mixed data set D as input data to perform unsupervised clustering joint training.

As shown in fig. 2, at this stage, the hybrid data is first processed using S1 a pre-trained depth encoder model, i.e. a signal depth encoder f trained using the pre-training stage _s And image depth encoder f _v As feature extractors for mixing signal data and image data in the dataset, respectively.

The processing of marked data and unmarked data is described below.

S23, generating a module pair through the pseudo tagProcessing results in pseudo tags without a set of tags.

The pseudo tag generation module comprises a discriminant enhancement module and a pseudo tag distribution module, wherein the discriminant enhancement module is used for obtaining two modesEnhancement of discriminant relationship is performed and in +.>Adding an entropy term as penalty, and taking the final output result as a label-free set D ^u Is a pseudo tag of (a).

As a preferred embodiment, as shown in fig. 1 and 5, the pseudo tag generation module of the present invention includes a discriminant enhancement module and a pseudo tag distribution module, where the discriminant enhancement module optimizes the classification score of the discriminant vector from a multi-view angle by using characteristics of complementarity of features of the various modal information, so as to improve significance of discriminant relationships among classes in the classification score vector. The specific method comprises the following steps:

signal feature vectorAnd image feature vector +.>Go through new class classifier to its discriminant vector +.>And->Respectively calculate the judgmentProbability distribution of the identity:

wherein C is ^u The number of categories in the new category is represented,representation->Probability belonging to class 1, < >>Representation->Probability belonging to class 2, < >>Representation->Belonging to the C ^u Probability of class->Representation->Probability belonging to class 1, < >>Representation->Probability belonging to class 2, < >>Representation->Belonging to the C ^u Probability of class.

Q＝Softmax(P _v ·P _s )

in the pseudo tag allocation module, for the case that discriminant vectors are equal to each other, an optimal transportation algorithm is used, an entropy item is added for punishing the case that all discriminant vectors are equal, and all C are excited ^u Pseudo tags on clusters are uniformly divided.

The specific method comprises the following steps: is provided withIs a matrix calculated for a new class header for a sample of size B, set +.>For the unknown pseudo tag matrix of the current lot, +.>The solution of (2) is:

wherein 1 is _B Andrepresents B and C ^u Vector matrix of all 1's, Y represents C ^u A joint probability matrix of x B; the pseudo tag thus produced is composed of +.>Each row y of _i And (3) representing. And finally splicing the pseudo tag allocated for the data of the label-free set with the tag of the label set, and training the pseudo tag as a joint training tag and the prediction output of the joint training in a mode of calculating cross entropy loss.

S24, splicing two output featuresAnd->As the prediction output of the joint training, the label of the label set and the pseudo label of the label-free set obtained in the step S23 are spliced together to be used as the label of the joint training, and the joint training is carried out by calculating the cross entropy loss with the prediction output of the joint training. The purpose of the joint training is to deepen the memory of the knowledge of the marked data while fine tuning the signal depth encoder and the image depth encoder by using the unmarked data;

In summary, the invention fully utilizes the characteristic information of the data of multiple modes, and the information among different modes can be mutually complemented, thereby obtaining more comprehensive and accurate information, enhancing the expression and diagnosis effect of fault characteristics and improving the accuracy of fault diagnosis. The method is characterized in that a multi-mode representation complementary fusion method related to significance is provided for simultaneously processing multi-mode data, different weight parameters are given to the data of different modes according to different working environment conditions, and the redundancy of the data is reduced. Secondly, the feature extraction capability of global information is improved, the defect of weak global dependency relationship of the convolutional neural network is overcome, and the generalization capability and the anti-interference capability of the model under different complex environments are enhanced through multi-scale feature extraction. Thirdly, the discrimination relation of ambiguity of the single-mode characteristic information entropy can be effectively solved through mutual entanglement of the two-mode characteristic information, the category significance of the discrimination vector of the label-free data is enhanced, the multi-mode discrimination vector is effectively and reasonably fused, and the final classification clustering effect is improved.

It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that various changes, modifications, additions and substitutions can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. The rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement is characterized by comprising two stages of pre-training and new class discovery:

the tag set data is tag vibration signal data, and the multi-mode learning is as follows: first by fast fourier transformThe original vibration signal +.>Converting into time-frequency domain image/>Realizes the modal data expansion to obtain input data +.>Then through a signal depth encoder f _s And image depth encoder f _v Encoding signal data and image data into signal feature vectors, respectivelyAnd image feature vector +.>The multimodal characterization complementation fusion is: the data fusion is carried out through the saliency correlation and the complementary fusion module, the saliency correlation alignment of the multi-modal data is realized, the modal data balance complementary fusion is carried out on the aligned multi-modal feature vectors, the contribution degree of each modal feature in the fusion process is controlled through the learning weight parameter, the saliency feature and the data diversity in each modal data are reserved, and the aligned and fused multi-modal feature vectors are obtained>Finally pass through a catalyst having C ^l The classification layer of the outputs, obtaining the outputs +.>By true tags Y with a set of tags ^l Calculating cross entropy loss, and realizing supervised training of the model;

at this stage, a signal depth encoder f trained using a pre-training stage _s And image depth codingEncoder f _v A feature extractor for mixing the signal data and the image data in the data set;

S22, no-mark set D ^u Input to f _s And f _v Extracting features to obtain signal feature vectorAnd image feature vectorSubsequently, use is made of a composition having C ^u Classifying the new class classifier of the output to obtain the output +.>

S23, generating a module pair through the pseudo tagProcessing to obtain a pseudo tag without a tag set; wherein the pseudo tag generation module comprises a discriminant enhancement module and a pseudo tag distribution module, and the discriminant enhancement module is used for obtaining the two modes ∈>Enhancement of discriminant relationship is performed and in +.>Adding an entropy term as penalty, and taking the final output result as a label-free set D ^u Is a pseudo tag of (2);

2. The method for detecting unknown faults of rolling bearings based on multi-modal feature fusion enhancement according to claim 1, wherein in a pre-training stage, the significance correlation and complementary fusion module perform data fusion in two steps: data saliency correlation alignment and modal data balance complementary fusion, and for obtained signal characteristic vectorAnd image feature vector->In the data saliency correlation alignment section, +.>And->Is transposed and cross multiplied to obtain an original correlation matrix S of the vibration signal and the time-frequency domain image _origin ：

alignment of multi-mode data is achieved through the correlation matrix, and aligned signal feature vectors are obtainedAnd image feature vector

After realizing the correlation alignment of the significance of the multi-mode data, the aligned signal characteristic vectorAnd image feature vector +.>Performing balanced complementary fusion of modal data, controlling the contribution degree of each modal feature in the fusion process by learning weight parameters, specifically, calculating a weight value for the feature vector of each modal, wherein the weight value represents the importance of the modal feature to the final fusion result, the modal feature with high importance is given a large weight, the modal feature with small importance is given a small weight, so that the fusion quality and effect are improved,

3. The method for detecting unknown faults of rolling bearings based on multi-modal feature fusion enhancement as claimed in claim 1, wherein the signal depth encoder f _s The image depth encoder f is divided into three shallow branch lines with different scales _v Only one shallow scale is adopted, two encoders perform semantic feature extraction with the same scale in a deep semantic stage, and an image depth encoder f _v Sum signal depth encoder f _s All adopt 1:1:3:1, two encoders respectively and simultaneously perform multi-scale operation aiming at shallow information and deep semantic information, and the method specifically comprises the following steps:

4. The rolling bearing unknown fault detection method based on multi-modal feature fusion enhancement according to claim 1, wherein the discriminant enhancement module of the pseudo tag generation module optimizes the discriminant vector classification score from a multi-view angle by utilizing the characteristic of complementation of the features of the multi-modal information; the specific method comprises the following steps:

signal feature vectorAnd image feature vector +.>Obtaining the discriminant vector of the new class classifier>And->Respectively calculating discriminant probability distribution:

wherein C is ^u The number of categories in the new category is represented,representation->Probability belonging to class 1, < >>Representation->Probability belonging to class 2, < >>Representation->Belonging to the C ^u Probability of class->Representation->Probability belonging to class 1, < >>Representation->The probability of belonging to class 2,representation->Belonging to the C ^u Probability of class;

Q＝Softmax(P _v ·P _s )

5. the method for detecting unknown faults of rolling bearings based on multi-modal feature fusion enhancement according to claim 4, wherein in the pseudo tag distribution module, for the case that discriminant vectors are equal to each other, an optimal transportation algorithm is used, an entropy term is added for punishing the case that all discriminant vectors are equal to each other, and all C is excited ^u Uniformly dividing pseudo tags on the clusters; the specific method comprises the following steps:

wherein 1 is _B Andrepresents B and C ^u Vector matrix of all 1's, Y represents C ^u A joint probability matrix of x B; the pseudo tag thus produced is composed of +.>Each row y of _i A representation;