CN110414383A

CN110414383A - Convolutional neural networks based on Wasserstein distance fight transfer learning method and its application

Info

Publication number: CN110414383A
Application number: CN201910624662.9A
Authority: CN
Inventors: 袁烨; 周倍同; 程骋; 李星毅; 马贵君
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2019-11-05

Abstract

The present invention relates to a kind of, and the convolutional neural networks based on Wasserstein distance fight transfer learning method and its application, it include: to obtain the source domain feature set of source domain marker samples collection and the target signature collection of source domain breakdown judge collection and aiming field sample set using convolutional neural networks to be migrated；To maximize the distance of the Wasserstein between source domain feature set and target signature collection and minimize Wasserstein distance and the adduction for judge penalty values that source domain breakdown judge integrates is target, based on convergence criterion, the confrontation transfer learning of convolutional neural networks is realized.The present invention introduces Wasserstein distance in the transfer learning of convolutional neural networks, it is up to target with Wasserstein distance, improve the differentiation susceptibility to two kinds of extracted features of sample set, again with the minimum target of adduction of Wasserstein distance and the penalty values of source domain breakdown judge collection, to improve the judgement precision of convolutional neural networks, it is low to sample data and network architecture requirement while guaranteeing trouble diagnosibility, it is applicable to migrate between multi-state, practical application is strong.

Description

Based on Wasserstein distance convolutional neural networks confrontation transfer learning method and It is applied

Technical field

The invention belongs to industrial process fault diagnosis field, more particularly, to a kind of based on Wasserstein distance Convolutional neural networks fight transfer learning method and its application,

Background technique

Fault diagnosis is intended to measurement data and other information by using acquisition, monitors and analyze machine state to be isolated Failure in system wants that accomplishing that above-mentioned this is put needs with prowess, veteran expert, and which increase use artificial intelligence Energy technology makes the demand of Fault Tree Diagnosis Decision.The deployment of real-time fault diagnosis frame allows maintenance team to take action in advance Impacted component is replaced or repairs, to improve production efficiency and guarantee safe operation.Therefore, bearing fault is accurately diagnosed Reliability and safety to Mechanical Manufacture System have critical significance.

Many advanced signal processings and machine learning techniques have been used for fault diagnosis.In the past few years, depth It practises model (for example, depth confidence network, sparse autocoder, especially convolutional neural networks), in fault diagnosis task, Than method rule-based and based on model, preferably fitting and learning ability are shown, knowing using related source domain is intended to Know and (there is enough flag datas), is learnt in aiming field (flag data is insufficient or does not have flag data), to save It has saved the plenty of time for rebuilding a new fault diagnosis model from the beginning and has calculated cost.

However, there are still two difficulties for above-mentioned deep learning model: 1) most methods need false based on independent same distribution If that is, the sample set of source domain and aiming field task needs identical distribution.Therefore, it when facing new diagnostic task, instructs in advance The adaptability of experienced network is restricted, and wherein the different operation condition of new task and physical characteristic may cause new samples collection Distributional difference between (target sample collection) and original sample collection (source sample set), it is caused the result is that: new failure is examined Disconnected task, deep learning model is typically required trains from the beginning, this leads to the waste of computing resource and training time；2) mesh Marking label or unlabelled data deficiencies in domain is another FAQs.In actual industrial, new diagnosis is appointed Business collects enough typical samples to construct the data set of extensive and high quality to train network be extremely difficult.And Due to being difficult to install enough sensors in sealed in unit, meanwhile, industrial marking usually requires expensive manpower.Therefore, domain The challenge of adaptation, which is can not to collect markd data in aiming field or can only be collected into, a small amount of has flag data.

Based on above-mentioned challenge, depth migration learning algorithm is had made some improvements at present, and one is moving based on acquisition data Network is moved, mainly by the data distribution weighted value appropriate to acquire in source domain, by data migration selected in source domain It is supplemented into aiming field as data, this transfer learning method independent of model and more demanding to data similarity.Second Kind it is that the migration network based on network structure refers to and migrates to the subnetwork that source domain is trained in advance, including its network structure And Connecting quantity, it is converted into a part of deep neural network used in aiming field, but this transfer learning method is only There is preferable effect for certain network structures, such as: LeNet, AlexNet, VGG, Inception, ResNet.The third is base In the migration network of domain fusion, wherein maximum Average difference is most commonly seen, but this maximum Average difference, calculating cost will Increase with the increase of sample number in biquadratic, it is suitable in many practical applications with large data sets which has limited MMD The property used.

Therefore, the depth migration learning algorithm that a kind of industrial applicibility is strong and trouble diagnosibility is high is developed, is target work Industry process failure diagnosis field technical problem urgently to be resolved.

Summary of the invention

The present invention provides a kind of convolutional neural networks confrontation based on Wasserstein distance and transfer learning method and its answers With to solve existing depth migration learning method while guaranteeing industrial process trouble diagnosibility to actual sample data And/or neural network structure is more demanding and leads to the technical issues of being dfficult to apply to actual industrial process.

The technical scheme to solve the above technical problems is that a kind of convolution mind based on Wasserstein distance Transfer learning method is fought through network, comprising:

Step 1, from source domain flag data collection and aiming field data set, determine source domain marker samples collection and aiming field sample Collection；

Step 2, using the convolutional neural networks of study to be migrated, obtain the source domain feature set of the source domain marker samples collection With the target signature collection of source domain breakdown judge collection and the aiming field sample set；

Step 3, with maximize the Wasserstein between the source domain feature set and the target signature collection distance and most The adduction for judging penalty values that the smallization Wasserstein distance and the source domain breakdown judge integrate is target, described in adjustment The parameter of convolutional neural networks, be based on convergence criterion, repeat step 1, alternatively, complete convolutional neural networks to anti-migration It practises.

The beneficial effects of the present invention are: the present invention introduced in the transfer learning of convolutional neural networks Wasserstein away from From, target is up to Wasserstein distance, improves the differentiation susceptibility to two kinds of extracted features of sample set, then with The minimum target of adduction of Wasserstein distance and the judgement penalty values of source domain breakdown judge collection, to improve convolutional Neural net The judgement precision of network.Based on above-mentioned target, optimize the parameter of convolutional neural networks, in the training process, two targets are formed pair It is anti-, realize that depth fights transfer learning.Therefore, the convolutional neural networks that present invention training obtains can minimize source domain and target Distributional difference between domain, carries out unsupervised transfer learning using unlabelled target numeric field data, and it is high just to obtain accuracy enough Trouble diagnosibility.It solves the prior art to a certain extent to need to rebuild depth from the beginning in face of new fault diagnosis task It spends the waste of computing resource and training time caused by learning model and lacks enough flag datas in aiming field Technical problem, and while guaranteeing industrial process trouble diagnosibility, to sample data and network structure without especially high Requirement, the fault diagnosis suitable for actual industrial process.

On the basis of above-mentioned technical proposal, the present invention can also be improved as follows.

Further, the Wasserstein distance are as follows:

By using domain similarity evaluation neural network, the source domain feature set is mapped as source domain set of real numbers, institute respectively It states target signature collection and is mapped as target set of real numbers, and the real number average value of the source domain set of real numbers is subtracted into the target set of real numbers Real number mean value calculation obtain.

Further beneficial effect of the invention is: present invention introduces domain similarity evaluation neural networks, respectively sentence source domain Disconnected collection and object judgement collection are mapped as real number, calculate Wasserstein distance based on real number, maximum with Wasserstein distance For target, evaluation neural network of the training based on domain similarity improves domain similarity evaluation neural network to two kinds of sample set institutes The differentiation susceptibility of the feature of extraction.Wherein, Wasserstein distance is carried out based on real number average value to calculate, it is simple and reliable Property it is high.

Further, the step 3 includes:

Step 3.1, using increase the source domain feature set and the target signature integrate between Wasserstein distance as mesh Mark, optimizes the parameter of the domain similarity evaluation neural network, obtains the default corresponding new domain similarity evaluation of the number of iterations Neural network；

Step 3.2, based on the source domain marker samples collection, the aiming field sample set and the new domain similarity evaluation Neural network, to reduce the adduction for judging penalty values of the Wasserstein distance and the convolutional neural networks as target, The parameter for optimizing the convolutional neural networks obtains the default corresponding new convolutional neural networks of the number of iterations；

Step 3.3, the convergence criterion based on domain similarity evaluation neural network and convolutional neural networks, deconditioning are complete At the confrontation transfer learning of the convolutional neural networks, alternatively, repeating step 1.

Further beneficial effect of the invention is: being up to target, training domain similarity evaluation with Wasserstein distance Neural network improves domain similarity evaluation neural network to the differentiation susceptibility of two kinds of extracted features of sample set.Later, Gu The parameter of localization similarity evaluation neural network, is adjusted the parameter of convolutional neural networks, is on the one hand obtained based on convolutional neural networks To the judgement penalty values of source domain sample set, on the other hand, the source domain feature set and aiming field that convolutional neural networks are extracted Feature set inputs above-mentioned trained domain similarity evaluation neural network, Wasserstein distance is obtained, with Wasserstein Distance and judge the minimum target of the adduction of penalty values, trained breakdown judge convolutional neural networks.Therefore, based on first training domain phase Like training convolutional neural networks after degree evaluation neural network, the convolution mind of retraining domain similarity evaluation neural network retraining later Mode through network iterates repeatedly, until two kinds of neural network convergences, complete convolutional neural networks training, so that the product Neural network breakdown judge ability with higher.

Further, in the step 3.1, the Wasserstein distance is the source domain real number punished based on gradient The difference of the real number average value of the real number average value of collection and the target set of real numbers；

In the step 3.2, the Wasserstein distance is the real number average value and the mesh of the source domain set of real numbers Mark the difference of the real number average value of set of real numbers.

Further beneficial effect of the invention is: present invention introduces gradient punishment, for restriction of domain similarity evaluation nerve Network prevents parameter excessively complicated, reduces computation complexity, improves domain similarity evaluation neural network and convolutional neural networks Convergence rate.In addition, ignoring gradient punishment when optimizing the parameter of convolutional neural networks, gradient punishment does not influence character representation Learning process, avoid gradient punishment to the trouble diagnosibilities of convolutional neural networks.

Further, the Wasserstein distance indicates:

The real number average value of the sub- set of real numbers of source domain subtracts the real number average value of the sub- set of real numbers of target；

Wherein, the sub- set of real numbers of the source domain is to constrain by Lipchitz from the source domain set of real numbers to obtain, the target Sub- set of real numbers is to constrain by the Lipchitz from the target set of real numbers to obtain.

Further beneficial effect of the invention is: being constrained, is obtained needed for calculating Wasserstein distance based on Lipchitz Set of real numbers, reduce data dimension, guarantee convolutional neural networks trouble diagnosibility while, improve domain similarity evaluation The convergence rate of neural network and convolutional neural networks.

Further, the Lipchitz constraint is specially the constraint of single order Lipchitz.

Further beneficial effect of the invention is: it is constrained using the Lipchitz of single order, calculates Wasserstein distance, To guarantee the continuity of convolutional neural networks and the property led, while while guarantee the trouble diagnosibility of convolutional neural networks, Improve the convergence rate of domain similarity evaluation neural network and convolutional neural networks.

Further, when in the aiming field sample set including part marker samples, then the step 2 are as follows:

Using the convolutional neural networks of study to be migrated, the source domain feature set and source domain of the source domain marker samples collection are obtained The target signature collection and target faults of breakdown judge collection and the aiming field sample set judgement collection；

The step 3 are as follows:

To maximize the distance of the Wasserstein between the source domain feature set and the target signature collection and minimize institute State the judgement damage of Wasserstein distance, the judgement penalty values of the source domain breakdown judge collection and target faults judgement collection The adduction of mistake value is target, adjusts the parameter of the convolutional neural networks, is based on convergence criterion, repeats step 1, alternatively, completing The confrontation transfer learning of convolutional neural networks.

Further beneficial effect of the invention is: depth of the invention fights transfer learning method, in face of related When migration task (such as transfer learning between different sensors position) between not similar enough task, and do not mark largely The unsupervised situation of note sample is compared, and only need to be added the aiming field sample marked on a small quantity, just be greatly improved the accurate of transfer learning Property.Therefore, the present invention is applicable not only to the unmarked situation of aiming field sample set, is also applied for the sample set of part label, Flexible Application.And when using the aiming field sample set for thering is part to mark, it can further increase what confrontation transfer training obtained Convolutional neural networks are practical to the judgement precision of industrial process failure.

The present invention also provides a kind of industrial process fault diagnosis convolutional neural networks, are based on using any as described above The convolutional neural networks confrontation transfer learning method training of Wasserstein distance obtains.

The beneficial effects of the present invention are: can be based on the correlation between task based access control using any of the above-described kind The convolutional neural networks that the convolutional neural networks confrontation transfer learning method training of Wasserstein distance obtains, by source domain pair The convolutional neural networks depth confrontation transfer learning answered be suitable for source domain and aiming field and the higher convolutional neural networks of precision, It solves the prior art to a certain extent to need to rebuild deep learning model from the beginning in face of new fault diagnosis task and lead The computing resource of cause and the waste of training time and lack in aiming field the technical issues of enough sample datas.

The present invention also provides a kind of industrial process method for diagnosing faults, based on any industrial process failure as described above Convolutional neural networks are diagnosed, when receiving any convolutional neural networks pair based on Wasserstein distance as described above When the new samples of aiming field described in anti-migration learning method, the corresponding industrial process breakdown judge result of the new samples is obtained.

The beneficial effects of the present invention are: using any of the above-described kind of convolutional neural networks pair based on Wasserstein distance The convolutional neural networks that the training of anti-migration learning method obtains, carry out industrial process fault diagnosis, before guarantee is safe and efficient It puts, fault diagnosis precision is higher.

The present invention also provides a kind of storage medium, instruction is stored in the storage medium, when computer reads the finger When enabling, the computer is made to execute any of the above-described kind of convolutional neural networks confrontation transfer learning based on Wasserstein distance Method and/or any industrial process method for diagnosing faults as described above.

Detailed description of the invention

Fig. 1 is a kind of convolutional neural networks confrontation based on Wasserstein distance provided by one embodiment of the present invention The flow diagram of transfer learning method；

Fig. 2 is the convolutional neural networks provided by one embodiment of the present invention based on Wasserstein distance to anti-migration The flow chart of learning method；

Fig. 3 is that the output of migration task US (C) provided by one embodiment of the present invention → US (A) convolutional neural networks is visual Change schematic diagram；

Fig. 4 is that the output of migration task US (E) provided by one embodiment of the present invention → US (F) convolutional neural networks can Depending on changing schematic diagram；

Fig. 5 is that task (a) US (E) provided by one embodiment of the present invention → US (F) and task (b) S (E) → S (F) is examined Accuracy break with sample size change curve.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below Not constituting a conflict with each other can be combined with each other.

Embodiment one

A kind of convolutional neural networks confrontation transfer learning method 100 based on Wasserstein distance, as shown in Figure 1, packet It includes:

Step 110, from source domain flag data collection and aiming field data set, determine source domain marker samples collection and aiming field sample This collection；

Step 120, using the convolutional neural networks of study to be migrated, obtain source domain marker samples collection source domain feature set and The target signature collection of source domain breakdown judge collection and aiming field sample set；

Step 130, with maximize the Wasserstein between source domain feature set and target signature collection distance and minimize The adduction for judging penalty values that Wasserstein distance and source domain breakdown judge integrate adjusts the ginseng of convolutional neural networks as target Number is based on convergence criterion, repeats step 110, alternatively, completing the confrontation transfer learning of convolutional neural networks.

It should be noted that before making the present invention, needing first based on source domain data set, training obtains a convolutional Neural net Network, the convolutional neural networks are the convolutional neural networks of study to be migrated.

Specifically, firstly, using source domain flag data collection X^sPre-training: convolutional layer is carried out to convolutional neural networks model The filter for being k comprising a sizeAnd biasingIt is used to calculate feature.One output feature v_iPass through Filter w and a nonlinear activation function Γ are obtained, expression are as follows: υ_i=Γ (w*u_j+ b), wherein It is to represent source domain data set X^sThe input data of j-th of subvector, " * " represent convolution operation.It is repaired using nonlinear activation function Linear positive unit (ReLU) learns the risk that gradient disappears in optimization process to reduce influence depth.Therefore, defined feature maps For v=[υ₁, υ₂..., υ_L], wherein L=(pN-s)/I_cv+ 1 is characteristic, and p indicates the size of filling, and N indicates the dimension of input Degree, s indicate filter size,It is the step-length of convolution operation.

Then, maximum pond layer is applied in Feature Mapping, to extract the maximum eigenvalue within the scope of a step-lengthThe corresponding filter size of maximum pondization is β, step-length I_pl。

By being alternately stacked multiple convolutional layers and maximum pond layer (filter size is variable), it is configured to Expressive Features Multilayered structure.The output feature of multilayered structure is flattened and passes to full articulamentum to classify, to generate on label The final output of probability distribution.The convolutional neural networks of pre-training obtain final classification using Softmax function in source domain As a result.In order to calculate prediction label in source domainWith the label of truthful dataBetween difference, calculated using cross entropy Lose l_c, it may be assumed that

The transportable feature of aiming field with Unlabeled data, is mentioned by the feature of the convolutional neural networks of above-mentioned training Take device (network structure before as full articulamentum is alternately stacked the multilayered structure of composition including convolutional layer and maximum pond layer) It directly obtains.

The feature distribution difference between set of source data and target data set is measured in next step.Specifically, can be based on two Common latent space between a feature distribution carries out dual training using Wasserstein distance, constant to learn to extract Feature.The feature extractor of specifically used pre-training convolutional neural networks model learns the feature from two domains.For n < N^s, N^t, give two and come from X^sAnd X^tSmall lot exampleWithTwo examples all pass through a parameter For θ_fFeature extractorDirectly generate source domain feature h^s=r_f(x^s) and target domain characterization h^t=r_f(x^t).IfWithRespectively h^sAnd h^tDistribution.Wherein, ()^s()^tRespectively represent source domain and destination domain information.

It is Optimal Parameters θ by the purpose that Wasserstein distance carries out domain adaptation_fTo reduce distributionWithBetween Distance, that is, learn two domains invariant features.

The present embodiment uses backpropagation theoretical optimization model thought.It is introduced in the transfer learning of convolutional neural networks Wasserstein distance is up to target with Wasserstein distance, improves the area to two kinds of extracted features of sample set Divide susceptibility, then with the minimum target of adduction of Wasserstein distance and the judgement penalty values of source domain breakdown judge collection, to mention The judgement precision of high convolutional neural networks.Based on above-mentioned target, the parameter of convolutional neural networks, in the training process, two are changed A target forms confrontation, realizes that depth fights transfer learning.Therefore, the convolutional neural networks that the present embodiment training obtains can be most Distributional difference between smallization source domain and aiming field carries out unsupervised transfer learning using unlabelled target numeric field data, just foot Enough obtain the high trouble diagnosibility of accuracy.Solve to a certain extent the prior art in face of new fault diagnosis task need from Head starts the waste of computing resource and training time caused by rebuilding deep learning model and lacks in aiming field enough Sample data the technical issues of.

Preferably, Wasserstein distance are as follows:

By using domain similarity evaluation neural network, it is special that source domain feature set is mapped as source domain set of real numbers, target respectively Collection is mapped as target set of real numbers, and the real number average value of source domain set of real numbers is subtracted to the real number mean value calculation of target set of real numbers It obtains.

The present embodiment introduces domain similarity evaluation neural network, and source domain judgement collection and object judgement collection are mapped as reality respectively Number calculates Wasserstein distance based on real number, is up to target, training domain similarity evaluation mind with Wasserstein distance Through network, domain similarity evaluation neural network is improved to the differentiation susceptibility of two kinds of extracted features of sample set.Wherein, it is based on Real number average value carries out Wasserstein distance and calculates, and simple and reliable property is high.

Preferably, step 130 includes:

Step 131, using increase source domain feature set and target signature integrate between Wasserstein distance as target, adjustment The parameter of domain similarity evaluation neural network obtains the default corresponding new domain similarity evaluation neural network of the number of iterations；

Step 132, based on source domain marker samples collection, aiming field sample set and new domain similarity evaluation neural network, with Reduce the adduction for judging penalty values of Wasserstein distance and convolutional neural networks as target, changes convolutional neural networks Parameter obtains the default corresponding new convolutional neural networks of the number of iterations；

Step 133, the convergence criterion based on domain similarity evaluation neural network and convolutional neural networks, deconditioning are complete At the confrontation transfer learning of convolutional neural networks, alternatively, repeating step 110.

It is up to target with Wasserstein distance, training domain similarity evaluation neural network improves domain similarity evaluation Differentiation susceptibility of the neural network to two kinds of extracted features of sample set.Later, fixed field similarity evaluation neural network Parameter adjusts the parameter of convolutional neural networks, on the one hand obtains the judgement penalty values of source domain sample set based on convolutional neural networks, On the other hand, the source domain feature set and target domain characterization collection convolutional neural networks extracted input above-mentioned trained domain phase Neural network is evaluated like degree, obtains Wasserstein distance, with Wasserstein distance and judges that the adduction of penalty values is minimum For target, training breakdown judge convolutional neural networks.Therefore, based on training convolutional after first training domain similarity evaluation neural network Neural network, the mode of retraining domain similarity evaluation neural network retraining convolutional neural networks, iterates repeatedly later, Until two kinds of neural network convergences, complete convolutional neural networks training, so that the product neural network breakdown judge with higher Ability.

Preferably, Wasserstein distance indicates: the real number average value of the sub- set of real numbers of source domain subtracts the sub- set of real numbers of target Real number average value；Wherein, the sub- set of real numbers of source domain is to constrain by Lipchitz from source domain set of real numbers to obtain, and the sub- set of real numbers of target is It is constrained by Lipchitz from target set of real numbers and is obtained.

It is constrained based on Lipchitz, obtains calculating Wasserstein apart from required set of real numbers, reduce data dimension, In While guaranteeing the trouble diagnosibility of convolutional neural networks, domain similarity evaluation neural network and convolutional neural networks are improved Convergence rate.

Preferably, Lipchitz constraint is specially the constraint of single order Lipchitz.

It is constrained using the Lipchitz of single order, calculates Wasserstein distance, reduce computation complexity, guaranteeing convolution While the trouble diagnosibility of neural network, the convergence speed of domain similarity evaluation neural network and convolutional neural networks is improved Degree.

Preferably, Wasserstein distance is the Wasserstein distance punished based on gradient.

Gradient punishment is introduced, for limiting neural network, prevents parameter excessively complicated, reduces computation complexity, improve domain The convergence rate of similarity evaluation neural network and convolutional neural networks.

Specifically, as shown in Fig. 2, the neural network for introducing an evaluation domain similarity in step 130 (is denoted as Domain Critic is abbreviated as DC) carry out learning parameter for θ_cMappingIt can be by source domain and the Feature Mapping of aiming field at reality Number.

And Wasserstein distance can pass throughTo count It calculates, wherein the upper limit is more than all single order Lipchitz functionsAnd experience Wasserstein distance approximate can be counted It calculates as follows:Wherein, l_wdIndicate source domain data X^sWith target numeric field data X^tBetween DC lose (being named as experience Wasserstein-1 distance).Now by Lipchitz about L is looked for when beam_wdMaximum value, in practice it has proved that, can in conjunction with gradient punishTo train DC Parameter θ_c, character representation h in formula is by the source domain and target domain characterization (i.e. h that generate^sAnd h^t) and along h^sAnd h^tTo between Randomly selected point h on straight line^rComposition.Due to Wasserstein-1 distance be can micro- and nearly all place it is all continuous, because This trains DC by solving following optimization problem:Wherein, ρ is coefficient of balance.

It is above-mentioned to use unmarked aiming field sample set, it is a kind of unsupervised feature learning adaptive for domain, but can The feature learnt in two domains of energy, which is also got, not enough to be opened.The present embodiment final goal is for aiming fieldExploitation one accurate Depth migration Study strategies and methods, it is needed the supervised learning knot of the source domain of label (and aiming field, if can be if) data It closes into the problem of invariant features study.Then by discriminator (there are two full articulamentums for tool) for further decreasing source domain and mesh Mark the distance between characteristic of field distribution.In this step, the parameter θ of DC_cIt is the parameter in above-mentioned training, and undated parameter θ_f Operator is minimized with optimization.

Final objective function can be according to the intersection entropy loss l of discriminator_cAbove-mentioned experience relevant with to domain difference Wasserstein distance l_wdIt indicates, it may be assumed thatWherein, θ_dIndicate discriminator Parameter, λ are the hyper parameters of determining domain confusion degree.When optimizing above-mentioned minimum operator, gradient punishment l is had ignored_gradIt (sets It sets ρ to be equal to 0), because it should not influence the learning process indicated.

Preferably, when in aiming field sample set including part marker samples, then step 120 are as follows:

Using the convolutional neural networks of study to be migrated, the source domain feature set and source domain failure of source domain marker samples collection are obtained The target signature collection and target faults of judgement collection and aiming field sample set judge collection；Step 130 are as follows: to maximize source domain feature Collect the Wasserstein distance between target signature collection and minimize Wasserstein distance, source domain breakdown judge collection is sentenced The adduction for judging penalty values that disconnected penalty values and target faults judge to integrate optimizes the parameter of convolutional neural networks, is based on as target Convergence criterion repeats step 110, alternatively, completing the confrontation transfer learning of convolutional neural networks.

The depth of the present embodiment fights transfer learning method, the moving between not similar enough task in face of related When shifting task (such as transfer learning between different sensors position), compared with the unsupervised situation of a large amount of unmarked samples, The aiming field sample marked on a small quantity only need to be added, just greatly improves the accuracy of transfer learning.Therefore, the present invention is applicable not only to The unmarked situation of aiming field sample set is also applied for the sample set of part label, flexible Application.And there is part mark when using When the aiming field sample set of note, the obtained convolutional neural networks of confrontation transfer training can be further increased to industrial process failure Diagnostic accuracy, it is practical.

Embodiment two

A kind of industrial process fault diagnosis convolutional neural networks are based on using any described in embodiment one as above The convolutional neural networks confrontation transfer learning method training of Wasserstein distance obtains.

It can be with the correlation between task based access control, using any of the above-described kind of convolutional Neural based on Wasserstein distance The convolutional neural networks that network confrontation transfer learning method training obtains move the corresponding convolutional neural networks depth confrontation of source domain Study is moved to solve prior art face to a certain extent suitable for source domain and aiming field and the higher convolutional neural networks of precision The wave of computing resource and training time caused by rebuilding deep learning model from the beginning is needed to new fault diagnosis task The technical issues of taking and lacking enough sample datas in aiming field.

Embodiment three

A kind of industrial process method for diagnosing faults, based on any industrial process fault diagnosis described in embodiment two as above Convolutional neural networks, when receiving any convolutional Neural net based on Wasserstein distance described in embodiment one as above When network fights the new samples of aiming field described in transfer learning method, the corresponding industrial process breakdown judge knot of the new samples is obtained Fruit.

Trained using any of the above-described kind of convolutional neural networks confrontation transfer learning method based on Wasserstein distance The convolutional neural networks arrived carry out industrial process fault diagnosis, and under the premise of guaranteeing safe and efficient, fault diagnosis precision is more It is high.

In order to verify above-mentioned depth confrontation transfer learning to the validity of troubleshooting issue, this embodiment introduces Keyes The benchmark bearing fault data collection that data center, Xi Chu university obtains.Monitor bearing state (i.e. normal, the inner ring event of four seed types Barrier, outer ring failure and roller failure), the sample frequency of digital signal is 12kHz, drive end bearing fault data simultaneously also with The sampling rate of 48kHz acquires.Meanwhile every kind of fault type is all with different fault severity level (0.007 inch, 0.014 English Very little and 0.021 inch of fault diameter) operation.Each type of faulty bearings are equipped with test motor, and the motor is at four kinds It is run under different motor speeds (i.e. 1797rpm, 1772rpm, 1750rpm and 1730rpm).Record the vibration letter of each experiment Number to carry out fault diagnosis.

Data prediction: simple Data Preprocessing Technology is applied to bearing data set: 1) divides sample to keep every A sample existsWithIn have 2000 measurement points；2) frequency of each sample is calculated using Fast Fourier Transform (FFT) (FFT) Domain power spectrum；3) the symmetric power spectrum left part calculated FFT is as the input of depth migration learning model.Therefore, each Input sample has 1000 measured values.

It is once verified based on two unsupervised scenes and a supervision scene (being shown in Table 1), scene specifically includes:

(1) the unsupervised migration (US-Speed) between motor speed: in this case, test is obtained in motor drive terminal The data obtained, and ignore the seriousness of failure.Here four classification tasks (i.e. normal and inner ring, outer ring and roller failure are constructed Three kinds of fault conditions), across the domain of 4 different motor speeds: 1797rpm (US (A)), 1772rpm (US (B)), 1750rpm (US (C)) and 1730rpm (US (D)).

Unsupervised migration between (2) two sensor positions (US-Location): in this case, concern is different Domain between sensor position is adaptive, but ignores the severity of failure and the difference of motor speed.It equally, is here source domain With four classification of aiming field building (normal and three kinds of failures) task, wherein vibration acceleration data are respectively by being located at driving end (US (E)) it is obtained with two sensors of the fan end of motor housing (US (F)).

Between the data set of (3) two sensor positions (S-Location) supervision migration: this scene use with it is previous The identical setting of scene US-Location, but it is added in source domain the flag data (about 0.5%) of a small amount of aiming field, purport Improving classification performance.

Table 1

To be compared, other methods are also tested in identical data set, comprising:

(1) convolutional neural networks (CNN): the model is the pre-training network that embodiment one describes, which is based on label The source domain data classification results that are trained, and are directly used on test target domain.

(2) depth adapts to network (DAN): learning transportable feature by the MK-MMD in deep neural network.MMD measurement It is an integral probability metrics, it measures two probability point by the way that sample is mapped to reproducing kernel Hilbert space (RKHS) The distance between cloth.

In addition, the present embodiment also has evaluated the feature extraction energy of convolutional neural networks compared with using conventional statistics feature Power compares using the legacy migration learning method for counting (craft) feature as a result, including migration constituent analysis (TCA), connection It closes distribution and adapts to (JDA) and related alignment (CORAL).

Concrete implementation details is as follows:

Use TensorFlow as the software frame of experiment, these models are all trained using Adam.At 5000 times It is tested in iteration every kind of method five times, and records the optimum tested every time.Here using average value and accuracy of classifying 95% confidence interval is compared.The sample size of electromotor velocity task (A), (B), (C) and (D) is respectively 1026,1145, 1390 and 1149.The task (E) of different sensors position and the sample size of (F) are respectively 3790 and 4710.For all realities It tests, training each time or the sample batch size n diagnosed are fixed as 32.

Convolutional neural networks (CNN): convolutional neural networks framework is by two convolutional layers (Conv1-Conv2), two maximums Pond layer (Pool1-Pool2) and two full articulamentum (FC1-FC2) compositions.Activation primitive in output layer is Softmax, and ReLU is used for convolutional layer.Neuron number in FC1 and FC2 is respectively 128 and 4.Every layer of filter quantity, core size and step-length It can be found in table 2.Before migration, CNN model is finely adjusted, to reach the best verifying precision of all migration scenes.

Depth adaptive network (DAN): the convolutional layer (Conv1-Conv2) of convolutional neural networks is used as feature extractor. Then, in order to minimize the domain distance between source domain and aiming field, FC1 is used as the hidden layer adapted to.Hidden layer in two domains It is final expression be embedded in RKHS to reduce MK-MMD distance.Final goal function is MK-MMD loss and Classification Loss Combination.

The application's fights transfer learning model (WD-DTL) based on Wasserstein depth: with depth adaptive network Similar, convolutional layer (Conv1-Conv2) is for extracting feature.The node of hidden layer is respectively set to 128 and 1 in DC network.Often The frequency of training C of batch is set as 10.The learning rate of discriminator and DC are respectively α₁=10^-3And α₂=2 × 10^-4.Gradient damage It loses ρ and is set as 10.Coefficient of balance λ for optimizing minimum operator is 0.1 and 0.8, is respectively used to motor speed migration and passes The migration of sensor position.

For traditional transfer learning method TCA, JDA and CORAL, regularization term λ is selected from { 0.001 0.01 0.1 1.0 10 100}.Classified using SVM in TCA and CORAL.

Table 2

Layer	Filter	Core size	Step-length
				Conv1/2	8/16	1x20	2
Pool1/2	-	1x2	2

Based on implementation above details, following result is obtained:

WD-DTL and the migration task result of other two methods are as shown in table 3.It is unmarked for having in aiming field The migration task (i.e. US-Speed and US-Location) of data, it can be observed that depth migration learning model is substantially better than volume Product neural network, and bat increases about in electromotor velocity and sensor position migration task respectively 13.6% and 25%.In addition, WD-DTL's moves other than accuracy is less than DAN 1% in migration task US (D) US (A) Move most of results (averagely increase by 5%) of the accuracy better than DAN.

Table 3

TCA

JDA

CORAL

CNN

DAN

WD-DTL

Us(A)→Us(B)

26.55

65.07(±7.55)

59.18

82.75(±6.77)

92.97(±3.88)

97.52(±3.09)

Us(A)→Us(C)

46.80

51.31(±1.56)

62.14

78.65(±4.54)

85.32(±5.26)

94.43(±2.99)

Us(A)→Us(D)

26.57

57.70(±8.59)

49.83

82.99(±5.89)

89.39(±4.37)

95.05(±2.12)

Us(B)→Us(A)

26.63

71.19(±1.21)

53.57

84.14(±6.63)

94.43(±2.95)

96.80(±1.10)

Us(B)→Us(C)

26.60

69.80(±5.67)

57.28

85.41(±9.44)

90.43(±4.62)

99.69(±0.59)

Us(B)→Us(D)

26.57

88.50(±1.96)

60.53

86.09(±4.63)

87.37(±5.42)

95.51(±2.52)

Us(C)→Us(A)

26.63

56.42(±2.52)

54.03

76.50(±3.76)

89.88(±1.57)

92.16(±2.61)

Us(C)→Us(B)

26.66

69.18(±1.90)

76.66

82.75(±5.51)

92.93(±1.57)

96.03(±6.27)

Us(C)→Us(D)

46.75

77.45(±0.83)

70.34

87.04(±6.81)

90.66(±5.24)

97.56(±3.31)

Us(D)→Us(A)

46.74

61.72(±5.48)

59.78

79.23(±6.96)

90.88(±1.82)

89.82(±2.41)

Us(D)→Us(B)

46.79

74.03(±0.86)

59.73

79.73(±5.49)

87.91(±2.42)

95.16(±3.67)

Us(D)→Us(C)

26.60

65.24(±4.18)

63.02

80.64(±4.23)

92.94(±3.96)

99.62(±0.80)

It is average

33.32

67.35(±3.53)

56.01

82.10(±5.89)

90.42(±3.59)

95.75(±2.62)

Us(E)→Us(F)

19.05

57.35(±0.47)

47.97

39.07(±2.22)

56.89(±2.73)

64.17(±7.16)

Us(F)→Us(E)

20.45

66.34(±4.47)

39.87

39.95(±3.84)

55.97(±3.17)

64.24(±3.87)

It is average

19.75

61.85(±2.47)

43.92

39.51(±3.03)

56.43(±2.95)

64.20(±5.52)

s(E)→s(F)

20.43

65.48(±0.57)

51.77

54.04(±7.67)

59.68(±4.61)

65.69(±3.74)

s(F)→s(E)

19.02

59.07(±0.56)

47.88

50.47(±5.74)

58.78(±5.67)

64.15(±5.52)

It is average

19.73

62.28(±0.57)

49.83

52.26(±6.71)

59.23(±5.14)

64.92(±4.63)

Summarize, it can be deduced that observe result below: 1) WD-DTL reaches best migration accuracy, and average is 95.75%；2) in the case where no domain adapts to, convolutional neural networks method is due to its outstanding feature detectability, Have and migrates the ability that task realizes good classification performance for motor speed；3) WD-DTL method proposed by the present invention is shown The good capacity of Monitor Problems is solved with a small amount of flag data.Supervision migration task S (E) → S (F) and S (F) → S (E) is used only 0.5% sample size of unsupervised case carries out, but realize it is equally good using the unsupervised case of 100% unlabelled sample Performance.

Based on the above results, reality, which is tested, to be analyzed as follows:

Feature visualization: Nonlinear Dimension Reduction is carried out to network visualization using T distribution random neighborhood insertion (t-SNE).It is right Migration task between motor speed, i.e. US-Speed randomly choose task US (C) → US (A) in different motor speeds The learnt character representation of lower visualization.Fig. 3 shows comparison result.It is observed that by WD-DTL shape proposed by the present invention At Fig. 3 (c) in cluster ratio Fig. 3 (a) in CNN web results and Fig. 3 (b) in the adaptive result in the domain DAN preferably divide From.Importantly, it can be observed that domain is adaptive in Fig. 3 (c) significantly improves, because source domain and target domain characterization are almost It is in the same cluster.

For the migration task between different sensors position, i.e. US-Location and S-Location, transformation task US (E) t-SNE of → US (F), as a result as shown in Figure 4.It can be seen that although fault type 1,2 and 3 is difficult to be distinctly divided into individually Cluster, but WD-DTL shows cluster result more better than CNN and DAN.It must be stressed that the above results are by mesh Mark is performed in domain using 100% (4710) sample size, and even in this case, performance also not enough meets.This is just mentioned The problem of having gone out how to enhance when source domain is related but not similar enough to the signal in aiming field transfer learning performance.

Sample size is to unsupervised and supervision accuracy influence: Fig. 5 shows task US (E) → US (F) and S (E) → S (F) the accuracy change curve relative to the WD-DTL of US-Location and S-Location, wherein sample number increases to from 10 2500.When sample number is greater than 2500, diagnostic accuracy will be saturated near fixed value, therefore the only knot of display 10 to 2500 Fruit.In Fig. 5 (a), it is observed that the accuracy of WD-DTL increases from 59.47%, final test accuracy is 64% Left and right.When sample size increases, the fault diagnosis accuracy of WD-DTL method is above DAN and CNN.Should analysis shows, it is right In this unsupervised situation, the accuracy of transfer learning is can be improved in the increase of sample size, however, even if in aiming field Sample size is 100%, and improvement is also limited (less than 5%).In order to solve this problem, in Fig. 5 (b), using a small amount of Flag data improves the accuracy of fault diagnosis, this is corresponding with the limited situation of label data in practical application in industry.The figure It has been shown that, when the sample size of label is greater than 20 (total number of samples 4710), the transfer learning accuracy of WD-DTL will be more than Fig. 5 (a) In there is the case where 100% sample size (blue region in Fig. 5 (a)).More specifically, 100 marker samples are used only The accuracy of 80% transfer learning may be implemented in (being equivalent to each failure modes 25), this shows WD- proposed by the present invention DTL is also the fabulous frame for supervising migration task.

Example IV

A kind of storage medium is stored with instruction in the storage medium, when computer reads described instruction, makes the meter Calculation machine executes any convolutional neural networks confrontation transfer learning method based on Wasserstein distance of above-described embodiment one And/or any industrial process method for diagnosing faults described in embodiment three as above.

Convolutional neural networks framework is first constructed, feature is extracted and introduces Wasserstein distance to learn the constant spy in domain Sign indicates.By antagonistic training process, domain difference is significantly reduced.Related art scheme is same as above, and details are not described herein.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims

1. a kind of convolutional neural networks based on Wasserstein distance fight transfer learning method characterized by comprising

Step 1, from source domain flag data collection and aiming field data set, determine source domain marker samples collection and aiming field sample set；

Step 2, using convolutional neural networks to be migrated, obtain the source domain marker samples collection source domain feature set and source domain therefore The target signature collection of barrier judgement collection and the aiming field sample set；

Step 3, with maximize the Wasserstein between the source domain feature set and the target signature collection distance and minimize The adduction for judging penalty values that the Wasserstein distance and the source domain breakdown judge integrate optimizes the convolution as target The parameter of neural network is based on convergence criterion, repeats step 1, alternatively, completing the confrontation transfer learning of convolutional neural networks.

2. a kind of convolutional neural networks based on Wasserstein distance according to claim 1 fight transfer learning side Method, which is characterized in that the Wasserstein distance are as follows:

By using domain similarity evaluation neural network, the source domain feature set is mapped as source domain set of real numbers, the mesh respectively Mark feature set is mapped as target set of real numbers, and the real number average value of the source domain set of real numbers is subtracted to the reality of the target set of real numbers Number mean value calculation obtains.

3. a kind of convolutional neural networks based on Wasserstein distance according to claim 2 fight transfer learning side Method, which is characterized in that the step 3 includes:

Step 3.1, using increase the source domain feature set and the target signature integrate between Wasserstein distance as target, The parameter for optimizing the domain similarity evaluation neural network obtains the corresponding new domain similarity evaluation nerve of default the number of iterations Network；

Step 3.2, based on the source domain marker samples collection, the aiming field sample set and the new domain similarity evaluation nerve Network optimizes using the adduction for judging penalty values for reducing the Wasserstein distance and the convolutional neural networks as target The parameter of the convolutional neural networks obtains the default corresponding new convolutional neural networks of the number of iterations；

Step 3.3, the convergence criterion based on domain similarity evaluation neural network and convolutional neural networks, deconditioning complete institute The confrontation transfer learning of convolutional neural networks is stated, alternatively, repeating step 1.

4. according to a kind of described in any item convolutional neural networks based on Wasserstein distance of claim 3 to anti-migration Learning method, which is characterized in that in the step 3.1, the Wasserstein distance is the source domain punished based on gradient The difference of the real number average value of the real number average value of set of real numbers and the target set of real numbers；

In the step 3.2, the Wasserstein distance is the real number average value and target reality of the source domain set of real numbers The difference of the real number average value of manifold.

5. a kind of convolutional neural networks based on Wasserstein distance according to claim 2 fight transfer learning side Method, which is characterized in that the Wasserstein distance indicates:

Wherein, the sub- set of real numbers of the source domain is to constrain by Lipchitz from the source domain set of real numbers to obtain, and target is real Manifold is to constrain by the Lipchitz from the target set of real numbers to obtain.

6. a kind of convolutional neural networks based on Wasserstein distance according to claim 5 fight transfer learning side Method, which is characterized in that the Lipchitz constraint is specially the constraint of single order Lipchitz.

7. a kind of convolutional neural networks confrontation based on Wasserstein distance according to any one of claims 1 to 6 is moved Move learning method, which is characterized in that when in the aiming field sample set including part marker samples, then the step 2 are as follows:

Using the convolutional neural networks of study to be migrated, the source domain feature set and source domain failure of the source domain marker samples collection are obtained The target signature collection and target faults of judgement collection and the aiming field sample set judge collection；

The step 3 are as follows:

To maximize described in the distance of the Wasserstein between the source domain feature set and the target signature collection and minimum The judgement loss of Wasserstein distance, the judgement penalty values of the source domain breakdown judge collection and target faults judgement collection The adduction of value is target, optimizes the parameter of the convolutional neural networks, is based on convergence criterion, repeats step 1, alternatively, completing volume The confrontation transfer learning of product neural network.

8. a kind of industrial process fault diagnosis convolutional neural networks, which is characterized in that using such as any one of claim 1 to 7 institute A kind of convolutional neural networks confrontation transfer learning method training based on Wasserstein distance stated obtains.

9. a kind of industrial process method for diagnosing faults, which is characterized in that based on a kind of industrial process event as claimed in claim 8 Barrier diagnosis convolutional neural networks, it is as described in any one of claim 1 to 7 a kind of based on Wasserstein distance when receiving Convolutional neural networks confrontation transfer learning method described in aiming field new samples when, obtain the corresponding industrial mistake of the new samples Journey breakdown judge result.

10. a kind of storage medium, which is characterized in that instruction is stored in the storage medium, when computer reads described instruction When, so that the computer is executed a kind of above-mentioned volume based on Wasserstein distance as described in any one of claim 1 to 7 Product neural network confrontation transfer learning method and/or a kind of industrial process method for diagnosing faults as claimed in claim 9.