CN108664999A

CN108664999A - A kind of training method and its device, computer server of disaggregated model

Info

Publication number: CN108664999A
Application number: CN201810412797.4A
Authority: CN
Inventors: 王乃岩; 樊峻崧
Original assignee: Beijing Tusimple Future Technology Co Ltd
Current assignee: Beijing Tusimple Technology Co Ltd
Priority date: 2018-05-03
Filing date: 2018-05-03
Publication date: 2018-10-16
Anticipated expiration: 2038-05-03
Also published as: CN108664999B

Abstract

The present invention discloses a kind of training method and its device, computer server of disaggregated model, the technical issues of to solve the prior art low by semi-supervised learning technique drill disaggregated model computational efficiency, narrow application range.Method includes：Build preliminary classification model, preliminary classification model includes the identical at least one single mode disaggregated model of classification task, and the corresponding modal data training set of each single mode disaggregated model includes label training data and without label training data；The method for having label training data and the feature coding distribution without label training data in modal data training set based on each single mode disaggregated model of alignment, is trained to obtain object-class model preliminary classification model.This programme can not only improve the efficiency of disaggregated model training and the scope of application is wider.

Description

A kind of training method and its device, computer server of disaggregated model

Technical field

The present invention relates to deep learning field, more particularly to a kind of training method of disaggregated model, a kind of disaggregated model Training device and a kind of computer server.

Background technology

Currently, training neural network usually requires largely to mark sample data, it is necessary first to acquire a large amount of sample number According to, then collected sample data is labeled by manually to obtain the mark sample data for training neural network, Acquisition and mark need higher human cost and time cost.

To solve the technical problem, it includes label training data and the training data without label training data to use at present Set pair neural network is trained, be not necessarily to a large amount of label training data, so as to alleviate to largely have label data according to Rely, it is higher with the mark sample data cost and time cost that solve the problems, such as the prior art.

Semi-supervised learning technology in current existing deep learning, mainly introduces during input and feature construction Random noise or various stochastic transformations, while the output for constraining neural network should have robustness, invariance, be utilized with reaching No label training data carries out the purpose of supplemental training, for example, Takeru Miyato et al. uses are to resisting sample, Mehdi Sajjadi et al. introduces disturbance using stochastic transformation, Samuli Laine et al. using random noise.

However, there are following technological deficiencies for existing semi-supervised learning technology：To be supervised from no label training data Information is superintended and directed, needs to carry out multiple forward calculation to same group of training sample, it is less efficient；Meanwhile single mode number can only be directed to According to progress learning training, narrow application range.

Invention content

In view of the above problems, the present invention provides a kind of training method and its device, computer server of disaggregated model, with The technical issues of solving the prior art low by semi-supervised learning technique drill disaggregated model computational efficiency, narrow application range.

The embodiment of the present invention, in a first aspect, providing a kind of training method of disaggregated model, this method includes：

Preliminary classification model is built, the preliminary classification model includes the identical at least one single mode classification of classification task Model, wherein the corresponding modal data training set of each single mode disaggregated model includes label training data and without label training Data；

There is label training data and without label based on being aligned in the modal data training set of each single mode disaggregated model The method of the feature coding distribution of training data, using first described in the corresponding mode training data set pair of each single mode disaggregated model Beginning disaggregated model is trained to obtain object-class model.

The embodiment of the present invention, second aspect, a kind of training device of disaggregated model, including：

Model construction unit, for building preliminary classification model, the preliminary classification model includes that classification task is identical At least one single mode disaggregated model, wherein the corresponding modal data training set of each single mode disaggregated model includes label instruction Practice data and without label training data；

Training unit, for having label training in the modal data training set based on each single mode disaggregated model of alignment The method of data and feature coding distribution without label training data, number is trained using the corresponding mode of each single mode disaggregated model It is trained according to preliminary classification model described in set pair to obtain object-class model.

The embodiment of the present invention, the third aspect provide a kind of computer server, including memory, and with the storage The one or more processors of device communication connection；

The instruction that can be executed by one or more of processors is stored in the memory, described instruction is by described one A or multiple processors execute, so that one or more of processors realize the training method of aforesaid class model.

Technical solution of the present invention has label instruction based on being aligned in the modal data training set of each single mode disaggregated model The method for practicing data and the feature coding distribution without label training data, is trained using the corresponding mode of each single mode disaggregated model Data set is trained to obtain object-class model the preliminary classification model.That is, using technical solution of the present invention, in conjunction with There is label training data and confrontation constrained learning is carried out to feature coding device without label training data so that encoder can learn To having label training data and largely there is without label training data the feature representation of good uniformity, unlike the prior art needs Multiple forward calculation is carried out to same group of training sample, so as to improve the training effectiveness of disaggregated model, in addition it can needle Study is trained to multi-modal data, use scope is wider.

Description of the drawings

Attached drawing is used to provide further understanding of the present invention, and a part for constitution instruction, the reality with the present invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.

Fig. 1 is the flow chart of the training method of disaggregated model in the embodiment of the present invention；

Fig. 2 is one of the structural schematic diagram of preliminary classification model in the embodiment of the present invention；

Fig. 3 is the second structural representation of preliminary classification model in the embodiment of the present invention；

Fig. 4 is the flow chart being trained based on Fig. 2/preliminary classification shown in Fig. 3 model in the embodiment of the present invention；

Fig. 5 is the object-class model obtained based on preliminary classification model training shown in Fig. 3 in the embodiment of the present invention；

Fig. 6 is in the embodiment of the present invention, and the same object corresponds to the schematic diagram of multiple modalities data representation respectively；

The third structural representation of preliminary classification model in Fig. 7 embodiment of the present invention；

Fig. 8 is the flow chart being trained based on preliminary classification model shown in Fig. 7 in the embodiment of the present invention；

Fig. 9 is four of the structural schematic diagram of preliminary classification model in the embodiment of the present invention；

Figure 10 is the object-class model obtained based on preliminary classification model training shown in Fig. 9 in the embodiment of the present invention；

Figure 11 is five of the structural schematic diagram of preliminary classification model in the embodiment of the present invention；

Figure 12 is six of the structural schematic diagram of preliminary classification model in the embodiment of the present invention；

Figure 13 is seven of the structural schematic diagram of preliminary classification model in the embodiment of the present invention；

Figure 14 is eight of the structural schematic diagram of preliminary classification model in the embodiment of the present invention；

Figure 15 is the structural schematic diagram of the training device of disaggregated model in the embodiment of the present invention；

Figure 16 is the structural schematic diagram of Computer server of the embodiment of the present invention.

Specific implementation mode

In order to make those skilled in the art more fully understand the technical solution in the present invention, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common The every other embodiment that technical staff is obtained without making creative work, should all belong to protection of the present invention Range.

Embodiment one

It is the flow chart of the training method of disaggregated model in the embodiment of the present invention referring to Fig. 1, this method includes：

Step 101, structure preliminary classification model, the preliminary classification model include the identical at least one list of classification task Mode disaggregated model, wherein the corresponding modal data training set of each single mode disaggregated model includes label training data and nothing Label training data；

There is label training data in step 102, the modal data training set based on each single mode disaggregated model of alignment The method being distributed with the feature coding of no label training data, using the corresponding mode training dataset of each single mode disaggregated model The preliminary classification model is trained to obtain object-class model.

In the embodiment of the present invention, the modal data of each single mode disaggregated model in preliminary classification model to respective type Classify, the corresponding modal data type of different single mode disaggregated models is different, but multiple single mode disaggregated models pair The classification task answered is identical.For example, multi-modal disaggregated model includes three single mode disaggregated models, model A, Model B are used respectively It is indicated with MODEL C, wherein model A is for classifying to image data, and Model B classifies to lteral data, and MODEL C is to regarding Frequency is according to classifying, but model A, Model B are identical with the classification task of MODEL C, such as classification task includes pedestrian, vehicle , traffic lights etc., i.e., each model identifies pedestrian, vehicle and traffic lights etc. from the modal data of corresponding types.

Based on method flow shown in FIG. 1, in the embodiment of the present invention, the structure of preliminary classification model can there are many, under Face is trained to obtain object-class model to the preliminary classification model of different structure respectively with multiple examples and carries out detailed retouch It states, those skilled in the art can extend other alternative schemes with example based on the embodiment of the present invention, but alternative As long as method of the scheme based on the feature coding distribution for being aligned multiple single mode disaggregated models, all falls within claimed by the application In the range of.

Example 1

In example 1, the structure of the preliminary classification model can only include a single mode disaggregated model as shown in Figure 2, Can also include more than two single mode disaggregated models, either Fig. 2 or structure shown in Fig. 3, Mei Gedan as shown in Figure 3 Mode disaggregated model is described to sentence comprising feature coding device and respectively with the cascade grader of the feature coding device and arbiter Other device is used to judge the feature coding of feature coding device output from having label training data or without label training data, The output end of the arbiter is provided with the first-loss function for being trained to the arbiter and is used for the feature The second loss function that encoder is trained, the first-loss function and the confrontation setting of the second loss function.

In the example 1, the step 102 is using described in the corresponding mode training data set pair of each single mode disaggregated model Preliminary classification model is trained to obtain object-class model, specifically can by but be not limited only to following manner realize, the party Formula includes step 102a~step 102b, as shown in Figure 4：

Step 102a, using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model into Row iteration is trained；

Step 102b, the arbiter in the obtained disaggregated model of training in each single mode disaggregated model is deleted, obtain as Object-class model shown in fig. 5.

In example 1, step 102a specifically can by but be not limited only to following manner realize：

Repeatedly following repetitive exercise is carried out to the preliminary classification model, an iteration training specifically may include following steps A1~step A2, wherein：

Step A1, it is directed to each single mode disaggregated model, from the modal data training set of the single mode disaggregated model It obtains a training data to be input in the feature coding device of the single mode disaggregated model, and according to single mode classification mould The value of the loss function of the grader of type to the parameter of feature coding device and grader in the single mode disaggregated model into Row adjustment；And the value pair of the value and the second loss function of the first-loss function based on the single mode disaggregated model The arbiter of the single mode disaggregated model and the parameter of feature coding device are adjusted；

Step A2, based on the preliminary classification model after parameter adjustment, next iteration training is carried out.

Preferably, in abovementioned steps A1, the value and second of the first-loss function based on the single mode disaggregated model The value of loss function is adjusted the arbiter of the single mode disaggregated model and the parameter of feature coding device, can specifically pass through But it is not limited only to any one following mode (mode B1~mode B2) to realize：

Mode B1, the feature coding that the feature coding device is exported according to the arbiter of the single mode disaggregated model into The value of first-loss function after row differentiation, adjusts the parameter of the arbiter；Based on the arbiter after parameter adjustment The value that the second loss function after major punishments not other is carried out to the feature coding of feature coding device output adjusts the feature and compiles The parameter of code device.

Mode B2, the feature coding that the feature coding device is exported according to the arbiter of the single mode disaggregated model into The value of first-loss function after row differentiation adjusts the parameter of the arbiter, should according to the adjustment of the value of the second loss function The parameter of the feature coding device of single mode disaggregated model.

In practical applications, the same object may be used different modal data expression, for example, image, video, voice, Word etc..As shown in fig. 6, same room with them can be expressed with image, Freehandhand-drawing picture and these three modal datas of word respectively.By The different characteristic of the same object can be characterized in different modalities data, can be complemented each other between each other, the embodiment of the present invention In, when disaggregated model includes more than two single mode disaggregated models, to improve the performance of each single mode disaggregated model, instructing During white silk, for what can be distributed based on the feature coding for being aligned multiple single mode disaggregated models between single mode disaggregated model Method carries out cross-module state training, i.e., carries out confrontation constrained learning to the distribution of the feature coding of multiple single mode disaggregated models.It is different Single mode disaggregated model correspond to the modal data training sets of different modalities, pass through the difference for being aligned multiple single mode disaggregated models The feature coding of modal data is distributed, and can implicitly utilize the training of different modalities between multiple single mode disaggregated models jointly Data, share different modalities training data characteristic information so that multiple single mode disaggregated models can coorinated training, The performance of each single mode disaggregated model is mutually improved using multi-modal data.This training method can not only improve each list The accuracy that mode disaggregated model is classified, and do not need each training sample while possessing multiple modalities data representation (i.e. technical solution of the present invention does not need the multiple modalities alignment of data of training sample), training sample acquisition are relatively easy to, apply Range is more extensive.For the cross-module state training program, can be by the structure setting of preliminary classification model as Fig. 7, such as Fig. 9, As Figure 11, as shown in figure 12, respectively use example 2, example 3, example 4 and example 5 respectively to shown in Fig. 7, Fig. 9, Figure 11 and Figure 12 Preliminary classification model be described in detail.

Example 2

Be arranged preliminary classification model structure can as shown in fig. 7, each single mode disaggregated model include feature coding device with And respectively with the cascade grader of the feature coding device and arbiter, the arbiter is for judging that the feature coding device is defeated The feature coding gone out, which derives from, label training data or without label training data, the output end of the arbiter, which is provided with, to be used for The first-loss function be trained to the arbiter and the second loss function being trained for the feature coding device, The first-loss function and the confrontation setting of the second loss function；And the feature coding device of multiple single mode disaggregated models is also distinguished It is connected on the same cross-module state arbiter, the cross-module state arbiter is used to differentiate the feature coding of each single mode disaggregated model The corresponding modality type of feature coding of device output, is provided in the output end of cross-module state arbiter for cross-module state arbiter The third loss function being trained and the 4th damage for being trained to the feature coding device in each single mode disaggregated model Lose function, the third loss function and the confrontation setting of the 4th loss function.

In aforementioned flow shown in FIG. 1, the corresponding mode training dataset of each single mode disaggregated model is used in step 102 The preliminary classification model is trained to obtain object-class model, specifically can by but be not limited only to following manner reality Existing, which includes step 102c~step 102d, as shown in Figure 8：

Step 102c, using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model into Row iteration is trained；

Step 102d, the arbiter in the disaggregated model for obtaining training in each single mode disaggregated model, cross-module state differentiate Device is deleted.

Preferably, the step 102c specifically can by but be not limited only to following manner realize：

Repeatedly following repetitive exercise is carried out to the preliminary classification model, an iteration training includes step C1~step C3, wherein：

Step C1, it is directed to each single mode disaggregated model, from the corresponding modal data training set of single mode disaggregated model It obtains a training data to be input in the feature coding device of the single mode disaggregated model, and according to the single mode disaggregated model The value of the loss function of grader is adjusted the parameter of feature coding device and grader in the single mode disaggregated model； And the value of the first-loss function based on the single mode disaggregated model and the value of the second loss function are to the single mode The arbiter of disaggregated model and the parameter of feature coding device are adjusted；

Step C2, the value of value and the 4th loss function based on third loss function is to cross-module state arbiter and each list The parameter of the feature coding device of mode disaggregated model is adjusted；

Step C3, based on the preliminary classification model after parameter adjustment, next iteration training is carried out.

Preferably, the value of the first-loss function based on the single mode disaggregated model and the second damage in the step C1 The value for losing function is adjusted the arbiter of the single mode disaggregated model and the parameter of feature coding device, and for details, reference can be made to show Mode B1 in example 1 or mode B2, details are not described herein.

Preferably, in the step C2, the value of value and the 4th loss function based on third loss function is to cross-module The parameter of the feature coding device of state arbiter and each single mode disaggregated model is adjusted, specifically can by but be not limited only to following Any one mode (mode D1~mode D2) is realized：

Mode D1, according to cross-module state arbiter to the feature coding of the feature coding device of each single mode disaggregated model output into The value of third loss function after row differentiation, adjusts the parameter of the cross-module state arbiter；Based on the cross-module after parameter adjustment State arbiter carries out the feature coding of the feature coding device output of each single mode disaggregated model in the 4th loss letter after major punishments not other Several values adjusts the parameter of the feature coding device of each single mode disaggregated model.

Mode D2, according to cross-module state arbiter to the feature coding of the feature coding device of each single mode disaggregated model output into The value of third loss function after row differentiation adjusts the parameter of the cross-module state arbiter, and according to the 4th loss function Value adjusts the parameter of the feature coding device of each single mode disaggregated model.

Example 3

Be arranged preliminary classification model structure can as shown in figure 9, each single mode disaggregated model include feature coding device with And respectively with the cascade grader of this feature encoder and the 5th loss function, the value of the 5th loss function indicates the single mode The corresponding modal data training of disaggregated model is concentrated with the feature coding distribution of label training data and no label training data Consistency；The feature coding device of multiple single mode disaggregated models is all connected on the same cross-module state arbiter, the cross-module state Arbiter is used to differentiate the corresponding modality type of feature coding of the feature coding device output of each single mode disaggregated model, in cross-module The output end of state arbiter is provided with the third loss function for being trained to cross-module state arbiter and is used for each single mode The 4th loss function that feature coding device in state disaggregated model is trained, the third loss function and the 4th loss function Confrontation setting.

In the example 3, the corresponding mode training data set pair institute of each single mode disaggregated model is used in the step 102 Preliminary classification model is stated to be trained to obtain object-class model, specifically can by but be not limited only to following manner realize, institute The mode of stating includes step 102e~step 102f：

Step 102e, using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model into Row iteration is trained；

Step 102f, the cross-module state arbiter in the disaggregated model for obtaining training is deleted, and obtains mesh as shown in Figure 10 Mark disaggregated model；Alternatively, by the obtained disaggregated model of training cross-module state arbiter and each single mode disaggregated model in 5th loss function is deleted, and object-class model as shown in Figure 5 is obtained.

Preferably, the step 102e specifically can by but be not limited only to following manner realize：

Repeatedly following repetitive exercise is carried out to the preliminary classification model, an iteration training includes step E1~step E3：

Step E1, it is directed to each single mode disaggregated model, from the corresponding modal data training set of single mode disaggregated model It obtains a training data to be input in the feature coding device of the single mode disaggregated model, and based on according to single mode classification mould The value of the loss function of the grader of type carries out the parameter of feature coding device and grader in the single mode disaggregated model Adjustment；And the value of the 5th loss function according to the single mode disaggregated model, the feature of the single mode disaggregated model is compiled The parameter of code device is adjusted；

Step E2, the value of value and the 4th loss function based on third loss function is to cross-module state arbiter and each list The parameter of the feature coding device of mode disaggregated model is adjusted；

Step E3, based on the preliminary classification model after parameter adjustment, next iteration training is carried out.

The step E2 specific implementations can be found in mode D1 or mode D2 in example 2, and details are not described herein.

Example 4

In example 4, the structure of the preliminary classification model can be as shown in figure 11, and each single mode disaggregated model includes spy Levy encoder and respectively with the cascade grader of this feature encoder and the 5th loss function, the value table of the 5th loss function Show that the corresponding modal data training of the single mode disaggregated model is concentrated with the feature of label training data and no label training data Encode the consistency of distribution；The feature coding device of multiple single mode disaggregated models is all connected on same 6th loss function, The value of 6th loss function indicates the consistency of the feature coding distribution of the feature coding device output of each single mode disaggregated model.

In the example 4, the corresponding mode training data set pair institute of each single mode disaggregated model is used in the step 102 Preliminary classification model is stated to be trained to obtain object-class model, specifically can by but be not limited only to following manner realize, should Mode includes step 102g~step 102h, wherein

Step 102g, using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model into Row iteration is trained；

Step 102h, the 6th loss function in the multi-modal disaggregated model for obtaining training and each single mode classification mould The 5th loss function in type is deleted, and object-class model is obtained.

Preferably, the step 102g specifically can by but be not limited only to following manner realize：To the preliminary classification mould Type carries out repeatedly following repetitive exercise, and an iteration training includes step F1~step F3, wherein：

Step F1, it is directed to each single mode disaggregated model, from the corresponding modal data training set of single mode disaggregated model It obtains a training data to be input in the feature coding device of the single mode disaggregated model, and according to the single mode disaggregated model The value of the loss function of grader is adjusted the parameter of feature coding device and grader in the single mode disaggregated model； And the value of the 5th loss function according to the single mode disaggregated model, to the feature coding device of the single mode disaggregated model Parameter be adjusted；

Step F2, the parameter of the feature coding device of each single mode disaggregated model is carried out according to the value of the 6th loss function Adjustment；

Step F3, based on the preliminary classification model after parameter adjustment, next iteration training is carried out.

Example 5

In example 5, the structure of the preliminary classification model can be as shown in figure 12, and each single mode disaggregated model includes spy Levy encoder and respectively with the cascade grader of this feature encoder and arbiter, the arbiter for differentiate cascaded with it The output of feature coding device feature coding from having label training data or without label training data, and the output of arbiter End is provided with first-loss function for being trained to the arbiter and for being trained to the feature coding device The second loss function, the first-loss function and the second loss function are that confrontation is arranged；Multiple single mode disaggregated models Feature coding device is all connected on same 6th loss function, and the value of the 6th loss function indicates each single mode disaggregated model Feature coding device output feature coding distribution consistency.

In the example 5, the corresponding mode training data set pair institute of each single mode disaggregated model is used in the step 102 Preliminary classification model is stated to be trained to obtain object-class model, specifically can by but be not limited only to following manner realize, should Mode includes step 102i~step 102j：

Step 102i, using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model into Row iteration is trained；

Step 102j, the arbiter in each single mode disaggregated model in the disaggregated model for obtaining training is deleted, and is obtained Object-class model；Alternatively, by the obtained disaggregated model of training the 6th loss function and each single mode disaggregated model in Arbiter delete, to obtain object-class model.

Preferably, the step 102i specifically can by but be not limited only to following manner realize：

Repeatedly following repetitive exercise is carried out to the preliminary classification model, an iteration training includes step G1~step G3：

Step G1, it is directed to each single mode disaggregated model, from the corresponding modal data training set of single mode disaggregated model It obtains a training data to be input in the feature coding device of the single mode disaggregated model, and according to the single mode disaggregated model The value of the loss function of grader adjusts the parameter of feature coding device and grader in the single mode disaggregated model It is whole；And the value of the first-loss function based on the single mode disaggregated model and the value of the second loss function are to the list The arbiter of mode disaggregated model and the parameter of feature coding device are adjusted；

Step G2, according to the value of the 6th loss function, to the parameter of the feature coding device of each single mode disaggregated model into Row adjustment；

Step G3, based on the preliminary classification model after parameter adjustment, next iteration training is carried out.

In the embodiment of the present invention, the value of the first-loss function based on the single mode disaggregated model in the step G1 The arbiter of the single mode disaggregated model and the parameter of feature coding device are adjusted with the value of the second loss function, specifically Realize the mode B1 or mode B2 that can be found in example 1, details are not described herein.

Example 6

In example 6, the structure of the preliminary classification model can be as shown in Figure 13 or Figure 14, regardless of Figure 13 or Figure 14 institutes The preliminary classification model shown, each single mode disaggregated model include cascade feature coding device and the 5th loss function, the 5th damage The value for losing function indicates that the corresponding modal data training of single mode disaggregated model is concentrated with label training data and is instructed without label Practice the consistency of the feature coding distribution of data.

In the example 6, the corresponding mode training data set pair institute of each single mode disaggregated model is used in the step 102 Preliminary classification model is stated to be trained to obtain object-class model, specifically can by but be not limited only to following manner realize, should Mode includes step 102k~step 102l：

Step 102k, using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model into Row iteration is trained；

Step 102l, the 5th loss function in the disaggregated model for obtaining training in each single mode disaggregated model is deleted, Obtain object-class model.

In example 6, step 102k specifically can by but be not limited only to following manner realize：

Repeatedly following repetitive exercise is carried out to the preliminary classification model, an iteration training specifically may include following steps H1~step H2, wherein：

Step H1, it is directed to each single mode disaggregated model, from the corresponding modal data training set of single mode disaggregated model It obtains a training data to be input in the feature coding device of the single mode disaggregated model, and based on according to single mode classification mould The value of the loss function of the grader of type carries out the parameter of feature coding device and grader in the single mode disaggregated model Adjustment；And the feature of the single mode disaggregated model is compiled according to the value of the 5th loss function of the single mode disaggregated model The parameter of code device is adjusted；

Step H2, based on the preliminary classification model after parameter adjustment, next iteration training is carried out.

In the embodiment of the present invention, the structure for the preliminary classification model that aforementioned exemplary 1 and example 6 are built is relatively simple, training Speed, but the complementarity between multiple modalities data cannot be made full use of to be assisted between each single mode disaggregated model With study.Although the structure for the preliminary classification model that 2~example of example 5 is built is relative complex, it is different single mode classification Model corresponds to the modal data training set of different modalities, the spy of the different modalities data by being aligned multiple single mode disaggregated models Assemble-publish code is distributed, and can implicitly be utilized the training data of different modalities between multiple single mode disaggregated models jointly, be shared not With the characteristic information of the training data of mode so that multiple single mode disaggregated models can coorinated training, utilization is multi-modal Data mutually improve the performance of each single mode disaggregated model.Aforementioned different example, can be combined with label training data Confrontation constrained learning is carried out to feature coding device with no label training data so that encoder can learn to there is label training number According to largely without label training data have good uniformity feature representation, unlike the prior art is needed to same group of training sample The multiple forward calculation of this progress, so as to improve the training effectiveness of disaggregated model, in addition it can be directed to multi-modal data into Row training study, use scope are wider；On this basis not only respectively but also have advantage and disadvantage, those skilled in the art can be according to reality Demand chooses any one preliminary classification model in aforementioned exemplary.

In the embodiment of the present invention one, in aforementioned each example, the loss function of the grader in each single mode disaggregated model It could be provided as identical.Feature coding device and grader in mode that each single mode is classified are denoted as f respectively_eAnd f_c, feature coding The learning parameter of device and grader is respectively θ_eAnd θ_c.In the embodiment of the present invention, to the encoder in each single mode disaggregated model With grader can be used cross entropy loss function on having label training data according to true value label to parameter θ_eAnd θ_cIt carries out excellent Change, uses L_c(X；θ_e,θ_c) indicate single mode disaggregated model in grader loss function, then can set the loss function to As shown in formula (1)：

In formula (1), N_lIndicate there is label training data in the corresponding modal data training set of single mode disaggregated model Sum, C is the classification number of classification task,Indicate training sample x_iClass label, if x_iBelong to kth class thenIt takes Value is 1, if x_iIt is not belonging to kth class thenValue is 0.

Preferably, in example 1, example 2 and example 5, the feature coding device in mode that each single mode is classified and classification Device is denoted as f respectively_eAnd f_c, the learning parameter of feature coding device, grader and arbiter is respectively θ_e、θ_cAnd φ, use L_d(X；φ) It indicates first-loss function, then can be as shown in formula (2) by the first-loss function setup：

In formula (2), N_lThe sum of label data, N are concentrated with for the corresponding modal data training of single mode disaggregated model_u For the sum without label data in the corresponding modal data training set of single mode disaggregated model, z_iFor a scalar, if x_iTo there is mark Sign data then z_iValue is 1, if x_iFor no label data then z_iValue is 0.

Second loss function can use L_e(X；θ_e) indicate, the second loss function is arranged with the confrontation of first-loss function, therefore It can set second loss function to as shown in formula (3)：

In formula (3), N_lThe sum of label data, N are concentrated with for the corresponding modal data training of single mode disaggregated model_u For the sum without label data in the corresponding modal data training set of single mode disaggregated model, z_iFor a scalar, if x_iTo there is mark Sign data then z_iValue is 1, if x_iFor no label data then z_iValue is 0.

Preferably, in example 2 and example 3, feature coding device and grader in mode that each single mode is classified are distinguished It is denoted as f_eAnd f_c, the learning parameter of feature coding device, grader and cross-module state arbiter is respectively θ_e、θ_cWith φ ', L is used_d′(X； φ ') indicate third loss function, then it can set the third loss function to as shown in formula (4)：

In formula (4), N be all single mode disaggregated models modal data training set in include all training samples Sum, J are the sum of single mode disaggregated model,It indicates in the corresponding modal data training set of j-th of single mode disaggregated model Including the sum for having label training data,It indicates to wrap in the corresponding modal data training set of j-th of single mode disaggregated model The sum without label training data contained,WithThe feature coding device and feature of j-th of single mode disaggregated model are indicated respectively The learning parameter of encoder.

The 4th loss function in example 2 and example 3 is arranged with the confrontation of third loss function, and the 4th loss function can be used L_m(X) it indicates, then can set as shown in formula (5) the 4th loss function to：

In formula (5), d'_(k)Indicate k-th of element of cross-module state arbiter output vector；J is single mode disaggregated model Sum,Indicate that include in the corresponding modal data training set of j-th of single mode disaggregated model has the total of label training data Number,Indicate the sum without label training data for including in the corresponding modal data training set of j-th of single mode disaggregated model,WithIndicate that the learning parameter of the feature coding device and feature coding device of j-th of single mode disaggregated model, φ ' are cross-module respectively The learning parameter of state arbiter.

Preferably, in example 3, example 4 and example 6, the feature coding device in mode that each single mode is classified and classification Device is denoted as f respectively_eAnd f_c, feature coding device, grader learning parameter be respectively θ_e、θ_c, in each single mode disaggregated model Five loss functions can use L_mmd(X；θ_e) indicate, the 5th loss function could be provided as shown in formula (6)：

In formula (6), k () is kernel function, and x indicates that label training data, y indicate no label training data, N_l The sum of label data, N are concentrated with for the corresponding modal data training of single mode disaggregated model_uIt is corresponded to for single mode disaggregated model Modal data training set in the sum without label data.

Preferably, in example 4 and example 5, the expression formula of the 6th loss function can use L_mmd' (X) indicates that the 6th damages Function is lost to could be provided as shown in formula (7)：

In formula (7), the corresponding multiple modal data training sets of multiple single mode disaggregated models are constituted one between any two A group, N is used for each group_aAnd N_bThe training sample that two mode training datasets in this group are included is indicated respectively Quantity, x and y indicate that the training sample for being belonging respectively to two different modalities training datas concentration, k () are kernel function.

Aforementioned formula (1)~formula (7) is only an example, and it is identical that other realizations can also be used in those skilled in the art The formula of function substitutes, and the application does not make considered critical.

Embodiment two

The identical design of training method based on a kind of disaggregated model that previous embodiment one provides, the embodiment of the present invention two A kind of training device of disaggregated model is provided, the structure of the device can be as described in Figure 15, including model construction unit 1 and training list Member 2, wherein：

Model construction unit 1, for building preliminary classification model, the preliminary classification model includes that classification task is identical At least one single mode disaggregated model, wherein the corresponding modal data training set of each single mode disaggregated model includes label instruction Practice data and without label training data；

Training unit 2, for having label instruction in the modal data training set based on each single mode disaggregated model of alignment The method for practicing data and the feature coding distribution without label training data, is trained using the corresponding mode of each single mode disaggregated model Data set is trained to obtain object-class model the preliminary classification model.

Based on the training device of disaggregated model shown in figure 15, the structure of preliminary classification model can be in the embodiment of the present invention There are many, respectively the preliminary classification model of different structure is trained to obtain object-class model respectively with multiple examples below It is described in detail, those skilled in the art can extend other alternative sides with example based on the embodiment of the present invention Case, as long as but method of the alternative scheme based on the feature coding distribution for being aligned multiple single mode disaggregated models, belong to It falls in the application scope of the claimed.

Example 1A

Example 1A is corresponding with the example 1 in embodiment one, and the structure of initial mode disaggregated model can be as shown in Figure 2 or Figure 3, Particular content is referring to example 1 in embodiment one, and details are not described herein.

In example 1A, training unit 2 shown in figure 15 specifically includes：

Training subelement, for using preliminary classification described in the corresponding mode training data set pair of each single mode disaggregated model Model is iterated training, and subelement is deleted in triggering at the end of training；

Subelement is deleted, for by sentencing in each single mode disaggregated model in the disaggregated model for training subelement training to obtain Other device is deleted.

In example 1A, the trained subelement is specifically used for：

Repeatedly following repetitive exercise is carried out to the preliminary classification model：

For each single mode disaggregated model, one is obtained from the modal data training set of the single mode disaggregated model Training data is input in the feature coding device of the single mode disaggregated model, and according to the classification of the single mode disaggregated model The value of the loss function of device is adjusted the parameter of feature coding device and grader in the single mode disaggregated model；With And the value of the first-loss function based on the single mode disaggregated model and the value of the second loss function are to the single mode point The arbiter of class model and the parameter of feature coding device are adjusted；

Based on the preliminary classification model after parameter adjustment, next iteration training is carried out.

Wherein, the value of first-loss function of the training subelement based on the single mode disaggregated model and the second loss letter Several values is adjusted the arbiter of the single mode disaggregated model and the parameter of feature coding device, and specific implementation, which can be found in, to be shown Mode B1 in example 1 or mode B2, details are not described herein.

Example 2A

Example 2A is corresponding with the example 2 in embodiment one, and the structure of initial mode disaggregated model can be as shown in fig. 7, specific Content is referring to example 2 in embodiment one, and details are not described herein.

In example 2A, training unit 2 shown in figure 15 specifically includes：

Subelement is deleted, for by sentencing in each single mode disaggregated model in the disaggregated model for training subelement training to obtain Other device, cross-module state arbiter are deleted.

In example 2A, the trained subelement is specifically used for：

For each single mode disaggregated model, one is obtained from the corresponding modal data training set of single mode disaggregated model Training data is input in the feature coding device of the single mode disaggregated model, and according to the grader of the single mode disaggregated model The value of loss function is adjusted the parameter of feature coding device and grader in the single mode disaggregated model；And base In the value of the first-loss function of the single mode disaggregated model and the value of the second loss function to single mode classification mould The arbiter of type and the parameter of feature coding device are adjusted；

The value of value and the 4th loss function based on third loss function is to cross-module state arbiter and each single mode point The parameter of the feature coding device of class model is adjusted；

In example 2A, the value pair of value and fourth loss function of the trained subelement based on third loss function The parameter of the feature coding device of cross-module state arbiter and each single mode disaggregated model is adjusted, and specific implementation can be found in example 2 In mode D1 or mode D2, details are not described herein.

In example 2A, the value of first-loss function of the trained subelement based on the single mode disaggregated model and The value of second loss function is adjusted the arbiter of the single mode disaggregated model and the parameter of feature coding device, specific real Now reference can be made to the mode B1 in example 2 or mode B2, details are not described herein.

Example 3A

Example 3A is corresponding with the example 3 in embodiment one, and the structure of initial mode disaggregated model can be as shown in figure 9, specific Content is referring to example 3 in embodiment one, and details are not described herein.

In example 3A, training unit shown in figure 15 specifically may include：

Subelement is deleted, for the cross-module state arbiter in the disaggregated model that subelement training obtains will to be trained to delete, is obtained To object-class model；Alternatively, the cross-module state arbiter in the disaggregated model that training subelement training is obtained and each single mode The 5th loss function in state disaggregated model is deleted, and object-class model is obtained.

In example 3A, the trained subelement is specifically used for：

For each single mode disaggregated model, one is obtained from the corresponding modal data training set of single mode disaggregated model Training data is input in the feature coding device of the single mode disaggregated model, and based on the classification according to the single mode disaggregated model The value of the loss function of device is adjusted the parameter of feature coding device and grader in the single mode disaggregated model；With And the value of the 5th loss function according to the single mode disaggregated model, to the feature coding device of the single mode disaggregated model Parameter is adjusted；

In example 3A, the value of value and fourth loss function of the training subelement based on third loss function is to cross-module state The parameter of the feature coding device of arbiter and each single mode disaggregated model is adjusted, and specific implementation can be found in the side in example 2 Formula D1 or mode D2, details are not described herein

Example 4A

Example 4A is corresponding with the example 4 in embodiment one, and the structure of initial mode disaggregated model can be as shown in figure 11, specifically Content is referring to example 4 in embodiment one, and details are not described herein.

In example 4A, training unit shown in figure 15 specifically may include：

Delete subelement, for will train the 6th loss function in the multi-modal disaggregated model that subelement training obtains with And the 5th loss function in each single mode disaggregated model is deleted, and object-class model is obtained.

In example 4A, training subelement is specifically used for：

For each single mode disaggregated model, one is obtained from the corresponding modal data training set of single mode disaggregated model Training data is input in the feature coding device of the single mode disaggregated model, and according to the grader of the single mode disaggregated model The value of loss function is adjusted the parameter of feature coding device and grader in the single mode disaggregated model；And root According to the value of the 5th loss function of the single mode disaggregated model, to the parameter of the feature coding device of the single mode disaggregated model into Row adjustment；

The parameter of the feature coding device of each single mode disaggregated model is adjusted according to the value of the 6th loss function；

Example 5A

Example 5A is corresponding with the example 5 in embodiment one, and the structure of initial mode disaggregated model can be as shown in figure 12, specifically Content is referring to example 5 in embodiment one, and details are not described herein.

In example 5A, training unit shown in figure 15 specifically may include：

Subelement is deleted, for that will train in each single mode disaggregated model in the disaggregated model that subelement training obtains Arbiter is deleted, and object-class model is obtained；Alternatively, the 6th loss letter in the disaggregated model that training subelement training is obtained Arbiter in several and each single mode disaggregated model is deleted, to obtain object-class model.

In example 5A, training subelement is specifically used for：

According to the value of the 6th loss function, the parameter of the feature coding device of each single mode disaggregated model is adjusted；

In example 5A, the value of first-loss function of the training subelement based on the single mode disaggregated model and the second damage The value for losing function is adjusted the arbiter of the single mode disaggregated model and the parameter of feature coding device, and specific implementation can be joined See the mode B1 or mode B2 in example 2, details are not described herein.

Example 6A

Example 6A is corresponding with the example 6 in embodiment one, and the structure of initial mode disaggregated model can be such as Figure 13 or Figure 14 institutes Show, particular content is referring to example 6 in embodiment one, and details are not described herein.

In example 6A, training unit shown in figure 15 specifically may include：

Subelement is deleted, for the in each single mode disaggregated model will to be trained in the disaggregated model that subelement training obtains Five loss functions are deleted, and object-class model is obtained.

In example 6A, training subelement is specifically used for：

For each single mode disaggregated model, one is obtained from the corresponding modal data training set of single mode disaggregated model Training data is input in the feature coding device of the single mode disaggregated model, and based on the classification according to the single mode disaggregated model The value of the loss function of device is adjusted the parameter of feature coding device and grader in the single mode disaggregated model；And According to the value of the 5th loss function of the single mode disaggregated model to the feature coding device of the single mode disaggregated model Parameter is adjusted；

In the embodiment of the present invention, the structure of the preliminary classification model of aforementioned exemplary 1A and example 6A structures is relatively simple, instruction Practice speed, but the complementarity between multiple modalities data cannot be made full use of between each single mode disaggregated model to carry out Cooperative Study.Although the structure of the preliminary classification model of example 2A~example 5A structures is relative complex, it is different single mode Disaggregated model corresponds to the modal data training set of different modalities, by the different modalities data for being aligned multiple single mode disaggregated models Feature coding distribution, the training data of different modalities can be implicitly utilized between multiple single mode disaggregated models jointly, altogether Enjoy the characteristic information of the training data of different modalities so that multiple single mode disaggregated models can coorinated training, using more Modal data mutually improves the performance of each single mode disaggregated model.Aforementioned different example, can be combined with label training Data and confrontation constrained learning is carried out to feature coding device without label training data so that encoder can learn to there is label instruction Practice data and largely there is without label training data the feature representation of good uniformity, unlike the prior art is needed to same group of instruction Practice sample and carry out multiple forward calculation, so as to improve the training effectiveness of disaggregated model, in addition it can be directed to multi-modal number According to study is trained, use scope is wider；On this basis not only respectively but also have advantage and disadvantage, those skilled in the art can basis Any one preliminary classification model in aforementioned exemplary is chosen in actual demand.

Embodiment three

The embodiment of the present invention three also provides a kind of computer server, and as shown in figure 16, which includes depositing Reservoir, and the one or more processors with memory communication connection；

The instruction that can be executed by one or more of processors is stored in the memory, described instruction is by described one A or multiple processors execute, so that one or more of processors realize in previous embodiment one any one multi-modal point The training method of class model.

In the embodiment of the present invention three, which can be PC machine, notebook, tablet computer, FPGA (Field- Programmable Gate Array, i.e. field programmable gate array), the hardware devices such as industrial computer or smart mobile phone.

The basic principle that the present invention is described above in association with specific embodiment, however, it is desirable to, it is noted that general to this field For logical technical staff, it is to be understood that the whole either any steps or component of methods and apparatus of the present invention can be any Computing device (including processor, storage medium etc.) either in the network of computing device with hardware firmware, software or they Combination is realized that this is that those of ordinary skill in the art use the basic of them in the case where having read the explanation of the present invention What programming skill can be achieved with.

One of ordinary skill in the art will appreciate that all or part of step that realization above-described embodiment method carries is can To instruct relevant hardware to complete by program, the program can be stored in a kind of computer readable storage medium, The program includes the steps that one or a combination set of embodiment of the method when being executed.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In read/write memory medium.

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computer The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.) Formula.

The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices with generate computer implemented processing in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although the above embodiment of the present invention has been described, created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes upper that the following claims are intended to be interpreted as It states embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of training method of disaggregated model, which is characterized in that including：

Preliminary classification model is built, the preliminary classification model includes the identical at least one single mode classification mould of classification task Type, wherein the corresponding modal data training set of each single mode disaggregated model includes label training data and without label training number According to；

There is label training data and without label training based on being aligned in the modal data training set of each single mode disaggregated model The method of the feature coding distribution of data, initially divides using described in the corresponding mode training data set pair of each single mode disaggregated model Class model is trained to obtain object-class model.

2. according to the method described in claim 1, it is characterized in that, each single mode disaggregated model include feature coding device and Respectively with the cascade grader of the feature coding device and arbiter, the arbiter is for judging the feature coding device output Feature coding from having label training data or without label training data, the output end of the arbiter be provided with for pair The first-loss function that the arbiter is trained and the second loss function being trained for the feature coding device, institute State first-loss function and the confrontation setting of the second loss function；

Preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model is used to be trained to obtain Object-class model specifically includes：

Training is iterated using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model；

Arbiter in the disaggregated model that training is obtained in each single mode disaggregated model is deleted.

3. according to the method described in claim 2, it is characterized in that, training number using the corresponding mode of each single mode disaggregated model It is iterated training according to preliminary classification model described in set pair, is specifically included：

For each single mode disaggregated model, a training is obtained from the modal data training set of the single mode disaggregated model Data are input in the feature coding device of the single mode disaggregated model, and according to the grader of the single mode disaggregated model The value of loss function is adjusted the parameter of feature coding device and grader in the single mode disaggregated model；And The value of first-loss function based on the single mode disaggregated model and the value of the second loss function classify to the single mode The arbiter of model and the parameter of feature coding device are adjusted；

4. according to the method described in claim 2, it is characterized in that, the single mode disaggregated model that the preliminary classification model includes Feature coding device for two or more, the multiple single mode disaggregated model is also respectively connected to the same cross-module state arbiter On, the cross-module state arbiter is used to differentiate the corresponding mould of feature coding of the feature coding device output of each single mode disaggregated model State type, the output end of cross-module state arbiter be provided with third loss function for being trained to cross-module state arbiter and The 4th loss function for being trained to the feature coding device in each single mode disaggregated model, the third loss function and The confrontation setting of 4th loss function；

The method further includes：Cross-module state arbiter in the classification mode that training is obtained is deleted.

5. according to the method described in claim 4, it is characterized in that, training number using the corresponding mode of each single mode disaggregated model It is iterated training according to preliminary classification model described in set pair, is specifically included：

For each single mode disaggregated model, a training is obtained from the corresponding modal data training set of single mode disaggregated model Data are input in the feature coding device of the single mode disaggregated model, and the loss of the grader according to the single mode disaggregated model The value of function is adjusted the parameter of feature coding device and grader in the single mode disaggregated model；And it is based on institute The value of the first-loss function of single mode disaggregated model and the value of the second loss function are stated to the single mode disaggregated model The parameter of arbiter and feature coding device is adjusted；

The value of value and the 4th loss function based on third loss function is to cross-module state arbiter and each single mode classification mould The parameter of the feature coding device of type is adjusted；

6. according to the method described in claim 5, it is characterized in that, the value based on third loss function and the 4th loss function Value the parameter of the feature coding device of cross-module state arbiter and each single mode disaggregated model is adjusted, specifically include：

After being differentiated to the feature coding of the feature coding device output of each single mode disaggregated model according to cross-module state arbiter The value of third loss function adjusts the parameter of the cross-module state arbiter；

Based on the cross-module state arbiter after parameter adjustment to the feature coding of the feature coding device output of each single mode disaggregated model The value for carrying out the 4th loss function after major punishments not other, adjusts the parameter of the feature coding device of each single mode disaggregated model.

7. the method according to claim 3 or 5, which is characterized in that the first-loss based on the single mode disaggregated model The value of function and the value of the second loss function to the parameter of the arbiter of the single mode disaggregated model and feature coding device into Row adjustment, specifically includes：

After the feature coding exported to the feature coding device according to the arbiter of the single mode disaggregated model differentiates The value of first-loss function adjusts the parameter of the arbiter；

The feature coding exported to the feature coding device based on the arbiter after parameter adjustment carries out the after major punishments not other The value of two loss functions adjusts the parameter of the feature coding device.

8. a kind of training device of disaggregated model, which is characterized in that including：

Model construction unit, for building preliminary classification model, the preliminary classification model includes that classification task is identical at least One single mode disaggregated model, wherein the corresponding modal data training set of each single mode disaggregated model includes label training number According to no label training data；

Training unit, for having label training data in the modal data training set based on each single mode disaggregated model of alignment The method being distributed with the feature coding of no label training data, using the corresponding mode training dataset of each single mode disaggregated model The preliminary classification model is trained to obtain object-class model.

9. device according to claim 8, which is characterized in that each single mode disaggregated model include feature coding device and Respectively with the cascade grader of the feature coding device and arbiter, the arbiter is for judging the feature coding device output Feature coding from having label training data or without label training data, the output end of the arbiter be provided with for pair The first-loss function that the arbiter is trained and the second loss function being trained for the feature coding device, institute State first-loss function and the confrontation setting of the second loss function；

The training unit, specifically includes：

Training subelement, for using preliminary classification model described in the corresponding mode training data set pair of each single mode disaggregated model It is iterated training, triggering deletion subelement after training；

Subelement is deleted, each single mode disaggregated model in the disaggregated model for training to be obtained to the trained subelement training In arbiter delete.

10. device according to claim 9, which is characterized in that the trained subelement is specifically used for：

11. device according to claim 9, which is characterized in that the single mode classification mould that the preliminary classification model includes Type is two or more, and the feature coding device of the multiple single mode disaggregated model is also respectively connected to the same cross-module state arbiter On, the cross-module state arbiter is used to differentiate the corresponding mould of feature coding of the feature coding device output of each single mode disaggregated model State type, the output end of cross-module state arbiter be provided with third loss function for being trained to cross-module state arbiter and The 4th loss function for being trained to the feature coding device in each single mode disaggregated model, the third loss function and The confrontation setting of 4th loss function；

The deletion subelement is further used for：Cross-module state arbiter in the classification mode that training is obtained is deleted.

12. according to the devices described in claim 11, which is characterized in that the trained subelement is specifically used for：

13. device according to claim 12, which is characterized in that trained subelement the taking based on third loss function Value and the value of the 4th loss function carry out the parameter of the feature coding device of cross-module state arbiter and each single mode disaggregated model Adjustment, specifically includes：

After being differentiated to the feature coding of the feature coding device output of each single mode disaggregated model according to cross-module state arbiter The value of first-loss function adjusts the parameter of the cross-module state arbiter；

Based on the cross-module state arbiter after parameter adjustment to the feature coding of the feature coding device output of each single mode disaggregated model The value for carrying out the second loss function after major punishments not other, adjusts the parameter of the feature coding device of each single mode disaggregated model.

14. the device according to claim 10 or 12, which is characterized in that the trained subelement is based on the single mode point The arbiter and spy of the value of the first-loss function of class model and the value of the second loss function to the single mode disaggregated model The parameter of sign encoder is adjusted, and is specifically included：

15. a kind of computer server, which is characterized in that including memory, and one with memory communication connection Or multiple processors；

Be stored with the instruction that can be executed by one or more of processors in the memory, described instruction by one or Multiple processors execute, so that one or more of processors are realized multi-modal point as described in any one of claim 1~7 The training method of class model.