CN109840530A

CN109840530A - The method and apparatus of training multi-tag disaggregated model

Info

Publication number: CN109840530A
Application number: CN201711187395.0A
Authority: CN
Inventors: 刘晓阳; 胡晓林; 王月红; 曹忆南
Original assignee: Tsinghua University; Huawei Technologies Co Ltd
Current assignee: Tsinghua University; Huawei Technologies Co Ltd
Priority date: 2017-11-24
Filing date: 2017-11-24
Publication date: 2019-06-04
Also published as: WO2019100724A1

Abstract

This application provides training multi-tag disaggregated model method and apparatus, can dynamic learning characteristics of image, so that feature extraction network is more adapted to mission requirements, and multi-tag good classification effect.N sample and label matrix Y corresponding with the n sample are determined this method comprises: concentrating in training data_c*n, the label matrix Y_c*nIn element y_i*jIndicate i-th of sample whether include the instruction of j-th of label object, c indicates the number of relevant to sample label；The eigenmatrix X of the n sample is extracted using feature extraction network_d*n；The eigenmatrix X is obtained using Feature Mapping network_d*nPrediction label matrixThe prediction label matrixIn elementIndicate that i-th of sample includes the confidence level of the object of j-th of label instruction；According to the label matrix Y_c*nWith the prediction label matrix, to the weighting parameter Z, the Feature Mapping matrix M_c*dIt is updated, the training multi-tag disaggregated model.

Description

The method and apparatus of training multi-tag disaggregated model

Technical field

This application involves computer fields, and the mould more particularly to the training multi-tag in computer field is classified The method and apparatus of type.

Background technique

With the promotion of the process performance of smart phone, more and more applications propose requirement to the identification of image.Than It such as,, can be to it if smart phone can accurately identify the object in coverage during with mobile phone photograph Color, shape carries out targetedly operation, to improve shooting effect.And in the machine learning of intelligence system, in image The training that is identified of object also just at a very important aspect.Usually, machine learning is for largely There is image that label is set for object wherein included, identification parameter is constantly then adjusted by self evolution of computer, by Gradually improve the recognition accuracy to object.

Due to the complexity and ambiguity of objective object itself, real-life many objects may simultaneously with multiple classes Distinguishing label is related.In order to preferably embody multi-semantic meaning possessed by practical object, it is often used a sub-set of tags appropriate (including multiple relevant semantic labels) describes the object, and which forms so-called multi-tag classification problems.At this moment, each sample This all corresponds to the respective labels subclass being made of multiple labels, the target of study be exactly be that unknown sample predicts that it is corresponding Sub-set of tags.

In the classification of actual multi-tag, a series of training datas can be given first, here this series of training data group At set be properly termed as training dataset.But since the label that training data is concentrated is different people mark, or mark Some objects are had ignored when label, leading to label may be to have missing, therefore can be by the label concentrated to training data Completion is carried out to improve the accuracy of multi-tag classification.Completion is carried out there are many method to known label in multi-tag classification, In one is by the order of nuclear norm constrained forecast label matrix, and the loss function by minimizing multi-tag classification calculates This feature mapping matrix obtains the prediction label matrix of low-rank, realizes label completion, and then improve the performance of multi-tag classification. But this method needs first to extract the feature of image, then according to the feature calculation Feature Mapping matrix of image.It is being extracted After the feature of image, the feature of the image is fixed, because being unable to dynamically learn input picture according to label Characteristic information.

Summary of the invention

The application provides a kind of method and apparatus of trained multi-tag disaggregated model, can dynamic learning characteristics of image, make Feature extraction network more adapts to mission requirements, and multi-tag good classification effect.

In a first aspect, providing a kind of method of trained multi-tag disaggregated model, comprising:

It is concentrated in training data and determines n sample and label matrix Y corresponding with the n sample_c*n, the label square Battle array Y_c*nIn element y_i*jIndicate i-th of sample whether include j-th of label instruction object, c indicate with the training data The number of the relevant label of the sample of concentration.

The eigenmatrix X of the n sample is extracted using feature extraction network_d*n, wherein the feature extraction network tool There are weighting parameter Z, d to indicate the eigenmatrix X_d*nCharacteristic dimension.

Here, feature extraction network, which can be any one, can extract the neural network of characteristics of image, such as can be Convolutional neural networks or multi-layer perception (MLP) etc., the embodiment of the present application does not limit this.Wherein, the weight of feature extraction network can be with It is expressed as Z, specifically, Z may include multiple weight matrixs.The parameter of weight matrix can be generated with random initializtion, Huo Zheke Using the model parameter of pre-training.Here, the model parameter of pre-training refers to the parameter of trained model, such as Vgg16 network trained model parameter on ImageNet data set.

The eigenmatrix X is obtained using Feature Mapping network_d*nPrediction label matrixThe prediction label matrixIn elementIndicate that i-th of sample includes the confidence level of the object of j-th of label instruction, wherein the Feature Mapping net The weight matrix of network is the Feature Mapping matrix M of low-rank_c*d, M_c*dIt can indicate the characteristic attribute and class in multi-tag disaggregated model Associated weight between distinguishing label, initial value can generate at random.As an example, Feature Mapping network can be weight matrix For the Feature Mapping matrix M of low-rank_c*dMapping network, such as can be full articulamentum.

Specifically, Feature Mapping network can be expressed as FCM.The eigenmatrix X of feature extraction network output_d*nIt can be defeated Enter to FCM, then by FCM by the eigenmatrix X of input_d*nIt is mapped to prediction label space, obtains prediction label matrixI.e. Have:

According to the label matrix Y_c*nWith the prediction label matrixTo the weighting parameter Z, the Feature Mapping Matrix M_c*dIt is updated, the training multi-tag disaggregated model.

Wherein, n, c, i, j and d are positive integer, and the value range of i is 1 to n, and the value range of j is 1 to c.

Therefore, the nerve network system provided by the embodiment of the present application can directly train model from input data, Without additional intermediate steps, i.e., the nerve network system is a nervous system end to end.Here, end to end Advantage is that feature extraction, Feature Mapping matrix and low-rank label correlation matrix can optimize simultaneously, that is to say, that the application is real Applying example can make feature extraction network more adapt to mission requirements, and multi-tag good classification effect with dynamic learning characteristics of image.

Optionally, the Feature Mapping network of the low-rank includes the first sub- mapping network and the second sub- mapping network, described The Feature Mapping network of low-rank, the first sub- mapping network and the second sub- mapping network have following relationship:

Wherein, the weight matrix of the described first sub- mapping network isThe weight matrix of the second sub- mapping network is H_c*r, here, in order to guarantee M_c*d、And H_c*rLow-rank, can be set r be positive integer and r≤min (d, c).

In a specific embodiment, the first sub- mapping network can be for weight matrixFull articulamentum, Two sub- mapping networks can be H with weight matrix_c*rFull articulamentum,And H_c*rInitial value can generate at random.

In other words, in the embodiment of the present application, completion can be carried out to label matrix by way of matrix low rank decomposition, I.e. by prediction label matrixCarry out low-rank decomposition, it may be assumed that

Here, made by the way that r≤min (d, c) is arrangedAnd H_c*rLow-rank, due to the square obtained after two matrix multiples Rank of matrix can make less than any one rank of matrix in two matrixes(i.e. M_c*d) low-rank, and then make ?I.e.Low-rank.Here, r can take optimal value by repeatedly training.

That is, the embodiment of the present application can pass through preset Feature Mapping matrix M (i.e. M_c*d) by X (i.e. X_d*n) mapping Obtain prediction label matrix(i.e.), i.e.,BecauseOrder be less than or equal to the order of M or X, so doing low-rank point to M Guarantee while solution can make M low-rankLow-rank, therefore low-rank decomposition can also be done to M, i.e., above-mentioned (2) formula in this way may be used Be equivalent to byThe form of two low dimensional matrix multiples has been resolved into, and then has been guaranteedLow-rank.

Optionally, according to the label matrix Y_c*nWith the prediction label matrixTo the weighting parameter Z, described Feature Mapping matrix M_c*dIt is updated, comprising:

Determine the prediction label matrixWith the label matrix Y_c*nBetween Euclidean distance loss function, the loss Following (3) formula of the expression formula of function:

Alternatively, following (4) formula of the expression formula of the loss function:

Then, according to the Euclidean distance loss function, to the weighting parameter Z, the weight matrixAnd H_c*rInto Row updates.

Optionally, described according to the Euclidean distance loss function, to the weighting parameter Z, the weight matrixWith H_c*rIt is updated, comprising:

By the sum of the Euclidean distance loss function and regular terms, it is determined as the majorized function L of the n sampleⁿ, In, the regular terms is for constraining the weighting parameter Z, the weight matrixAnd H_c*r, LⁿExpression formula such as (7) formula or (8) shown in formula:

Wherein, majorized function LⁿFirst item be above-mentioned loss function εⁿ, Section 2 is regular terms, and the regular terms is for about Shu Suoshu weighting parameter Z, the weight matrixAnd H_c*r, prevent over-fitting.

Here it is possible to minimize loss function L using error backpropagation algorithmⁿ, by the value of the majorized function Corresponding weighting parameter Z is as updated weighting parameter Z when minimum, corresponding when by the value minimum of the majorized function Weight matrixAs updated weight matrixCorresponding weight when by the value minimum of the majorized function Matrix H_c*rAs updated weight matrix H_c*r。

Next, it is determined whether reaching stop condition.

Here, stop condition are as follows: LⁿNo longer decline or fall is less than preset threshold value, or reaches maximum training time Number.The repetition training if not reaching, until reaching stop condition.In the embodiment of the present application, all pictures are all inputted a calculation Make one wheel of training, it usually needs several wheels of training.

Optionally, described that the label matrix Y for determining n sample and the n sample is concentrated in training data_c*n, comprising:

Determine training dataset, the training data concentrate include D sample and with each sample in the D sample Label vector, wherein the element y in the label vector of each sample_jIndicate whether each sample includes j-th of mark Sign the object of instruction, wherein D is the positive integer not less than n；

N sample is randomly selected from training data concentration, and generates the label matrix Y of the n sample_c*n, described Label matrix Y_c*nIncluding the corresponding label vector of each sample in the n sample.

Therefore, in the embodiment of the present application, it is not necessary to disposably input entire training dataset and be calculated, and only need in batches Secondary input picture is calculated, therefore the embodiment of the present application can input entire data set in batch and be trained.Also It is to say, in the embodiment of the present application, model can be trained by the partial data that multiple batches of ground input data is concentrated, In, the data inputted every time, which can be, to be randomly selected from the picture sample not inputted in data set.Due to training dataset A large amount of sample is generally included, therefore the embodiment of the present application can reduce training pattern by inputting training dataset in batches In the process to the occupancy of resource, the demand during training pattern to memory source is greatly reduced, can effectively be solved big The computational problem of low-rank label correlation matrix under scale data.

Optionally, further includes: the fisrt feature matrix of first sample is extracted using the feature extraction network, wherein institute It states first sample and is not belonging to the n sample；

The first prediction label matrix of the fisrt feature matrix is obtained using first mapping network, described first is pre- Survey the confidence level that first sample described in the element representation in label matrix includes the object of j-th of label instruction.

Specifically, in test phase, only test picture need to be input to the spy in the neural network model after the completion of training Sign extracts network, extracts the fisrt feature matrix of the test picture using the feature extraction network, and by the fisrt feature square Battle array is input to Feature Mapping network (can specifically include FCW and FCH), is obtained using Feature Mapping network and exports described first The prediction label matrix of eigenmatrix, test described in the element representation in the prediction label matrix include j-th of label instruction Object confidence level.Here, test picture can be one or more pictures, and can be not belonging to training dataset.

Second aspect provides a kind of device of trained multi-tag disaggregated model, and described device is for executing above-mentioned first party Method in any possible implementation of face or first aspect.Specifically, the apparatus may include for executing first The module of method in any possible implementation of aspect or first aspect.

The third aspect, provides a kind of device of trained multi-tag disaggregated model, and described device includes memory and processor, For storing instruction, the processor is used to execute the instruction of the memory storage to the memory, and to the storage The execution of the instruction stored in device is so that the processor executes any possible implementation of first aspect or first aspect In method.

Fourth aspect provides a kind of computer readable storage medium, finger is stored in the computer readable storage medium It enables, when described instruction is run on computers, so that computer executes any possible reality of first aspect or first aspect Method in existing mode.

5th aspect, provides a kind of computer program product comprising instruction, when the computer program product is in computer When upper operation, so that computer executes the method in any possible implementation of first aspect or first aspect.

Detailed description of the invention

Fig. 1 shows the schematic diagram of single labeling and multi-tag classification problem.

Fig. 2 shows a kind of schematic flows of the method for trained multi-tag disaggregated model provided by the embodiments of the present application Figure.

Fig. 3 shows a kind of schematic diagram of multi-tag disaggregated model provided by the embodiments of the present application.

Fig. 4 shows a kind of schematic diagram of multi-tag disaggregated model provided by the embodiments of the present application.

Fig. 5 shows a kind of schematic block diagram of the device of trained multi-tag disaggregated model provided by the embodiments of the present application.

Fig. 6 shows the schematic frame of the device of another training multi-tag disaggregated model provided by the embodiments of the present application Figure.

Specific embodiment

Below in conjunction with attached drawing, the technical solution in the application is described.

Fig. 1 shows the schematic diagram of single labeling and multi-tag classification problem.As shown in figure 1 shown in (a), single labeling Often assume that sample corresponds only to a class label, that is, has unique semantic meaning.Then this hypothesis is in many reality In the case of may and it is invalid, particularly contemplate the existing semantic diversity of objective objects itself, object be likely to simultaneously with Multiple and different class labels is related.Therefore in multi-tag problem, as shown in figure 1 shown in (b), multiple relevant classifications are often used Label describes semantic information corresponding to each object, for example, each image may correspond to multiple semantic labels simultaneously, such as " meadow ", " sky " and " sea ", per song segment may contain there are many mood, such as " pleasure " and " light ".

In multi-tag classification problem, a series of training datas can be given first, and this series of training data forms here Set is properly termed as training dataset.By learning given training data, its corresponding label can be predicted for unknown sample Subset.Here, training dataset can correspond to a tag set, may include related to the training data in the tag set The different classes of labels of c, c is positive integer.Training dataset may include D sample and corresponding label of each sample Collection, wherein D is positive integer.It is understood that sub-set of tags here is a subset of the tag set.That is, passing through The multiple samples and the corresponding sub-set of tags of each sample that the given training data of study is concentrated, can predict the mark of unknown sample Bamboo slips used for divination or drawing lots collection.

In the embodiment of the present application, sub-set of tags can be expressed as label vector.In other words, the label vector of sample can be with Indicate which label sample has or belong to which type.For example, the label vector of piece image is [0 1001 0], Then show to share 6 kinds of classifications, wherein each element in the label vector represents a kind of classification or a label, and 0 indicates image In without this kind of or this label, have this kind of or this label in 1 expression image.Since there are two 1 marks for the label vector Label, then it represents that there are two types of objects in the image, are belonging respectively to the second class and the 5th class.In this way, the D sample that training data is concentrated In each sample can correspond to a label vector y_j, indicate the sample whether include j-th of label instruction object, here The value range of j is 1 to c.It should be understood that the embodiment of the present application in, sample whether include j-th of label instruction object i.e. sample It whether include j-th of label.

In this way, the label vector for all or part of sample that training data is concentrated will form a label matrix Y:

In addition, prediction label vector is the output of multi-tag classifier, dimension is identical as label vector, represents multi-tag Prediction of the classifier to the image generic.The value of the element of prediction label vector is real value, if the real value is more than given As soon as threshold value be otherwise not belonging to the category then the corresponding position of the element belongs to respective classes.For example, prediction label Vector is [0.7 0.2 0.1 0.8 1.0 0.0], and the number on each is compared by threshold value 0.5 with threshold value, is greater than threshold Value, which is then equivalent to, belongs to the category.The classification predicted in this way is the first kind, the 4th class and the 5th class.If the prediction label to Measuring corresponding label vector is [1 00101 0], then the prediction label vector is completely correct.

It is each sample in data in practical problem especially in the case where a large amount of class labels involved in data It is often extremely difficult all to provide its corresponding complete tag information.Therefore, label letter corresponding to the sample that training data is concentrated Breath is likely to incomplete.That is, sample does not include certain label and does not represent practical feelings in the label matrix of data Sample is uncorrelated to the label under condition.Therefore, it is necessary to concentrate existing data by training data, label matrix is mended It entirely, include then richer label information by this to obtain the prediction label matrix comprising richer label information Prediction label matrix can more accurately predict the label information of unknown sample.

The prior art needs first to extract the feature of image, then according to the spy of image when carrying out completion to label matrix Sign calculates the Feature Mapping matrix of low-rank.After being extracted the feature of image, the feature of the image is fixed, because without The characteristic information of input picture can dynamically be learnt according to label.Based on this, the embodiment of the present application devises a kind of for more The neural network of labeling can realize multi-tag point by learning characteristic mapping matrix and optimization feature extraction network Class algorithm.

Nerve network system is a kind of intelligentized identifying system, adds up training result by way of repetition training, To improve the recognition capability to various target objects or sound.Convolutional neural networks be neural network development main flow direction it One.Convolutional neural networks generally comprise convolutional layer (Convolutional Layer), correct linear unit (Rectified Linear Units, ReLU) layer, pond (Pooling) layer and full connection (Fully Connect, FC) layer.Wherein, convolution Layer, ReLU layers and Pooling layers may alternately be repeated several times setting.

Convolutional layer can be considered as the core of convolutional neural networks, and when being used for image recognition, input terminal receives image Data, for being identified by filter image.Here image data can be the image conversion knot that video camera photographed Fruit is also possible to the processing result of the front layer of convolutional layer.Usual image data is three-dimensional pattern matrix, such as 32x32x3, In, 32x32 is the two-dimensional for the image that image data represents, i.e., wide and high, depth value 3 here is then because image is usual It is divided into green, red, blue three data channel.Multiple filters are equipped in convolutional layer, it is special that different filters corresponds to different images Sign (boundary, color, shape etc.) is scanned the image data of input according to certain step-length.It is arranged in different filters There is different weight matrix, the weight matrix is that neural network generates in learning process for specific image features.Often The each region for clapping scan image of one filter, obtaining a three-dimensional input matrix, (MxNx3, M and N are determined The size of scanning area), input matrix and weight matrix are made dot product by convolutional network, obtain an end value, then can be with spy Fixed step size scans next region, for example, traversing two lattice.After a filter scans through all areas according to particular step size, End value can constitute a two-dimensional matrix；And after all filters are completed to scan, end value will constitute a three-dimensional matrice As the output of current convolutional layer, the different depth layer of this three-dimensional matrice respectively corresponds the scanning result of a filter (i.e. The two-dimensional matrix constituted after each filter scan).

The output of convolutional layer can be sent to ReLU layers again and process (the numberical range progress by max (0, x) function to output Limit), and Pooling layers are sent to by down-sampling size reduction, before being sent to FC layers, image data can may also pass through Multiple convolutional layers, with to characteristics of image carry out profound identification (such as first time convolutional layer only to the contour feature of image into Row identification, second of convolutional layer start identification icon etc.), finally enter FC layers.It is similar but slightly different with convolutional layer, FC layers It is weight processing to be made to input data by multiple filters, but FC layers obtain each filter unlike the filter of convolutional layer Different zones are scanned by the displacement of each bat like that, but the disposably all areas of the image data of scanning input, Then operation is carried out with weight matrix obtain an end value.As soon as final FC layer export be a 1x1xN matrix, in fact It is a data sequence, each of this data sequence corresponds to different target objects, and value thereon can be considered these Score value existing for object target.In convolutional layer and FC layers, weight matrix can be all used, neural network can be tieed up by self-training Protect a variety of weight matrix.

The method of the training multi-tag disaggregated model of the embodiment of the present application is discussed in detail below in conjunction with Fig. 2 and Fig. 3.

Fig. 2 shows a kind of schematic flows of the method for trained multi-tag disaggregated model provided by the embodiments of the present application Figure.It should be understood that the step of Fig. 2 shows the methods of trained multi-tag disaggregated model or operation, but these steps or operation are only The deformation of other operations or each operation in Fig. 2 can also be performed in example, the embodiment of the present application.In addition, each in Fig. 2 Step can be executed according to the different sequences presented from Fig. 2, and it is possible to not really want to execute all operationss in Fig. 2.

Fig. 3 shows a kind of schematic diagram of multi-tag disaggregated model 300 provided by the embodiments of the present application.Multi-tag classification Model 300 is specially nerve network system.The multi-tag disaggregated model 300 includes feature extraction network 301, Feature Mapping network 302 and processing unit 305, wherein Feature Mapping network 302 may include FCW 303 and FCH 304.It should be understood that Fig. 3 is shown Multi-tag disaggregated model 300 be only example, the embodiment of the present application can also include each in other modules or unit or Fig. 3 The deformation of a module or unit.

It should be noted that multi-tag classification method can be applied to image labeling, image recognition, sound knowledge in the embodiment of the present application Not, the multiple fields such as text classification, specifically, the sample concentrated of corresponding training data can for image, sound, document etc., The embodiment of the present application does not limit this.For convenience, it will hereafter be carried out for using image pattern to carry out image recognition Description, but this can't limit the forecast scheme configuration of the embodiment of the present application.

210, initialize the weight of multi-tag disaggregated model 200.

Feature extraction network 301, Feature Mapping in weight, that is, initialization system of initialization multi-tag disaggregated model 200 The weight of network (i.e. FCW303 and FCH 304).

Here, feature extraction network 301, which can be any one, can extract the neural network of characteristics of image, such as can be with For convolutional neural networks or multi-layer perception (MLP) etc., the embodiment of the present application does not limit this.Wherein, the power of feature extraction network 301 Value can be expressed as Z, specifically, Z may include multiple weight matrixs.The parameter of weight matrix can be generated with random initializtion, Or the model parameter of pre-training can be used.Here, the model parameter of pre-training refers to the ginseng of trained model Number, as vgg16 network on ImageNet data set trained model parameter.

In addition, Feature Mapping network 302 can be the Feature Mapping matrix M that weight matrix is low-rank_c*dMapping network, It such as can be full articulamentum, wherein M_c*dIt can indicate between the characteristic attribute and class label in multi-tag disaggregated model Associated weight, initial value can generate at random.In a specific embodiment, Feature Mapping network 302 may include FCW303 and FCH 304, wherein FCW 303 indicates that weight matrix isFull articulamentum, FCH 304 indicate weight matrix For H_c*rFull articulamentum,And H_c*rInitial value can generate at random.Here, in order to guarantee M_c*d、And H_c*rLow-rank Property, r≤min (d, c) can be set.

220, input n width picture.

Due to the characteristic of neural network, it is not necessary to disposably input entire training dataset and be calculated, and only need in batches Secondary input picture is calculated, therefore the embodiment of the present application can input entire data set in batch and be trained.Also It is to say, in the embodiment of the present application, model can be trained by the partial data that multiple batches of ground input data is concentrated, In, the data inputted every time, which can be, to be randomly selected from the picture sample not inputted in data set.Due to training dataset A large amount of sample is generally included, therefore the embodiment of the present application can reduce training pattern by inputting training dataset in batches In the process to the occupancy of resource.

At this moment, the number that a batch is input to the sample of multi-tag disaggregated model 300 can be n.When sample is figure When piece, which can be expressed as image_n, and more specifically, and image_n can be the D sample from training dataset The n picture randomly selected in this, also, the value of n can be much smaller than D.Specifically, the size of n can be according to more marks The ability for signing disaggregated model 300 determines.For example, n can if the data-handling capacity of the multi-tag disaggregated model 300 is stronger With the bigger of setting, to shorten the time of training pattern.In another example if the data processing of the multi-tag disaggregated model 300 Ability is weaker, then n can be set smaller, to reduce resource consumed by training pattern.In this way, the embodiment of the present application can The value of n is neatly set according to the data-handling capacity of multi-tag disaggregated model 300.

Also, label matrix corresponding to the n sample can be expressed as Y_c*n, label matrix Y_c*nIn element y_i*jTable Show i-th of sample whether include the instruction of j-th of label object, the value range of i is 1 to n here, the value range of j be 1 to c.Specifically, the description of label matrix may refer to description above, to avoid repeating, which is not described herein again.

In the embodiment of the present application, training data can be input to multi-tag disaggregated model 300 shown in Fig. 3.Specifically , the label matrix Y of the n picture and the n picture that training data can be concentrated_c*nIt is separately input into multi-tag classification Model 300.

230, extract the feature of picture.

Specifically, n picture can be input to feature extraction network 301, feature extraction network 301 passes through convolution Layer, activation primitive layer, Pooling layers, full articulamentum, Batchnorm layers of effect, can extract the feature of the n picture, and Export eigenmatrix X_d*n.Wherein, d is positive integer and indicates the eigenmatrix X_d*nCharacteristic dimension.

240, according to the prediction label matrix of the feature calculation picture of picture.

Specifically, the eigenmatrix X that feature extraction network 301 exports_d*nFeature Mapping network 302 can be input to.Due to The weight matrix of Feature Mapping network is the Feature Mapping matrix M of low-rank_c*d, M_c*dIt can indicate the spy in multi-tag disaggregated model The associated weight between attribute and class label is levied, therefore Feature Mapping network 302 can be by the eigenmatrix X of input_d*nMapping To prediction label space, prediction label matrix is obtainedHave:

Here, prediction label matrixIt can be the label matrix comprising richer label information, each element thereinIndicate that i-th of sample includes the confidence level of the object of j-th of label instruction.Therefore, prediction label matrix can be claimedTo mend Full label matrix claims label matrix Y_c*nTo there is the label matrix of missing.

It should be noted that the label in label matrix is not independent from each other in practical problem, but semantic phase It closes.Such as sheep and grass a possibility that occurring in a width picture it is very big, a possibility that mountain and sky occur together, is also very big, and A possibility that sheep and office occur together very little, and this correlation can be used to improve the accuracy of multi-tag classification. It follows that completion label matrixIn label between there is correlation, i.e.,It is low-rank, therefore can basis The low-rank structure of matrix is by Y_c*nIt obtainsAnd this process is properly termed as the completion of label matrix.

In the embodiment of the present application, completion can be carried out to label matrix by way of matrix low rank decomposition, i.e., it will prediction Label matrixCarry out low-rank decomposition, it may be assumed that

That is, the Feature Mapping network of the low-rank may include the first sub- mapping network in the embodiment of the present application With the second sub- mapping network, the Feature Mapping network of the low-rank, the first sub- mapping network and the second son mapping net Network has following relationship:

In a specific embodiment, the first sub- mapping network can be for weight matrixFull articulamentum, indicate For FCW, the second sub- mapping network can be that weight matrix is H_c*rFull articulamentum, be expressed as FCH,And H_c*rInitial value It can generate at random.

Here, made by the way that r≤min (d, c) is arrangedAnd H_c*rLow-rank, due to the matrix obtained after two matrix multiples Order less than any one rank of matrix in two matrixes, therefore can make(i.e. M_c*d) low-rank, so thatI.e.Low-rank.Here, r can take optimal value by repeatedly training.

In other words, the embodiment of the present application can pass through preset Feature Mapping matrix M (i.e. M_c*d) by X (i.e. X_d*n) mapping Obtain prediction label matrix(i.e.), i.e.,BecauseOrder be less than or equal to the order of M or X, so doing low-rank point to M Guarantee while solution can make M low-rankLow-rank, therefore low-rank decomposition can also be done to M, i.e., above-mentioned (2) formula in this way may be used Be equivalent to byThe form of two low dimensional matrix multiples has been resolved into, and then has been guaranteedLow-rank.

250, calculation optimization function.

Specifically, in the embodiment of the present application, preset matrix H can be passed through_c*rWithTo replace default matrix M. But due toIt is using ready-made matrix H_c*rWithIt obtains, therefore prediction label matrix in this wayIt is inaccurate, at this time It needs to be compareed in learning process with the label matrix Y of reference, to learn to update matrix H_c*rWith

At this point, processing unit 305 can be according to the label matrix Y_c*nWith the prediction label matrixTo the power Value parameter Z, the Feature Mapping matrix M_c*dIt is updated, with the training multi-tag disaggregated model 300.

Specifically, processing unit 305 can determine the prediction label matrixWith the label matrix Y_c*nBetween Euclidean distance loss function, effect are constraintsIt is allowed to and Y_c*nIt is close, following (3) formula of the expression formula of the loss function:

Here, for ease of description, it is omittedH_c*rSubscript and subscript.Wherein,It is the Frobenius model of matrix Number, matrix A_m*nFrobenius norm is defined as:

Wherein, A_ijFor the element of matrix A, i.e. Euclidean distance loss function.

In addition, the P in formula (4)_ΩFor projection operator, that is, the element observed remains unchanged, unobservable element value It is 0, effect is exactly that the element observed only is allowed to participate in calculating.Concrete form are as follows:

For example assumeY=[1 00? 0 1],? the element being missing from, that ?I.e.? position exists 0 is set to when calculating.In this way,It is not involved in calculating (being considered as 0) with the element on the position in Y, it can be to avoid scarce The element of mistake causes the value of loss function bigger than normal, and then improves the accuracy calculated.

It is possible to further be determined as the loss function of the n sample for the sum of above-mentioned loss function and regular terms Lⁿ.Here, loss function LⁿIt is referred to as majorized function Lⁿ, specifically, LⁿExpression formula as shown in (7) formula or (8) formula:

260, weighting parameter is updated using error inverse algorithms.

Error backpropagation algorithm is a kind of method for multilayer neural network training, using gradient descent method as base Plinth carries out study update to every layer of neural network of weight by optimizing loss function.

Specifically, can use error backpropagation algorithm, loss function L is minimizedⁿ, by taking for the majorized function Value weighting parameter Z corresponding when minimum is right by the value minimum when institute of the majorized function as updated weighting parameter Z The Feature Mapping matrix M answered_c*dAs updated Feature Mapping matrix M_c*d。

WhenWhen, then have: corresponding weight matrix when by the value minimum of the majorized function As updated weight matrixCorresponding weight matrix H when by the value minimum of the majorized function_c*rAs update Weight matrix H afterwards_c*r。

To use error backpropagation algorithm, derivation is carried out to the variable in (7) formula below.With one width picture of input, just Then item uses l₂For norm.

Remember L¹For the majorized function of a width picture, then have:

Wherein, the l of square corresponding vector of the Frobenius norm of matrix₂Square of norm.

It is right belowH_c*rEach element derivation obtain:

Wherein, h_kjFor matrix H_c*rElement, w_jiFor matrixElement, x_iFor vector x_dVector, p_jFor vectorElement,For vectorElement, y_j/y_kFor vector y_cElement, x_d、p_r、y_cRespectively matrix X_d*n、P_r*n、Y_c*nColumn vector.The reversed derivation of error of feature extraction network weight Z can be passed throughTransmitting obtains. Then H_c*rWithElement update are as follows:

It is that this updates obtained value,It is the value updated last time, w_jiIt is similar therewith, η₁、η₂It is H respectively_c*r WithLearning rate, for controlling renewal rate.The update of feature extraction network portion weight Z is similar.

It thus may learn weight Z, the Feature Mapping matrix of feature extraction networkIt supplies scarce Lose-submission label promote the ability of multi-tag classification.

270, judge whether to reach stop condition.

Here, stop condition are as follows: LⁿNo longer decline or fall is less than preset threshold value, or reaches maximum training time Number.If do not reached, step 220 is repeated to 260, until reaching stop condition.It is in the embodiment of the present application, all pictures are all defeated Entering one time can be regarded as one wheel of training, it usually needs several wheels of training.

After the completion of training, in test phase, 220 and 230 only need to be executed, i.e., test picture are input to the neural network mould Feature extraction network in type, extracts the fisrt feature matrix of the test picture using the feature extraction network, and by this One eigenmatrix is input to FCM, and the prediction label matrix of the fisrt feature matrix, the prediction are obtained and exported using FCM The confidence level of object of the test described in element representation in label matrix comprising the instruction of j-th of label.Here, test picture can be with For one or more pictures, and training dataset can be not belonging to.

And specifically, to the single predicted vector of prediction label matrixFrom the point of view of, by rightIt processes and this can be obtained One or more classifications belonging to picture, such asSome or some element values be greater than preset threshold value and indicate the picture It is equipped with class label in an element or multiple element corresponding positions, which belongs to this kind of or several classes.Here, preset Threshold value can be that 0.5 or other numerical value, the embodiment of the present application do not limit this.

Therefore, the nerve network system provided by the embodiment of the present application can directly train model from input data, Without additional intermediate steps, i.e., the nerve network system is a nervous system end to end.Here, end to end Advantage is that the weighting parameter of feature extraction network and Feature Mapping matrix can optimize simultaneously, that is to say, that the embodiment of the present application Feature extraction network can be made more to adapt to mission requirements, multi-tag good classification effect with dynamic learning characteristics of image.

In addition, the embodiment of the present application can utilize the box counting algorithm Feature Mapping matrix of picture sample in batch, It, i.e., need not be disposably with the figure of whole samples without disposably using the characteristics of image of entire data set to be calculated as input As feature is trained, the demand during training pattern to memory source is greatly reduced, can effectively be solved extensive The computational problem that multi-tag is classified under data.

Fig. 4 shows a kind of schematic diagram of multi-tag disaggregated model 500 provided by the embodiments of the present application.The model 500 Feature extraction network portion uses VGG16 network, and by the Dropout after the full articulamentum of the penultimate of VGG16 network The output of layer is as eigenmatrix X.In addition, the weighting parameter Z of feature extraction network uses the training on ImageNet data set Good weighting parameter, then finely tunes that (fine tuning refers to the several layers of weight in fixed front or only carries out the adjustment of very little, instructs completely to it Practice the last layer or two-tier network).The initial value of weight matrix H and W can be initialized using Gaussian Profile, and H and W Value will train completely.Regular terms can use Frobenius norm.

Specifically, the weight of feature extraction network VGG16 (removing the last one full articulamentum) uses in training The weight of pre-training on ImageNet data set.

The RGB triple channel picture image_n that n width pixel size is 224*224 is input in VGG16 network, here 1≤ N≤N, N are the quantity of picture in training set, and picture size can be expressed as the four-matrixes such as n*C*h*w or h*w*C*n, wherein C is port number (RGB image 3), and h is picture height (224 pixel), and w is picture width (224 pixel).Picture is by multiple After the operation such as convolution, activation, Pooling, image characteristic matrix is obtained using two full articulamentums and Dropout layers X_4096*n。

X_4096*nIt is respectively by weight matrixAnd H_c*rFull articulamentum (FCW 503 and FCH 504), predicted Label matrix

Processing unit 505 is according to label matrix Y_c*nWith prediction label matrixObtain majorized function:

Then, using error backpropagation algorithm, above-mentioned majorized function is minimized, updates weighting parameter Z, weight matrixAnd H_c*r.Specifically optimization process may refer to described above, and to avoid repeating, which is not described herein again.

Having updated weighting parameter Z, weight matrixAnd H_c*rLater, judge whether to reach stop condition, not reach such as Step is then repeated, until reaching stop condition.Specifically, stop condition may refer to it is described above, to avoid repeating, this In repeat no more.

After the completion of training, test picture can be input to feature extraction network 501, and feature extraction network is extracted The feature of picture be input to FCW 503 and FCH 504, prediction label matrix is obtained by FCW 503 and FCH 504.

It should be noted that the structure of feature extraction network can be replaced using other networks, such as in the embodiment of the present application AlexNet, GoogleNet, ResNet and customized network etc., the embodiment of the present application does not limit this.The layer of feature output Several convolutional layers or full articulamentum can also be increased and decreased using the output of a certain layer of above-mentioned network on the basis of the above.Separately Outside, the embodiment of the present application can also use different regularization terms.

It should be noted that the embodiment of the present application does not limit special product form, the side of the multi-tag classification of the embodiment of the present application Method can be deployed on general computer node.The multi-tag disaggregated model of Primary Construction can be stored in harddisk memory In, algorithm is run by processor and memory, existing training dataset is learnt, the multi-tag disaggregated model is obtained. The label of unknown sample can be predicted by leading to the multi-tag disaggregated model, and prediction result is stored in harddisk memory, be realized Completion is carried out to existing tally set, and the corresponding label of unknown sample is predicted.

Fig. 5 shows a kind of schematic frame of the device 600 of trained multi-tag disaggregated model provided by the embodiments of the present application Figure.Device 600 includes determination unit 610, extraction unit 620, acquiring unit 630 and updating unit 640.

Determination unit 610 determines n sample and label square corresponding with the n sample for concentrating in training data Battle array Y_c*n, the label matrix Y_c*nIn element y_i*jIndicate i-th of sample whether include j-th of label instruction object, c table Show the number of label relevant to the sample that the training data is concentrated.

Extraction unit 620, for extracting the eigenmatrix X of the n sample using feature extraction network_d*n, wherein institute Stating feature extraction network indicates the eigenmatrix X with weighting parameter Z, d_d*nCharacteristic dimension.

Acquiring unit 630, for obtaining the eigenmatrix X using Feature Mapping network_d*nPrediction label matrix The prediction label matrixIn elementIndicate that i-th of sample includes the confidence level of the object of j-th of label instruction, In, the weight matrix of the Feature Mapping network is the Feature Mapping matrix M of low-rank_c*d。

Updating unit 640, for according to the label matrix Y_c*nWith the prediction label matrixThe weight is joined Number Z, the Feature Mapping matrix M_c*dIt is updated, the training multi-tag disaggregated model.

Wherein, the weight matrix of the described first sub- mapping network isThe weight matrix of the second sub- mapping network is H_c*r, r is positive integer and r≤min (d, c).

Optionally, the updating unit is specifically used for:

Determine the prediction label matrixWith the label matrix Y_c*nBetween Euclidean distance loss function；

According to the Euclidean distance loss function, to the weighting parameter Z, the weight matrixAnd H_c*rIt carries out more Newly.

Optionally, the updating unit is specifically also used to:

By the sum of the Euclidean distance loss function and regular terms, it is determined as the majorized function of the n sample, wherein The regular terms is for constraining the weighting parameter Z, the weight matrixAnd H_c*r；

Corresponding weighting parameter Z is as updated weighting parameter Z when using the value minimum of the majorized function, by institute State weight matrix corresponding when the value minimum of majorized functionAs updated weight matrixBy the optimization Corresponding weight matrix H when the value minimum of function_c*rAs updated weight matrix H_c*r。

Optionally, the determination unit is specifically used for:

Therefore, in the embodiment of the present application, it is not necessary to disposably input entire training dataset and be calculated, and only need in batches Secondary input picture is calculated, therefore the embodiment of the present application can input entire data set in batch and be trained.Due to Training dataset generally includes a large amount of sample, therefore the embodiment of the present application can reduce by inputting training dataset in batches To the occupancy of resource during training pattern, the demand during training pattern to memory source is greatly reduced, it can be with Effectively solve the computational problem of low-rank label correlation matrix under large-scale data.

Optionally, further includes: the extraction unit is also used to extract the of first sample using the feature extraction network One eigenmatrix, wherein the first sample is not belonging to the n sample；

The acquiring unit is also used to obtain the first prediction of the fisrt feature matrix using first mapping network Label matrix, first sample described in the element representation in the first prediction label matrix include the object of j-th of label instruction Confidence level.

It should be noted that in the embodiment of the present invention, determination unit 610, extraction unit 620, acquiring unit 630 and updating unit 640 can be realized by processor.As shown in fig. 6, the device 700 of training multi-tag disaggregated model may include processor 710, deposit Reservoir 720 and communication interface 730.Wherein, memory 720 can be used for instruction or code of the execution of storage processor 710 etc..When The instruction or code are performed, the method which is used to execute above method embodiment offer, and processor 710 is also It is communicated for controlling communication interface 730 with the external world.

During realization, each step of the above method can pass through the integrated logic circuit of the hardware in processor 710 Or the instruction of software form is completed.The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly at hardware Reason device executes completion, or in processor hardware and software module combine and execute completion.Software module can be located at random Memory, flash memory, read-only memory, the abilities such as programmable read only memory or electrically erasable programmable memory, register In the storage medium of domain maturation.The storage medium is located at memory 720, and processor 710 reads the information in memory 720, knot Close the step of its hardware completes the above method.To avoid repeating, it is not detailed herein.

The dress of the device 600 of trained multi-tag disaggregated model shown in fig. 5 or trained multi-tag disaggregated model shown in fig. 6 Setting 700 can be realized the corresponding each process of preceding method embodiment, specifically, the device of the training multi-tag disaggregated model 600 or the device 700 of training multi-tag disaggregated model may refer to described above, to avoid repeating, which is not described herein again.

It should be understood that magnitude of the sequence numbers of the above procedures are not meant to execute suitable in the various embodiments of the application Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present application Process constitutes any restriction.

The embodiment of the present application also provides a kind of computer readable storage mediums, which is characterized in that including computer program, When the computer program is run on computers, so that the computer executes the method that above method embodiment provides.

The embodiment of the present application also provides a kind of computer program products comprising instruction, which is characterized in that when the meter When calculation machine program product is run on computers, so that the computer executes the method that above method embodiment provides.

It should be understood that processor as mentioned in the embodiments of the present invention can be central processing unit (Central Processing Unit, CPU), it can also be other general processors, digital signal processor (Digital Signal Processor, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing At programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components etc..General processor can be microprocessor or the processor can also To be any conventional processor etc..

It should also be understood that memory as mentioned in the embodiments of the present invention can be volatile memory or non-volatile memories Device, or may include both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read-Only Memory, ROM), programmable read only memory (Programmable ROM, PROM), erasable programmable are only Read memory (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, ) or flash memory EEPROM.Volatile memory can be random access memory (Random Access Memory, RAM), use Make External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random-access Memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random-access Memory (Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data Rate SDRAM, DDR SDRAM), it is enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), same Step connection dynamic random access memory (Synchlink DRAM, SLDRAM) and direct rambus random access memory (Direct Rambus RAM, DR RAM).

It should be noted that when processor is general processor, DSP, ASIC, FPGA or other programmable logic devices When part, discrete gate or transistor logic, discrete hardware components, memory (memory module) is integrated in the processor.

It should be noted that memory described herein is intended to include but is not limited to the storage of these and any other suitable type Device.

It should also be understood that the differentiation that first, second and various digital numbers that are referred to herein only carry out for convenience of description, It is not intended to limit scope of the present application.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed Scope of the present application.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), arbitrary access are deposited The various media that can store program code such as reservoir (Random Access Memory, RAM), magnetic or disk.

The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.

Claims

1. a kind of method of trained multi-tag disaggregated model characterized by comprising

It is concentrated in training data and determines n sample and label matrix Y corresponding with the n sample_c*n, the label matrix Y_c*n In element y_i*jIndicate i-th of sample whether include j-th of label instruction object, c indicate with the training data concentrate The number of the relevant label of sample；

The eigenmatrix X of the n sample is extracted using feature extraction network_d*n, wherein the feature extraction network has power Value parameter Z, d indicate the eigenmatrix X_d*nCharacteristic dimension；

The eigenmatrix X is obtained using Feature Mapping network_d*nPrediction label matrixThe prediction label matrixIn ElementIndicate that i-th of sample includes the confidence level of the object of j-th of label instruction, wherein the Feature Mapping network Weight matrix is the Feature Mapping matrix M of low-rank_c*d；

According to the label matrix Y_c*nWith the prediction label matrixTo the weighting parameter Z, the Feature Mapping matrix M_c*dIt is updated, the training multi-tag disaggregated model；

2. the method according to claim 1, wherein the Feature Mapping network of the low-rank includes the first son mapping Network and the second sub- mapping network, the Feature Mapping network of the low-rank, the first sub- mapping network and second son reflect Network is penetrated with following relationship:

3. according to the method described in claim 2, it is characterized in that, according to the label matrix Y_c*nWith the prediction label square Battle arrayTo the weighting parameter Z, the Feature Mapping matrix M_c*dIt is updated, comprising:

According to the Euclidean distance loss function, to the weighting parameter Z, the weight matrixAnd H_c*rIt is updated.

4. according to the method described in claim 3, it is characterized in that, described according to the Euclidean distance loss function, to described Weighting parameter Z, the weight matrixAnd H_c*rIt is updated, comprising:

By the sum of the Euclidean distance loss function and regular terms, it is determined as the majorized function of the n sample, wherein described Regular terms is for constraining the weighting parameter Z, the weight matrixAnd H_c*r；

Corresponding weighting parameter Z, will be described excellent as updated weighting parameter Z when using the value minimum of the majorized function Change weight matrix corresponding when the value minimum of functionAs updated weight matrixBy the majorized function Value minimum when corresponding weight matrix H_c*rAs updated weight matrix H_c*r。

5. method according to claim 1-4, which is characterized in that described concentrate in training data determines n sample The label matrix Y of this and the n sample_c*n, comprising:

Determine training dataset, it includes D sample and the label with each sample in the D sample that the training data, which is concentrated, Vector, wherein the element y in the label vector of each sample_jIndicate whether each sample includes that j-th of label refers to The object shown, wherein D is the positive integer not less than n；

N sample is randomly selected from training data concentration, and generates the label matrix Y of the n sample_c*n, the label Matrix Y_c*nIncluding the corresponding label vector of each sample in the n sample.

6. method according to claim 1-5, which is characterized in that further include:

The fisrt feature matrix of first sample is extracted using the feature extraction network, wherein the first sample is not belonging to institute State n sample；

The first prediction label matrix of the fisrt feature matrix, the first pre- mark are obtained using first mapping network Sign the confidence level that first sample described in the element representation in matrix includes the object of j-th of label instruction.

7. a kind of device of trained multi-tag disaggregated model characterized by comprising

Determination unit determines n sample and label matrix Y corresponding with the n sample for concentrating in training data_c*n, institute State label matrix Y_c*nIn element y_i*jIndicate i-th of sample whether include j-th of label instruction object, c indicate with it is described The number for the relevant label of sample that training data is concentrated；

Extraction unit, for extracting the eigenmatrix X of the n sample using feature extraction network_d*n, wherein the feature mentions Take network that there is weighting parameter Z, d to indicate the eigenmatrix X_d*nCharacteristic dimension；

Acquiring unit, for obtaining the eigenmatrix X using Feature Mapping network_d*nPrediction label matrixThe prediction Label matrixIn elementIndicate that i-th of sample includes the confidence level of the object of j-th of label instruction, wherein described The weight matrix of Feature Mapping network is the Feature Mapping matrix M of low-rank_c*d；

Updating unit, for according to the label matrix Y_c*nWith the prediction label matrixTo the weighting parameter Z, described Feature Mapping matrix M_c*dIt is updated, the training multi-tag disaggregated model；

8. device according to claim 7, which is characterized in that the Feature Mapping network of the low-rank includes the first son mapping Network and the second sub- mapping network, the Feature Mapping network of the low-rank, the first sub- mapping network and second son reflect Network is penetrated with following relationship:

9. device according to claim 8, which is characterized in that the updating unit is specifically used for:

10. device according to claim 9, which is characterized in that the updating unit is specifically also used to:

11. according to the described in any item devices of claim 7-10, which is characterized in that the determination unit is specifically used for:

12. according to the described in any item devices of claim 7-11, which is characterized in that further include:

The extraction unit is also used to extract the fisrt feature matrix of first sample using the feature extraction network, wherein institute It states first sample and is not belonging to the n sample；

The acquiring unit is also used to obtain the first prediction label of the fisrt feature matrix using first mapping network Matrix, first sample described in the element representation in the first prediction label matrix include setting for the object of j-th of label instruction Reliability.

13. a kind of computer readable storage medium, which is characterized in that including computer program, when the computer program is being counted When being run on calculation machine, so that the computer executes such as method of any of claims 1-6.

14. a kind of computer program product comprising instruction, which is characterized in that when the computer program product on computers When operation, so that the computer executes such as method of any of claims 1-6.