WO2019100723A1 - Procédé et dispositif destinés à l'apprentissage d'un modèle de classification à étiquettes multiples - Google Patents

Procédé et dispositif destinés à l'apprentissage d'un modèle de classification à étiquettes multiples Download PDF

Info

Publication number
WO2019100723A1
WO2019100723A1 PCT/CN2018/094309 CN2018094309W WO2019100723A1 WO 2019100723 A1 WO2019100723 A1 WO 2019100723A1 CN 2018094309 W CN2018094309 W CN 2018094309W WO 2019100723 A1 WO2019100723 A1 WO 2019100723A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
label
network
feature
weight
Prior art date
Application number
PCT/CN2018/094309
Other languages
English (en)
Chinese (zh)
Inventor
刘晓阳
胡晓林
王月红
曹忆南
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019100723A1 publication Critical patent/WO2019100723A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to the field of computers and, more particularly, to methods and apparatus for training a multi-label classification model in the computer field.
  • tags Due to the complexity and ambiguity of the objective object itself, many objects in real life may be related to multiple category tags at the same time.
  • an appropriate subset of tags (including multiple related semantic tags) is often used to describe the object, which forms the so-called multi-label classification problem.
  • each sample corresponds to a related subset of tags consisting of multiple tags.
  • the goal of learning is to predict the corresponding subset of tags for unknown samples.
  • the labels in the subset of labels are not independent of each other, but are semantically related. For example, sheep and grass are very likely to appear in a picture. The possibility of mountains and sky appearing together is also very high, and the possibility of sheep and office appearing together is very small, so this correlation can be used to increase The accuracy of label classification.
  • tag correlation There are several ways to calculate tag correlation in multi-label classification. One of them is to calculate the correlation between tags by learning a low-rank tag correlation matrix, and calculate the low-rank tag by minimizing the loss function of multi-label classification. Correlation matrix to improve the performance of multi-label classification.
  • this method needs to extract the features of the image first, and then calculate the feature mapping matrix and the low rank label correlation matrix according to the features of the image. After the features of the image are extracted, the features of the image are fixed and thus the feature information of the input image cannot be dynamically learned from the tags.
  • the present application provides a method and apparatus for training a multi-label classification model, which can dynamically learn image features, make the feature extraction network more suitable for task requirements, and have a multi-label classification effect.
  • a method of training a multi-label classification model comprising:
  • n samples and a tag matrix Y c*n corresponding to the n samples from the training data set the element y i*j in the tag matrix Y c*n indicating whether the i th sample contains the j th tag
  • the indicated object, c represents the number of tags associated with the samples in the training data set.
  • the feature matrix X d*n of the n samples is extracted by a feature extraction network, wherein the feature extraction network has a weight parameter Z, and d represents a feature dimension of the feature matrix X d*n .
  • the feature extraction network may be any neural network capable of extracting image features, and may be, for example, a convolutional neural network or a multi-layer perceptron, which is not limited by the embodiment of the present application.
  • the weight of the feature extraction network may be represented as Z. Specifically, Z may include multiple weight matrixes.
  • the parameters of the weight matrix can be randomly initialized, or pre-trained model parameters can be used.
  • the pre-trained model parameters refer to the parameters of the already trained model, such as the model parameters trained by the vgg16 network on the ImageNet data set.
  • Prediction label matrix Elements in Indicates that the i-th sample contains the confidence of the object indicated by the j-th label, wherein the weight matrix of the first mapping network is a feature mapping matrix M c*d , and M c*d may represent a multi-label classification model
  • M c*d may represent a multi-label classification model
  • the first mapping network may be represented as an FCM.
  • the feature matrix X d*n output by the feature extraction network can be input to the FCM, and then the FCM maps the input feature matrix X d*n to the prediction label space to obtain a prediction label matrix. That is:
  • the prediction label matrix Can be a label matrix with richer label information, each of which is Indicates that the i-th sample contains the confidence of the object indicated by the j-th label.
  • the weight matrix of the second mapping network is a low rank label correlation matrix S, and the low rank label correlation matrix S is used to describe a relationship between the c labels. That is:
  • the prediction label matrix Low rank label matrix
  • the weight parameter Z the weight parameter Z
  • the feature mapping matrix M c*d the low rank label correlation matrix S to train the multi-label classification model.
  • n, c, i, j, and d are all positive integers, and i ranges from 1 to n, and j ranges from 1 to c.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the feature extraction, the feature mapping matrix, and the low-rank label correlation matrix can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, make the feature extraction network more suitable for task requirements, and multi-label The classification effect is good.
  • the second mapping network includes a first sub-map network and a second sub-map network, where the second mapping network, the first sub-mapping network, and the second sub-mapping network have the following relationships:
  • H c*r is a weight matrix of the second sub-map network
  • r is a positive integer less than or equal to c.
  • the first sub-mapping network may be a weight matrix The full connection layer
  • the second sub-map network may be a fully connected layer with a weight matrix of H c*r
  • the initial values of H c*r can be randomly generated. Since the rank of the matrix obtained by multiplying the two matrices is smaller than the rank of any one of the two matrices, the size of r (ie, r ⁇ c) can be set.
  • H c*r low rank which in turn makes *H c*r low rank, that is, the label correlation matrix S is low rank, and r can take the optimal value by multiple training.
  • the prediction label matrix Low rank label matrix Updating the weight parameter Z, the feature mapping matrix M c*d, and the low rank label correlation matrix S including:
  • the prediction label matrix Low rank label matrix The Euclidean distance loss function is determined as the first loss function, and the expression of the first loss function is as follows:
  • the tag matrix Y c*n and the low rank tag matrix The Euclidean distance loss function is determined as the second loss function, and the expression of the second loss function is as follows:
  • the weight parameter Z the feature mapping matrix M c*d , and the weight matrix of the first sub-map network And updating the weight matrix H c*r of the second sub-mapping network.
  • the weight parameter Z, the feature mapping matrix M c*d , and the weight matrix of the first sub-map network And updating the weight matrix H c*r of the second sub-mapping network, including:
  • the first term of the optimization function L n is the first loss function described above
  • the second term is the second loss function described above.
  • the third term is a regular term that is used to constrain the weight parameter Z and the feature mapping matrix M c*d to prevent overfitting.
  • the error back propagation algorithm may be utilized to minimize the loss function L n , and the weight parameter Z corresponding to the minimum value of the optimization function is used as the updated weight parameter Z, and the value of the optimization function is used.
  • the feature mapping matrix M c*d corresponding to the minimum is used as the updated feature mapping matrix M c*d
  • the weight matrix of the first sub-map network corresponding to the minimum value of the optimization function As the weight matrix of the updated first sub-map network
  • the weight value of the optimization function corresponds to a second minimum sub-network map value matrix H c * r as the weight of the second sub-network map to the updated value of the matrix H c * r.
  • the stop condition is: L n is no longer falling, or the falling amplitude is less than a preset threshold, or the maximum number of trainings is reached. If not, repeat the training until the stop condition is reached.
  • all the pictures are input once and counted as one round of training, and usually several rounds need to be trained.
  • the determining, from the training data set, the n samples and the label matrix Y c*n of the n samples including:
  • a training data set comprising D samples and a label vector with each of the D samples, wherein an element y j in the label vector of each sample represents the each sample Whether the object indicated by the jth label is included, where D is a positive integer greater than n;
  • the training data set from the random sample n, and generates the n samples of the matrix Y c * n labels, the label matrix Y c * n corresponding to each sample comprises the n samples in the tag vector .
  • the embodiment of the present application can input the entire data set for training in batches. That is to say, in the embodiment of the present application, the model may be trained by inputting part of the data in the data set by multiple batches, wherein each input data may be randomly extracted from the image samples that are not input in the data set. Since the training data set usually includes a large number of samples, the embodiment of the present application can reduce the occupation of resources in the process of training the model by inputting the training data set in batches, thereby greatly reducing the demand for memory resources in the process of training the model. It can effectively solve the calculation problem of low rank label correlation matrix under large-scale data.
  • the method further includes: extracting, by using the feature extraction network, a first feature matrix of the first sample, where the first sample does not belong to the n samples;
  • the test picture is input only to the feature extraction network in the neural network model, and the first feature matrix of the test picture is extracted by using the feature extraction network, and the first feature is extracted.
  • the matrix is input to the FCM, and the predictive label matrix of the first feature matrix is acquired and output by the FCM, and the elements in the predictive label matrix represent the confidence that the test includes the object indicated by the jth label.
  • the test picture may be one or more pictures and may not belong to the training data set.
  • an apparatus for training a multi-label classification model is provided, the apparatus being for performing the method of any of the first aspect or the first aspect of the first aspect.
  • the apparatus may comprise means for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • an apparatus for training a multi-label classification model comprising a memory and a processor, the memory for storing instructions, the processor for executing the instructions stored by the memory, and to the memory Execution of the instructions stored in the processor causes the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.
  • a computer readable storage medium in a fourth aspect, storing instructions that, when executed on a computer, cause the computer to perform any of the first aspect or the first aspect The method in the implementation.
  • a computer program product comprising instructions for causing a computer to perform the method of the first aspect or any of the possible implementations of the first aspect, when the computer program product is run on a computer.
  • Figure 1 shows a schematic diagram of the single label classification and multi-label classification problems.
  • FIG. 2 is a schematic flowchart of a method for training a multi-label classification model provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a multi-label classification model provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a complementary tag in the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a multi-label classification model provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of an apparatus for training a multi-label classification model provided by an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of another apparatus for training a multi-label classification model provided by an embodiment of the present application.
  • Figure 1 shows a schematic diagram of the single label classification and multi-label classification problems.
  • single-label classification often assumes that the sample only corresponds to one category label, that is, it has a unique semantic meaning. Then this assumption may not be true in many practical situations, especially considering the semantic diversity of the objective object itself, and the object is likely to be related to multiple different category labels at the same time. Therefore, in the multi-label problem, as shown in FIG. 1(b), a plurality of related category labels are often used to describe the semantic information corresponding to each object. For example, each image may correspond to multiple semantic labels at the same time, such as “Grass”, “Sky” and “Sea”, each piece of music may contain a variety of emotions, such as “pleasure” and “easy”.
  • the training data set may correspond to a label set, and the label set may include c different categories of labels related to the training data, and c is a positive integer.
  • the training data set may include D samples and a subset of tags corresponding to each sample, where D is a positive integer. It can be understood that the subset of tags here is a subset of the set of tags. That is, by learning a plurality of samples in a given training data set and a subset of tags corresponding to each sample, a subset of tags of unknown samples can be predicted.
  • the label subset may be represented as a label vector.
  • the label vector of the sample can indicate which labels the samples have or which categories they belong to. For example, if the label vector of an image is [0 1 0 0 1 0], it means that there are 6 categories, where each element in the label vector represents a category or a label, and 0 means that there is no such category in the image. Or this label, 1 means that there is this type or this label in the image. Since the tag vector has two 1 tags, it means that there are two kinds of objects in the image, belonging to the second class and the fifth class respectively.
  • each of the D samples in the training data set may correspond to a tag vector y j indicating whether the sample contains the object indicated by the j-th tag, where j ranges from 1 to c. It should be understood that, in the embodiment of the present application, whether the sample includes the object indicated by the jth label, that is, whether the sample includes the jth label.
  • the label vectors of all or part of the samples in the training data set form a label matrix Y:
  • the predictive label vector is the output of the multi-label classifier, representing the prediction of the category to which the multi-label classifier belongs, the dimension of which is the same as the label vector.
  • the value of the element of the prediction label vector is a real value. If the real value exceeds a given threshold, the position corresponding to the element belongs to the corresponding category, otherwise it does not belong to the category.
  • the predictive label vector is [0.7 0.2 0.1 0.8 1.0 0.0]
  • the threshold is 0.5
  • the number on each bit is compared to a threshold, which is equivalent to belonging to this category.
  • the categories predicted in this way are the first, fourth and fifth categories. If the label vector corresponding to the predicted label vector is [1 0 0 1 0 1 0], the predicted label vector is completely correct.
  • the tag information corresponding to the samples in the training data set is likely to be incomplete. That is to say, in the tag matrix of the data, the fact that the sample does not contain a tag does not mean that the sample is not related to the tag in actual situations. Therefore, it is necessary to learn the correlation between the tags by training the existing data in the data set, and then use the tag correlation to obtain a tag matrix containing richer tag information, and then the tag matrix containing the richer tag information can be more accurate. Predict the tag information of an unknown sample.
  • the embodiment of the present application designs a neural network for multi-label classification, which can implement a multi-label classification algorithm by learning a feature mapping matrix, a low rank label correlation matrix, and an optimized feature extraction network.
  • the neural network system is an intelligent recognition system that accumulates training results by means of repeated training to improve the ability to recognize various target objects or sounds.
  • Convolutional neural networks are one of the mainstream directions in the development of neural networks.
  • the convolutional neural network generally includes a Convolutional Layer, a Rectified Linear Units (ReLU) layer, a Pooling layer, and a Fully Connect (FC) layer.
  • the convolutional layer, the ReLU layer and the Pooling layer may be repeatedly set repeatedly.
  • the convolutional layer can be considered as the core of a convolutional neural network, and when used for image recognition, its input receives image data for identifying the image by a filter.
  • the image data here may be the image conversion result captured by the camera, or may be the processing result of the layer before the convolution layer.
  • the image data is a three-dimensional image array, such as 32x32x3, where 32x32 is the two-dimensional size of the image represented by the image data, ie width and height, where the depth value 3 is because the image is usually divided into green, red and blue. Data channel.
  • a plurality of filters are provided in the convolution layer, and different filters correspond to different image features (boundary, color, shape, etc.) to scan the input image data in a certain step size.
  • Different weight matrices are provided in different filters, which are generated for a specific image feature in the learning process of the neural network.
  • Each filter scans an area of each image and obtains a three-dimensional input matrix (MxNx3, M and N determine the size of the scan area).
  • the convolution network plots the input matrix and the weight matrix as a result. The value will then scan the next area in a specific step size, for example, traversing two areas.
  • a filter scans all regions according to a specific step size, the resulting values form a two-dimensional matrix; and when all filters are scanned, the resulting values form a three-dimensional matrix as the output of the current convolutional layer.
  • the different depth layers of the matrix correspond to the scan results of one filter (ie, the two-dimensional matrix formed after each filter scan).
  • the output of the convolutional layer is sent to the ReLU layer for processing (the value range of the output is limited by the max(0,x) function), and sent to the Pooling layer to downsize by downsampling, before being sent to the FC layer, the image
  • the data may also go through multiple convolutional layers to deeply characterize the image features (such as the first convolutional layer to identify only the outline features of the image, the second convolutional layer to begin to recognize the pattern, etc.), and finally Enter the FC layer. Similar to the convolutional layer but slightly different, the FC layer also weights the input data through multiple filters, but the FC layer does not have each filter shift through each beat like the convolution layer filter.
  • the final FC layer outputs a 1x1xN matrix, which is actually a sequence of data. Each bit of the data sequence corresponds to a different target object, and the values on it can be regarded as the scores of the objects of these objects.
  • a weight matrix is used, and the neural network can maintain multiple weight matrices through self-training.
  • FIG. 2 is a schematic flowchart of a method for training a multi-label classification model provided by an embodiment of the present application. It should be understood that FIG. 2 illustrates steps or operations of a method of training a multi-label classification model, but these steps or operations are merely examples, and other embodiments of the present application may also perform other operations or variations of the various operations in FIG. 2. Moreover, the various steps in FIG. 2 may be performed in a different order than that presented in FIG. 2, and it is possible that not all operations in FIG. 2 are to be performed.
  • FIG. 3 is a schematic diagram of a multi-label classification model 300 provided by an embodiment of the present application.
  • the multi-label classification model 300 is specifically a neural network system.
  • the multi-label classification model 300 includes a feature extraction network 301, an FCM 302, a mapping network 31, and a processing unit 305, wherein the mapping network 31 can include an FCW 303 and an FCH 304.
  • the multi-label classification model 300 illustrated in FIG. 3 is merely an example, and embodiments of the present application may further include other modules or units or variations of the various modules or units in FIG. 3.
  • the multi-label classification method in the embodiment of the present application can be applied to multiple fields such as image annotation, image recognition, voice recognition, and text classification.
  • the samples in the corresponding training data set may be images, sounds, documents, and the like. This embodiment of the present application does not limit this. For convenience of description, the following description will be made by taking image recognition using image samples as an example, but this does not limit the scheme of the embodiment of the present application.
  • the weights of the multi-label classification model 200 are initialized, that is, the weights of the feature extraction network 301, the FCM 302, and the mapping network 31 (i.e., FCW 303 and FCH 304) in the initialization system.
  • the feature extraction network 301 may be any neural network capable of extracting image features, and may be, for example, a convolutional neural network or a multi-layer perceptron, which is not limited by the embodiment of the present application.
  • the weight of the feature extraction network 301 can be represented as Z.
  • Z can include multiple weight matrices.
  • the parameters of the weight matrix can be randomly initialized, or pre-trained model parameters can be used.
  • the pre-trained model parameters refer to the parameters of the already trained model, such as the model parameters trained by the vgg16 network on the ImageNet data set.
  • the FCM indicates that the weight matrix is a fully connected layer of the feature mapping matrix M c*d , wherein M c*d can represent the correlation weight between the feature attribute and the category label in the multi-label classification model, and the initial value can be randomly generated.
  • FCW 303 indicates that the weight matrix is Fully connected layer
  • FCH 304 represents a fully connected layer whose weight matrix is H c*r
  • the initial values of H c*r can be randomly generated.
  • r is a self-set value and needs to satisfy r ⁇ c.
  • the embodiment of the present application can input the entire data set for training in batches. That is to say, in the embodiment of the present application, the model may be trained by inputting part of the data in the data set by multiple batches, wherein each input data may be randomly extracted from the image samples that are not input in the data set. Since the training data set usually includes a large number of samples, the embodiment of the present application can reduce the occupation of resources in the process of training the model by inputting the training data set in batches.
  • the number of samples of one batch input to the multi-label classification model 300 may be n.
  • the n samples may be represented as image_n, and more specifically, image_n may be n pictures randomly extracted from D samples of the training data set, and the value of n may be much smaller than D.
  • the size of n can be determined based on the capabilities of the multi-label classification model 300. For example, if the data processing capability of the multi-label classification model 300 is strong, n can be set relatively large to shorten the training model time. For another example, if the data processing capability of the multi-label classification model 300 is weak, n can be set relatively small to reduce the resources consumed by the training model. In this way, the embodiment of the present application can flexibly set the value of n according to the data processing capability of the multi-label classification model 300.
  • the label matrix corresponding to the n samples may be represented as Y c*n
  • the element y i*j in the label matrix Y c*n indicates whether the i-th sample contains the object indicated by the j-th label, where i
  • the value ranges from 1 to n
  • j ranges from 1 to c.
  • the description of the label matrix can be referred to the above description. To avoid repetition, details are not described herein again.
  • the training data may be input to the multi-label classification model 300 shown in FIG.
  • the n pictures in the training data set and the label matrix Y c*n of the n pictures may be input to the multi-label classification model 300, respectively.
  • the n pictures can be input to the feature extraction network 301, and the feature extraction network 301 can extract the features of the n pictures through the functions of a convolution layer, an activation function layer, a Pooling layer, a fully connected layer, and a Batchnorm layer. And output the feature matrix X d*n .
  • d is a positive integer and represents the feature dimension of the feature matrix X d*n .
  • the feature matrix X d*n output by the feature extraction network 301 can be input to the FCM 302. Since the FCM represents the weight matrix as the fully connected layer of the feature mapping matrix M c*d , and M c*d can represent the correlation weight between the feature attribute and the category label in the multi-label classification model, the FCM 302 can input the The feature matrix X d*n is mapped to the prediction label space to obtain a prediction label matrix That is:
  • the prediction label matrix Can be a label matrix with richer label information, each of which is Indicates that the i-th sample contains the confidence of the object indicated by the j-th label.
  • n label matrices Y c*n may be input to the mapping network 31, and the output of the mapping network 31 is a label rank related low rank label matrix of the label matrix Y c*n
  • the weight matrix of the mapping network 31 is a label correlation matrix S, and the label correlation matrix S is used to describe a relationship between the c labels, that is,
  • the matrix When there is a correlation between the elements of the matrix, the matrix is low rank. It can be seen that since each element in the tag correlation matrix S is used to describe the relationship between two tags, the tag correlation matrix S is a low rank matrix. Specifically, the rank of the low rank matrix is smaller than the number of rows or columns of the matrix. At this point, the missing elements of the matrix can be recovered according to the low rank structure of the matrix. This recovery process can be called matrix completion, so it can be Called the completion label matrix, It is likely to contain more extensive tag information. Each element in It can be indicated that the i-th sample contains the confidence of the object indicated by the j-th label.
  • FIG. 4 is a schematic structural diagram of a complementary tag in the embodiment of the present application. It is assumed that picture 1 is known to contain only the label "fish” in the original incomplete label matrix Y, and then the complement label matrix is obtained by the label correlation structure by the method in (2). During the construction process, there was a strong correlation between the label "fish” and “ocean”, which led to the completion of the label matrix. It is also more likely that the medium-predicted picture 1 contains the "marine" label.
  • the mapping network 31 may specifically include an FCW 303 and an FCH 304.
  • the size of r ie, r ⁇ c
  • H c*r low rank which in turn makes *H c*r low rank, that is, the label correlation matrix S is low rank, and r can take the optimal value by multiple training.
  • the input of the FCW 303 is the label matrix Y c*n corresponding to the image_n
  • the output of the FCW 303 can be represented as P r*n
  • the P r*n can be directly input to the FCH 304
  • the FCH 304 outputs the low rank label.
  • the processing unit 305 can be based on the label matrix Y c*n , the prediction label matrix Low rank label matrix
  • the weight parameter Z, the feature mapping matrix M c*d and the low rank label correlation matrix S are updated to train the multi-label classification model 300.
  • the processing unit 305 can use the prediction label matrix Low rank label matrix
  • the Euclidean distance loss function is determined as the first loss function Role is constraint Make it Similar, and the expression of the first loss function is as follows:
  • Ai j is the element of matrix A, ie the Euclidean distance loss function.
  • the processing unit 305 can also use the label matrix Y c*n and the low rank label matrix
  • the Euclidean distance loss function is determined as the second loss function
  • the expression of the second loss function is as follows:
  • the sum of the first loss function, the second loss function and the regular term may be determined as the loss function L n of the n samples.
  • the loss function L n may also be referred to as an optimization function L n .
  • the expression of L n is as follows:
  • the first term of the optimization function L n is the first loss function described above
  • the second term is the second loss function described above.
  • the third term is a regular term that is used to constrain the weight parameter Z and the feature mapping matrix M c*d to prevent overfitting.
  • the error back propagation algorithm is a method for multi-layer neural network training. Based on the gradient descent method, the weight of each layer of the neural network is learned and updated by optimizing the loss function.
  • the error back propagation algorithm may be utilized to minimize the loss function L n and the weight parameter Z corresponding to the minimum value of the optimization function is used as the updated weight parameter Z, and the optimization is performed.
  • the corresponding feature mapping matrix M c*d is used as the updated feature mapping matrix M c*d
  • the weight matrix S corresponding to the minimum value of the optimization function is used as the updated weight. Value matrix S.
  • variables in (9) are derived below. Take the input of a picture and the regular item using the l 2 norm as an example.
  • the square of the Frobenius norm of the matrix corresponds to the square of the l 2 norm of the vector
  • the l 2,1 norm of the matrix corresponds to the l 2 norm of the vector
  • m ji is the element of the matrix M c*d
  • h kj is the element of the matrix H c*r
  • x i is the vector of the vector x d
  • w ji is the matrix Element
  • p j is a vector Elements
  • Vector Elements Vector Element
  • the inverse of the error of the feature extraction network weight Z can be obtained by M c*d .
  • M c*d , H c*r and The elements are updated to:
  • Is the value obtained by this update Is the value obtained from the last update
  • h ji and w ji are similar
  • ⁇ 1 , ⁇ 2 , ⁇ 3 are M c*d , H c*r and Learning rate, used to control the update rate.
  • the update of the feature extraction network part weight Z is similar.
  • the feature mapping matrix M c*d which improves the ability to classify multiple tags, can also complement the missing tags with tag correlation. 270. Determine whether the stop condition is reached.
  • the stop condition is: L n is no longer falling, or the falling amplitude is less than a preset threshold, or the maximum number of trainings is reached. If not, steps 220 through 260 are repeated until the stop condition is reached. In the embodiment of the present application, all the pictures are input once and counted as one round of training, and usually several rounds need to be trained.
  • test picture is input to the feature extraction network in the neural network model, and the first feature matrix of the test picture is extracted by using the feature extraction network, and the first feature matrix is extracted.
  • a feature matrix is input to the FCM, and the predictive tag matrix of the first feature matrix is acquired and output by the FCM, and the elements in the predictive tag matrix represent the confidence that the test includes the object indicated by the jth tag.
  • the test picture may be one or more pictures and may not belong to the training data set.
  • the preset threshold may be 0.5, or other values, which are not limited by the embodiment of the present application.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the feature extraction, the feature mapping matrix, and the low rank label correlation matrix can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, and the feature extraction network is more suitable for task requirements, and multi-label classification. The effect is good.
  • the embodiment of the present application can calculate the low rank label correlation matrix and the feature mapping matrix by using the image features of the picture samples in batches, without having to use the image features of the entire data set as input for calculation, that is, one-time use
  • the image features of all samples are trained, which greatly reduces the memory resource requirements in the process of training the model, and can effectively solve the computation problem of low rank label correlation matrix under large-scale data.
  • FIG. 5 is a schematic diagram of a multi-label classification model 500 provided by an embodiment of the present application.
  • the feature extraction network portion of the model 500 employs a VGG16 network, and the output of the Dropout layer after the penultimate fully connected layer of the VGG16 network is taken as the feature matrix X.
  • the weighting parameter Z of the feature extraction network uses the weighting parameters trained on the ImageNet dataset, and then fine-tunes it (fine tuning refers to fixing the weights of the previous layers or making only minor adjustments, fully training the last layer. Or two-tier network).
  • the initial values of the weight matrices M, H, and W can be initialized with a Gaussian distribution, and the values of M, H, and W are fully trained.
  • the regular term can use the Frobenius norm.
  • the weight of the feature extraction network VGG16 (excluding the last fully connected layer) is weighted pre-trained on the ImageNet data set.
  • Input n RGB three-channel picture image_n with a pixel size of 224*224 into the VGG16 network where 1 ⁇ n ⁇ N, N is the number of pictures in the training set, and the picture size can be expressed as n*C*h*w or h *w*C*n and other four-dimensional matrices, where C is the number of channels (RGB image is 3), h is the height of the image (224 pixels), and w is the width of the image (224 pixels).
  • the image is then subjected to two fully connected layers and a Dropout layer to obtain an image feature matrix X 4096*n .
  • X 4096*n passes through a fully connected layer (FCM 502) whose weight matrix is M c*4096 , and the prediction label matrix is obtained.
  • Y c*n passes through two weight matrices respectively And the fully connected layers of H c*r (FCW 503 and FCH 504), resulting in a low rank label correlation matrix *H c*r and low rank label matrix with label correlation
  • the processing unit 505 is based on the label matrix Y c*n and the prediction label matrix Low rank label matrix Get the optimization function:
  • the weight parameter Z, the feature mapping matrix M c*d , and the weight matrix are updated. After H c*r , it is judged whether the stop condition is reached, and if it is not reached, the steps are repeated until the stop condition is reached.
  • the stop condition can be referred to the description above, and to avoid repetition, details are not described herein again.
  • test picture may be input to the feature extraction network 501, and the feature of the picture extracted by the feature extraction network is input to the FCM 502, and the predicted tag matrix is obtained by the FCM 502.
  • the structure of the feature extraction network may be replaced by other networks, such as AlexNet, GoogleNet, ResNet, and a custom network.
  • the layer of the feature output may use the output of a certain layer of the above network, or may add or subtract several convolutional layers or fully connected layers on the basis of the above.
  • different regularization items may also be adopted in the embodiments of the present application.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the feature extraction, the feature mapping matrix, and the low rank label correlation matrix can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, and the feature extraction network is more suitable for task requirements, and multi-label classification. The effect is good.
  • the embodiment of the present application can calculate the low rank label correlation matrix and the feature mapping matrix by using the image features of the picture samples in batches, without having to use the image features of the entire data set as input for calculation, that is, one-time use
  • the image features of all samples are trained, which greatly reduces the memory resource requirements in the process of training the model, and can effectively solve the computation problem of low rank label correlation matrix under large-scale data.
  • the embodiment of the present application does not limit the specific product form, and the method for multi-label classification in the embodiment of the present application can be deployed on a general-purpose computer node.
  • the initially constructed multi-label classification model can be stored in the hard disk memory, and the existing training data set is learned by the processor and the memory running algorithm to obtain the multi-label classification model.
  • the multi-label classification model can predict the label of the unknown sample, store the prediction result in the hard disk storage, complete the existing label set, and predict the label corresponding to the unknown sample.
  • FIG. 6 is a schematic block diagram of an apparatus 600 for training a multi-label classification model provided by an embodiment of the present application.
  • Apparatus 600 includes a determining unit:
  • a determining unit 610 configured to determine, from the training data set, n samples and a label matrix Y c*n corresponding to the n samples, where an element y i*j in the label matrix Y c*n represents an ith sample Whether the object indicated by the jth tag is included, and c represents the number of tags associated with the samples in the training data set;
  • the extracting unit 620 is configured to extract the feature matrix X d*n of the n samples by using a feature extraction network, where the feature extraction network has a weight parameter Z, and d represents a feature dimension of the feature matrix X d*n ;
  • a first acquiring unit 630 configured to acquire, by using a first mapping network, a prediction label matrix of the feature matrix X d*n Prediction label matrix Elements in Indicates that the i-th sample contains the confidence of the object indicated by the j-th label, wherein the weight matrix of the first mapping network is the feature mapping matrix M c*d ;
  • a second obtaining unit 640 configured to acquire a low rank label matrix of the label matrix Y c*n by using a second mapping network
  • the weight matrix of the second mapping network is a low rank label correlation matrix S, and the low rank label correlation matrix S is used to describe a relationship between the c labels.
  • the updating unit 650 is configured to use, according to the label matrix Y c*n , the prediction label matrix Low rank label matrix Updating the weight parameter Z, the feature mapping matrix M c*d and the low rank label correlation matrix S to train the multi-label classification model;
  • n, c, i, j, and d are all positive integers, and i ranges from 1 to n, and j ranges from 1 to c.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the feature extraction, the feature mapping matrix, and the low-rank label correlation matrix can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, make the feature extraction network more suitable for task requirements, and multi-label The classification effect is good.
  • the second mapping network includes a first sub-map network and a second sub-map network, where the second mapping network, the first sub-mapping network, and the second sub-mapping network have the following relationships:
  • H c*r is a weight matrix of the second sub-map network
  • r is a positive integer less than or equal to c.
  • the updating unit 650 is specifically configured to:
  • the prediction label matrix Low rank label matrix The Euclidean distance loss function is determined as the first loss function
  • the tag matrix Y c*n and the low rank tag matrix The Euclidean distance loss function is determined as a second loss function
  • the weight parameter Z the feature mapping matrix M c*d , and the weight matrix of the first sub-map network And updating the weight matrix H c*r of the second sub-mapping network.
  • the updating unit 650 is further specifically configured to:
  • the weight parameter Z corresponding to the minimum value of the optimization function is used as the updated weight parameter Z
  • the feature mapping matrix M c*d corresponding to the minimum value of the optimization function is used as the updated a feature mapping matrix M c*d
  • a weight matrix of the first sub-map network corresponding to the value of the optimization function As the weight matrix of the updated first sub-map network
  • the weight value of the optimization function corresponds to a second minimum sub-network map value matrix H c * r as the weight of the second sub-network map to the updated value of the matrix H c * r.
  • the determining unit 610 is specifically configured to:
  • a training data set comprising D samples and a label vector with each of the D samples, wherein an element y j in the label vector of each sample represents the each sample Whether the object indicated by the jth label is included, where D is a positive integer greater than n;
  • the training data set from the random sample n, and generates the n samples of the matrix Y c * n labels, the label matrix Y c * n corresponding to each sample comprises the n samples in the tag vector .
  • the embodiment of the present application it is not necessary to input the entire training data set for calculation at one time, and only the input pictures of the batch are needed for calculation. Therefore, the embodiment of the present application can input the entire data set for training in batches. Since the training data set usually includes a large number of samples, the embodiment of the present application can reduce the occupation of resources in the process of training the model by inputting the training data set in batches, thereby greatly reducing the demand for memory resources in the process of training the model. It can effectively solve the calculation problem of low rank label correlation matrix under large-scale data.
  • the extracting unit 620 is further configured to extract, by using the feature extraction network, a first feature matrix of the first sample, where the first sample does not belong to the n samples;
  • the first obtaining unit 630 is further configured to acquire, by using the first mapping network, a first prediction label matrix of the first feature matrix, where an element in the first prediction label matrix indicates that the first sample includes a first The confidence level of the object indicated by j labels.
  • apparatus 700 for training a multi-label classification model can include a processor 710, a memory 720, and a communication interface 730.
  • the memory 720 can be used to store instructions or codes and the like executed by the processor 710.
  • the processor 710 is configured to execute the method provided by the foregoing method embodiment, and the processor 710 is further configured to control the communication interface 730 to communicate with the outside world.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 710 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in memory 720, and processor 710 reads the information in memory 720 and, in conjunction with its hardware, performs the steps of the above method. To avoid repetition, it will not be described in detail here.
  • the apparatus 600 for training the multi-label classification model shown in FIG. 6 or the apparatus 700 for training the multi-label classification model shown in FIG. 7 can implement the respective processes corresponding to the foregoing method embodiments. Specifically, the apparatus 600 for training the multi-label classification model For the apparatus 700 for training the multi-label classification model, reference may be made to the above description. To avoid repetition, details are not described herein again.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.
  • the embodiment of the present application further provides a computer readable storage medium, comprising a computer program, when executed on a computer, causing the computer to execute the method provided by the foregoing method embodiment.
  • the embodiment of the present application further provides a computer program product comprising instructions, when the computer program product is run on a computer, causing the computer to execute the method provided by the foregoing method embodiment.
  • processors mentioned in the embodiment of the present invention may be a central processing unit (CPU), and may also be other general-purpose processors, digital signal processors (DSPs), and application specific integrated circuits ( Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory referred to in the embodiments of the present invention may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read only memory (EEPROM) or flash memory.
  • the volatile memory can be a Random Access Memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM). SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Connection Dynamic Random Access Memory (Synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (DR RAM).
  • processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, the memory (storage module) is integrated in the processor.
  • memories described herein are intended to comprise, without being limited to, these and any other suitable types of memory.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un dispositif destinés à l'apprentissage d'un modèle de classification à étiquettes multiples, pouvant former de manière dynamique des caractéristiques d'image, permettant à un réseau d'extraction de caractéristiques de mieux s'adapter à des exigences de tâche et présentant de bonnes performances de classification à étiquettes multiples. Le procédé comprend les étapes consistant : à déterminer, parmi un ensemble de données d'apprentissage, n échantillons et une matrice d'étiquettes Yc*n correspondant aux n échantillons, un élément yi*j dans la matrice d'étiquettes Yc*n indiquant si le ième échantillon comprend un objet désigné par la jème étiquette, et c indiquant le nombre d'étiquettes associées aux échantillons ; à utiliser un réseau d'extraction de caractéristiques pour extraire une matrice de caractéristiques Xd*n des n échantillons ; et à utiliser un premier réseau de mise en correspondance pour acquérir une matrice d'étiquettes prédite pour la matrice de caractéristiques Xd*n, à utiliser un second réseau de mise en correspondance pour acquérir une matrice d'étiquettes de rang inférieur pour la matrice d'étiquettes Yc*n, et à mettre à jour, en fonction de la matrice d'étiquettes Yc*n, de la matrice d'étiquettes prédite et de la matrice d'étiquettes de rang inférieur, un paramètre de pondération Z, une matrice de mise en correspondance de caractéristiques Mc*d et une matrice de corrélation d'étiquettes de rang inférieur S pour former un modèle de classification à étiquettes multiples.
PCT/CN2018/094309 2017-11-24 2018-07-03 Procédé et dispositif destinés à l'apprentissage d'un modèle de classification à étiquettes multiples WO2019100723A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711187818.9 2017-11-24
CN201711187818.9A CN109840531B (zh) 2017-11-24 2017-11-24 训练多标签分类模型的方法和装置

Publications (1)

Publication Number Publication Date
WO2019100723A1 true WO2019100723A1 (fr) 2019-05-31

Family

ID=66631376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094309 WO2019100723A1 (fr) 2017-11-24 2018-07-03 Procédé et dispositif destinés à l'apprentissage d'un modèle de classification à étiquettes multiples

Country Status (2)

Country Link
CN (1) CN109840531B (fr)
WO (1) WO2019100723A1 (fr)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334186A (zh) * 2019-07-08 2019-10-15 北京三快在线科技有限公司 数据查询方法、装置、计算机设备及计算机可读存储介质
CN110647992A (zh) * 2019-09-19 2020-01-03 腾讯云计算(北京)有限责任公司 卷积神经网络的训练方法、图像识别方法及其对应的装置
CN110929785A (zh) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 数据分类方法、装置、终端设备及可读存储介质
CN111199244A (zh) * 2019-12-19 2020-05-26 北京航天测控技术有限公司 一种数据的分类方法、装置、存储介质及电子装置
CN111275183A (zh) * 2020-01-14 2020-06-12 北京迈格威科技有限公司 视觉任务的处理方法、装置和电子系统
CN111783831A (zh) * 2020-05-29 2020-10-16 河海大学 基于多源多标签共享子空间学习的复杂图像精确分类方法
CN112132188A (zh) * 2020-08-31 2020-12-25 浙江工业大学 一种基于网络属性的电商用户分类方法
US10878296B2 (en) * 2018-04-12 2020-12-29 Discovery Communications, Llc Feature extraction and machine learning for automated metadata analysis
CN112182214A (zh) * 2020-09-27 2021-01-05 中国建设银行股份有限公司 一种数据分类方法、装置、设备及介质
CN112365931A (zh) * 2020-09-18 2021-02-12 昆明理工大学 一种用于预测蛋白质功能的数据多标签分类方法
CN112541055A (zh) * 2020-12-17 2021-03-23 中国银联股份有限公司 一种确定文本标签的方法及装置
CN112598076A (zh) * 2020-12-29 2021-04-02 北京易华录信息技术股份有限公司 一种机动车属性识别方法及系统
CN112766383A (zh) * 2021-01-22 2021-05-07 浙江工商大学 一种基于特征聚类和标签相似性的标签增强方法
CN112785441A (zh) * 2020-04-20 2021-05-11 招商证券股份有限公司 数据处理方法、装置、终端设备及存储介质
CN113034406A (zh) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 扭曲文档恢复方法、装置、设备及介质
CN113326698A (zh) * 2021-06-18 2021-08-31 深圳前海微众银行股份有限公司 检测实体关系的方法、模型训练方法及电子设备
CN113327666A (zh) * 2021-06-21 2021-08-31 青岛科技大学 一种胸片疾病多分类网络的多标签局部至全局学习方法
CN113469225A (zh) * 2021-06-16 2021-10-01 浙江工业大学 基于跨域特征相关性分析的图像转换方法
CN113920210A (zh) * 2021-06-21 2022-01-11 西北工业大学 基于自适应图学习主成分分析方法的图像低秩重构方法
CN114401205A (zh) * 2022-01-21 2022-04-26 中国人民解放军国防科技大学 无标注多源网络流量数据漂移检测方法和装置
CN114648635A (zh) * 2022-03-15 2022-06-21 安徽工业大学 一种融合标签间强相关性的多标签图像分类方法
CN115797709A (zh) * 2023-01-19 2023-03-14 苏州浪潮智能科技有限公司 一种图像分类方法、装置、设备和计算机可读存储介质
CN115934809A (zh) * 2023-03-08 2023-04-07 北京嘀嘀无限科技发展有限公司 一种数据处理方法、装置和电子设备
CN111626279B (zh) * 2019-10-15 2023-06-02 西安网算数据科技有限公司 一种负样本标注训练方法及高度自动化的票据识别方法
CN117876797A (zh) * 2024-03-11 2024-04-12 中国地质大学(武汉) 图像多标签分类方法、装置及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11341358B2 (en) * 2019-09-30 2022-05-24 International Business Machines Corporation Multiclassification approach for enhancing natural language classifiers
CN113269215B (zh) * 2020-02-17 2023-08-01 百度在线网络技术(北京)有限公司 一种训练集的构建方法、装置、设备和存储介质
CN113496232B (zh) * 2020-03-18 2024-05-28 杭州海康威视数字技术股份有限公司 标签校验方法和设备
CN111524187B (zh) * 2020-04-22 2023-06-02 北京三快在线科技有限公司 一种视觉定位模型的训练方法及装置
CN111667399B (zh) * 2020-05-14 2023-08-25 华为技术有限公司 风格迁移模型的训练方法、视频风格迁移的方法以及装置
CN111797910B (zh) * 2020-06-22 2023-04-07 浙江大学 一种基于平均偏汉明损失的多维标签预测方法
CN112465126B (zh) * 2020-07-27 2023-12-19 国电内蒙古东胜热电有限公司 用于跑冒滴漏检测的加载预训练卷积网络检测方法及装置
CN112215795B (zh) * 2020-09-02 2024-04-09 苏州超集信息科技有限公司 一种基于深度学习的服务器部件智能检测方法
CN112353402B (zh) * 2020-10-22 2022-09-27 平安科技(深圳)有限公司 心电信号分类模型的训练方法、心电信号分类方法及装置
CN113222068B (zh) * 2021-06-03 2022-12-27 西安电子科技大学 基于邻接矩阵指导标签嵌入的遥感图像多标签分类方法
CN113076426B (zh) * 2021-06-07 2021-08-13 腾讯科技(深圳)有限公司 多标签文本分类及模型训练方法、装置、设备及存储介质
CN114723987A (zh) * 2022-03-17 2022-07-08 Oppo广东移动通信有限公司 图像标签分类网络的训练方法、图像标签分类方法及设备
CN115205011B (zh) * 2022-06-15 2023-08-08 海南大学 基于lsf-fc算法的银行用户画像模型生成方法
CN115841596B (zh) * 2022-12-16 2023-09-15 华院计算技术(上海)股份有限公司 多标签图像分类方法及其模型的训练方法、装置
CN118364393B (zh) * 2024-06-20 2024-09-10 成都工业学院 一种基于相关性增强特征学习的多标签分类方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120093396A1 (en) * 2010-10-13 2012-04-19 Shengyang Dai Digital image analysis utilizing multiple human labels
CN105825502A (zh) * 2016-03-12 2016-08-03 浙江大学 一种基于显著性指导的词典学习的弱监督图像解析方法
CN107292322A (zh) * 2016-03-31 2017-10-24 华为技术有限公司 一种图像分类方法、深度学习模型及计算机系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8805845B1 (en) * 2013-07-31 2014-08-12 LinedIn Corporation Framework for large-scale multi-label classification
US10325220B2 (en) * 2014-11-17 2019-06-18 Oath Inc. System and method for large-scale multi-label learning using incomplete label assignments
CN104899596B (zh) * 2015-03-16 2018-09-14 景德镇陶瓷大学 一种多标签分类方法及其装置
CN105320967A (zh) * 2015-11-04 2016-02-10 中科院成都信息技术股份有限公司 基于标签相关性的多标签AdaBoost集成方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120093396A1 (en) * 2010-10-13 2012-04-19 Shengyang Dai Digital image analysis utilizing multiple human labels
CN105825502A (zh) * 2016-03-12 2016-08-03 浙江大学 一种基于显著性指导的词典学习的弱监督图像解析方法
CN107292322A (zh) * 2016-03-31 2017-10-24 华为技术有限公司 一种图像分类方法、深度学习模型及计算机系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG, ZHEN: "Learning Label Correlations for Multi-Label Classification", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE , CHINA MASTER'S THESES FULL-TEXT DATABASE, 15 September 2015 (2015-09-15), pages 1140 - 75, ISSN: 1674-0246 *
YAO, HONGGE ET AL.: "Intelligent Identification of Image Character Based on Wavelet Analysis and BP Neural Network", JOURNAL OFXI'AN TECHNOLOGICAL UNIVERSITY, vol. 28, no. 6, 31 December 2008 (2008-12-31), ISSN: 1673-9965 *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10878296B2 (en) * 2018-04-12 2020-12-29 Discovery Communications, Llc Feature extraction and machine learning for automated metadata analysis
CN110334186A (zh) * 2019-07-08 2019-10-15 北京三快在线科技有限公司 数据查询方法、装置、计算机设备及计算机可读存储介质
CN110334186B (zh) * 2019-07-08 2021-09-28 北京三快在线科技有限公司 数据查询方法、装置、计算机设备及计算机可读存储介质
CN110647992A (zh) * 2019-09-19 2020-01-03 腾讯云计算(北京)有限责任公司 卷积神经网络的训练方法、图像识别方法及其对应的装置
CN111626279B (zh) * 2019-10-15 2023-06-02 西安网算数据科技有限公司 一种负样本标注训练方法及高度自动化的票据识别方法
CN110929785A (zh) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 数据分类方法、装置、终端设备及可读存储介质
CN110929785B (zh) * 2019-11-21 2023-12-05 中国科学院深圳先进技术研究院 数据分类方法、装置、终端设备及可读存储介质
CN111199244A (zh) * 2019-12-19 2020-05-26 北京航天测控技术有限公司 一种数据的分类方法、装置、存储介质及电子装置
CN111199244B (zh) * 2019-12-19 2024-04-09 北京航天测控技术有限公司 一种数据的分类方法、装置、存储介质及电子装置
CN111275183B (zh) * 2020-01-14 2023-06-16 北京迈格威科技有限公司 视觉任务的处理方法、装置和电子系统
CN111275183A (zh) * 2020-01-14 2020-06-12 北京迈格威科技有限公司 视觉任务的处理方法、装置和电子系统
CN112785441B (zh) * 2020-04-20 2023-12-05 招商证券股份有限公司 数据处理方法、装置、终端设备及存储介质
CN112785441A (zh) * 2020-04-20 2021-05-11 招商证券股份有限公司 数据处理方法、装置、终端设备及存储介质
CN111783831A (zh) * 2020-05-29 2020-10-16 河海大学 基于多源多标签共享子空间学习的复杂图像精确分类方法
CN111783831B (zh) * 2020-05-29 2022-08-05 河海大学 基于多源多标签共享子空间学习的复杂图像精确分类方法
CN112132188B (zh) * 2020-08-31 2024-04-16 浙江工业大学 一种基于网络属性的电商用户分类方法
CN112132188A (zh) * 2020-08-31 2020-12-25 浙江工业大学 一种基于网络属性的电商用户分类方法
CN112365931B (zh) * 2020-09-18 2024-04-09 昆明理工大学 一种用于预测蛋白质功能的数据多标签分类方法
CN112365931A (zh) * 2020-09-18 2021-02-12 昆明理工大学 一种用于预测蛋白质功能的数据多标签分类方法
CN112182214B (zh) * 2020-09-27 2024-03-19 中国建设银行股份有限公司 一种数据分类方法、装置、设备及介质
CN112182214A (zh) * 2020-09-27 2021-01-05 中国建设银行股份有限公司 一种数据分类方法、装置、设备及介质
CN112541055A (zh) * 2020-12-17 2021-03-23 中国银联股份有限公司 一种确定文本标签的方法及装置
CN112598076A (zh) * 2020-12-29 2021-04-02 北京易华录信息技术股份有限公司 一种机动车属性识别方法及系统
CN112598076B (zh) * 2020-12-29 2023-09-19 北京易华录信息技术股份有限公司 一种机动车属性识别方法及系统
CN112766383A (zh) * 2021-01-22 2021-05-07 浙江工商大学 一种基于特征聚类和标签相似性的标签增强方法
CN113034406B (zh) * 2021-04-27 2024-05-14 中国平安人寿保险股份有限公司 扭曲文档恢复方法、装置、设备及介质
CN113034406A (zh) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 扭曲文档恢复方法、装置、设备及介质
CN113469225A (zh) * 2021-06-16 2021-10-01 浙江工业大学 基于跨域特征相关性分析的图像转换方法
CN113469225B (zh) * 2021-06-16 2024-03-22 浙江工业大学 基于跨域特征相关性分析的图像转换方法
CN113326698A (zh) * 2021-06-18 2021-08-31 深圳前海微众银行股份有限公司 检测实体关系的方法、模型训练方法及电子设备
CN113326698B (zh) * 2021-06-18 2023-05-09 深圳前海微众银行股份有限公司 检测实体关系的方法、模型训练方法及电子设备
CN113327666A (zh) * 2021-06-21 2021-08-31 青岛科技大学 一种胸片疾病多分类网络的多标签局部至全局学习方法
CN113920210B (zh) * 2021-06-21 2024-03-08 西北工业大学 基于自适应图学习主成分分析方法的图像低秩重构方法
CN113920210A (zh) * 2021-06-21 2022-01-11 西北工业大学 基于自适应图学习主成分分析方法的图像低秩重构方法
CN114401205B (zh) * 2022-01-21 2024-01-16 中国人民解放军国防科技大学 无标注多源网络流量数据漂移检测方法和装置
CN114401205A (zh) * 2022-01-21 2022-04-26 中国人民解放军国防科技大学 无标注多源网络流量数据漂移检测方法和装置
CN114648635A (zh) * 2022-03-15 2022-06-21 安徽工业大学 一种融合标签间强相关性的多标签图像分类方法
CN115797709A (zh) * 2023-01-19 2023-03-14 苏州浪潮智能科技有限公司 一种图像分类方法、装置、设备和计算机可读存储介质
CN115934809A (zh) * 2023-03-08 2023-04-07 北京嘀嘀无限科技发展有限公司 一种数据处理方法、装置和电子设备
CN117876797A (zh) * 2024-03-11 2024-04-12 中国地质大学(武汉) 图像多标签分类方法、装置及存储介质
CN117876797B (zh) * 2024-03-11 2024-06-04 中国地质大学(武汉) 图像多标签分类方法、装置及存储介质

Also Published As

Publication number Publication date
CN109840531B (zh) 2023-08-25
CN109840531A (zh) 2019-06-04

Similar Documents

Publication Publication Date Title
WO2019100723A1 (fr) Procédé et dispositif destinés à l'apprentissage d'un modèle de classification à étiquettes multiples
WO2019100724A1 (fr) Procédé et dispositif d'apprentissage de modèle de classification à étiquettes multiples
CN110866140B (zh) 图像特征提取模型训练方法、图像搜索方法及计算机设备
CN114529825B (zh) 用于消防通道占用目标检测的目标检测模型、方法及应用
WO2020228446A1 (fr) Procédé et appareil d'entraînement de modèles, et terminal et support de stockage
CN111950453B (zh) 一种基于选择性注意力机制的任意形状文本识别方法
Sameen et al. Classification of very high resolution aerial photos using spectral‐spatial convolutional neural networks
EP3388978B1 (fr) Procédé de classification d'image, dispositif électronique et support de stockage
CN111523621A (zh) 图像识别方法、装置、计算机设备和存储介质
US10169683B2 (en) Method and device for classifying an object of an image and corresponding computer program product and computer-readable medium
US20160224903A1 (en) Hyper-parameter selection for deep convolutional networks
Jiang et al. Hyperspectral image classification with spatial consistence using fully convolutional spatial propagation network
US20170061254A1 (en) Method and device for processing an image of pixels, corresponding computer program product and computer-readable medium
CN110222718B (zh) 图像处理的方法及装置
WO2020077940A1 (fr) Procédé et dispositif d'identification automatique d'étiquettes d'image
CN112418327B (zh) 图像分类模型的训练方法、装置、电子设备以及存储介质
JP6107531B2 (ja) 特徴抽出プログラム及び情報処理装置
CN113326930A (zh) 数据处理方法、神经网络的训练方法及相关装置、设备
US11842540B2 (en) Adaptive use of video models for holistic video understanding
CN110968734A (zh) 一种基于深度度量学习的行人重识别方法及装置
CN113095370A (zh) 图像识别方法、装置、电子设备及存储介质
Huo et al. Semisupervised learning based on a novel iterative optimization model for saliency detection
CN113705596A (zh) 图像识别方法、装置、计算机设备和存储介质
CN109101984B (zh) 一种基于卷积神经网络的图像识别方法及装置
CN112749737A (zh) 图像分类方法及装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18881329

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18881329

Country of ref document: EP

Kind code of ref document: A1