WO2019100724A1 - 训练多标签分类模型的方法和装置 - Google Patents

训练多标签分类模型的方法和装置 Download PDF

Info

Publication number
WO2019100724A1
WO2019100724A1 PCT/CN2018/094400 CN2018094400W WO2019100724A1 WO 2019100724 A1 WO2019100724 A1 WO 2019100724A1 CN 2018094400 W CN2018094400 W CN 2018094400W WO 2019100724 A1 WO2019100724 A1 WO 2019100724A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
label
feature
samples
network
Prior art date
Application number
PCT/CN2018/094400
Other languages
English (en)
French (fr)
Inventor
刘晓阳
胡晓林
王月红
曹忆南
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019100724A1 publication Critical patent/WO2019100724A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to the field of computers and, more particularly, to methods and apparatus for training a multi-label classification model in the computer field.
  • tags Due to the complexity and ambiguity of the objective object itself, many objects in real life may be related to multiple category tags at the same time.
  • an appropriate subset of tags (including multiple related semantic tags) is often used to describe the object, which forms the so-called multi-label classification problem.
  • each sample corresponds to a related subset of tags consisting of multiple tags.
  • the goal of learning is to predict the corresponding subset of tags for unknown samples.
  • the set of training data is called a training data set.
  • the labels in the training data set are marked by different people, or some objects are ignored when labeling the labels, the labels may be missing, so the accuracy of the multi-label classification can be improved by complementing the labels in the training data set.
  • Sex There are several methods for complementing known tags in multi-label classification. One of them is to predict the rank of the tag matrix by kernel norm constraint, and calculate the feature mapping matrix by minimizing the loss function of multi-label classification. The rank prediction label matrix implements label completion, thereby improving the performance of multi-label classification.
  • this method needs to extract the features of the image first, and then calculate the feature mapping matrix according to the features of the image. After the features of the image are extracted, the features of the image are fixed and thus the feature information of the input image cannot be dynamically learned from the tags.
  • the present application provides a method and apparatus for training a multi-label classification model, which can dynamically learn image features, make the feature extraction network more suitable for task requirements, and have a multi-label classification effect.
  • a method of training a multi-label classification model comprising:
  • the element y i*j in the tag matrix Y c*n indicating whether the i th sample contains the j th tag
  • the indicated object, c represents the number of tags associated with the samples in the training data set.
  • the feature matrix X d*n of the n samples is extracted by a feature extraction network, wherein the feature extraction network has a weight parameter Z, and d represents a feature dimension of the feature matrix X d*n .
  • the feature extraction network may be any neural network capable of extracting image features, and may be, for example, a convolutional neural network or a multi-layer perceptron, which is not limited by the embodiment of the present application.
  • the weight of the feature extraction network may be represented as Z. Specifically, Z may include multiple weight matrixes.
  • the parameters of the weight matrix can be randomly initialized, or pre-trained model parameters can be used.
  • the pre-trained model parameters refer to the parameters of the already trained model, such as the model parameters trained by the vgg16 network on the ImageNet data set.
  • the weight matrix of the feature mapping network is a low-rank feature mapping matrix M c*d
  • M c*d may represent a multi-label classification model
  • the correlation weight between the feature attribute and the category label in the initial value can be randomly generated.
  • the feature mapping network may be a mapping network in which the weight matrix is a low-rank feature mapping matrix M c*d , and may be, for example, a fully connected layer.
  • the feature mapping network can be represented as FCM.
  • the feature matrix X d*n output by the feature extraction network can be input to the FCM, and then the FCM maps the input feature matrix X d*n to the prediction label space to obtain a prediction label matrix. That is:
  • the weight parameter Z and the feature mapping matrix M c*d are updated to train the multi-label classification model.
  • n, c, i, j, and d are all positive integers, and i ranges from 1 to n, and j ranges from 1 to c.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the feature extraction, the feature mapping matrix, and the low-rank label correlation matrix can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, make the feature extraction network more suitable for task requirements, and multi-label The classification effect is good.
  • the low-rank feature mapping network includes a first sub-map network and a second sub-map network, where the low-rank feature mapping network, the first sub-map network, and the second sub-map network have The following relationship:
  • the weight matrix of the first sub-mapping network is The weight matrix of the second sub-mapping network is H c*r , where, in order to ensure M c*d , And the low rank of H c*r, r can be set to a positive integer and r ⁇ min (d, c).
  • the first sub-mapping network may be a weight matrix The full connection layer
  • the second sub-map network may have a weight matrix of H c*r full connection layer
  • the initial values of H c*r can be randomly generated.
  • the label matrix can be complemented by a matrix low rank decomposition method, that is, the prediction label matrix is predicted.
  • Perform low rank decomposition ie:
  • r can take the optimal value by training multiple times.
  • the embodiment of the present application can map X (ie, X d*n ) to obtain a prediction label matrix by using a preset feature mapping matrix M (ie, M c*d ). (which is ),which is because The rank is less than or equal to the rank of M or X, so low-rank decomposition of M can make M low-rank while guaranteeing Low rank, so it is also possible to do a low rank decomposition for M, which is the above formula (2), which can be equivalent to Decomposed into a form of multiplication of two low-dimensional matrices, thereby ensuring Low rank.
  • the label matrix Y c*n and the prediction label matrix Updating the weight parameter Z and the feature mapping matrix M c*d includes:
  • the weight parameter Z, the weight matrix Updated with H c*r including:
  • the first term of the optimization function L n is the above loss function ⁇ n
  • the second term is a regular term
  • the regular term is used to constrain the weight parameter Z, the weight matrix And H c*r to prevent overfitting.
  • the error back propagation algorithm may be utilized to minimize the loss function L n , and the weight parameter Z corresponding to the minimum value of the optimization function is used as the updated weight parameter Z, and the optimization function is The weight matrix corresponding to the smallest value As an updated weight matrix
  • the weight matrix H c*r corresponding to the minimum value of the optimization function is taken as the updated weight matrix H c*r .
  • the stop condition is: L n is no longer falling, or the falling amplitude is less than a preset threshold, or the maximum number of trainings is reached. If not, repeat the training until the stop condition is reached.
  • all the pictures are input once and counted as one round of training, and usually several rounds need to be trained.
  • the determining, in the training data set, the label matrix Y c*n of the n samples and the n samples includes:
  • a training data set comprising D samples and a label vector with each of the D samples, wherein an element y j in the label vector of each sample represents the each sample Whether the object indicated by the jth label is included, where D is a positive integer not less than n;
  • the training data set from the random sample n, and generates the n samples of the matrix Y c * n labels, the label matrix Y c * n corresponding to each sample comprises the n samples in the tag vector .
  • the embodiment of the present application can input the entire data set for training in batches. That is to say, in the embodiment of the present application, the model may be trained by inputting part of the data in the data set by multiple batches, wherein each input data may be randomly extracted from the image samples that are not input in the data set. Since the training data set usually includes a large number of samples, the embodiment of the present application can reduce the occupation of resources in the process of training the model by inputting the training data set in batches, thereby greatly reducing the demand for memory resources in the process of training the model. It can effectively solve the calculation problem of low rank label correlation matrix under large-scale data.
  • the method further includes: extracting, by using the feature extraction network, a first feature matrix of the first sample, where the first sample does not belong to the n samples;
  • the test picture is input only to the feature extraction network in the neural network model, and the first feature matrix of the test picture is extracted by using the feature extraction network, and the first feature is extracted.
  • the matrix is input to the feature mapping network (specifically, may include FCW and FCH), and the prediction label matrix of the first feature matrix is obtained and output by using a feature mapping network, and the element in the prediction label matrix indicates that the test includes the jth label The confidence level of the indicated object.
  • the test picture may be one or more pictures and may not belong to the training data set.
  • an apparatus for training a multi-label classification model is provided, the apparatus being for performing the method of any of the first aspect or the first aspect of the first aspect.
  • the apparatus may comprise means for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • an apparatus for training a multi-label classification model comprising a memory and a processor, the memory for storing instructions, the processor for executing the instructions stored by the memory, and to the memory Execution of the instructions stored in the processor causes the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.
  • a computer readable storage medium in a fourth aspect, storing instructions that, when executed on a computer, cause the computer to perform any of the first aspect or the first aspect The method in the implementation.
  • a computer program product comprising instructions for causing a computer to perform the method of the first aspect or any of the possible implementations of the first aspect, when the computer program product is run on a computer.
  • Figure 1 shows a schematic diagram of the single label classification and multi-label classification problems.
  • FIG. 2 is a schematic flowchart of a method for training a multi-label classification model provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a multi-label classification model provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a multi-label classification model provided by an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of an apparatus for training a multi-label classification model provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of another apparatus for training a multi-label classification model provided by an embodiment of the present application.
  • Figure 1 shows a schematic diagram of the single label classification and multi-label classification problems.
  • single-label classification often assumes that the sample only corresponds to one category label, that is, it has a unique semantic meaning. Then this assumption may not be true in many practical situations, especially considering the semantic diversity of the objective object itself, and the object is likely to be related to multiple different category labels at the same time. Therefore, in the multi-label problem, as shown in FIG. 1(b), a plurality of related category labels are often used to describe the semantic information corresponding to each object. For example, each image may correspond to multiple semantic labels at the same time, such as “Grass”, “Sky” and “Sea”, each piece of music may contain a variety of emotions, such as “pleasure” and “easy”.
  • the training data set may correspond to a label set, and the label set may include c different categories of labels related to the training data, and c is a positive integer.
  • the training data set may include D samples and a subset of tags corresponding to each sample, where D is a positive integer. It can be understood that the subset of tags here is a subset of the set of tags. That is, by learning a plurality of samples in a given training data set and a subset of tags corresponding to each sample, a subset of tags of unknown samples can be predicted.
  • the label subset may be represented as a label vector.
  • the label vector of the sample can indicate which labels the samples have or which categories they belong to. For example, if the label vector of an image is [0 1 0 0 1 0], it means that there are 6 categories, where each element in the label vector represents a category or a label, and 0 means that there is no such category in the image. Or this label, 1 means that there is this type or this label in the image. Since the tag vector has two 1 tags, it means that there are two kinds of objects in the image, belonging to the second class and the fifth class respectively.
  • each of the D samples in the training data set may correspond to a tag vector y j indicating whether the sample contains the object indicated by the j-th tag, where j ranges from 1 to c. It should be understood that, in the embodiment of the present application, whether the sample includes the object indicated by the jth label, that is, whether the sample includes the jth label.
  • the label vectors of all or part of the samples in the training data set form a label matrix Y:
  • the predictive label vector is the output of the multi-label classifier with the same dimensions as the label vector, representing the prediction of the category to which the multi-label classifier belongs.
  • the value of the element of the prediction label vector is a real value. If the real value exceeds a given threshold, the position corresponding to the element belongs to the corresponding category, otherwise it does not belong to the category.
  • the predictive label vector is [0.7 0.2 0.1 0.8 1.0 0.0]
  • the threshold is 0.5
  • the number on each bit is compared to a threshold, which is equivalent to belonging to this category.
  • the categories predicted in this way are the first, fourth and fifth categories. If the label vector corresponding to the predicted label vector is [1 0 0 1 0 1 0], the predicted label vector is completely correct.
  • the tag information corresponding to the samples in the training data set is likely to be incomplete. That is to say, in the tag matrix of the data, the fact that the sample does not contain a tag does not mean that the sample is not related to the tag in actual situations. Therefore, it is necessary to complete the label matrix by training the existing data in the data set to obtain a prediction label matrix containing richer label information, and then predict the unknown more accurately by using the prediction label matrix containing richer label information.
  • the label information of the sample is necessary to complete the label matrix by training the existing data in the data set to obtain a prediction label matrix containing richer label information, and then predict the unknown more accurately by using the prediction label matrix containing richer label information.
  • the embodiment of the present application designs a neural network for multi-label classification, which can implement a multi-label classification algorithm by learning a feature mapping matrix and optimizing a feature extraction network.
  • the neural network system is an intelligent recognition system that accumulates training results by means of repeated training to improve the ability to recognize various target objects or sounds.
  • Convolutional neural networks are one of the mainstream directions in the development of neural networks.
  • the convolutional neural network generally includes a Convolutional Layer, a Rectified Linear Units (ReLU) layer, a Pooling layer, and a Fully Connect (FC) layer.
  • the convolutional layer, the ReLU layer and the Pooling layer may be repeatedly set repeatedly.
  • the convolutional layer can be considered as the core of a convolutional neural network, and when used for image recognition, its input receives image data for identifying the image by a filter.
  • the image data here may be the image conversion result captured by the camera, or may be the processing result of the layer before the convolution layer.
  • the image data is a three-dimensional image array, such as 32x32x3, where 32x32 is the two-dimensional size of the image represented by the image data, ie width and height, where the depth value 3 is because the image is usually divided into green, red and blue. Data channel.
  • a plurality of filters are provided in the convolution layer, and different filters correspond to different image features (boundary, color, shape, etc.) to scan the input image data in a certain step size.
  • Different weight matrices are provided in different filters, which are generated for a specific image feature in the learning process of the neural network.
  • Each filter scans an area of each image and obtains a three-dimensional input matrix (MxNx3, M and N determine the size of the scan area).
  • the convolution network plots the input matrix and the weight matrix as a result. The value will then scan the next area in a specific step size, for example, traversing two areas.
  • a filter scans all regions according to a specific step size, the resulting values form a two-dimensional matrix; and when all filters are scanned, the resulting values form a three-dimensional matrix as the output of the current convolutional layer.
  • the different depth layers of the matrix correspond to the scan results of one filter (ie, the two-dimensional matrix formed after each filter scan).
  • the output of the convolutional layer is sent to the ReLU layer for processing (the value range of the output is limited by the max(0,x) function), and sent to the Pooling layer to downsize by downsampling, before being sent to the FC layer, the image
  • the data may also go through multiple convolutional layers to deeply characterize the image features (such as the first convolutional layer to identify only the outline features of the image, the second convolutional layer to begin to recognize the pattern, etc.), and finally Enter the FC layer. Similar to the convolutional layer but slightly different, the FC layer also weights the input data through multiple filters, but the FC layer does not have each filter shift through each beat like the convolution layer filter.
  • the final FC layer outputs a 1x1xN matrix, which is actually a sequence of data. Each bit of the data sequence corresponds to a different target object, and the values on it can be regarded as the scores of the objects of these objects.
  • a weight matrix is used, and the neural network can maintain multiple weight matrices through self-training.
  • FIG. 2 is a schematic flowchart of a method for training a multi-label classification model provided by an embodiment of the present application. It should be understood that FIG. 2 illustrates steps or operations of a method of training a multi-label classification model, but these steps or operations are merely examples, and other embodiments of the present application may also perform other operations or variations of the various operations in FIG. 2. Moreover, the various steps in FIG. 2 may be performed in a different order than that presented in FIG. 2, and it is possible that not all operations in FIG. 2 are to be performed.
  • FIG. 3 is a schematic diagram of a multi-label classification model 300 provided by an embodiment of the present application.
  • the multi-label classification model 300 is specifically a neural network system.
  • the multi-label classification model 300 includes a feature extraction network 301, a feature mapping network 302, and a processing unit 305, wherein the feature mapping network 302 can include a FCW 303 and an FCH 304.
  • the multi-label classification model 300 illustrated in FIG. 3 is merely an example, and embodiments of the present application may further include other modules or units or variations of the various modules or units in FIG. 3.
  • the multi-label classification method in the embodiment of the present application can be applied to multiple fields such as image annotation, image recognition, voice recognition, and text classification.
  • the samples in the corresponding training data set may be images, sounds, documents, and the like. This embodiment of the present application does not limit this. For convenience of description, the following description will be made by taking image recognition using image samples as an example, but this does not limit the scheme of the embodiment of the present application.
  • the weights of the multi-label classification model 200 are initialized, that is, the weights of the feature extraction network 301 and the feature mapping network (i.e., FCW 303 and FCH 304) in the initialization system.
  • the feature extraction network 301 may be any neural network capable of extracting image features, and may be, for example, a convolutional neural network or a multi-layer perceptron, which is not limited by the embodiment of the present application.
  • the weight of the feature extraction network 301 can be represented as Z.
  • Z can include multiple weight matrices.
  • the parameters of the weight matrix can be randomly initialized, or pre-trained model parameters can be used.
  • the pre-trained model parameters refer to the parameters of the already trained model, such as the model parameters trained by the vgg16 network on the ImageNet data set.
  • the feature mapping network 302 may be a mapping network of the feature matrix M c*d whose weight matrix is a low rank, for example, may be a fully connected layer, where M c*d may represent feature attributes and categories in the multi-label classification model. The associated weights between the tags, whose initial values can be randomly generated.
  • the feature mapping network 302 can include a FCW 303 and an FCH 304, wherein the FCW 303 represents a weight matrix of Fully connected layer, FCH 304 represents a fully connected layer whose weight matrix is H c*r , And the initial values of H c*r can be randomly generated.
  • the low rank of H c*r r ⁇ min(d, c) can be set.
  • the embodiment of the present application can input the entire data set for training in batches. That is to say, in the embodiment of the present application, the model may be trained by inputting part of the data in the data set by multiple batches, wherein each input data may be randomly extracted from the image samples that are not input in the data set. Since the training data set usually includes a large number of samples, the embodiment of the present application can reduce the occupation of resources in the process of training the model by inputting the training data set in batches.
  • the number of samples of one batch input to the multi-label classification model 300 may be n.
  • the n samples may be represented as image_n, and more specifically, image_n may be n pictures randomly extracted from D samples of the training data set, and the value of n may be much smaller than D.
  • the size of n can be determined based on the capabilities of the multi-label classification model 300. For example, if the data processing capability of the multi-label classification model 300 is strong, n can be set relatively large to shorten the training model time. For another example, if the data processing capability of the multi-label classification model 300 is weak, n can be set relatively small to reduce the resources consumed by the training model. In this way, the embodiment of the present application can flexibly set the value of n according to the data processing capability of the multi-label classification model 300.
  • the label matrix corresponding to the n samples may be represented as Y c*n
  • the element y i*j in the label matrix Y c*n indicates whether the i-th sample contains the object indicated by the j-th label, where i
  • the value ranges from 1 to n
  • j ranges from 1 to c.
  • the description of the label matrix can be referred to the above description. To avoid repetition, details are not described herein again.
  • the training data may be input to the multi-label classification model 300 shown in FIG.
  • the n pictures in the training data set and the label matrix Y c*n of the n pictures may be input to the multi-label classification model 300, respectively.
  • the n pictures can be input to the feature extraction network 301, and the feature extraction network 301 can extract the features of the n pictures through the functions of a convolution layer, an activation function layer, a Pooling layer, a fully connected layer, and a Batchnorm layer. And output the feature matrix X d*n .
  • d is a positive integer and represents the feature dimension of the feature matrix X d*n .
  • the feature matrix X d*n output by the feature extraction network 301 can be input to the feature mapping network 302. Since the weight matrix of the feature mapping network is a low rank feature mapping matrix M c*d , M c*d can represent the correlation weight between the feature attribute and the category label in the multi-label classification model, so the feature mapping network 302 can The input feature matrix X d*n is mapped to the prediction label space to obtain a prediction label matrix That is:
  • the prediction label matrix Can be a label matrix with richer label information, each of which is Indicates that the i-th sample contains the confidence of the object indicated by the j-th label.
  • Predictive label matrix To complete the label matrix, the label matrix Y c*n is called a missing label matrix.
  • the labels in the label matrix are not independent of each other, but are semantically related. For example, sheep and grass are very likely to appear in a picture. The possibility of mountains and sky appearing together is also high, and the possibility of sheep and office appearing together is small, and this correlation can be used to increase The accuracy of label classification. It can be seen that the completion label matrix There is a correlation between the tags in the Is low rank, so it can be obtained from Y c*n according to the low rank structure of the matrix And this process can be called the completion of the label matrix.
  • the label matrix may be complemented by a matrix low rank decomposition method, that is, the prediction label matrix is predicted.
  • Perform low rank decomposition ie:
  • the low-rank feature mapping network may include a first sub-map network and a second sub-map network, the low-rank feature mapping network, the first sub-map network, and the The second sub-mapping network has the following relationship:
  • the weight matrix of the first sub-mapping network is The weight matrix of the second sub-mapping network is H c*r , where, in order to ensure M c*d , And the low rank of H c*r, r can be set to a positive integer and r ⁇ min (d, c).
  • the first sub-mapping network may be a weight matrix
  • the fully connected layer denoted as FCW
  • the second sub-mapped network may be a fully connected layer whose weight matrix is H c*r , expressed as FCH, And the initial values of H c*r can be randomly generated.
  • r can take the optimal value by training multiple times.
  • the embodiment of the present application can map X (ie, X d*n ) to obtain a prediction label matrix by using a preset feature mapping matrix M (ie, M c*d ). (which is ),which is because The rank is less than or equal to the rank of M or X, so low-rank decomposition of M can make M low-rank while guaranteeing Low rank, so it is also possible to do a low rank decomposition for M, which is the above formula (2), which can be equivalent to Decomposed into a form of multiplication of two low-dimensional matrices, thereby ensuring Low rank.
  • the preset matrix H c*r and Instead of the preset matrix M.
  • the ready-made matrix H c*r and So so predict the label matrix It is not accurate.
  • the processing unit 305 may be configured according to the label matrix Y c*n and the prediction label matrix
  • the weight parameter Z and the feature mapping matrix M c*d are updated to train the multi-label classification model 300.
  • the processing unit 305 can determine the prediction label matrix And the Euclidean distance loss function between the label matrix Y c*n , the role is the constraint Make it similar to Y c*n , the expression of the loss function is as follows (3):
  • H c*r superscript and subscript. among them, Is the Frobenius norm of the matrix, and the Frobenius norm of the matrix A m*n is defined as:
  • a ij is the element of matrix A, ie the Euclidean distance loss function.
  • P ⁇ in the formula (4) is a projection operator, that is, the observed element remains unchanged, and the unobserved element value is 0, and the effect is to let only the observed element participate in the calculation.
  • the specific form is:
  • the sum of the above loss function and the regular term may be determined as the loss function L n of the n samples.
  • the loss function L n may also be referred to as an optimization function L n .
  • the expression of L n is as shown in equation (7) or (8):
  • the first term of the optimization function L n is the above loss function ⁇ n
  • the second term is a regular term
  • the regular term is used to constrain the weight parameter Z, the weight matrix And H c*r to prevent overfitting.
  • the error back propagation algorithm is a method for multi-layer neural network training. Based on the gradient descent method, the weight of each layer of the neural network is learned and updated by optimizing the loss function.
  • the error back propagation algorithm may be utilized to minimize the loss function L n , and the weight parameter Z corresponding to the minimum value of the optimization function is used as the updated weight parameter Z, and the optimization function is used.
  • the feature mapping matrix M c*d corresponding to the minimum value is taken as the updated feature mapping matrix M c*d .
  • the variables in (7) are derived below. Take the input of a picture and the regular item using the l 2 norm as an example.
  • H c*r is the element of the matrix H c*r and w ji is the matrix Element
  • x i is the vector of the vector x d
  • p j is the vector Elements
  • the reverse derivation of the error of the feature extraction network weight Z can be passed Passed.
  • H c*r and The elements are updated to:
  • the stop condition is: L n is no longer falling, or the falling amplitude is less than a preset threshold, or the maximum number of trainings is reached. If not, steps 220 through 260 are repeated until the stop condition is reached. In the embodiment of the present application, all the pictures are input once and counted as one round of training, and usually several rounds need to be trained.
  • test picture is input to the feature extraction network in the neural network model, and the first feature matrix of the test picture is extracted by using the feature extraction network, and the first feature matrix is extracted.
  • a feature matrix is input to the FCM, and the predictive tag matrix of the first feature matrix is acquired and output by the FCM, and the elements in the predictive tag matrix represent the confidence that the test includes the object indicated by the jth tag.
  • the test picture may be one or more pictures and may not belong to the training data set.
  • the preset threshold may be 0.5, or other values, which are not limited by the embodiment of the present application.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the weighting parameter and the feature mapping matrix of the feature extraction network can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, so that the feature extraction network is more suitable for task requirements, and the multi-label classification effect is good. .
  • the embodiment of the present application can calculate the feature mapping matrix by using the image features of the picture samples in batches, without having to use the image features of the entire data set as input for calculation, that is, it is not necessary to use all the image features of the sample for training.
  • the requirement for memory resources in the process of training the model is greatly reduced, and the calculation problem of multi-label classification under large-scale data can be effectively solved.
  • FIG. 4 is a schematic diagram of a multi-label classification model 500 provided by an embodiment of the present application.
  • the feature extraction network portion of the model 500 employs a VGG16 network, and the output of the Dropout layer after the penultimate fully connected layer of the VGG16 network is taken as the feature matrix X.
  • the weighting parameter Z of the feature extraction network uses the weighting parameters trained on the ImageNet dataset, and then fine-tunes it (fine tuning refers to fixing the weights of the previous layers or making only minor adjustments, fully training the last layer. Or two-tier network).
  • the initial values of the weight matrices H and W can be initialized with a Gaussian distribution, and the values of H and W are fully trained.
  • the regular term can use the Frobenius norm.
  • the weight of the feature extraction network VGG16 (excluding the last fully connected layer) is weighted pre-trained on the ImageNet data set.
  • Input n RGB three-channel picture image_n with a pixel size of 224*224 into the VGG16 network where 1 ⁇ n ⁇ N, N is the number of pictures in the training set, and the picture size can be expressed as n*C*h*w or h *w*C*n and other four-dimensional matrices, where C is the number of channels (RGB image is 3), h is the height of the image (224 pixels), and w is the width of the image (224 pixels).
  • the image is then subjected to two fully connected layers and a Dropout layer to obtain an image feature matrix X 4096*n .
  • the X 4096*n weight matrix is And the fully connected layer of H c*r (FCW 503 and FCH 504) to obtain the prediction label matrix
  • the processing unit 505 is based on the label matrix Y c*n and the prediction label matrix Get the optimization function:
  • the weight matrix After H c*r it is judged whether the stop condition is reached, and if it is not reached, the steps are repeated until the stop condition is reached.
  • the stop condition can be referred to the description above, and to avoid repetition, details are not described herein again.
  • the test picture may be input to the feature extraction network 501, and the features of the picture extracted by the feature extraction network are input to the FCW 503 and the FCH 504, and the prediction tag matrix is obtained through the FCW 503 and the FCH 504.
  • the structure of the feature extraction network may be replaced by other networks, such as AlexNet, GoogleNet, ResNet, and a custom network.
  • the layer of the feature output may adopt the output of a certain layer of the above network, or may add or subtract several convolutional layers or fully connected layers on the basis of the above.
  • different regularization items may also be adopted in the embodiments of the present application.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the weighting parameter and the feature mapping matrix of the feature extraction network can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, so that the feature extraction network is more suitable for task requirements, and the multi-label classification effect is good. .
  • the embodiment of the present application can calculate the feature mapping matrix by using the image features of the picture samples in batches, without having to use the image features of the entire data set as input for calculation, that is, it is not necessary to use all the image features of the sample for training.
  • the requirement for memory resources in the process of training the model is greatly reduced, and the calculation problem of multi-label classification under large-scale data can be effectively solved.
  • the embodiment of the present application does not limit the specific product form, and the method for multi-label classification in the embodiment of the present application can be deployed on a general-purpose computer node.
  • the initially constructed multi-label classification model can be stored in the hard disk memory, and the existing training data set is learned by the processor and the memory running algorithm to obtain the multi-label classification model.
  • the multi-label classification model can predict the label of the unknown sample, store the prediction result in the hard disk storage, complete the existing label set, and predict the label corresponding to the unknown sample.
  • FIG. 5 is a schematic block diagram of an apparatus 600 for training a multi-label classification model provided by an embodiment of the present application.
  • the apparatus 600 includes a determining unit 610, an extracting unit 620, an obtaining unit 630, and an updating unit 640.
  • a determining unit 610 configured to determine, in the training data set, n samples and a label matrix Y c*n corresponding to the n samples, where an element y i*j in the label matrix Y c*n represents an ith sample Whether the object indicated by the jth tag is included, and c represents the number of tags associated with the samples in the training data set.
  • the extracting unit 620 is configured to extract the feature matrix X d*n of the n samples by using a feature extraction network, where the feature extraction network has a weight parameter Z, and d represents a feature dimension of the feature matrix X d*n .
  • An obtaining unit 630 configured to acquire, by using a feature mapping network, a prediction label matrix of the feature matrix X d*n Prediction label matrix Elements in Indicates that the i-th sample contains the confidence of the object indicated by the j-th label, wherein the weight matrix of the feature mapping network is a low-rank feature mapping matrix M c*d .
  • An updating unit 640 configured to use the label matrix Y c*n and the prediction label matrix
  • the weight parameter Z and the feature mapping matrix M c*d are updated to train the multi-label classification model.
  • n, c, i, j, and d are all positive integers, and i ranges from 1 to n, and j ranges from 1 to c.
  • the neural network system provided by the embodiment of the present application can directly train the model from the input data without an additional intermediate step, that is, the neural network system is an end-to-end nervous system.
  • the end-to-end advantage is that the feature extraction, the feature mapping matrix, and the low-rank label correlation matrix can be optimized at the same time, that is, the embodiment of the present application can dynamically learn image features, make the feature extraction network more suitable for task requirements, and multi-label The classification effect is good.
  • the low-rank feature mapping network includes a first sub-map network and a second sub-map network, where the low-rank feature mapping network, the first sub-map network, and the second sub-map network have The following relationship:
  • the weight matrix of the first sub-mapping network is The weight matrix of the second sub-mapping network is H c*r , r is a positive integer and r ⁇ min(d, c).
  • the updating unit is specifically configured to:
  • the updating unit is further configured to:
  • the weight parameter Z corresponding to the minimum value of the optimization function is used as the updated weight parameter Z, and the weight matrix corresponding to the minimum value of the optimization function is used.
  • the weight matrix H c*r corresponding to the minimum value of the optimization function is taken as the updated weight matrix H c*r .
  • the determining unit is specifically configured to:
  • a training data set comprising D samples and a label vector with each of the D samples, wherein an element y j in the label vector of each sample represents the each sample Whether the object indicated by the jth label is included, where D is a positive integer not less than n;
  • the training data set from the random sample n, and generates the n samples of the matrix Y c * n labels, the label matrix Y c * n corresponding to each sample comprises the n samples in the tag vector .
  • the embodiment of the present application it is not necessary to input the entire training data set for calculation at one time, and only the input pictures of the batch are needed for calculation. Therefore, the embodiment of the present application can input the entire data set for training in batches. Since the training data set usually includes a large number of samples, the embodiment of the present application can reduce the occupation of resources in the process of training the model by inputting the training data set in batches, thereby greatly reducing the demand for memory resources in the process of training the model. It can effectively solve the calculation problem of low rank label correlation matrix under large-scale data.
  • the method further includes: the extracting unit is further configured to extract, by using the feature extraction network, a first feature matrix of the first sample, where the first sample does not belong to the n samples;
  • the acquiring unit is further configured to acquire, by using the first mapping network, a first prediction label matrix of the first feature matrix, where an element in the first prediction label matrix indicates that the first sample includes a j-th label The confidence level of the indicated object.
  • apparatus 700 for training a multi-label classification model can include a processor 710, a memory 720, and a communication interface 730.
  • the memory 720 can be used to store instructions or codes and the like executed by the processor 710.
  • the processor 710 is configured to execute the method provided by the foregoing method embodiment, and the processor 710 is further configured to control the communication interface 730 to communicate with the outside world.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 710 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in memory 720, and processor 710 reads the information in memory 720 and, in conjunction with its hardware, performs the steps of the above method. To avoid repetition, it will not be described in detail here.
  • the apparatus 600 for training the multi-label classification model shown in FIG. 5 or the apparatus 700 for training the multi-label classification model shown in FIG. 6 can implement the respective processes corresponding to the foregoing method embodiments. Specifically, the apparatus 600 for training the multi-label classification model For the apparatus 700 for training the multi-label classification model, reference may be made to the above description. To avoid repetition, details are not described herein again.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.
  • the embodiment of the present application further provides a computer readable storage medium, comprising: a computer program, when the computer program is run on a computer, causing the computer to execute the method provided by the foregoing method embodiment.
  • the embodiment of the present application further provides a computer program product comprising instructions, wherein when the computer program product is run on a computer, the computer is caused to execute the method provided by the foregoing method embodiment.
  • processors mentioned in the embodiment of the present invention may be a central processing unit (CPU), and may also be other general-purpose processors, digital signal processors (DSPs), and application specific integrated circuits ( Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory referred to in the embodiments of the present invention may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read only memory (EEPROM) or flash memory.
  • the volatile memory can be a Random Access Memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM). SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Connection Dynamic Random Access Memory (Synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (DR RAM).
  • processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, the memory (storage module) is integrated in the processor.
  • memories described herein are intended to comprise, without being limited to, these and any other suitable types of memory.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Abstract

本申请提供了训练多标签分类模型的方法和装置,能够动态学习图像特征,使特征提取网络更适应任务需求,并且多标签分类效果好。该方法包括:在训练数据集中确定n个样本和与所述n个样本对应的标签矩阵Yc*n,所述标签矩阵Yc*n中的元素yi*j表示第i个样本是否包含第j个标签指示的对象,c表示与样本相关的标签的个数;利用特征提取网络提取所述n个样本的特征矩阵Xd*n;利用特征映射网络获取所述特征矩阵Xd*n的预测标签矩阵所述预测标签矩阵中的元素 >表示第i个样本包含第j个标签指示的对象的置信度;根据所述标签矩阵Yc*n和所述预测标签矩阵对所述权值参数Z、所述特征映射矩阵Mc*d进行更新,训练所述多标签分类模型。

Description

训练多标签分类模型的方法和装置
本申请要求于2017年11月24日提交中国专利局、申请号为201711187395.0、申请名称为“训练多标签分类模型的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机领域,并且更具体的,涉及计算机领域中的训练多标签分类模型的方法和装置。
背景技术
随着智能手机的处理性能的提升,越来越多的应用对图像的识别提出了要求。比如,在用手机拍照的过程中,如果智能手机能够精确的识别出拍摄范围内的物体,就能对其颜色,形状进行针对性的运算,从而提高拍摄效果。而在智能系统的机器学习中,对图像中的物体进行识别的训练也就成了一个非常重要的方面。通常来说,机器学习是为大量的已有图像针对其中包含的物体设置标签,然后通过计算机自我演进不断调整识别参数,来逐渐提高对物体的识别准确率。
由于客观物体本身的复杂性和多义性,现实生活中的很多对象可能同时与多个类别标签相关。为了更好的体现出实际对象所具有的多语义性,常使用一个适当的标签子集(包含多个相关的语义标签)描述该对象,这就形成了所谓的多标签分类问题。这时,每个样本都对应一个由多个标签构成的相关标签子集合,学习的目标就是为未知样本预测其相应的标签子集。
在实际的多标签分类中,首先会给定一系列训练数据,这里该一系列训练数据组成的集合可以称为训练数据集。但是,由于训练数据集中的标签是不同人标注的,或者标注标签时忽略了一些物体,导致标签可能是有缺失的,因此可以通过对训练数据集中的标签进行补全来提高多标签分类的准确性。多标签分类中对已知标签进行补全有多种方法,其中一种是通过核范数约束预测标签矩阵的秩,并通过最小化多标签分类的损失函数,计算该特征映射矩阵,得到低秩的预测标签矩阵,实现标签补全,进而提高多标签分类的性能。但是这种方法需要先提取图像的特征,然后根据图像的特征计算特征映射矩阵。在提取了图像的特征之后,该图像的特征就是固定的,因而不能够动态地根据标签学习输入图像的特征信息。
发明内容
本申请提供一种训练多标签分类模型的方法和装置,能够动态学习图像特征,使特征提取网络更适应任务需求,并且多标签分类效果好。
第一方面,提供了一种训练多标签分类模型的方法,包括:
在训练数据集中确定n个样本和与所述n个样本对应的标签矩阵Y c*n,所述标签矩阵Y c*n中的元素y i*j表示第i个样本是否包含第j个标签指示的对象,c表示与所述训练数据集中的样本相关的标签的个数。
利用特征提取网络提取所述n个样本的特征矩阵X d*n,其中,所述特征提取网络具有权值参数Z,d表示所述特征矩阵X d*n的特征维度。
这里,特征提取网络可以是任意一种能够提取图像特征的神经网络,例如可以为卷积神经网络或多层感知机等,本申请实施例对此不限定。其中,特征提取网络的权值可以表示为Z,具体的,Z可以包含多个权值矩阵。权值矩阵的参数可以随机初始化生成,或者可以采用预训练的模型参数。这里,预训练的模型参数指的是已经训练好的模型的参数,如vgg16网络在ImageNet数据集上训练好的模型参数。
利用特征映射网络获取所述特征矩阵X d*n的预测标签矩阵
Figure PCTCN2018094400-appb-000001
所述预测标签矩阵
Figure PCTCN2018094400-appb-000002
中的元素
Figure PCTCN2018094400-appb-000003
表示第i个样本包含第j个标签指示的对象的置信度,其中,所述特征映射网络的权值矩阵为低秩的特征映射矩阵M c*d,M c*d可以表示多标签分类模型中的特征属性与类别标签之间的相关权重,其初始值可以随机生成。作为一例,特征映射网络可以为权值矩阵为低秩的特征映射矩阵M c*d的映射网络,例如可以为全连接层。
具体的,特征映射网络可以表示为FCM。特征提取网络输出的特征矩阵X d*n可以输入至FCM,再由FCM将输入的特征矩阵X d*n映射到预测标签空间,得到预测标签矩阵
Figure PCTCN2018094400-appb-000004
即有:
Figure PCTCN2018094400-appb-000005
根据所述标签矩阵Y c*n和所述预测标签矩阵
Figure PCTCN2018094400-appb-000006
对所述权值参数Z、所述特征映射矩阵M c*d进行更新,训练所述多标签分类模型。
其中,n、c、i、j和d均为正整数,且i的取值范围为1至n,j的取值范围为1至c。
因此,本申请实施例所提供的该神经网络系统可以从输入数据直接训练出模型,而不需要额外的中间步骤,即该神经网络系统为一个端到端的的神经系统。这里,端到端的优点是特征提取、特征映射矩阵和低秩标签相关性矩阵可以同时优化,也就是说,本申请实施例可以动态学习图像特征,使特征提取网络更适应任务需求,并且多标签分类效果好。
可选的,所述低秩的特征映射网络包括第一子映射网络和第二子映射网络,所述低秩的特征映射网络、所述第一子映射网络和所述第二子映射网络具有以下关系:
Figure PCTCN2018094400-appb-000007
其中,所述第一子映射网络的权值矩阵为
Figure PCTCN2018094400-appb-000008
所述第二子映射网络的权值矩阵为H c*r,这里,为了保证M c*d
Figure PCTCN2018094400-appb-000009
和H c*r的低秩性,可以设置r为正整数且r≤min(d,c)。
在一个具体的实施例中,第一子映射网络可以为权值矩阵为
Figure PCTCN2018094400-appb-000010
的全连接层,第二子映射网络可以权值矩阵为H c*r的全连接层,
Figure PCTCN2018094400-appb-000011
和H c*r的初始值可以随机生成。
换句话说,本申请实施例中,可以通过矩阵低秩分解的方式对标签矩阵进行补全,即将预测标签矩阵
Figure PCTCN2018094400-appb-000012
进行低秩分解,即:
Figure PCTCN2018094400-appb-000013
这里,通过设置r≤min(d,c)使得
Figure PCTCN2018094400-appb-000014
和H c*r低秩,由于两个矩阵相乘后得到的矩阵的秩小于两个矩阵中的任意一个矩阵的秩,因此可以使得
Figure PCTCN2018094400-appb-000015
(即M c*d)低秩,进而使得
Figure PCTCN2018094400-appb-000016
Figure PCTCN2018094400-appb-000017
低秩。这里,r可以通过多次训练取最优值。
也就是说,本申请实施例可以通过预设的特征映射矩阵M(即M c*d)将X(即X d*n)映射获得预测标签矩阵
Figure PCTCN2018094400-appb-000018
(即
Figure PCTCN2018094400-appb-000019
),即
Figure PCTCN2018094400-appb-000020
因为
Figure PCTCN2018094400-appb-000021
的秩小于或等于M或X的秩,所以对M做低秩分解可以使得M低秩的同时保证
Figure PCTCN2018094400-appb-000022
的低秩性,因此也可以对M做低秩分解,即上述(2)式,这样可以等价于将
Figure PCTCN2018094400-appb-000023
分解成了两个低维度矩阵相乘的形式,进而保证
Figure PCTCN2018094400-appb-000024
的低秩性。
可选的,根据所述标签矩阵Y c*n和所述预测标签矩阵
Figure PCTCN2018094400-appb-000025
对所述权值参数Z、所述特征映射矩阵M c*d进行更新,包括:
确定所述预测标签矩阵
Figure PCTCN2018094400-appb-000026
和所述标签矩阵Y c*n之间的欧氏距离损失函数,该损失函数的表达式如下(3)式:
Figure PCTCN2018094400-appb-000027
或者,该损失函数的表达式如下(4)式:
Figure PCTCN2018094400-appb-000028
然后,根据所述欧氏距离损失函数,对所述权值参数Z、所述权值矩阵
Figure PCTCN2018094400-appb-000029
和H c*r进行更新。
可选的,所述根据所述欧氏距离损失函数,对所述权值参数Z、所述权值矩阵
Figure PCTCN2018094400-appb-000030
和H c*r进行更新,包括:
将所述欧氏距离损失函数与正则项之和,确定为所述n个样本的优化函数L n,其中,所述正则项用于约束所述权值参数Z、所述权值矩阵
Figure PCTCN2018094400-appb-000031
和H c*r,L n的表达式如(7)式或(8)式所示:
Figure PCTCN2018094400-appb-000032
Figure PCTCN2018094400-appb-000033
其中,优化函数L n的第一项为上述损失函数ε n,第二项为正则项,该正则项用于约束所述权值参数Z、所述权值矩阵
Figure PCTCN2018094400-appb-000034
和H c*r,防止过拟合。
这里,可以利用误差反向传播算法,最小化该损失函数L n,将所述优化函数的取值最小时所对应的权值参数Z作为更新后的权值参数Z,将所述优化函数的取值最小时所对应的权值矩阵
Figure PCTCN2018094400-appb-000035
作为更新后的权值矩阵
Figure PCTCN2018094400-appb-000036
将所述优化函数的取值最小时所对应的权值矩阵H c*r作为更新后的权值矩阵H c*r
然后,判断是否达到停止条件。
这里,停止条件为:L n不再下降,或下降幅度小于预设的阈值,或达到最大训练次数。 如没达到则重复训练,直到达到停止条件。本申请实施例中,把所有图片都输入一遍算作训练一轮,通常需要训练若干轮。
可选的,所述在训练数据集中确定n个样本和所述n个样本的标签矩阵Y c*n,包括:
确定训练数据集,所述训练数据集中包括D个样本和与所述D个样本中每个样本的标签向量,其中,所述每个样本的标签向量中的元素y j表示所述每个样本是否包含第j个标签指示的对象,其中,D为不小于n的正整数;
从所述训练数据集中随机抽取n个样本,并生成所述n个样本的标签矩阵Y c*n,所述标签矩阵Y c*n包括所述n个样本中的每个样本对应的标签向量。
因此,本申请实施例中,不必一次性输入整个训练数据集进行计算,而只需要分批次的输入图片进行计算,因此本申请实施例可以分批次地输入整个数据集进行训练。也就是说,本申请实施例中,可以通过多批次地输入数据集中的部分数据对模型进行训练,其中,每次输入的数据可以是从数据集中未输入的图片样本中随机抽取的。由于训练数据集通常包括大量的样本,因此本申请实施例通过分批次输入训练数据集可以减小训练模型的过程中对资源的占用,大大降低了训练模型的过程中对内存资源的需求,可以有效解决大规模数据下低秩标签相关性矩阵的计算问题。
可选的,还包括:利用所述特征提取网络提取第一样本的第一特征矩阵,其中,所述第一样本不属于所述n个样本;
利用所述第一映射网络获取所述第一特征矩阵的第一预测标签矩阵,所述第一预测标签矩阵中的元素表示所述第一样本包含第j个标签指示的对象的置信度。
具体的,训练完成后,在测试阶段,只需将测试图片输入至该神经网络模型中的特征提取网络,利用所述特征提取网络提取该测试图片的第一特征矩阵,并将该第一特征矩阵输入至特征映射网络(具体可以包括FCW和FCH),利用特征映射网络获取并输出所述第一特征矩阵的预测标签矩阵,所述预测标签矩阵中的元素表示所述测试包含第j个标签指示的对象的置信度。这里,测试图片可以为一个或多个图片,且可以不属于训练数据集。
第二方面,提供一种训练多标签分类模型的装置,所述装置用于执行上述第一方面或第一方面的任一可能的实现方式中的方法。具体地,所述装置可以包括用于执行第一方面或第一方面的任一可能的实现方式中的方法的模块。
第三方面,提供一种训练多标签分类模型的装置,所述装置包括存储器和处理器,所述存储器用于存储指令,所述处理器用于执行所述存储器存储的指令,并且对所述存储器中存储的指令的执行使得所述处理器执行第一方面或第一方面的任一可能的实现方式中的方法。
第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行第一方面或第一方面的任一可能的实现方式中的方法。
第五方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行第一方面或第一方面的任一可能的实现方式中的方法。
附图说明
图1示出了单标签分类和多标签分类问题的示意图。
图2示出了本申请实施例提供的一种训练多标签分类模型的方法的示意性流程图。
图3示出了本申请实施例提供的一种多标签分类模型的示意图。
图4示出了本申请实施例提供的一种多标签分类模型的示意图。
图5示出了本申请实施例提供的一种训练多标签分类模型的装置的示意性框图。
图6示出了本申请实施例提供的另一种训练多标签分类模型的装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1示出了单标签分类和多标签分类问题的示意图。如图1中(a)所示,单标签分类往往假设样本仅对应于一个类别标签,即具有唯一的语义意义。然后这种假设在许多实际情况下可能并不成立,特别考虑到客观对象本身所存在的语义多样性,物体很可能同时与多个不同的类别标签相关。因此在多标签问题中,如图1中(b)所示,常使用多个相关的类别标签来描述每个对象所对应的语义信息,例如,每幅图像可能同时对应多个语义标签,如“草地”,“天空”和“大海”,每首音乐片段可能会含有多种情绪,如“愉悦”和“轻松”。
多标签分类问题中,首先会给定一系列训练数据,这里该一系列训练数据组成的集合可以称为训练数据集。通过学习给定的训练数据,可以为未知样本预测其相应的标签子集。这里,训练数据集可以对应一个标签集合,该标签集合中可以包括与该训练数据相关的c个不同类别的标签,c为正整数。训练数据集可以包括D个样本和每个样本对应的标签子集,其中,D为正整数。可理解,这里的标签子集即为该标签集合的一个子集。也就是说,通过学习给定的训练数据集中的多个样本和每个样本对应的标签子集,可以预测未知样本的标签子集。
本申请实施例中,标签子集可以表示为标签向量。换句话说,样本的标签向量可以表示样本具有哪些标签或属于哪一些种类。例如,一幅图像的标签向量为[0 1 0 0 1 0],则表明共有6种类别,其中该标签向量中的每个元素代表一种类别或一个标签,0表示图像中没有这一类或这一标签,1表示图像中有这一类或这一标签。由于该标签向量有两个1标签,则表示该图像中有两种物体,分别属于第二类和第五类。这样,训练数据集中的D个样本中的每个样本可以对应一个标签向量y j,表示该样本是否包含第j个标签指示的对象,这里j的取值范围为1至c。应理解,本申请实施例中,样本是否包含第j个标签指示的对象即样本是否包含第j个标签。
这样,训练数据集中的全部或部分样本的标签向量就会组成一个标签矩阵Y:
Figure PCTCN2018094400-appb-000037
另外,预测标签向量是多标签分类器的输出,其维度与标签向量相同,代表多标签分类器对该图像所属类别的预测。预测标签向量的元素的值为实值,如果该实值超过给定的一个阈值,那么该元素对应的位置就属于相应类别,否则不属于该类别。例如,预测 标签向量为[0.7 0.2 0.1 0.8 1.0 0.0],阈值为0.5,将每一位上的数与阈值进行比较,大于阈值则相当于属于该类别。这样所预测的类别为第一类、第四类和第五类。如果该预测标签向量对应的标签向量为[1 0 0 1 0 1 0],则该预测标签向量完全正确。
在实际问题中,特别是数据中涉及大量类别标签的情况下,为数据中的每个样本都提供其对应的完整标签信息往往非常困难。因此,训练数据集中的样本所对应的标签信息很可能是不完全的。也就是说,在数据的标签矩阵中,样本不包含某标签并不代表实际情况下样本与该标签不相关。因此,需要通过训练数据集中已有的数据,对标签矩阵进行补全,以获得一个包含更丰富标签信息的预测标签矩阵,然后通过该包含更丰富标签信息的预测标签矩阵可以更加准确的预测未知样本的标签信息。
现有技术在对标签矩阵进行补全时,需要先提取图像的特征,然后根据图像的特征计算低秩的特征映射矩阵。在提取了图像的特征之后,该图像的特征就是固定的,因而不能够动态地根据标签学习输入图像的特征信息。基于此,本申请实施例设计了一种用于多标签分类的神经网络,能够通过学习特征映射矩阵以及优化特征提取网络来实现多标签分类算法。
神经网络系统是一种智能化的识别系统,其通过反复训练的方式累计训练结果,来提高对各种目标物体或声音的识别能力。卷积神经网络是神经网络发展的主流方向之一。卷积神经网络一般包括卷积层(Convolutional Layer),修正线性单元(Rectified Linear Units,ReLU)层、池化(Pooling)层以及全连接(Fully Connect,FC)层。其中,卷积层,ReLU层和Pooling层可能会交替多次重复设置。
卷积层可以被视为卷积神经网络的核心,在用于图像识别时,其输入端接收图像数据,用于通过滤波器对图像进行鉴定。这里的图像数据可以是摄像机拍到的图像转化结果,也可以是卷积层之前层的处理结果。通常图像数据是三维的图像阵列,比如32x32x3,其中,32x32是图像数据代表的图像的二维尺寸,即宽和高,这里的深度值3则是因为图像通常分为绿,红,蓝三个数据通道。卷积层中设有多个滤波器,不同的滤波器对应不同的图像特征(边界,颜色,形状等)对输入的图像数据按照一定的步长进行扫描。不同的滤波器中设置有不同的权重矩阵,所述权重矩阵为神经网络在学习过程中针对特定图像特征生成的。每一个滤波器每一拍扫描图像的一个区域,会得到一个三维的输入矩阵(MxNx3,M和N决定了扫描区域的尺寸),卷积网络将输入矩阵和权重矩阵作点积,得到一个结果值,然后会以特定步长扫描下一个区域,比如,横移两格。当一个滤波器按照特定步长扫描完所有区域后,结果值会构成一个二维矩阵;而当所有滤波器完成扫描后,结果值就会构成一个三维矩阵作为当前卷积层的输出,这个三维矩阵的不同深度层分别对应一个滤波器的扫描结果(即每个滤波器扫描后构成的二维矩阵)。
卷积层的输出会再送往ReLU层做处理(通过max(0,x)函数对输出的数值范围进行限定),以及送到Pooling层通过下采样缩减尺寸,在送往FC层之前,图像数据可能还会经过多个卷积层,以对图像特征进行深层次的鉴定(比如第一次卷积层仅对图像的轮廓特征进行鉴定,第二次卷积层开始识别图案等),最终输入FC层。与卷积层类似但稍有不同,FC层也是通过多个滤波器对输入数据作权重处理,但是FC层得每个滤波器并不像卷积层的滤波器那样通过每一拍的移位来扫描不同区域,而是一次性的扫描输入的图像数据的所有区域,然后与权重矩阵进行运算得到一个结果值。最终FC层输出的是一个1x1xN 的矩阵,其实也就是一个数据序列,这个数据序列的每一位对应不同的目标物体,其上的值可以被视作这些物体目标存在的分值。在卷积层和FC层中,都会用到权重矩阵,神经网络可以通过自训练维护多种权重矩阵。
下文将结合图2和图3详细介绍本申请实施例的训练多标签分类模型的方法。
图2示出了本申请实施例提供的一种训练多标签分类模型的方法的示意性流程图。应理解,图2示出了训练多标签分类模型的方法的步骤或操作,但这些步骤或操作仅是示例,本申请实施例还可以执行其他操作或者图2中的各个操作的变形。此外,图2中的各个步骤可以按照与图2呈现的不同的顺序来执行,并且有可能并非要执行图2中的全部操作。
图3示出了本申请实施例提供的一种多标签分类模型300的示意图。该多标签分类模型300具体为神经网络系统。该多标签分类模型300包括特征提取网络301、特征映射网络302和处理单元305,其中,特征映射网络302可以包括FCW 303和FCH 304。应理解,图3示出的多标签分类模型300仅是示例,本申请实施例还可以包括其他模块或单元或者图3中的各个模块或单元的变形。
应注意,本申请实施例中多标签分类方法可以应用于图像标注、图像识别、声音识别、文本分类等多个领域,具体的,对应的训练数据集中的样本可以为图像、声音、文档等,本申请实施例对此不限定。为了描述方便,下文将以使用图像样本进行图像识别为例进行描述,但这并不会对本申请实施例的方案构成限制。
210,初始化多标签分类模型200的权值。
初始化多标签分类模型200的权值即初始化系统中的特征提取网络301、特征映射网络(即FCW303以及FCH 304)的权值。
这里,特征提取网络301可以是任意一种能够提取图像特征的神经网络,例如可以为卷积神经网络或多层感知机等,本申请实施例对此不限定。其中,特征提取网络301的权值可以表示为Z,具体的,Z可以包含多个权值矩阵。权值矩阵的参数可以随机初始化生成,或者可以采用预训练的模型参数。这里,预训练的模型参数指的是已经训练好的模型的参数,如vgg16网络在ImageNet数据集上训练好的模型参数。
另外,特征映射网络302可以为权值矩阵为低秩的特征映射矩阵M c*d的映射网络,例如可以为全连接层,其中M c*d可以表示多标签分类模型中的特征属性与类别标签之间的相关权重,其初始值可以随机生成。在一个具体的实施例中,特征映射网络302可以包括FCW303以及FCH 304,其中,FCW 303表示权值矩阵为
Figure PCTCN2018094400-appb-000038
的全连接层,FCH 304表示权值矩阵为H c*r的全连接层,
Figure PCTCN2018094400-appb-000039
和H c*r的初始值可以随机生成。这里,为了保证M c*d
Figure PCTCN2018094400-appb-000040
和H c*r的低秩性,可以设置r≤min(d,c)。
220,输入n幅图片。
由于神经网络的特性,不必一次性输入整个训练数据集进行计算,而只需要分批次的输入图片进行计算,因此本申请实施例可以分批次地输入整个数据集进行训练。也就是说,本申请实施例中,可以通过多批次地输入数据集中的部分数据对模型进行训练,其中,每次输入的数据可以是从数据集中未输入的图片样本中随机抽取的。由于训练数据集通常包括大量的样本,因此本申请实施例通过分批次输入训练数据集可以减小训练模型的过程中对资源的占用。
这时,一个批次输入至多标签分类模型300的样本的个数可以为n个。当样本为图片 时,该n个样本可以表示为image_n,并且更具体的,image_n可以为从训练数据集的D个样本中随机抽取的n个图片,并且,n的取值可以远小于D。具体而言,n的大小可以根据该多标签分类模型300的能力确定。例如,如果该多标签分类模型300的数据处理能力较强,则n可以设置的比较大,以缩短训练模型的时间。又例如,如果该多标签分类模型300的数据处理能力较弱,则n可以设置的比较小,以降低训练模型所消耗的资源。这样,本申请实施例能够灵活地根据多标签分类模型300的数据处理能力设置n的取值。
并且,该n个样本所对应的标签矩阵可以表示为Y c*n,标签矩阵Y c*n中的元素y i*j表示第i个样本是否包含第j个标签指示的对象,这里i的取值范围为1至n,j的取值范围为1至c。具体的,标签矩阵的描述可以参见上文的描述,为避免重复,这里不再赘述。
本申请实施例中,可以将训练数据输入至图3中所示的多标签分类模型300。具体的,可以将训练数据集中的n个图片和该n个图片的标签矩阵Y c*n分别输入至该多标签分类模型300。
230,提取图片的特征。
具体而言,可以将n个图片输入至特征提取网络301,特征提取网络301经过卷积层、激活函数层、Pooling层、全连接层、Batchnorm层的作用,可以提取该n个图片的特征,并输出特征矩阵X d*n。其中,d为正整数且表示所述特征矩阵X d*n的特征维度。
240,根据图片的特征计算图片的预测标签矩阵。
具体的,特征提取网络301输出的特征矩阵X d*n可以输入至特征映射网络302。由于特征映射网络的权值矩阵为低秩的特征映射矩阵M c*d,M c*d可以表示多标签分类模型中的特征属性与类别标签之间的相关权重,因此特征映射网络302可以将输入的特征矩阵X d*n映射到预测标签空间,得到预测标签矩阵
Figure PCTCN2018094400-appb-000041
即有:
Figure PCTCN2018094400-appb-000042
这里,预测标签矩阵
Figure PCTCN2018094400-appb-000043
可以为包含更丰富标签信息的标签矩阵,其中的每个元素
Figure PCTCN2018094400-appb-000044
表示第i个样本包含第j个标签指示的对象的置信度。因此,可以称预测标签矩阵
Figure PCTCN2018094400-appb-000045
为补全标签矩阵,称标签矩阵Y c*n为有缺失的标签矩阵。
需要注意的是,在实际问题中,标签矩阵中的标签并不是相互独立的,而是语义相关的。比如羊和草在一幅图片中出现的可能性很大,山和天空一起出现的可能性也很大,而羊和办公室一起出现的可能性很小,并且这种相关性可以用来提高多标签分类的准确性。由此可知,补全标签矩阵
Figure PCTCN2018094400-appb-000046
中的标签之间是具有相关性的,即
Figure PCTCN2018094400-appb-000047
是低秩的,因此可以根据矩阵的低秩结构由Y c*n得到
Figure PCTCN2018094400-appb-000048
并且这个过程可以称为标签矩阵的补全。
本申请实施例中,可以通过矩阵低秩分解的方式对标签矩阵进行补全,即将预测标签矩阵
Figure PCTCN2018094400-appb-000049
进行低秩分解,即:
Figure PCTCN2018094400-appb-000050
也就是说,本申请实施例中,所述低秩的特征映射网络可以包括第一子映射网络和第二子映射网络,所述低秩的特征映射网络、所述第一子映射网络和所述第二子映射网络具有以下关系:
Figure PCTCN2018094400-appb-000051
其中,所述第一子映射网络的权值矩阵为
Figure PCTCN2018094400-appb-000052
所述第二子映射网络的权值矩阵为H c*r,这里,为了保证M c*d
Figure PCTCN2018094400-appb-000053
和H c*r的低秩性,可以设置r为正整数且r≤min(d,c)。
在一个具体的实施例中,第一子映射网络可以为权值矩阵为
Figure PCTCN2018094400-appb-000054
的全连接层,表示为FCW,第二子映射网络可以为权值矩阵为H c*r的全连接层,表示为FCH,
Figure PCTCN2018094400-appb-000055
和H c*r的初始值可以随机生成。
这里,通过设置r≤min(d,c)使得
Figure PCTCN2018094400-appb-000056
和H c*r低秩,由于两个矩阵相乘后得到的矩阵的秩小于两个矩阵中的任意一个矩阵的秩,因此可以使得
Figure PCTCN2018094400-appb-000057
(即M c*d)低秩,进而使得
Figure PCTCN2018094400-appb-000058
Figure PCTCN2018094400-appb-000059
低秩。这里,r可以通过多次训练取最优值。
换句话说,本申请实施例可以通过预设的特征映射矩阵M(即M c*d)将X(即X d*n)映射获得预测标签矩阵
Figure PCTCN2018094400-appb-000060
(即
Figure PCTCN2018094400-appb-000061
),即
Figure PCTCN2018094400-appb-000062
因为
Figure PCTCN2018094400-appb-000063
的秩小于或等于M或X的秩,所以对M做低秩分解可以使得M低秩的同时保证
Figure PCTCN2018094400-appb-000064
的低秩性,因此也可以对M做低秩分解,即上述(2)式,这样可以等价于将
Figure PCTCN2018094400-appb-000065
分解成了两个低维度矩阵相乘的形式,进而保证
Figure PCTCN2018094400-appb-000066
的低秩性。
250,计算优化函数。
具体而言,在本申请实施例中,可以通过预设的矩阵H c*r
Figure PCTCN2018094400-appb-000067
来代替预设矩阵M。但是,由于
Figure PCTCN2018094400-appb-000068
是使用现成的矩阵H c*r
Figure PCTCN2018094400-appb-000069
得出的,因此这样预测标签矩阵
Figure PCTCN2018094400-appb-000070
并不准确,此时需要在学习过程中与参照的标签矩阵Y进行对照,来学习更新矩阵H c*r
Figure PCTCN2018094400-appb-000071
此时,处理单元305可以根据所述标签矩阵Y c*n和所述预测标签矩阵
Figure PCTCN2018094400-appb-000072
对所述权值参数Z、所述特征映射矩阵M c*d进行更新,以训练所述多标签分类模型300。
具体的,处理单元305可以确定所述预测标签矩阵
Figure PCTCN2018094400-appb-000073
和所述标签矩阵Y c*n之间的欧氏距离损失函数,作用是约束
Figure PCTCN2018094400-appb-000074
使之与Y c*n相近,该损失函数的表达式如下(3)式:
Figure PCTCN2018094400-appb-000075
或者,该损失函数的表达式如下(4)式:
Figure PCTCN2018094400-appb-000076
这里,为了便于描述,省略了
Figure PCTCN2018094400-appb-000077
H c*r上标和下标。其中,
Figure PCTCN2018094400-appb-000078
是矩阵的Frobenius范数,矩阵A m*n的Frobenius范数定义为:
Figure PCTCN2018094400-appb-000079
其中,A ij为矩阵A的元素,即欧氏距离损失函数。
另外,公式(4)中的P Ω为投影算子,即观察到的元素保持不变,未观察到的元素值为0,其作用就是只让观察到的元素参与计算。具体形式为:
Figure PCTCN2018094400-appb-000080
比如假设
Figure PCTCN2018094400-appb-000081
Y=[1 0 0 ? 0 1],?是缺失的元素,那么
Figure PCTCN2018094400-appb-000082
即?所在位置在计算的时候置为0。这样,
Figure PCTCN2018094400-appb-000083
和Y中的在该位置上的元素均不参与计算(即视为0),可以 避免缺失的元素导致损失函数的值偏大,进而提高计算的准确性。
进一步地,可以将上述损失函数与正则项之和,确定为所述n个样本的损失函数L n。这里,损失函数L n也可以称为优化函数L n,具体的,L n的表达式如(7)式或(8)式所示:
Figure PCTCN2018094400-appb-000084
Figure PCTCN2018094400-appb-000085
其中,优化函数L n的第一项为上述损失函数ε n,第二项为正则项,该正则项用于约束所述权值参数Z、所述权值矩阵
Figure PCTCN2018094400-appb-000086
和H c*r,防止过拟合。
260,利用误差反向算法更新权值参数。
误差反向传播算法是一种用于多层神经网络训练的方法,其以梯度下降方法为基础,通过优化损失函数,对神经网络每层的权值进行学习更新。
具体的,可以利用误差反向传播算法,最小化该损失函数L n,将所述优化函数的取值最小时所对应的权值参数Z作为更新后的权值参数Z,将所述优化函数的取值最小时所对应的特征映射矩阵M c*d作为更新后的特征映射矩阵M c*d
Figure PCTCN2018094400-appb-000087
时,则有:将所述优化函数的取值最小时所对应的权值矩阵
Figure PCTCN2018094400-appb-000088
作为更新后的权值矩阵
Figure PCTCN2018094400-appb-000089
将所述优化函数的取值最小时所对应的权值矩阵H c*r作为更新后的权值矩阵H c*r
为使用误差反向传播算法,下面对(7)式中的变量进行求导。以输入一幅图片、正则项采用l 2范数为例。
记L 1为一幅图片的优化函数,则有:
Figure PCTCN2018094400-appb-000090
其中,矩阵的Frobenius范数的平方对应向量的l 2范数的平方。
下面对
Figure PCTCN2018094400-appb-000091
H c*r的每一个元素求导得:
Figure PCTCN2018094400-appb-000092
Figure PCTCN2018094400-appb-000093
其中,h kj为矩阵H c*r的元素,w ji为矩阵
Figure PCTCN2018094400-appb-000094
的元素,x i为向量x d的向量,p j为向量
Figure PCTCN2018094400-appb-000095
的元素,
Figure PCTCN2018094400-appb-000096
为向量
Figure PCTCN2018094400-appb-000097
的元素,y j/y k为向量y c的元素,x d、p r
Figure PCTCN2018094400-appb-000098
y c分别为矩阵X d*n、P r*n
Figure PCTCN2018094400-appb-000099
Y c*n的列向量。对特征提取网络权值Z的误差反向求导可通过
Figure PCTCN2018094400-appb-000100
传递得到。则H c*r
Figure PCTCN2018094400-appb-000101
的元素更新为:
Figure PCTCN2018094400-appb-000102
Figure PCTCN2018094400-appb-000103
Figure PCTCN2018094400-appb-000104
是本次更新得到的值,
Figure PCTCN2018094400-appb-000105
是上次更新得到的值,w ji与之类似,η 1、η 2分别是H c*r
Figure PCTCN2018094400-appb-000106
的学习率,用于控制更新速率。特征提取网络部分权值Z的更新与此类似。
这样就可以学习到特征提取网络的权值Z、特征映射矩阵
Figure PCTCN2018094400-appb-000107
补足缺失标签,提升多标签分类的能力。
270,判断是否达到停止条件。
这里,停止条件为:L n不再下降,或下降幅度小于预设的阈值,或达到最大训练次数。如没达到则重复步骤220至260,直到达到停止条件。本申请实施例中,把所有图片都输入一遍算作训练一轮,通常需要训练若干轮。
训练完成后,在测试阶段,只需执行220和230,即将测试图片输入至该神经网络模型中的特征提取网络,利用所述特征提取网络提取该测试图片的第一特征矩阵,并将该第一特征矩阵输入至FCM,利用FCM获取并输出所述第一特征矩阵的预测标签矩阵,所述预测标签矩阵中的元素表示所述测试包含第j个标签指示的对象的置信度。这里,测试图片可以为一个或多个图片,且可以不属于训练数据集。
并且具体的,对预测标签矩阵的单个预测向量
Figure PCTCN2018094400-appb-000108
来看,通过对
Figure PCTCN2018094400-appb-000109
做处理即可得到该图片所属的一个或多个类别,例如
Figure PCTCN2018094400-appb-000110
的某一个或一些元素值大于预设的阈值即表示该图片在该一个元素或多个元素相应位置有类别标签,该图片属于这一类或者几类。这里,预设的阈值可以为0.5,或者其他数值,本申请实施例对此不限定。
因此,本申请实施例所提供的该神经网络系统可以从输入数据直接训练出模型,而不需要额外的中间步骤,即该神经网络系统为一个端到端的的神经系统。这里,端到端的优点是特征提取网络的权值参数和特征映射矩阵可以同时优化,也就是说,本申请实施例可以动态学习图像特征,使特征提取网络更适应任务需求,多标签分类效果好。
另外,本申请实施例可以分批次地利用图片样本的图像特征计算特征映射矩阵,而不 必一次性用整个数据集的图像特征作为输入进行计算,即无须一次性用全部样本的图像特征进行训练,大大降低了训练模型的过程中对内存资源的需求,可以有效解决大规模数据下多标签分类的计算问题。
图4示出了本申请实施例提供的一种多标签分类模型500的示意图。该模型500的特征提取网络部分采用VGG16网络,并且将VGG16网络的倒数第二个全连接层后的Dropout层的输出作为特征矩阵X。另外,特征提取网络的权值参数Z采用在ImageNet数据集上训练好的权值参数,然后对其微调(微调指固定前面几层的权值或者只进行很小的调整,完全训练最后一层或两层网络)。权值矩阵H和W的初始值可以采用高斯分布进行初始化,且H和W的值要完全训练。正则项可以采用Frobenius范数。
具体的,在训练时,特征提取网络VGG16(除去最后一个全连接层)的权值采用在ImageNet数据集上预训练的权值。
将n幅像素大小为224*224的RGB三通道图片image_n输入到VGG16网络中,这里1≤n≤N,N为训练集中图片的数量,图片大小可以表示为n*C*h*w或h*w*C*n等四维矩阵,其中,C为通道数(RGB图像为3),h为图片高度(224像素),w为图片宽度(224像素)。图片经过多次卷积、激活、Pooling等操作后,再经过两个全连接层以及Dropout层得到图像特征矩阵X 4096*n
X 4096*n经过权值矩阵分别为
Figure PCTCN2018094400-appb-000111
和H c*r的全连接层(FCW 503和FCH 504),得到预测标签矩阵
Figure PCTCN2018094400-appb-000112
处理单元505根据标签矩阵Y c*n和预测标签矩阵
Figure PCTCN2018094400-appb-000113
得到优化函数:
Figure PCTCN2018094400-appb-000114
然后,利用误差反向传播算法,最小化上述优化函数,更新权值参数Z,权值矩阵
Figure PCTCN2018094400-appb-000115
和H c*r。具体的优化过程可以参见上文中的描述,为避免重复,这里不再赘述。
在更新了权值参数Z,权值矩阵
Figure PCTCN2018094400-appb-000116
和H c*r之后,判断是否达到停止条件,如没达到则重复步骤,直到达到停止条件。具体的,停止条件可以参见上文中的描述,为避免重复,这里不再赘述。
在训练完成后,可以将测试图片输入至特征提取网络501,并将特征提取网络提取的图片的特征输入至FCW 503和FCH 504,通过FCW 503和FCH 504得到预测标签矩阵。
应注意,本申请实施例中,特征提取网络的结构可以采用其他网络代替,如AlexNet、GoogleNet、ResNet以及自定义网络等,本申请实施例对此不限定。特征输出的层可以采用上述网络的某一层的输出,也可以在上述基础上增减若干卷积层或全连接层。另外,本申请实施例还可以采用不同的正则化项。
因此,本申请实施例所提供的该神经网络系统可以从输入数据直接训练出模型,而不需要额外的中间步骤,即该神经网络系统为一个端到端的的神经系统。这里,端到端的优点是特征提取网络的权值参数和特征映射矩阵可以同时优化,也就是说,本申请实施例可以动态学习图像特征,使特征提取网络更适应任务需求,多标签分类效果好。
另外,本申请实施例可以分批次地利用图片样本的图像特征计算特征映射矩阵,而不必一次性用整个数据集的图像特征作为输入进行计算,即无须一次性用全部样本的图像特征进行训练,大大降低了训练模型的过程中对内存资源的需求,可以有效解决大规模数据 下多标签分类的计算问题。
应注意,本申请实施例不限定专门的产品形态,本申请实施例的多标签分类的方法可以部署在通用的计算机节点上。初步构建的多标签分类模型可以被存储在硬盘存储器中,通过处理器和内存运行算法,对已有的训练数据集进行学习,得到该多标签分类模型。通该多标签分类模型可以对未知样本的标签进行预测,将预测结果存入硬盘存储器,实现对已有的标签集进行补全,并对未知样本对应的标签进行预测。
图5示出了本申请实施例提供的一种训练多标签分类模型的装置600的示意性框图。装置600包括确定单元610、提取单元620、获取单元630和更新单元640。
确定单元610,用于在训练数据集中确定n个样本和与所述n个样本对应的标签矩阵Y c*n,所述标签矩阵Y c*n中的元素y i*j表示第i个样本是否包含第j个标签指示的对象,c表示与所述训练数据集中的样本相关的标签的个数。
提取单元620,用于利用特征提取网络提取所述n个样本的特征矩阵X d*n,其中,所述特征提取网络具有权值参数Z,d表示所述特征矩阵X d*n的特征维度。
获取单元630,用于利用特征映射网络获取所述特征矩阵X d*n的预测标签矩阵
Figure PCTCN2018094400-appb-000117
所述预测标签矩阵
Figure PCTCN2018094400-appb-000118
中的元素
Figure PCTCN2018094400-appb-000119
表示第i个样本包含第j个标签指示的对象的置信度,其中,所述特征映射网络的权值矩阵为低秩的特征映射矩阵M c*d
更新单元640,用于根据所述标签矩阵Y c*n和所述预测标签矩阵
Figure PCTCN2018094400-appb-000120
对所述权值参数Z、所述特征映射矩阵M c*d进行更新,训练所述多标签分类模型。
其中,n、c、i、j和d均为正整数,且i的取值范围为1至n,j的取值范围为1至c。
因此,本申请实施例所提供的该神经网络系统可以从输入数据直接训练出模型,而不需要额外的中间步骤,即该神经网络系统为一个端到端的的神经系统。这里,端到端的优点是特征提取、特征映射矩阵和低秩标签相关性矩阵可以同时优化,也就是说,本申请实施例可以动态学习图像特征,使特征提取网络更适应任务需求,并且多标签分类效果好。
可选的,所述低秩的特征映射网络包括第一子映射网络和第二子映射网络,所述低秩的特征映射网络、所述第一子映射网络和所述第二子映射网络具有以下关系:
Figure PCTCN2018094400-appb-000121
其中,所述第一子映射网络的权值矩阵为
Figure PCTCN2018094400-appb-000122
所述第二子映射网络的权值矩阵为H c*r,r为正整数且r≤min(d,c)。
可选的,所述更新单元具体用于:
确定所述预测标签矩阵
Figure PCTCN2018094400-appb-000123
和所述标签矩阵Y c*n之间的欧氏距离损失函数;
根据所述欧氏距离损失函数,对所述权值参数Z、所述权值矩阵
Figure PCTCN2018094400-appb-000124
和H c*r进行更新。
可选的,所述更新单元具体还用于:
将所述欧氏距离损失函数与正则项之和,确定为所述n个样本的优化函数,其中,所述正则项用于约束所述权值参数Z、所述权值矩阵
Figure PCTCN2018094400-appb-000125
和H c*r
将所述优化函数的取值最小时所对应的权值参数Z作为更新后的权值参数Z,将所述优化函数的取值最小时所对应的权值矩阵
Figure PCTCN2018094400-appb-000126
作为更新后的权值矩阵
Figure PCTCN2018094400-appb-000127
将所述优化函数的取值最小时所对应的权值矩阵H c*r作为更新后的权值矩阵H c*r
可选的,所述确定单元具体用于:
确定训练数据集,所述训练数据集中包括D个样本和与所述D个样本中每个样本的 标签向量,其中,所述每个样本的标签向量中的元素y j表示所述每个样本是否包含第j个标签指示的对象,其中,D为不小于n的正整数;
从所述训练数据集中随机抽取n个样本,并生成所述n个样本的标签矩阵Y c*n,所述标签矩阵Y c*n包括所述n个样本中的每个样本对应的标签向量。
因此,本申请实施例中,不必一次性输入整个训练数据集进行计算,而只需要分批次的输入图片进行计算,因此本申请实施例可以分批次地输入整个数据集进行训练。由于训练数据集通常包括大量的样本,因此本申请实施例通过分批次输入训练数据集可以减小训练模型的过程中对资源的占用,大大降低了训练模型的过程中对内存资源的需求,可以有效解决大规模数据下低秩标签相关性矩阵的计算问题。
可选的,还包括:所述提取单元还用于利用所述特征提取网络提取第一样本的第一特征矩阵,其中,所述第一样本不属于所述n个样本;
所述获取单元还用于利用所述第一映射网络获取所述第一特征矩阵的第一预测标签矩阵,所述第一预测标签矩阵中的元素表示所述第一样本包含第j个标签指示的对象的置信度。
应注意,本发明实施例中,确定单元610、提取单元620、获取单元630和更新单元640可以由处理器实现。如图6所示,训练多标签分类模型的装置700可以包括处理器710、存储器720和通信接口730。其中,存储器720可以用于存储处理器710执行的指令或代码等。当该指令或代码被执行时,该处理器710用于执行上述方法实施例提供的方法,处理器710还用于控制通信接口730与外界进行通信。
在实现过程中,上述方法的各步骤可以通过处理器710中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器720,处理器710读取存储器720中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
图5所示的训练多标签分类模型的装置600或图6所示的训练多标签分类模型的装置700能够实现前述方法实施例对应的各个过程,具体的,该训练多标签分类模型的装置600或训练多标签分类模型的装置700可以参见上文中的描述,为避免重复,这里不再赘述。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本申请实施例还提供了一种计算机可读存储介质,其特征在于,包括计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行上述方法实施例提供的方法。
本申请实施例还提供了一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述方法实施例提供的方法。
应理解,本发明实施例中提及的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、 分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本发明实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
还应理解,本文中涉及的第一、第二以及各种数字编号仅为描述方便进行的区分,并不用来限制本申请的范围。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现 有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (14)

  1. 一种训练多标签分类模型的方法,其特征在于,包括:
    在训练数据集中确定n个样本和与所述n个样本对应的标签矩阵Y c*n,所述标签矩阵Y c*n中的元素y i*j表示第i个样本是否包含第j个标签指示的对象,c表示与所述训练数据集中的样本相关的标签的个数;
    利用特征提取网络提取所述n个样本的特征矩阵X d*n,其中,所述特征提取网络具有权值参数Z,d表示所述特征矩阵X d*n的特征维度;
    利用特征映射网络获取所述特征矩阵X d*n的预测标签矩阵
    Figure PCTCN2018094400-appb-100001
    所述预测标签矩阵
    Figure PCTCN2018094400-appb-100002
    中的元素
    Figure PCTCN2018094400-appb-100003
    表示第i个样本包含第j个标签指示的对象的置信度,其中,所述特征映射网络的权值矩阵为低秩的特征映射矩阵M c*d
    根据所述标签矩阵Y c*n和所述预测标签矩阵
    Figure PCTCN2018094400-appb-100004
    对所述权值参数Z、所述特征映射矩阵M c*d进行更新,训练所述多标签分类模型;
    其中,n、c、i、j和d均为正整数,且i的取值范围为1至n,j的取值范围为1至c。
  2. 根据权利要求1所述的方法,其特征在于,所述低秩的特征映射网络包括第一子映射网络和第二子映射网络,所述低秩的特征映射网络、所述第一子映射网络和所述第二子映射网络具有以下关系:
    Figure PCTCN2018094400-appb-100005
    其中,所述第一子映射网络的权值矩阵为
    Figure PCTCN2018094400-appb-100006
    所述第二子映射网络的权值矩阵为H c*r,r为正整数且r≤min(d,c)。
  3. 根据权利要求2所述的方法,其特征在于,根据所述标签矩阵Y c*n和所述预测标签矩阵
    Figure PCTCN2018094400-appb-100007
    对所述权值参数Z、所述特征映射矩阵M c*d进行更新,包括:
    确定所述预测标签矩阵
    Figure PCTCN2018094400-appb-100008
    和所述标签矩阵Y c*n之间的欧氏距离损失函数;
    根据所述欧氏距离损失函数,对所述权值参数Z、所述权值矩阵
    Figure PCTCN2018094400-appb-100009
    和H c*r进行更新。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述欧氏距离损失函数,对所述权值参数Z、所述权值矩阵
    Figure PCTCN2018094400-appb-100010
    和H c*r进行更新,包括:
    将所述欧氏距离损失函数与正则项之和,确定为所述n个样本的优化函数,其中,所述正则项用于约束所述权值参数Z、所述权值矩阵
    Figure PCTCN2018094400-appb-100011
    和H c*r
    将所述优化函数的取值最小时所对应的权值参数Z作为更新后的权值参数Z,将所述优化函数的取值最小时所对应的权值矩阵
    Figure PCTCN2018094400-appb-100012
    作为更新后的权值矩阵
    Figure PCTCN2018094400-appb-100013
    将所述优化函数的取值最小时所对应的权值矩阵H c*r作为更新后的权值矩阵H c*r
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述在训练数据集中确定n个样本和所述n个样本的标签矩阵Y c*n,包括:
    确定训练数据集,所述训练数据集中包括D个样本和与所述D个样本中每个样本的标签向量,其中,所述每个样本的标签向量中的元素y j表示所述每个样本是否包含第j个标签指示的对象,其中,D为不小于n的正整数;
    从所述训练数据集中随机抽取n个样本,并生成所述n个样本的标签矩阵Y c*n,所述标签矩阵Y c*n包括所述n个样本中的每个样本对应的标签向量。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,还包括:
    利用所述特征提取网络提取第一样本的第一特征矩阵,其中,所述第一样本不属于所述n个样本;
    利用所述第一映射网络获取所述第一特征矩阵的第一预测标签矩阵,所述第一预测标签矩阵中的元素表示所述第一样本包含第j个标签指示的对象的置信度。
  7. 一种训练多标签分类模型的装置,其特征在于,包括:
    确定单元,用于在训练数据集中确定n个样本和与所述n个样本对应的标签矩阵Y c*n,所述标签矩阵Y c*n中的元素y i*j表示第i个样本是否包含第j个标签指示的对象,c表示与所述训练数据集中的样本相关的标签的个数;
    提取单元,用于利用特征提取网络提取所述n个样本的特征矩阵X d*n,其中,所述特征提取网络具有权值参数Z,d表示所述特征矩阵X d*n的特征维度;
    获取单元,用于利用特征映射网络获取所述特征矩阵X d*n的预测标签矩阵
    Figure PCTCN2018094400-appb-100014
    所述预测标签矩阵
    Figure PCTCN2018094400-appb-100015
    中的元素
    Figure PCTCN2018094400-appb-100016
    表示第i个样本包含第j个标签指示的对象的置信度,其中,所述特征映射网络的权值矩阵为低秩的特征映射矩阵M c*d
    更新单元,用于根据所述标签矩阵Y c*n和所述预测标签矩阵
    Figure PCTCN2018094400-appb-100017
    对所述权值参数Z、所述特征映射矩阵M c*d进行更新,训练所述多标签分类模型;
    其中,n、c、i、j和d均为正整数,且i的取值范围为1至n,j的取值范围为1至c。
  8. 根据权利要求7所述的装置,其特征在于,所述低秩的特征映射网络包括第一子映射网络和第二子映射网络,所述低秩的特征映射网络、所述第一子映射网络和所述第二子映射网络具有以下关系:
    Figure PCTCN2018094400-appb-100018
    其中,所述第一子映射网络的权值矩阵为
    Figure PCTCN2018094400-appb-100019
    所述第二子映射网络的权值矩阵为H c*r,r为正整数且r≤min(d,c)。
  9. 根据权利要求8所述的装置,其特征在于,所述更新单元具体用于:
    确定所述预测标签矩阵
    Figure PCTCN2018094400-appb-100020
    和所述标签矩阵Y c*n之间的欧氏距离损失函数;
    根据所述欧氏距离损失函数,对所述权值参数Z、所述权值矩阵
    Figure PCTCN2018094400-appb-100021
    和H c*r进行更新。
  10. 根据权利要求9所述的装置,其特征在于,所述更新单元具体还用于:
    将所述欧氏距离损失函数与正则项之和,确定为所述n个样本的优化函数,其中,所述正则项用于约束所述权值参数Z、所述权值矩阵
    Figure PCTCN2018094400-appb-100022
    和H c*r
    将所述优化函数的取值最小时所对应的权值参数Z作为更新后的权值参数Z,将所述优化函数的取值最小时所对应的权值矩阵
    Figure PCTCN2018094400-appb-100023
    作为更新后的权值矩阵
    Figure PCTCN2018094400-appb-100024
    将所述优化函数的取值最小时所对应的权值矩阵H c*r作为更新后的权值矩阵H c*r
  11. 根据权利要求7-10任一项所述的装置,其特征在于,所述确定单元具体用于:
    确定训练数据集,所述训练数据集中包括D个样本和与所述D个样本中每个样本的标签向量,其中,所述每个样本的标签向量中的元素y j表示所述每个样本是否包含第j个标签指示的对象,其中,D为不小于n的正整数;
    从所述训练数据集中随机抽取n个样本,并生成所述n个样本的标签矩阵Y c*n,所述标签矩阵Y c*n包括所述n个样本中的每个样本对应的标签向量。
  12. 根据权利要求7-11任一项所述的装置,其特征在于,还包括:
    所述提取单元还用于利用所述特征提取网络提取第一样本的第一特征矩阵,其中,所述第一样本不属于所述n个样本;
    所述获取单元还用于利用所述第一映射网络获取所述第一特征矩阵的第一预测标签矩阵,所述第一预测标签矩阵中的元素表示所述第一样本包含第j个标签指示的对象的置信度。
  13. 一种训练多标签分类模型的装置,其特征在于,包括:存储器和处理器,其中,所述存储器用于存储指令,所述处理器用于执行所述存储器存储的指令,以使得所述处理器执行如权利要求1-6任一项所述的方法。
  14. 一种计算机可读存储介质,其特征在于,包括计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1-6中任一项所述的方法。
PCT/CN2018/094400 2017-11-24 2018-07-04 训练多标签分类模型的方法和装置 WO2019100724A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711187395.0A CN109840530A (zh) 2017-11-24 2017-11-24 训练多标签分类模型的方法和装置
CN201711187395.0 2017-11-24

Publications (1)

Publication Number Publication Date
WO2019100724A1 true WO2019100724A1 (zh) 2019-05-31

Family

ID=66630474

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094400 WO2019100724A1 (zh) 2017-11-24 2018-07-04 训练多标签分类模型的方法和装置

Country Status (2)

Country Link
CN (1) CN109840530A (zh)
WO (1) WO2019100724A1 (zh)

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569764A (zh) * 2019-08-28 2019-12-13 北京工业大学 一种基于卷积神经网络的手机型号识别方法
CN110659667A (zh) * 2019-08-14 2020-01-07 平安科技(深圳)有限公司 图片分类模型训练方法、系统和计算机设备
CN110688893A (zh) * 2019-08-22 2020-01-14 成都通甲优博科技有限责任公司 佩戴安全帽的检测方法、模型训练方法及相关装置
CN110765935A (zh) * 2019-10-22 2020-02-07 上海眼控科技股份有限公司 图像处理方法、装置、计算机设备及可读存储介质
CN110929785A (zh) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 数据分类方法、装置、终端设备及可读存储介质
CN111027582A (zh) * 2019-09-20 2020-04-17 哈尔滨理工大学 基于低秩图学习的半监督特征子空间学习方法及装置
CN111275089A (zh) * 2020-01-16 2020-06-12 北京松果电子有限公司 一种分类模型训练方法及装置、存储介质
CN111291618A (zh) * 2020-01-13 2020-06-16 腾讯科技(深圳)有限公司 标注方法、装置、服务器和存储介质
CN111340131A (zh) * 2020-03-09 2020-06-26 北京字节跳动网络技术有限公司 图像的标注方法、装置、可读介质和电子设备
CN111339362A (zh) * 2020-02-05 2020-06-26 天津大学 一种基于深度协同矩阵分解的短视频多标签分类方法
CN111461191A (zh) * 2020-03-25 2020-07-28 杭州跨视科技有限公司 为模型训练确定图像样本集的方法、装置和电子设备
CN111475496A (zh) * 2020-03-26 2020-07-31 深圳先进技术研究院 基于多条件约束的时间序列数据生成方法、装置及介质
CN111581488A (zh) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 一种数据处理方法及装置、电子设备和存储介质
CN111611386A (zh) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 文本分类方法和装置
CN111709475A (zh) * 2020-06-16 2020-09-25 全球能源互联网研究院有限公司 一种基于N-grams的多标签分类方法及装置
CN111797881A (zh) * 2019-07-30 2020-10-20 华为技术有限公司 图像分类方法及装置
CN111860572A (zh) * 2020-06-04 2020-10-30 北京百度网讯科技有限公司 数据集蒸馏方法、装置、电子设备及存储介质
CN111898707A (zh) * 2020-08-24 2020-11-06 鼎富智能科技有限公司 模型训练方法、文本分类方法、电子设备及存储介质
CN111916144A (zh) * 2020-07-27 2020-11-10 西安电子科技大学 基于自注意力神经网络和粗化算法的蛋白质分类方法
CN111914885A (zh) * 2020-06-19 2020-11-10 合肥工业大学 基于深度学习的多任务人格预测方法和系统
CN111931809A (zh) * 2020-06-29 2020-11-13 北京大米科技有限公司 数据的处理方法、装置、存储介质及电子设备
CN112016040A (zh) * 2020-02-06 2020-12-01 李迅 一种权重矩阵的构建方法、装置、设备及存储介质
CN112069319A (zh) * 2020-09-10 2020-12-11 杭州中奥科技有限公司 文本抽取方法、装置、计算机设备和可读存储介质
US10878296B2 (en) * 2018-04-12 2020-12-29 Discovery Communications, Llc Feature extraction and machine learning for automated metadata analysis
CN112149705A (zh) * 2019-06-28 2020-12-29 京东数字科技控股有限公司 分类模型的训练方法、系统、计算机设备及存储介质
CN112149692A (zh) * 2020-10-16 2020-12-29 腾讯科技(深圳)有限公司 基于人工智能的视觉关系识别方法、装置及电子设备
CN112183757A (zh) * 2019-07-04 2021-01-05 创新先进技术有限公司 模型训练方法、装置及系统
CN112182214A (zh) * 2020-09-27 2021-01-05 中国建设银行股份有限公司 一种数据分类方法、装置、设备及介质
CN112215795A (zh) * 2020-09-02 2021-01-12 苏州超集信息科技有限公司 一种基于深度学习的服务器部件智能检测方法
CN112307133A (zh) * 2020-10-29 2021-02-02 平安普惠企业管理有限公司 安全防护方法、装置、计算机设备及存储介质
CN112434722A (zh) * 2020-10-23 2021-03-02 浙江智慧视频安防创新中心有限公司 基于类别相似度的标签平滑计算的方法、装置、电子设备及介质
CN112529029A (zh) * 2019-09-18 2021-03-19 华为技术有限公司 信息处理方法、神经网络的训练方法、装置及存储介质
CN112560966A (zh) * 2020-12-18 2021-03-26 西安电子科技大学 基于散射图卷积网络的极化sar图像分类方法、介质及设备
CN112579746A (zh) * 2019-09-29 2021-03-30 京东数字科技控股有限公司 获取文本对应的行为信息的方法和装置
CN112668509A (zh) * 2020-12-31 2021-04-16 深圳云天励飞技术股份有限公司 社交关系识别模型的训练方法、识别方法及相关设备
CN112825144A (zh) * 2019-11-20 2021-05-21 深圳云天励飞技术有限公司 一种图片的标注方法、装置、电子设备及存储介质
CN112884159A (zh) * 2019-11-30 2021-06-01 华为技术有限公司 模型更新系统、模型更新方法及相关设备
CN112948937A (zh) * 2021-03-12 2021-06-11 中建西部建设贵州有限公司 一种混凝土强度智能预判断方法和装置
CN112994701A (zh) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 数据压缩方法、装置、电子设备及计算机可读介质
CN113010500A (zh) * 2019-12-18 2021-06-22 中国电信股份有限公司 用于dpi数据的处理方法和处理系统
CN113033318A (zh) * 2021-03-01 2021-06-25 深圳大学 人体动作的检测方法、装置及计算机可读存储介质
CN113095210A (zh) * 2021-04-08 2021-07-09 北京一起教育科技有限责任公司 一种练习册页面检测的方法、装置及电子设备
CN113095364A (zh) * 2021-03-12 2021-07-09 西安交通大学 利用卷积神经网络的高铁地震事件提取方法、介质及设备
CN113157788A (zh) * 2021-04-13 2021-07-23 福州外语外贸学院 大数据挖掘方法及系统
CN113469249A (zh) * 2021-06-30 2021-10-01 阿波罗智联(北京)科技有限公司 图像分类模型训练方法、分类方法、路侧设备和云控平台
CN113516239A (zh) * 2021-04-16 2021-10-19 Oppo广东移动通信有限公司 模型训练方法、装置、存储介质及电子设备
CN113657087A (zh) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 信息的匹配方法及装置
CN113821664A (zh) * 2021-08-30 2021-12-21 湖南军芃科技股份有限公司 一种基于依据直方统计频率的图像分类方法、系统、终端及可读存储介质
CN113837394A (zh) * 2021-09-03 2021-12-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 多特征视图数据标签预测方法、系统与可读存储介质
CN114648635A (zh) * 2022-03-15 2022-06-21 安徽工业大学 一种融合标签间强相关性的多标签图像分类方法
TWI769753B (zh) * 2020-04-01 2022-07-01 大陸商支付寶(杭州)信息技術有限公司 保護資料隱私的圖片分類方法及裝置
CN115481746A (zh) * 2021-06-15 2022-12-16 华为技术有限公司 模型训练方法及相关系统、存储介质
CN115550014A (zh) * 2022-09-22 2022-12-30 中国电信股份有限公司 应用程序防护方法及相关设备
CN116070120A (zh) * 2023-04-06 2023-05-05 湖南归途信息科技有限公司 一种多标签时序电生理信号的自动识别方法及系统
US11797372B2 (en) 2020-03-26 2023-10-24 Shenzhen Institutes Of Advanced Technology Method and apparatus for generating time series data based on multi-condition constraints, and medium
CN117076994A (zh) * 2023-10-18 2023-11-17 清华大学深圳国际研究生院 一种多通道生理时间序列分类方法
CN117312865A (zh) * 2023-11-30 2023-12-29 山东理工职业学院 基于非线性动态优化的数据分类模型的构建方法及装置
CN112994701B (zh) * 2019-12-02 2024-05-03 阿里巴巴集团控股有限公司 数据压缩方法、装置、电子设备及计算机可读介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353076B (zh) * 2020-02-21 2023-10-10 华为云计算技术有限公司 训练跨模态检索模型的方法、跨模态检索的方法和相关装置
CN111368976B (zh) * 2020-02-27 2022-09-02 杭州国芯科技股份有限公司 基于神经网络特征识别的数据压缩方法
CN111652315B (zh) * 2020-06-04 2023-06-02 广州虎牙科技有限公司 模型训练、对象分类方法和装置、电子设备及存储介质
CN111737520B (zh) * 2020-06-22 2023-07-25 Oppo广东移动通信有限公司 一种视频分类方法、视频分类装置、电子设备及存储介质
CN111898703B (zh) * 2020-08-14 2023-11-10 腾讯科技(深圳)有限公司 多标签视频分类方法、模型训练方法、装置及介质
CN112115997B (zh) * 2020-09-11 2022-12-02 苏州浪潮智能科技有限公司 一种物体识别模型的训练方法、系统及装置
CN112308299B (zh) * 2020-10-19 2024-04-19 新奥数能科技有限公司 用于电力系统负荷预测模型的样本数据提取方法和装置
CN112633419B (zh) * 2021-03-09 2021-07-06 浙江宇视科技有限公司 小样本学习方法、装置、电子设备和存储介质
CN113139433A (zh) * 2021-03-29 2021-07-20 西安天和防务技术股份有限公司 确定波达方向的方法和装置
CN114936631B (zh) * 2021-04-26 2023-06-09 华为技术有限公司 一种模型处理方法及装置
CN117217288B (zh) * 2023-09-21 2024-04-05 摩尔线程智能科技(北京)有限责任公司 大模型的微调方法、装置、电子设备和存储介质
CN117078359B (zh) * 2023-10-16 2024-01-12 山东大学 基于用户群分类的产品推荐方法、系统、设备及介质
CN117274726B (zh) * 2023-11-23 2024-02-23 南京信息工程大学 一种基于多视角补标签的图片分类方法与系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120093396A1 (en) * 2010-10-13 2012-04-19 Shengyang Dai Digital image analysis utilizing multiple human labels
CN105825502A (zh) * 2016-03-12 2016-08-03 浙江大学 一种基于显著性指导的词典学习的弱监督图像解析方法
CN107292322A (zh) * 2016-03-31 2017-10-24 华为技术有限公司 一种图像分类方法、深度学习模型及计算机系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8805845B1 (en) * 2013-07-31 2014-08-12 LinedIn Corporation Framework for large-scale multi-label classification
US10325220B2 (en) * 2014-11-17 2019-06-18 Oath Inc. System and method for large-scale multi-label learning using incomplete label assignments
CN104899596B (zh) * 2015-03-16 2018-09-14 景德镇陶瓷大学 一种多标签分类方法及其装置
CN105320967A (zh) * 2015-11-04 2016-02-10 中科院成都信息技术股份有限公司 基于标签相关性的多标签AdaBoost集成方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120093396A1 (en) * 2010-10-13 2012-04-19 Shengyang Dai Digital image analysis utilizing multiple human labels
CN105825502A (zh) * 2016-03-12 2016-08-03 浙江大学 一种基于显著性指导的词典学习的弱监督图像解析方法
CN107292322A (zh) * 2016-03-31 2017-10-24 华为技术有限公司 一种图像分类方法、深度学习模型及计算机系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG, ZHEN: "Learning Label Correlations for Multi-label Classification", INFORMATION & TECHNOLOGY, CHINA MASTER'S THESES FULL-TEXT DATABAS E, 15 September 2015 (2015-09-15), pages 1140 - 75, XP55613693, ISSN: 1674-0246 *
YAO, HONGGE ET AL.: "Image Feature Extraction Based on Wavelet Analysis and BP Neural Network", JOURNAL OF XI'AN TECHNOLOGICAL UNIVERSITY, vol. 28, no. 6, 31 December 2008 (2008-12-31), ISSN: 1673-9965 *

Cited By (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10878296B2 (en) * 2018-04-12 2020-12-29 Discovery Communications, Llc Feature extraction and machine learning for automated metadata analysis
CN112149705A (zh) * 2019-06-28 2020-12-29 京东数字科技控股有限公司 分类模型的训练方法、系统、计算机设备及存储介质
CN112183757B (zh) * 2019-07-04 2023-10-27 创新先进技术有限公司 模型训练方法、装置及系统
CN112183757A (zh) * 2019-07-04 2021-01-05 创新先进技术有限公司 模型训练方法、装置及系统
CN111797881A (zh) * 2019-07-30 2020-10-20 华为技术有限公司 图像分类方法及装置
CN110659667A (zh) * 2019-08-14 2020-01-07 平安科技(深圳)有限公司 图片分类模型训练方法、系统和计算机设备
CN110688893A (zh) * 2019-08-22 2020-01-14 成都通甲优博科技有限责任公司 佩戴安全帽的检测方法、模型训练方法及相关装置
CN110569764B (zh) * 2019-08-28 2023-12-22 北京工业大学 一种基于卷积神经网络的手机型号识别方法
CN110569764A (zh) * 2019-08-28 2019-12-13 北京工业大学 一种基于卷积神经网络的手机型号识别方法
CN112529029A (zh) * 2019-09-18 2021-03-19 华为技术有限公司 信息处理方法、神经网络的训练方法、装置及存储介质
CN111027582A (zh) * 2019-09-20 2020-04-17 哈尔滨理工大学 基于低秩图学习的半监督特征子空间学习方法及装置
CN112579746A (zh) * 2019-09-29 2021-03-30 京东数字科技控股有限公司 获取文本对应的行为信息的方法和装置
CN110765935A (zh) * 2019-10-22 2020-02-07 上海眼控科技股份有限公司 图像处理方法、装置、计算机设备及可读存储介质
CN112825144A (zh) * 2019-11-20 2021-05-21 深圳云天励飞技术有限公司 一种图片的标注方法、装置、电子设备及存储介质
CN110929785B (zh) * 2019-11-21 2023-12-05 中国科学院深圳先进技术研究院 数据分类方法、装置、终端设备及可读存储介质
CN110929785A (zh) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 数据分类方法、装置、终端设备及可读存储介质
CN112884159A (zh) * 2019-11-30 2021-06-01 华为技术有限公司 模型更新系统、模型更新方法及相关设备
CN112994701A (zh) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 数据压缩方法、装置、电子设备及计算机可读介质
CN112994701B (zh) * 2019-12-02 2024-05-03 阿里巴巴集团控股有限公司 数据压缩方法、装置、电子设备及计算机可读介质
CN113010500A (zh) * 2019-12-18 2021-06-22 中国电信股份有限公司 用于dpi数据的处理方法和处理系统
CN111291618B (zh) * 2020-01-13 2024-01-09 腾讯科技(深圳)有限公司 标注方法、装置、服务器和存储介质
CN111291618A (zh) * 2020-01-13 2020-06-16 腾讯科技(深圳)有限公司 标注方法、装置、服务器和存储介质
CN111275089B (zh) * 2020-01-16 2024-03-05 北京小米松果电子有限公司 一种分类模型训练方法及装置、存储介质
CN111275089A (zh) * 2020-01-16 2020-06-12 北京松果电子有限公司 一种分类模型训练方法及装置、存储介质
CN111339362A (zh) * 2020-02-05 2020-06-26 天津大学 一种基于深度协同矩阵分解的短视频多标签分类方法
CN111339362B (zh) * 2020-02-05 2023-07-18 天津大学 一种基于深度协同矩阵分解的短视频多标签分类方法
CN112016040A (zh) * 2020-02-06 2020-12-01 李迅 一种权重矩阵的构建方法、装置、设备及存储介质
CN111340131A (zh) * 2020-03-09 2020-06-26 北京字节跳动网络技术有限公司 图像的标注方法、装置、可读介质和电子设备
CN111461191A (zh) * 2020-03-25 2020-07-28 杭州跨视科技有限公司 为模型训练确定图像样本集的方法、装置和电子设备
CN111461191B (zh) * 2020-03-25 2024-01-23 杭州跨视科技有限公司 为模型训练确定图像样本集的方法、装置和电子设备
US11797372B2 (en) 2020-03-26 2023-10-24 Shenzhen Institutes Of Advanced Technology Method and apparatus for generating time series data based on multi-condition constraints, and medium
CN111475496A (zh) * 2020-03-26 2020-07-31 深圳先进技术研究院 基于多条件约束的时间序列数据生成方法、装置及介质
CN111475496B (zh) * 2020-03-26 2023-07-21 深圳先进技术研究院 基于多条件约束的时间序列数据生成方法、装置及介质
TWI769753B (zh) * 2020-04-01 2022-07-01 大陸商支付寶(杭州)信息技術有限公司 保護資料隱私的圖片分類方法及裝置
CN111581488B (zh) * 2020-05-14 2023-08-04 上海商汤智能科技有限公司 一种数据处理方法及装置、电子设备和存储介质
CN111581488A (zh) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 一种数据处理方法及装置、电子设备和存储介质
CN111611386B (zh) * 2020-05-28 2024-03-29 北京明略昭辉科技有限公司 文本分类方法和装置
CN111611386A (zh) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 文本分类方法和装置
CN111860572A (zh) * 2020-06-04 2020-10-30 北京百度网讯科技有限公司 数据集蒸馏方法、装置、电子设备及存储介质
CN111860572B (zh) * 2020-06-04 2024-01-26 北京百度网讯科技有限公司 数据集蒸馏方法、装置、电子设备及存储介质
CN111709475A (zh) * 2020-06-16 2020-09-25 全球能源互联网研究院有限公司 一种基于N-grams的多标签分类方法及装置
CN111709475B (zh) * 2020-06-16 2024-03-15 全球能源互联网研究院有限公司 一种基于N-grams的多标签分类方法及装置
CN111914885B (zh) * 2020-06-19 2024-04-26 合肥工业大学 基于深度学习的多任务人格预测方法和系统
CN111914885A (zh) * 2020-06-19 2020-11-10 合肥工业大学 基于深度学习的多任务人格预测方法和系统
CN111931809A (zh) * 2020-06-29 2020-11-13 北京大米科技有限公司 数据的处理方法、装置、存储介质及电子设备
CN111916144B (zh) * 2020-07-27 2024-02-09 西安电子科技大学 基于自注意力神经网络和粗化算法的蛋白质分类方法
CN111916144A (zh) * 2020-07-27 2020-11-10 西安电子科技大学 基于自注意力神经网络和粗化算法的蛋白质分类方法
CN111898707A (zh) * 2020-08-24 2020-11-06 鼎富智能科技有限公司 模型训练方法、文本分类方法、电子设备及存储介质
CN112215795B (zh) * 2020-09-02 2024-04-09 苏州超集信息科技有限公司 一种基于深度学习的服务器部件智能检测方法
CN112215795A (zh) * 2020-09-02 2021-01-12 苏州超集信息科技有限公司 一种基于深度学习的服务器部件智能检测方法
CN112069319A (zh) * 2020-09-10 2020-12-11 杭州中奥科技有限公司 文本抽取方法、装置、计算机设备和可读存储介质
CN112069319B (zh) * 2020-09-10 2024-03-22 杭州中奥科技有限公司 文本抽取方法、装置、计算机设备和可读存储介质
CN112182214B (zh) * 2020-09-27 2024-03-19 中国建设银行股份有限公司 一种数据分类方法、装置、设备及介质
CN112182214A (zh) * 2020-09-27 2021-01-05 中国建设银行股份有限公司 一种数据分类方法、装置、设备及介质
CN112149692A (zh) * 2020-10-16 2020-12-29 腾讯科技(深圳)有限公司 基于人工智能的视觉关系识别方法、装置及电子设备
CN112149692B (zh) * 2020-10-16 2024-03-05 腾讯科技(深圳)有限公司 基于人工智能的视觉关系识别方法、装置及电子设备
CN112434722B (zh) * 2020-10-23 2024-03-19 浙江智慧视频安防创新中心有限公司 基于类别相似度的标签平滑计算的方法、装置、电子设备及介质
CN112434722A (zh) * 2020-10-23 2021-03-02 浙江智慧视频安防创新中心有限公司 基于类别相似度的标签平滑计算的方法、装置、电子设备及介质
CN112307133A (zh) * 2020-10-29 2021-02-02 平安普惠企业管理有限公司 安全防护方法、装置、计算机设备及存储介质
CN112560966B (zh) * 2020-12-18 2023-09-15 西安电子科技大学 基于散射图卷积网络的极化sar图像分类方法、介质及设备
CN112560966A (zh) * 2020-12-18 2021-03-26 西安电子科技大学 基于散射图卷积网络的极化sar图像分类方法、介质及设备
CN112668509A (zh) * 2020-12-31 2021-04-16 深圳云天励飞技术股份有限公司 社交关系识别模型的训练方法、识别方法及相关设备
CN112668509B (zh) * 2020-12-31 2024-04-02 深圳云天励飞技术股份有限公司 社交关系识别模型的训练方法、识别方法及相关设备
CN113033318B (zh) * 2021-03-01 2023-09-26 深圳大学 人体动作的检测方法、装置及计算机可读存储介质
CN113033318A (zh) * 2021-03-01 2021-06-25 深圳大学 人体动作的检测方法、装置及计算机可读存储介质
CN113095364B (zh) * 2021-03-12 2023-12-19 西安交通大学 利用卷积神经网络的高铁地震事件提取方法、介质及设备
CN112948937B (zh) * 2021-03-12 2024-03-01 中建西部建设贵州有限公司 一种混凝土强度智能预判断方法和装置
CN112948937A (zh) * 2021-03-12 2021-06-11 中建西部建设贵州有限公司 一种混凝土强度智能预判断方法和装置
CN113095364A (zh) * 2021-03-12 2021-07-09 西安交通大学 利用卷积神经网络的高铁地震事件提取方法、介质及设备
CN113095210A (zh) * 2021-04-08 2021-07-09 北京一起教育科技有限责任公司 一种练习册页面检测的方法、装置及电子设备
CN113157788A (zh) * 2021-04-13 2021-07-23 福州外语外贸学院 大数据挖掘方法及系统
CN113157788B (zh) * 2021-04-13 2024-02-13 福州外语外贸学院 大数据挖掘方法及系统
CN113516239A (zh) * 2021-04-16 2021-10-19 Oppo广东移动通信有限公司 模型训练方法、装置、存储介质及电子设备
CN115481746B (zh) * 2021-06-15 2023-09-01 华为技术有限公司 模型训练方法及相关系统、存储介质
CN115481746A (zh) * 2021-06-15 2022-12-16 华为技术有限公司 模型训练方法及相关系统、存储介质
CN113469249A (zh) * 2021-06-30 2021-10-01 阿波罗智联(北京)科技有限公司 图像分类模型训练方法、分类方法、路侧设备和云控平台
CN113469249B (zh) * 2021-06-30 2024-04-09 阿波罗智联(北京)科技有限公司 图像分类模型训练方法、分类方法、路侧设备和云控平台
CN113657087A (zh) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 信息的匹配方法及装置
CN113657087B (zh) * 2021-08-25 2023-12-15 平安科技(深圳)有限公司 信息的匹配方法及装置
CN113821664A (zh) * 2021-08-30 2021-12-21 湖南军芃科技股份有限公司 一种基于依据直方统计频率的图像分类方法、系统、终端及可读存储介质
CN113837394A (zh) * 2021-09-03 2021-12-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 多特征视图数据标签预测方法、系统与可读存储介质
CN114648635A (zh) * 2022-03-15 2022-06-21 安徽工业大学 一种融合标签间强相关性的多标签图像分类方法
CN115550014B (zh) * 2022-09-22 2024-03-19 中国电信股份有限公司 应用程序防护方法及相关设备
CN115550014A (zh) * 2022-09-22 2022-12-30 中国电信股份有限公司 应用程序防护方法及相关设备
CN116070120A (zh) * 2023-04-06 2023-05-05 湖南归途信息科技有限公司 一种多标签时序电生理信号的自动识别方法及系统
CN117076994B (zh) * 2023-10-18 2024-01-26 清华大学深圳国际研究生院 一种多通道生理时间序列分类方法
CN117076994A (zh) * 2023-10-18 2023-11-17 清华大学深圳国际研究生院 一种多通道生理时间序列分类方法
CN117312865B (zh) * 2023-11-30 2024-02-27 山东理工职业学院 基于非线性动态优化的数据分类模型的构建方法及装置
CN117312865A (zh) * 2023-11-30 2023-12-29 山东理工职业学院 基于非线性动态优化的数据分类模型的构建方法及装置

Also Published As

Publication number Publication date
CN109840530A (zh) 2019-06-04

Similar Documents

Publication Publication Date Title
WO2019100724A1 (zh) 训练多标签分类模型的方法和装置
WO2019100723A1 (zh) 训练多标签分类模型的方法和装置
US11551333B2 (en) Image reconstruction method and device
WO2020228446A1 (zh) 模型训练方法、装置、终端及存储介质
CN109902546B (zh) 人脸识别方法、装置及计算机可读介质
Kao et al. Visual aesthetic quality assessment with a regression model
CN111950638B (zh) 基于模型蒸馏的图像分类方法、装置和电子设备
US20220237944A1 (en) Methods and systems for face alignment
EP3388978B1 (en) Image classification method, electronic device, and storage medium
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
CN111523621A (zh) 图像识别方法、装置、计算机设备和存储介质
US10055673B2 (en) Method and device for processing an image of pixels, corresponding computer program product and computer-readable medium
CN112651438A (zh) 多类别图像的分类方法、装置、终端设备和存储介质
CN105631398A (zh) 识别对象的方法和设备以及训练识别器的方法和设备
CN110222718B (zh) 图像处理的方法及装置
US20230186100A1 (en) Neural Network Model for Image Segmentation
JP6107531B2 (ja) 特徴抽出プログラム及び情報処理装置
CN110968734A (zh) 一种基于深度度量学习的行人重识别方法及装置
CN113807399A (zh) 一种神经网络训练方法、检测方法以及装置
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
Huo et al. Semisupervised learning based on a novel iterative optimization model for saliency detection
CN113705596A (zh) 图像识别方法、装置、计算机设备和存储介质
CN112749737A (zh) 图像分类方法及装置、电子设备、存储介质
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
CN113449548A (zh) 更新物体识别模型的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18881043

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18881043

Country of ref document: EP

Kind code of ref document: A1