WO2020073951A1 - 用于图像识别的模型的训练方法、装置、网络设备和存储介质 - Google Patents

用于图像识别的模型的训练方法、装置、网络设备和存储介质 Download PDF

Info

Publication number
WO2020073951A1
WO2020073951A1 PCT/CN2019/110361 CN2019110361W WO2020073951A1 WO 2020073951 A1 WO2020073951 A1 WO 2020073951A1 CN 2019110361 W CN2019110361 W CN 2019110361W WO 2020073951 A1 WO2020073951 A1 WO 2020073951A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
image
label
model
target
Prior art date
Application number
PCT/CN2019/110361
Other languages
English (en)
French (fr)
Inventor
陈卫东
吴保元
刘威
樊艳波
张勇
张潼
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020073951A1 publication Critical patent/WO2020073951A1/zh
Priority to US17/083,180 priority Critical patent/US20210042580A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the field of communication technology, and in particular to a method, device, network device, and storage medium for training a model for image recognition.
  • the deep neural network model currently trained on the large-scale multi-label image training set has better visual performance.
  • the quality of the large-scale multi-label image data set determines the visual performance and accuracy of the deep neural network model .
  • the currently published large-scale multi-label image dataset ML-Images can include 11166 labels and 18019881 training images; the industry generally trains deep neural network models on this image dataset.
  • Embodiments of the present application provide a method for training a model for image recognition, which is executed by a network device and includes:
  • the multi-label image training set includes multiple training images, each training image is labeled with multiple sample labels;
  • the positive label loss in the cross-entropy loss function is set with a weight, and the weight is greater than 1 to cause the loss of the positive label Greater than the loss of the negative label;
  • the predicted label and the sample label of each target training image are converged according to the cross-entropy loss function, and the parameters of the model are updated to obtain the trained model.
  • An embodiment of the present application also provides a model training device for image recognition, including:
  • An image acquisition unit configured to acquire a multi-label image training set, the multi-label image training set includes multiple training images, and each training image is labeled with multiple sample labels;
  • a prediction unit configured to use the model to perform label prediction on each of the target training images to obtain multiple predicted labels for each target training image
  • a function obtaining unit configured to obtain a cross-entropy loss function corresponding to multiple sample labels of each of the target training images, the positive label loss in the cross-entropy loss function is set with a weight, and the weight is greater than 1 to make The loss of the positive label is greater than the loss of the negative label;
  • the training unit is configured to converge the predicted label and the sample label of each target training image according to the cross-entropy loss function, update the parameters of the model, and obtain the trained model.
  • An embodiment of the present application provides a network device, including a processor; a memory connected to the processor; the memory stores machine-readable instructions, and the machine-readable instructions may be executed by the processor. The method.
  • an embodiment of the present application further provides a storage medium that stores a plurality of instructions, and the instructions are suitable for the processor to load to perform the method described in the embodiment of the present application.
  • FIG. 1A is a schematic diagram of a scene of a model training method provided by an embodiment of the present application.
  • FIG. 1B is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • 1C is a flowchart of preprocessing the target training image provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a first residual block in each convolution stage of a residual network provided by an embodiment of the present application;
  • FIG. 3 is a schematic structural diagram of a residual network provided by an embodiment of the present application.
  • FIG. 4 is another schematic flowchart of a model training method provided by an embodiment of the present application.
  • 5A is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • 5B is another schematic structural diagram of a model training device provided by an embodiment of the present application.
  • 5C is another schematic structural diagram of a model training device provided by an embodiment of the present application.
  • 5D is another schematic structural diagram of a model training device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • AI Artificial Intelligence
  • Artificial intelligence is a theory, method, technology, and application system that uses digital computers or digital computer-controlled machines to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machine has the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject, covering a wide range of fields, both hardware-level technology and software-level technology.
  • Basic technologies of artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation / interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology and machine learning / deep learning.
  • large-scale multi-label image data sets have a large category imbalance problem, for example, the imbalance of positive and negative labels in the category, for example, for a training image in the dataset, generally the negative label (ie the image).
  • category imbalance for example, the imbalance of positive and negative labels in the category
  • the negative label ie the image
  • the number of categories that do not exist in is much larger than the positive labels (the categories that exist in the image). Due to the problem of category imbalance in large-scale multi-label image data sets, the accuracy and visual performance of deep network models trained on large-scale multi-label image data sets are low.
  • Embodiments of the present application provide a model training method, device, and storage medium for image recognition.
  • the model training device can be specifically integrated in network equipment such as terminals or servers.
  • the network equipment can obtain a multi-label image training set.
  • the network equipment can search for multi-label images through an image search engine To obtain a multi-label image training set (multi-label image training set includes images labeled with multiple tags); then, select multiple training images from the multi-label image training set as the target training image for training the current model;
  • Each target training image trains the current model to obtain multiple predicted labels for each target training image; obtain the cross-entropy loss function corresponding to multiple sample labels for each of the target training images.
  • the weight of the positive label loss is set, and the weight is greater than 1 so that the loss weight of the positive label is greater than the loss weight of the negative label; according to the cross-entropy loss function, the predicted label and the sample label of each target training image are converged to update the model's Parameters to get the trained model.
  • the current model may be a deep neural network model.
  • the current multiple training images are selected from the multi-label image training set as the target training image for training the current model training, when the model is trained, the model is one by one to learn the training image of. Therefore, the above sample label may be a label of a target training image currently being learned by the model. For example, if the current target training image has labels of "person” and "dog", the above sample labels are "person” and "dog” dog".
  • the aforementioned cross-entropy loss function is a way to measure the predicted value and actual value of an artificial neural network (ANN: Artificial Neural).
  • ANN Artificial Neural
  • Cross entropy describes the distance between two probability distributions, that is, the smaller the value of cross entropy, the closer the two probability distributions. Therefore, the distance between the predicted value and the actual value can be judged by the value of cross entropy.
  • a weight is added to the positive label loss in the cross-entropy loss function, and the weight is greater than 1, its role is to make the loss of the positive label greater than the loss of the negative label, that is, the role is to set the score of the positive label
  • the error cost is greater than the error cost of the negative label, where the positive label can be the same label as a sample label of the training image, such as the object category present in the image, and the negative label can be a label that is different from the sample label of the training image For example, an object category that does not exist in the image.
  • the model may be trained in batch training, that is, multiple training images are used to train the model at a time.
  • the network device can select the target training image for the current batch training from the multi-label image training set; and train the deep neural network model according to the target training image.
  • the network device can also adaptively attenuate cross-entropy loss based on the continuous occurrence of labeled images in vector batch training.
  • the cross-entropy loss reduction function also includes: cross-entropy loss attenuation parameters; at this time, the network device can obtain the overall type of the first training image of the sample label in the adjacent batch training and the number of consecutive times the training image of the sample label appears; Obtain the overall type of the second training image of the sample label in the current batch training; update the cross-entropy loss attenuation parameter according to the overall type of the first training image, the overall type and the number of times of the second training image to obtain the updated cross-entropy loss function; The updated cross-entropy loss function converges the predicted label and the sample label of the target training image.
  • the model training device may be specifically integrated in network equipment such as a terminal or a server.
  • a model training method is provided.
  • the deep neural network model obtained by the model training method can be used for image recognition tasks such as visual representation tasks, such as article image quality evaluation and recommendation, in-game object recognition Wait.
  • This method can be executed by a network device, as shown in FIG. 1B, the specific process of the model training method can be as follows:
  • the multi-label image training set includes multiple training images, and each training image is labeled with multiple sample labels.
  • the multi-label image training set may include at least one image labeled with multiple labels (such as multiple object categories), and the image may be referred to as a multi-label image.
  • the multi-label image training set may include multiple multi-label images and cover multiple object categories.
  • the multi-label image training set can select the large-scale multi-label image training set that has been published in the industry; for example, it can be Open Images 3 v multi-label image training set (including 9M images, covering 6K object categories), or it can be ML-Images large-scale multi-label image training set, which covers a total of 11,166 labels, 18019881 images.
  • a multi-label image training set may be formed by searching an open multi-label image through an image search engine or the like. For another example, you can directly download or pull the published multi-label image training set, such as ML-Images.
  • the label content of the image may include the category to which the image belongs (may be the object category), for example, may include the category number and / or category name.
  • the label may include the category number / m / 056mk, the corresponding category name metropolis (ie metropolis, metropolis).
  • one or more training images can be selected from the multi-label image training set for model training.
  • multiple batches of training images can be used to train the model.
  • different batches of target training images can be used to achieve batch training, that is, multiple images can be selected for multi-label image training set for model training Target training image.
  • the step "select multiple training images from the multi-label image training set" may include: selecting the multiple training images from the multi-label image training set as the current target training image for batch training.
  • 100 training images can be selected each time as the target training image to batch train the model.
  • each batch of training can be the same, for example, they are all 100, or they can be different. For example, 100 training images are selected for the first time and 200 images are selected for the second time. Training images.
  • the training image in order to improve the efficiency and accuracy of model training, may be pre-processed before the model training; for example, before using the deep neural network model to predict the target training image, it may also include : Pre-process the target training image.
  • FIG. 1C is a flowchart of preprocessing the target training image according to an embodiment of the present application. As shown in FIG. 1C, the preprocessing process may include the following steps:
  • an area image that occupies a predetermined proportion of the target training image and has an aspect ratio of a predetermined ratio can be intercepted from the target training image.
  • the predetermined ratio may be a value randomly selected from a predetermined ratio interval, and / or the predetermined ratio may be a value randomly selected from a predetermined aspect ratio interval, for example, the predetermined ratio interval may be [0.7 , 1.0], the predetermined aspect ratio interval can be [3/4, 4/3].
  • the predetermined size can be set according to actual needs, for example, the extracted area images are uniformly reduced to 224 * 224 size.
  • the random disturbance processing may include:
  • the random angle is a value randomly selected from a predetermined angle interval; and / or
  • the pixel value of the scaled image is scaled to the preset pixel value range.
  • the image attributes include saturation, contrast, brightness and chroma.
  • the preset pixel value range can be [-1,1]. Each processing probability can be set according to actual needs.
  • the predetermined angle interval can be set according to actual needs, for example, it can be between [-45,45] degrees.
  • the image may be an RGB image.
  • the pretreatment process can be as follows:
  • the image is rotated at a random angle with the first processing probability of 0.25, for example, and the random angle is randomly determined between [-45,45] degrees.
  • S103 Use the model to perform label prediction on each target training image to obtain multiple predicted labels for each target training image.
  • the model may be a deep neural network model.
  • the deep neural network model may be used to classify each target training image to obtain a predicted category of each target training image, that is, a predicted label.
  • “using the model to perform label prediction on each target training image” may include: using the model to label each pre-processed training image prediction.
  • the deep neural network model may include an output layer, and the output layer may include multiple output functions, and each output function is used to output a prediction result of a corresponding label such as a category, such as a prediction label, a prediction probability corresponding to the prediction label, and so on.
  • the output layer of the deep network model may include m output functions such as the Sigmoid function, where m is the number of labels corresponding to the multi-label image training set, for example, when the label is a category, m is the number of categories of the multi-label image training set, The m is a positive integer.
  • the output of each output function such as the Sigmoid function may include that a given training image belongs to a certain label such as object category, and / or probability value, that is, the predicted probability.
  • the deep neural network model can be a model based on deep learning networks such as convolutional neural networks, for example, it can be a residual neural network (ResNet: Residual Neural Network) model, ResNet is a neural network proposed by He Kaiming et al., ResNet The structure can accelerate the training of ultra-deep neural networks extremely quickly, and the accuracy of the model has also been greatly improved.
  • ResNet Residual Neural Network
  • the size of the convolution kernel of the first convolution layer in the convolution branch is 1 ⁇ 1
  • the convolution step size is 2
  • the The size of the convolution kernel is 3 ⁇ 3 and the convolution step size is 1
  • the residual network contains multiple consecutive residual blocks (Residual Blocks), each residual block contains a convolution branch and a residual branch, the convolution kernel size of the first convolution layer in the convolution branch is less than The size of the convolution kernel of the second convolution layer after the first convolution layer.
  • the convolution step of the second convolution layer is larger than the convolution step of the first convolution layer and smaller than the convolution of the second convolution layer.
  • Nuclear width the residual branch in the residual block is directed from the input of the convolution branch to the output of the convolution branch.
  • the residual network may be a deep residual network.
  • the residual network also includes an initial convolutional layer before multiple residual blocks, and the output of the initial convolutional layer is the first of the multiple residual blocks.
  • the input of the residual block since the second convolutional layer in the residual block is already capable of downsampling, the pooling layer before the residual block in the original residual network can be removed, simplifying the structure of the residual network.
  • multiple residual blocks in the residual network constitute multiple convolution stages
  • the residual branch included in the first residual block in each convolution stage contains A batch of normalized processing layers and a target convolution layer connected next.
  • the residual branch is an identity map, but if its input and output size are not The same, then you need to map the input and output to the same size through a convolution operation, usually in the first residual block in each convolution stage, you need to pass non-identity mapping (that is, add a convolution Multiply) the residual branch to ensure that the input and output of the residual block are consistent.
  • a BN Batch Normalization, batch normalization
  • the convolutional layer in the residual block can be used to perform the convolution operation, which can not only ensure the downsampling process through the second convolutional layer, but also ensure that no one is skipped.
  • the feature points in turn, can ensure that no loss of the representation capability of the feature network can be caused, thereby ensuring the accuracy of image feature extraction and improving the accuracy of image recognition.
  • the schematic structural diagram of the residual block specifically includes: a convolution branch 201 and a residual branch 202, where the residual branch 202 is directed from the input of the convolution branch 201 to the output of the convolution branch 201.
  • the convolution branch 201 includes a first convolution layer 2011, a second convolution layer 2012, and a third convolution layer 2013, each of the first convolution layer 2011, the second convolution layer 2012, and the third convolution layer 2013
  • the BN layer is set before the convolution layer, and after processing by the BN layer, it is processed by Relu (Rectified Linear Unit).
  • the first convolution layer 2011 has a convolution kernel size of 1 ⁇ 1 and the convolution step size is 1
  • the second convolution layer 2012 has a convolution kernel size of 3 ⁇ 3 and a convolution step size of 2
  • the third The convolution layer 2013 has a convolution kernel size of 1 ⁇ 1 and a convolution step size of 1. Since the second convolutional layer 2012 can both implement downsampling processing and ensure that no feature point is skipped, the residual block in the embodiment of the present application can ensure that no loss of characterization capability of the feature network is caused.
  • the residual branch 202 includes the convolutional layer 2021 and the BN layer set before the convolutional layer, and after being processed by the BN layer, it is processed by the Relu function.
  • the outputs of the convolution branch 201 and the residual branch 202 perform an addition operation at the element level to obtain the output of each residual block.
  • FIG. 3 it is a schematic structural diagram of a residual network according to an embodiment of the present application.
  • the structure includes: an initial convolution layer 301, a convolution stage 302, and a convolution stage 303 which are sequentially connected Convolution stage 304, convolution stage 305, global average pooling layer (Global Average Pool) 306 and fully connected layer (Fully-Connected Layer) 307.
  • the initial convolution layer 301 has a convolution kernel size of 7 ⁇ 7, a convolution step size of 2, and a channel number of 64; a convolution stage 302, a convolution stage 303, a convolution stage 304, and a convolution stage
  • Each convolution stage in 305 contains multiple residual blocks.
  • the number of residual blocks included in different convolution stages may be different.
  • the convolution stage 302 contains 3 residual blocks and the convolution stage.
  • 303 contains 4 residual blocks
  • convolution stage 304 contains 23 residual blocks
  • convolution stage 305 contains 4 residual blocks.
  • the structure of the first residual block in each convolution stage is shown in Figure 2.
  • the residual branches in the other residual blocks are identity maps.
  • the convolution branch is shown in Figure 2
  • the convolution branch 201 is the same.
  • the residual network in the embodiment of the present application removes the maximum pooling layer after the initial convolution layer 301 on the basis of the original residual network, and
  • the down-sampling process is put into the first convolution stage, that is, the convolution stage 302, specifically the second convolution layer 2012 in the first residual block in the convolution stage 302.
  • the down-sampling process is placed in the 3 ⁇ 3 second convolutional layer, thereby ensuring that the down-sampling process will not skip any feature point, and ensure that it will not cause feature network characterization Loss of ability.
  • the BN layer is added to the convolution branch, but also the BN layer is added to the residual branch of the non-identical mapping, so that the offset term can be added before the convolution layer through the BN layer, which can ensure that the most Excellent processing effect.
  • the BN operation is not only performed on the convolution branch of each block, but also on the residual branch of the non-identical mapping.
  • the positive image tends to be smaller than its negative image over the entire data set.
  • the ratio of negative images even reaches 1 to several thousand.
  • a positive image is an image that contains content corresponding to the sample label such as a category
  • a negative image refers to an image that does not contain content corresponding to the sample label such as a category.
  • a positive image refers to an image that includes a dog
  • a negative image refers to an image that does not include a dog.
  • the positive training image is a training image that contains the content corresponding to the sample label in the target training image for model training, that is, the positive training image refers to the label The training image that is the same as the sample label; and the negative training image is used for the training image of the target training image for model training that does not contain the content corresponding to the sample label, that is, the negative training image refers to a label with a different label than the sample label Training images.
  • the positive training image refers to the target training image containing the dog's training image
  • the negative image refers to The training image of the dog is not included in the target training image.
  • step S103 using the model to perform label prediction on the target training image to obtain multiple predicted labels for the target training image.
  • the parameters in the output function corresponding to the sample label are updated according to a preset processing probability to obtain an updated model
  • the parameters of the output function corresponding to the sample label are updated according to a preset processing probability, for example, a processing probability of 0.1.
  • the updated model is used to perform label prediction on each target training image to obtain multiple predicted labels for each target training image.
  • the step "use the model to perform label prediction on each target training image to obtain multiple predicted labels for each target training image” may include:
  • the negative training image without the sample label in each target training image is randomly down-sampled to obtain the down-sampled target training image;
  • the model is used for label prediction of the down-sampled target training image to the model to obtain multiple prediction labels for each target training image.
  • the positive training image is a training image with the same label (eg category) as the sample label
  • the negative training image is a training image without the same label (eg category) as the sample label.
  • the step "random downsampling of negative training images that do not have sample labels in each target training image” may include:
  • the negative training images without sample labels in each target training image are randomly down-sampled.
  • the ratio of positive and negative training images of labels can be the ratio of the number of positive training images of labels (such as categories) and negative training images. Can be set according to actual needs.
  • the negative training images of the sample labels in each target training image may be randomly down-sampled, so that the positive and negative training ratio is not less than the preset positive and negative training image ratio, for example, the preset ratio is not less than 1: 5.
  • the downsampling of negative training images ensures that all categories are trained under an approximate distribution of positive and negative data, which alleviates the difference between categories to a certain extent. Imbalance, improve the accuracy and visual performance of the model.
  • S104 Obtain a cross-entropy loss function corresponding to multiple sample labels of each of the target training images.
  • the positive label loss in the cross-entropy loss function is set with a weight, and the weight is greater than 1 to make the loss of the positive label greater than negative Label loss.
  • the target training image generally has multiple sample labels, and the cross-entropy loss function covers these multiple sample labels.
  • the acquisition timing of the cross-entropy loss function is not limited by the serial number, and the corresponding time sequence position in the model training process can be set according to actual needs For example, after selecting a training image, cross-entropy loss functions corresponding to multiple sample labels can be obtained.
  • the positive label is the same label as the sample label of the training image, for example, when the label is category j, the positive label is the same category as the category j of the training image;
  • the negative label is a label that is different from the sample label of the training image For example, when the label is category j, the negative label is a category that is different from the category j of the training image.
  • the cross-entropy loss function may include a positive label loss and a negative label loss, and the positive label loss and the negative label loss may be obtained based on the label prediction probability of the training image and the sample label.
  • the embodiment of the present application may use the cross-entropy function corresponding to the sample label to converge.
  • the definition of the cross-entropy loss function covering m sample labels can be as follows:
  • W represents the set of trainable parameters of the model.
  • y i ⁇ ⁇ 0,1 ⁇ m represents the given label vector of the i-th training image xi (that is, the sample label group of the i-th training image xi), if the j-th object exists in the image, the yi
  • the j elements are 1, otherwise 0.
  • m is the number of labels, that is, the number of categories, of the multi-label image training set.
  • is the weight parameter of positive label loss, and its value represents the weight of positive label loss.
  • the role of the weighting parameter ⁇ > 1 is to make the loss of the positive label greater than the loss of the negative label, that is, to set the error cost of the positive label (that is, the type of object present in the image) to be greater than the negative label (ie Object categories that do not exist in this image).
  • the reason for this is: a) For image annotation, we value the correct prediction degree of positive labels; b) The number of negative labels is much larger than positive labels, ⁇ > 1 can suppress this imbalance to a certain extent; c) Positive tags are more reliable than negative tags because many of the negative tags are missing positive tags. In practical applications, setting ⁇ to 12 preferentially can suppress the imbalance of positive and negative tags in the category.
  • positive image imbalance between tags such as categories
  • the difference in the number of corresponding positive images in different categories is very large.
  • the proportion of positive images in the entire data set may exceed 10%, while for some rare For small categories, the proportion of positive images may be one thousandth. Therefore, the accuracy and visual expressiveness of the model at the training site will be reduced.
  • the embodiments of the present application can adaptively attenuate the cross-entropy loss of certain labels, such as categories, for example, in the cross-entropy loss function
  • the cross-entropy loss attenuation parameter is added to attenuate the cross-entropy loss of the corresponding category.
  • the cross-entropy loss attenuation parameter can be updated every batch training to adaptively attenuate the cross-entropy loss of the category.
  • the method of the embodiment of the present application may further include:
  • the overall type of the first training image is used to indicate whether there are one or more continuous training images with the same label as the sample label in each target training image of the adjacent batch;
  • the cross-entropy loss attenuation parameter is updated to obtain an updated cross-entropy loss function.
  • the adjacent batch refers to adjacent batch training
  • the current batch refers to the current batch training
  • the overall type of the first training image is positive ; If each training image of each target training image in the adjacent batch training does not have the same label as the sample label, the overall type of the first training image for the sample label is negative.
  • the overall type of the second training image is positive ; If each training image of each target training image of the current batch does not have the same label as the sample label, then for the sample label, the overall type of the second training image is negative.
  • the target training images for model training selected from the multi-label image training set are 10,000, and the sample labels of the currently learned target training images are "human” and "dog".
  • the target training images of adjacent batches are five, and the overall type of the first training image for the sample label "person” is used to indicate the five target training images of adjacent batches.
  • Each training image has and does not have the label of "person", for example, when the training image has the label of "person", its type ID is 1, when the training image does not have the label of "person", its type ID Is 0.
  • the type identification of each training image in the 5 target training images of adjacent batches is 01110, Then for the sample label "person”, the overall type of the first training image is "positive”; if each training image in the 5 target training images of adjacent batches does not have the label of "person", for example, 5 of adjacent batches
  • the type identification of each training image in the target training image is 00000, then for the sample label "person", the overall type of the first training image may be "negative".
  • the overall type of the second training image is used to indicate whether there are one or more continuous training images with the same label as the sample label in each target training image of the current batch, and its determination method is integral with the first training image
  • the types are the same and will not be repeated here.
  • each sample label needs to be counted separately, for example, the sample label is For "people” and “dogs”, it is necessary to count the number of consecutive appearances with the training image of "person” for the sample label "person” and the number of consecutive appearances of the training image with "dog” for the sample label "dog”.
  • the overall training image type may be the overall training image type of a label such as a category in the batch training image, and the overall training image type may be positive or negative.
  • the overall training image type may be the overall training image class corresponding to the object category j in the batch training image.
  • the overall training image type that is, a certain label such as the symbol of the category in the training image for batch training, can be positive or negative.
  • multiple training images that is, batch training images
  • the overall type of the training image of j is positive, that is, the sign of j is positive; if the category of all the training images is not the j category, it can be determined that the overall type of the training image of the object category j is negative, that is, the symbol is negative.
  • the number of consecutive occurrences of the training image of the sample label is the number of consecutive appearances of the training image corresponding to the sample label (non-sample label) in the batch training image, that is, for a certain label, the positive (negative) training image in the batch training image continuously appears
  • the number of times; in, for example, the category of batch training images is j, j + 1, j, j, j, j + 1, j, j, j, j, j, j, then at this time, the category in batch training
  • the number of consecutive occurrences of the positive training image of j is 6.
  • the adjacent batch training is the adjacent batch training of the current batch training, for example, the previous batch training of the current batch training, that is, the last batch.
  • the embodiment of the present application can obtain the overall type of the training image in the current batch of training images (that is, the current batch of training images), such as the overall type of the training image, and the overall type of the training label in the adjacent batch of training images, such as the category, and sample label For example, the number of consecutive occurrences of positive (negative) training images of the category; then, the cross-entropy loss attenuation parameters such as r t j are updated based on the overall image type and frequency.
  • a cross-entropy loss adaptive attenuation parameter also called an adaptive weight parameter for cross-entropy loss
  • t represents the number of consecutive occurrences of positive (negative) samples of category j in the current batch of training images. It can be seen from the formula that the value of the cross-entropy loss adaptive parameter can be obtained through t, that is, the cross-entropy loss adaptive parameter is updated through t.
  • t is related to the overall type of training images of the current batch training and the adjacent batch training; specifically, the step "update the cross-entropy loss attenuation parameter according to the overall type of the first training image, the overall type and the number of times of the second training image , Get the updated crossover loss function ", which can include:
  • the cross-entropy loss attenuation parameter is updated according to the target number of times to obtain the updated cross-up loss function.
  • the target number is the above t, which can be obtained based on the comparison result of the overall image type of the current batch training and the adjacent historical batch training, and the number of consecutive occurrences of the positive (negative) training images of the sample labels in the adjacent historical batch training.
  • the number of consecutive occurrences of positive (negative) training images of sample labels in adjacent historical batch training can be added 1.
  • the cross-entropy loss is adaptively attenuated.
  • an adaptive weight parameter is added To achieve adaptive attenuation of cross-entropy loss, where t represents the number of consecutive occurrences of positive (negative) samples.
  • the adaptive attenuation of the cross-entropy loss of corresponding labels such as categories can weaken the model's overfitting to large categories (based on the update of positive samples), and can also weaken the model's suppression of small categories (based on negative samples). Update), so as to suppress the imbalance between categories, thereby improving the accuracy and visual performance of the model.
  • S105 Converge the predicted label and the sample label of each target training image according to the cross-entropy loss function, update the parameters of the model, and obtain the trained model.
  • each sample label such as category of the training image
  • its corresponding cross-entropy loss function can be obtained, and then, based on the cross-entropy loss function, the predicted label and the sample label of the training image are converged to train the model parameters of the model To get the trained model.
  • the cross-entropy loss of the predicted label and the sample label of the training image is obtained according to the cross-entropy loss function; the model parameters in the deep neural network model are trained according to the cross-entropy loss.
  • a back propagation algorithm can be used in conjunction with a stochastic gradient descent algorithm with momentum to train the model; for example, the cross-entropy loss descent gradient of the predicted label and the sample label of the training image can be obtained according to the cross-entropy loss function (may (Derived by derivation of the loss function), and then, based on the cross-entropy loss descent gradient, the model parameters in the deep neural network model are trained; specifically, the cross-entropy loss descent gradient and the learning rate corresponding to the model parameters (that is, model parameters) The learning rate corresponding to the layer) updates the model parameters.
  • the cross-entropy loss descent gradient of the predicted label and the sample label of the training image can be obtained according to the cross-entropy loss function (may (Derived by derivation of the loss function), and then, based on the cross-entropy loss descent gradient, the model parameters in the deep neural network model are trained; specifically, the cross-entropy loss descent gradient and the learning rate corresponding to the model parameters (that is
  • the multi-label prediction or output deep neural network model can be changed into a single-label prediction or output classification model through transfer learning, which can improve the generality of the model Sex.
  • the method of the embodiment of the present application may further include:
  • the learning rate of the upper layer is greater than the learning rate of the lower layer
  • the learning rate of each layer in the changed network model is adaptively adjusted to obtain the adjusted network model
  • the single-label image classification model is obtained.
  • the multi-label output ResNet-101 model trained on the multi-label image training set ML-Images can be learned and transferred in the above manner, so that the multi-label output ResNet-101 model trained on ML-Images can help other Visual tasks, such as single-label image classification.
  • the model parameters include: the parameters of the single label classifier (that is, a single Softmax function), and other model parameters.
  • the layered adaptive learning rate fine-tuning method is to adjust the learning rate of each layer in the changed network model according to the principle that the learning rate of the upper layer is greater than the learning rate of the lower layer. Specifically, set the learning rate of the upper layer to be greater than Bottom layer. That is, the layer closer to the output has a larger learning rate.
  • the single-label classification model obtained through the above learning transfer method compared with the single-label classification model obtained by the traditional method, alleviates the negative effects caused by the difference between the multi-label data and the single-label data set, has superior performance, high classification accuracy, high quality, etc. advantage.
  • the model training method provided by the embodiments of the present application may be applicable to visual-related services.
  • the article's image quality evaluation and recommendation, in-game object recognition, and the models trained by the method of the embodiments of the present application have all achieved good results.
  • the model will provide an excellent initial model for other broader visual services, including image understanding and video understanding.
  • the embodiments of the present application can obtain a multi-label image training set.
  • the multi-label image training set includes multiple training images, each training image is labeled with multiple sample labels; multiple training images are selected from the multi-label image training set, As the target training image for training the current model; adopting the model to perform label prediction on each target training image to obtain multiple predicted labels for each target training image; obtaining multiple samples for each of the target training images
  • the cross-entropy loss function corresponding to the label, the positive label loss in the cross-entropy loss function is set with a weight, and the weight is greater than 1 so that the loss of the positive label is greater than the loss of the negative label; according to the cross-entropy loss function, the training image
  • the predicted label and the sample label converge, update the parameters of the model, and obtain the trained model.
  • This scheme can use the cross entropy loss function with weights to train the model parameters of the deep neural network model, and the weight parameter value in the cross entropy function is greater than 1, therefore, it can suppress the imbalance of positive and negative labels in the category and improve the accuracy of the model Rate and visual expression.
  • the scheme can also suppress the problem of category imbalance through adaptive attenuation of cross-entropy loss and negative sample downsampling, further improving the accuracy and visual performance of the model.
  • model training device is specifically integrated in a network device.
  • W represents the set of trainable parameters of the model.
  • y i ⁇ ⁇ 0,1 ⁇ m represents the given label vector of the i-th training image xi (that is, the sample label group of the i-th training image xi), if the j-th object exists in the image, the yi
  • the j elements are 1, otherwise 0.
  • m is the number of labels, that is, the number of categories, of the multi-label image training set.
  • is the weight parameter of positive label loss, and its value represents the weight of positive label loss.
  • setting ⁇ to 12 preferentially can suppress the imbalance of positive and negative tags in the category.
  • r t j is the cross-entropy loss attenuation parameter.
  • the cross-entropy loss adaptive attenuation parameter also called the adaptive weight parameter of the cross-entropy loss
  • t represents the number of consecutive occurrences of positive (negative) samples of category j in the current batch of training images. It can be seen from the formula that the value of the cross-entropy loss adaptive parameter can be obtained through t, that is, the cross-entropy loss adaptive parameter is updated through t.
  • the network device obtains a multi-label image training set.
  • the multi-label image training set includes multiple training images, and each training image is labeled with multiple sample labels.
  • the multi-label image training set may include at least one image labeled with multiple labels (such as multiple object categories), and the image may be referred to as a multi-label image.
  • the multi-label image training set may include multiple multi-label images and cover multiple object categories. For example, it can be the ML-Images multi-label image training set.
  • the network device selects multiple training images from the multi-label image training set as the current batch training target training images.
  • the network device can use multiple batch training images to train the model, that is, multiple target training images for model training can be selected in the multi-label image training set each time.
  • each batch training can be the same, for example, they are all 100, or they can be different, such as selecting 100 training images for the first time and selecting 400 for the second time. Training images.
  • the network device updates the cross-entropy loss attenuation parameter in the cross-entropy loss function corresponding to the multiple sample labels of each target training image.
  • the network device obtains the overall type of the first training image for each sample label of each target training image in adjacent batch training, and the number of consecutive occurrences of the training image with the same label as the sample label; obtains each target training in the current batch training The overall type of the second training image for each sample label of the image; compare the overall type of the first training image with the overall type of the second training image to obtain a comparison result; according to the comparison result and the number of times, obtain the current batch training with the sample label The target number of consecutive occurrences of the training image with the same label; update the cross-entropy loss attenuation parameter according to the target number to obtain the updated cross-loss function.
  • the target number is the above t, which can be obtained based on the comparison result of the overall image type of the current batch training and the adjacent historical batch training, and the number of consecutive occurrences of the positive (negative) training images of the sample labels in the adjacent historical batch training.
  • the network device preprocesses each target training image in the current batch training.
  • the image preprocessing can refer to the above description, for example, the corresponding area image can be extracted from the target training image, and the area image is scaled to a predetermined size to obtain the scaled image; the scaled image is randomly disturbed, etc. .
  • the network device performs negative sample downsampling on each target training image in the current batch training.
  • the deep neural network model includes an output layer, and the output layer includes multiple output functions
  • the output layer includes multiple output functions
  • each training image in the target training image is a negative training image without a sample label
  • the sample label is mapped according to a preset processing probability To update the parameters in the output function
  • the negative training image without sample labels in the target training image is randomly down-sampled.
  • the network device uses the model to perform label prediction on each target training image to obtain multiple predicted labels for each target training image.
  • the network device converges the predicted label and the sample label of each target training image according to the cross-entropy loss function corresponding to the multiple sample labels of each target training image, updates the parameters of the deep neural network model, and obtains training After the model.
  • the deep neural network model after training can be obtained.
  • the cross-entropy loss function can refer to the above introduction.
  • the deep neural network model may be a model based on a deep learning network such as a convolutional neural network.
  • a deep learning network such as a convolutional neural network.
  • it may be a ResNet (Residual Neural Network) model.
  • ResNet Residual Neural Network
  • the structure of the residual network can refer to the above introduction.
  • a back propagation algorithm can be used in conjunction with a stochastic gradient descent algorithm with momentum to train the model; for example, the cross entropy loss descent gradient of the predicted label and the sample label of the training image can be obtained according to the cross entropy loss function ( It can be obtained by derivation of the loss function), and then, based on the cross-entropy loss descent gradient, the model parameters in the deep neural network model are trained; specifically, it can be based on the cross-entropy loss descent gradient and the learning rate corresponding to the model parameter (that is, the model The learning rate corresponding to the layer where the parameter is located) updates the model parameters.
  • the cross entropy loss descent gradient of the predicted label and the sample label of the training image can be obtained according to the cross entropy loss function ( It can be obtained by derivation of the loss function), and then, based on the cross-entropy loss descent gradient, the model parameters in the deep neural network model are trained; specifically, it can be based on the cross-entropy loss descent gradient and
  • the training algorithm and hyperparameters use the commonly used back propagation algorithm, coupled with the stochastic gradient descent algorithm with momentum to train the ResNet-101 model.
  • the training hyperparameters are as follows.
  • the number of batch images is 4096.
  • the learning rate adopts a warm-up strategy.
  • the initial learning rate is 0.01, and each epoch is multiplied by 1.297 until the 9th epoch learning rate becomes 0.08, and then the learning rate is attenuated by 0.1 multiplier every 25 epoch until the 60th epoch.
  • Momentum is 0.9.
  • the attenuation factor of the moving average is 0.9, and 1e-5 is added to the variance of the denominator to avoid 0 variance.
  • an L2 regular term can be added to all training parameters, and its weight parameter is 0.0001.
  • Measurement criteria In order to verify the performance of the ResNet-101 model trained on multi-label data, ML-Images, it can be tested on the validation set of ML-Images, and three commonly used multi-label measurement criteria are used, including accuracy , Recall rate and F1 index. Since the output of each Sigmoid function is a continuous value between 0 and 1, that is, the posterior probability for each category, the posterior probability vector needs to be converted into a binary vector before measurement. Given a posterior probability vector of continuous values, the element corresponding to the first k maximum values can be set to 1, indicating that the prediction is a positive label, and the other elements are set to 0, indicating that the prediction is a negative label. For the i-th test image, a binary prediction vector can be obtained The three measurement criteria for sampling are defined as follows:
  • the specific experimental results are shown in the table below. It is worth noting that the values of various indicators are not too high. The main reasons are: 1) The label in ML-Images itself contains noise; 2) For many categories, the training sample is not enough (about 5000 The training image of the class does not exceed 1000).
  • the multi-label prediction or output deep neural network model can be changed into a single-label prediction or output classification model through transfer learning, which can improve the generality of the model Sex.
  • the network device can change multiple output functions in the output layer of the deep neural network model after training into a single-label classifier to obtain the changed network model; according to the principle that the high-level learning rate is greater than the low-level learning rate, the changed network model
  • the learning rate of each layer is adaptively adjusted to obtain the adjusted network model; the model parameters of the adjusted network model are trained according to the single-label training image set, and the single-label image classification model is obtained.
  • the model parameters include: the parameters of the single label classifier (that is, a single Softmax function), and other model parameters.
  • the layered adaptive learning rate fine-tuning method is to adaptively adjust the learning rate of each layer in the changed network model according to the principle that the learning rate of the upper layer is greater than the learning rate of the lower layer. Bottom layer. That is, the layer closer to the output has a larger learning rate.
  • the multi-label output ResNet-101 model trained on the multi-label image training set ML-Images can be learned and transferred in the above manner, so that the multi-label output ResNet-101 model trained on ML-Images can help other Visual tasks, such as single-label image classification.
  • Model one train the ResNet-101 model with single label output directly on the ImageNet training data set and test it on the ImageNet verification set.
  • Model 2 Replace the output layer of the multi-label output ResNet-101 model trained on ML-Images (that is, multiple independent Sigmoid functions) into a single-label classifier (that is, a single Softmax function), Train the parameters of the Softmax function on the ImageNet data set, and fine-tune the consistency of the learning rate of other layer parameters (see below).
  • Model 3 replace the output layer of the multi-label output ResNet-101 model trained on ML-Images (that is, multiple independent Sigmoid functions) into a single-label classifier (that is, a single Softmax function), Train the parameters of the Softmax function on the ImageNet dataset, and fine-tune the adaptive learning rate of the other layer parameters (see below).
  • Fine-tuning the learning rate In the transfer learning of deep neural networks, fine-tuning the model parameters is a very important and critical step. It can not only retain the visual expression ability of the initial parameters, but also adjust according to the difference between the original data set and the target data set. .
  • the hyperparameter setting of the commonly used fine-tuning algorithm is: setting a larger initial learning rate for the output layer parameters, and setting the learning rate of all other layer parameters to a smaller value. Because the learning rate is the same except for the output layer, this standard fine-tuning algorithm is called the learning rate consistency fine-tuning algorithm.
  • the embodiment of the present application proposes a fine-tuning algorithm for hierarchical adaptive learning rate.
  • the high-level parameters are more relevant to the training data set, so a larger learning rate is set; the lower-level parameters represent low-level visual information, and the relationship with the training data set is weaker, so a smaller learning rate is set.
  • hyperparameter settings of the above three models are shown in the following table:
  • Google's Model 2 is pre-trained on the JFT-300M dataset containing 300 million images, while ML-Images contains only 18 million images. This application only uses about 1/17 of the data volume, which surpasses Google's performance, fully illustrating the effectiveness of the model implementation and training algorithm of this application.
  • the model training method provided by the embodiments of the present application may be applicable to visual-related services.
  • the article's image quality evaluation and recommendation, in-game object recognition, and the models trained by the method of the embodiments of the present application have all achieved good results.
  • the model will provide an excellent initial model for other broader visual services, including image understanding and video understanding.
  • the embodiment of the present application can increase the weight of the positive label loss in the cross-entropy loss function, and the weight is greater than 1; according to the cross-entropy loss function, the predicted label and the sample label of the target training image are converged to obtain the deep neural network after training Network model.
  • This scheme can use the cross entropy loss function with weights to train the model parameters of the deep neural network model, and the weight parameter value in the cross entropy function is greater than 1, therefore, it can suppress the imbalance of positive and negative labels in the category and improve the accuracy of the model Rate and visual expression.
  • the scheme can also suppress the problem of category imbalance through adaptive attenuation of cross-entropy loss and negative sample downsampling, further improving the accuracy and visual performance of the model.
  • the embodiments of the present application also provide a model training device.
  • the model training device may be specifically integrated in a network device such as a terminal or a server.
  • the terminal may include a mobile phone, a tablet computer, a laptop computer or PC and other equipment.
  • the model training device may include an image acquisition unit 501, a selection unit 502, a prediction unit 503, a function acquisition unit 504, and a training unit 505, as follows:
  • the image acquisition unit 501 is used to acquire a multi-label image training set, the multi-label image training set includes multiple training images, and each training image is labeled with multiple sample labels;
  • the selection unit 502 is used to select multiple training images from the multi-label image training set as target training images for training the current model;
  • the prediction unit 503 is configured to use the model to perform label prediction on each of the target training images to obtain multiple predicted labels for each target training image;
  • the function obtaining unit 504 is used to obtain a cross-entropy loss function corresponding to multiple sample labels of each of the target training images, the positive label loss in the cross-entropy loss function is set with a weight, and the weight is greater than 1 to make The loss of the positive label is greater than the loss of the negative label;
  • the training unit 505 is configured to converge the predicted label and the sample label of each target training image according to the cross-entropy loss function, update the parameters of the model, and obtain the trained model.
  • the cross-entropy loss function further includes: cross-entropy loss attenuation parameters;
  • the model training apparatus may further include: a first type acquisition unit 506, a second type acquisition unit 507, and a parameter update unit 508 ;
  • the selection unit 502 may be specifically used to: use the selected training images as target training images of the current batch;
  • the first type obtaining unit 506 is used to obtain each target training of batches of adjacent batches before the training unit 505 converges the predicted label and the sample label of the target training image according to the cross-entropy loss function
  • the overall type of the first training image for each sample label of the image and the number of consecutive occurrences of the training image with the same label as the sample label, the overall type of the first training image corresponding to each sample label is used to indicate the phase In each target training image of the neighboring batch, whether there are one or more continuous training images with the same label as the sample label;
  • the second type obtaining unit 507 is configured to obtain the overall type of the second training image corresponding to each sample label of each target training image of the current batch, and the overall type of the second training image corresponding to each sample label is used to Indicating whether there are one or more continuous training images with the same label as the sample label in each target training image of the current batch;
  • the parameter updating unit 508 is configured to update the cross-entropy loss attenuation parameter according to the overall type of the first training image, the overall type of the second training image, and the number of times.
  • the parameter updating unit 508 may be specifically used for:
  • the deep neural network model includes an output layer, and the output layer includes multiple output functions; the prediction unit 503 may be specifically configured to: for each sample label of each target training image :
  • the parameters in the output function corresponding to the sample label are updated according to a preset processing probability to obtain an updated model
  • the deep neural network model includes an output layer, and the output layer includes multiple output functions; the prediction unit 503 may be specifically configured to: for each sample label of each target training image :
  • the model is used for label prediction of the down-sampled target training image to the model to obtain multiple prediction labels for each target training image.
  • the prediction unit 503 may be specifically configured to: randomize the negative training images that do not have the sample label in each of the target training images according to the preset ratio of positive and negative training images corresponding to the sample labels Downsampling.
  • the model training apparatus may further include: a preprocessing unit 509;
  • the pre-processing unit 509 may be specifically used for:
  • the prediction unit 503 may be specifically used to: use a model to perform label prediction on each pre-processed training image.
  • the preprocessing unit 509 performs random disturbance processing on the scaled image, which may include:
  • the pixel value of the processed image is scaled to a preset pixel value range.
  • the deep neural network model includes a deep residual network model; the deep residual network model includes multiple consecutive residual blocks, each of which includes convolution branches and residuals Difference branch, the size of the convolution kernel of the first convolution layer in the convolution branch is smaller than the size of the convolution kernel of the second convolution layer after the first convolution layer, the The convolution step size is greater than the convolution step size of the first convolution layer and less than the width of the convolution kernel of the second convolution layer
  • the model training apparatus may further include: a transfer learning unit 510;
  • the transfer learning unit 510 may be specifically used for:
  • the learning rate of the upper layer is greater than the learning rate of the lower layer, adaptively adjust the learning rate of each layer in the changed network model to obtain the adjusted network model;
  • the single-label image classification model is obtained.
  • the training unit 505 obtains the cross-entropy loss gradient of the predicted label and the sample label of each target training image according to the cross-entropy loss function
  • the model parameters in the model are trained based on the cross-entropy loss descent gradient, and the model parameters in the model are updated to obtain the trained model.
  • the above units can be implemented as independent entities, or can be combined in any combination to be implemented as the same or several entities.
  • the above units please refer to the foregoing method embodiments, which will not be repeated here.
  • the model training apparatus of this embodiment obtains the multi-label image training set through the image acquisition unit 501; the selection unit 502 selects multiple training images from the multi-label image training set as targets for training the current model Training images; the prediction unit 503 uses the model to perform label prediction on each of the target training images to obtain multiple predicted labels for each target training image; the function acquisition unit 504 obtains multiple of each of the target training images
  • the cross-entropy loss function corresponding to a sample label, the positive label loss in the cross-entropy loss function is set with a weight, and the weight is greater than 1 so that the loss of the positive label is greater than the loss of the negative label; according to the training unit 505
  • the cross-entropy loss function converges the predicted label and the sample label of each target training image, updates the parameters of the model, and obtains the trained model.
  • This scheme can use the weighted cross-entropy loss function to train the model parameters for the image recognition model, and the weight parameter value in the cross-entropy function is greater than 1, therefore, it can suppress the imbalance of positive and negative labels in the category and improve the model's Accuracy and visual expression.
  • An embodiment of the present application further provides a network device, and the network device may be a device such as a server or a terminal.
  • the network device may be a device such as a server or a terminal.
  • FIG. 6 it shows a schematic structural diagram of a network device involved in an embodiment of the present application. Specifically speaking:
  • the network device may include a processor 601 with one or more processing cores, a memory 602 with one or more computer-readable storage media, a power supply 603, an input unit 604, and other components.
  • a processor 601 with one or more processing cores
  • a memory 602 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • a power supply 603 with one or more computer-readable storage media
  • the processor 601 is the control center of the network device, and uses various interfaces and lines to connect the various parts of the entire network device, by running or executing the software programs and / or modules stored in the memory 602, and calling the stored in the memory 602 Data, perform various functions of network equipment and process data, so as to monitor the network equipment as a whole.
  • the processor 601 may include one or more processing cores; the processor 601 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, and application programs, and the modem processor mainly Handle wireless communications. It can be understood that the above-mentioned modem processor may not be integrated into the processor 601.
  • the memory 602 may be used to store software programs and modules.
  • the processor 601 runs the software programs and modules stored in the memory 602 to execute various functional applications and data processing.
  • the memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required by at least one function (such as a sound playback function, an image playback function, etc.), etc .; Data created by the use of network equipment, etc.
  • the memory 602 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 602 may also include a memory controller to provide the processor 601 access to the memory 602.
  • the network device also includes a power supply 603 that supplies power to various components.
  • the power supply 603 can be logically connected to the processor 601 through a power management system, so as to realize functions such as charging, discharging, and power consumption management through the power management system.
  • the power supply 603 may also include any component such as one or more DC or AC power supplies, a recharging system, a power failure detection circuit, a power converter or inverter, and a power status indicator.
  • the network device may further include an input unit 604, which may be used to receive input numeric or character information, and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
  • an input unit 604 which may be used to receive input numeric or character information, and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
  • the network device may further include a display unit and the like, which will not be repeated here.
  • the processor 601 in the network device loads the executable file corresponding to the process of one or more application programs into the memory 602 according to the following instructions, and the processor 601 executes and stores the The application program in the memory 602 realizes various functions as follows:
  • the multi-label image training set includes multiple training images, each of which is labeled with multiple sample labels; multiple training images are selected from the multi-label image training set as the current training image.
  • Target training images of the model using the model to perform label prediction on each of the target training images to obtain multiple predicted labels for each target training image; acquiring multiple sample labels corresponding to each of the target training images
  • Cross entropy loss function the positive label loss in the cross entropy loss function is set with a weight, and the weight is greater than 1 so that the loss of the positive label is greater than the loss of the negative label; according to the cross entropy loss function
  • the predicted label and the sample label of the target training image converge, update the parameters of the model, and obtain the trained model.
  • the network device of this embodiment can acquire a multi-label image training set; select multiple training images from the multi-label image training set as target training images for training the current model; use the model for each target
  • the training image performs label prediction to obtain multiple predicted labels for each target training image; obtain the cross-entropy loss function corresponding to multiple sample labels for each target training image, and the positive label loss in the cross-entropy loss function is set with a weight , And the weight is greater than 1, so that the loss of the positive label is greater than the loss of the negative label; according to the cross-entropy loss function, the predicted label and the sample label of each target training image are converged, and the parameters of the model are updated to obtain the trained model.
  • This scheme can use the weighted cross-entropy loss function to train the model parameters for image recognition, and the weight parameter value in the cross-entropy function is greater than 1, therefore, it can suppress the imbalance of positive and negative labels in the category and improve the accuracy of the model Rate and visual expression.
  • an embodiment of the present application provides a storage medium in which multiple instructions are stored, and the instruction can be loaded by a processor to perform steps in any of the model training methods provided in the embodiments of the present application.
  • the instruction can perform the following steps:
  • the multi-label image training set includes multiple training images, each of which is labeled with multiple sample labels; multiple training images are selected from the multi-label image training set as the current training image.
  • Target training images of the model using the model to perform label prediction on each of the target training images to obtain multiple predicted labels for each target training image; acquiring multiple sample labels corresponding to each of the target training images
  • Cross entropy loss function the positive label loss in the cross entropy loss function is set with a weight, and the weight is greater than 1 so that the loss of the positive label is greater than the loss of the negative label; according to the cross entropy loss function
  • the predicted label and the sample label of the target training image converge, update the parameters of the model, and obtain the trained model.
  • the storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

Abstract

一种用于图像识别的模型的训练方法、装置和存储介质,所述方法包括:获取多标签图像训练集;从多标签图像训练集中选择多个训练图像,作为用于训练当前模型的目标训练图像;采用当前模型对每个目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;获取每个目标训练图像的多个样本标签对应的交叉熵损失函数,交叉熵损失函数中的正标签损失设有权重,且权重大于1以使所述正标签的损失大于负标签的损失;根据交叉熵损失函数对每个目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。

Description

用于图像识别的模型的训练方法、装置、网络设备和存储介质
本申请要求于2018年10月10日提交中国专利局、申请号为201811180282.2、名称为“一种模型训练方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,具体涉及一种用于图像识别的模型的训练方法、装置、网络设备和存储介质。
背景
随着深度学习模型和训练方法的发展,计算机视觉领域也取得了很大的进步,且研究方向从低层的图像处理,图像识别逐渐向更高层的视觉理解转变。复杂的视觉任务需要利用具有更好视觉表示潜力的深度神经网络模型。
目前在大规模多标签图像训练集上训练而成的深度神经网络模型,其具有更好的视觉表现能力,大规模多标签图像数据集的质量决定着深度神经网络模型的视觉表现力和准确率。目前已公开的大规模多标签图像数据集ML-Images可以包括11166个标签,18019881幅训练图片;业内一般在该图像数据集上训练深度神经网络模型。
技术内容
本申请实施例提供一种用于图像识别的模型的训练方法,由网络设备执行,包括:
获取多标签图像训练集,所述多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;
从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;
采用所述模型对每个所述目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;
获取每个所述目标训练图像的多个样本标签所对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;
根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
本申请实施例还提供一种用于图像识别的模型训练装置,包括:
图像获取单元,用于获取多标签图像训练集,所述多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;
选择单元,用于从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;
预测单元,用于采用所述模型对每个所述目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;
函数获取单元,用于获取每个所述目标训练图像的多个样本标签所对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;
训练单元,用于根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
本申请实施例提供一种网络设备,包括处理器;与所述处理器相连接的存储器;所述存储器中存储有机器可读指令,所述机器可读指令可以由处理器执行本申请实施例所述的方法。
此外,本申请实施例还提供一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本申请实施例所述的方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A是本申请实施例提供的模型训练方法的场景示意图;
图1B是本申请实施例提供的模型训练方法的流程示意图;
图1C是本申请实施例提供的对目标训练图像进行预处理的流程图;
图2是本申请实施例提供的残差网络的每个卷积阶段中的第一个残差块的结构示意图;
图3是本申请实施例提供的残差网络的结构示意图;
图4是本申请实施例提供的模型训练方法的另一流程示意图;
图5A是本申请实施例提供的模型训练装置的结构示意图;
图5B是本申请实施例提供的模型训练装置的另一结构示意图;
图5C是本申请实施例提供的模型训练装置的另一结构示意图;
图5D是本申请实施例提供的模型训练装置的另一结构示意图;
图6是本申请实施例提供的网络设备的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,深度学习是机器学习的技术和研究领域之一,通过建立具有阶层结构的人工神经网络,在计算机系统中实现人工智能(Artificial Intelligence,AI)。
人工智能是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
由于深度学习(Deep Learning,DL)在视觉领域的成功应用,研究者也将其引入到图像描述领域中,通过大规模多标签图像数据集来训练深度学习神经网络模型,以使其完成图像识别相关的任务。
目前大规模多标签图像数据集存在较大的类别不均衡的问题,比如,类别内的正负标签不均衡的问题,譬如,对于数据集中的某张训练图像,一般其负标签(即该图像中不存在的类别)的数量远远大于正标签(该图像中存在的类别)。由于大规模多标签图像数据集存在类别不均衡的问题,导致在大规模多标签图像数据集训练出的深度网络模型的准确率、视觉表现力较低。
本申请实施例提供一种用于图像识别的模型的训练方法、装置和存储介质。
其中,该模型训练装置具体可以集成在网络设备,如终端或服务器等设备中,例如,参考图1A,网络设备可以获取多标签图像训练集,比如,网络设备可以通过图像搜索引擎搜索多标签图像,得到多标签图像训练集(多标签图像训练集包括标注了多个标签的图像);然后,从多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;根据每个目标训练图像对当前的模型进行训练,得到每个目标训练图像的多个预测标签;获取每个所述目标训练图像的多个样本标签对应的交叉熵损失函数,交叉熵损失函数中的正标签损失设有权重,且权重大于1以使正标签的损失权重大于负标签的损失权重;根据交叉熵损失函数对每个目标训练图像的预测标签和样本标签进行收敛,更新所述模型 的参数,得到训练后的模型。其中,当前的模型可以是深度神经网络模型。
在一些实施例中,尽管从多标签图像训练集中选择当前多个训练图像,作为用于训练当前的模型训练的目标训练图像,但在模型训练时,模型是一张一张来进行学习训练图像的。因此上述样本标签可以是模型当前学习的一个目标训练图像的所具有的标签,例如,当前的目标训练图像所具有的标签为“人”和“狗”,则上述样本标签为“人”和“狗”。
在一些实施例中,上述交叉熵损失函数是用来衡量人工神经网络(ANN:Artificial Neural Network)的预测值与实际值的一种方式。交叉熵刻画的是两个概率分布的距离,也就是说交叉熵的值越小,两个概率分布越接近,因此,可以通过交叉熵的值可以判断预测值与实际值之间的距离。
在一些实施例中,对交叉熵损失函数中的正标签损失设有权重,且该权重大于1,其作用是使正标签的损失大于负标签的损失,也即作用是设定正标签的分错代价大于负标签的分错代价,其中,正标签可以是与训练图像的一个样本标签相同的标签,例如图像中存在的物体类别,负标签可以是与训练图像的样本标签不相同的标签,例如图像中不存在的物体类别。
本申请实施例中,可以采用批量训练的方式对模型进行训练,也即每次采用多张训练图像对模型进行训练。此时,网络设备可以从多标签图像训练集中选择用于当前的批量训练的目标训练图像;根据目标训练图像对深度神经网络模型进行训练。
此外,为了进一步抑制类别间正负训练图像的不均衡的问题,网络设备还可以基于向量批量训练中标签的图像连续出现的情况,对交叉熵损失进行自适应衰减。具体地,交叉熵损失还函数还包括:交叉熵损失衰减参数;此时,网络设备可以获取相邻批量训练中样本标签的第一训练图像整体类型、以及样本标签的训练图像连续出现的次数;获取当前批量训练中样本标签的第二训练图像整体类型;根据第一训练图像整体类型、第二训练图像整体类型以及次数,对交叉熵损失衰减参数进行更新,得到更新后交叉熵损失函数;根据更新后交叉熵损失函数对目标训练图像的预测标签和样本标签进行收敛。
在本申请实施例中,将模型训练装置的角度进行描述,该模型训练装置具体可以集成在网络设备如终端或服务器等设备中。
在一实施例中,提供了一种模型训练方法,该模型训练方法得到的深度神经网络模型可以用于视觉表示任务等图像识别相关的任务,例如文章的图像质量评价与推荐,游戏内物体识别等。该方法可以由网络设备执行,如图1B所示,该模型训练方法的具体流程可以如下:
S101、获取多标签图像训练集,该多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签。
其中,多标签图像训练集可以包括至少一张标注了多个标签(如多个物体类别)的图像,该图像可以称为多标签图像。多标签图像训练集可以包括多张多 标签图像,并且涵盖多个物体类别。
实际应用中,多标签图像训练集可以选择目前业内已公开的大规模多标签图像训练集;比如,可以为Open Images v3多标签图像训练集(包括9M图像,涵盖6K物体类别),或者可以为ML-Images大规模多标签图像训练集,其涵盖共有11166个标签,18019881幅图像。
本申请实施例中,多标签图像训练集的获取方式可以有多种,比如,可以通过图像搜索引擎等搜索公开的多标签图像组成多标签图像训练集。又比如,可以直接下载或拉取已公开的多标签图像训练集,比如ML-Images等。
本申请实施例中,图像的标签内容可以包括图像所属的类别(可以为物体类别),比如,可以包括类别编号和/或类别名称例如,标签可以包括类别编号/m/056mk、对应的类别名称metropolis(即都会、大都市)。
S102、从多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像。
比如,可以从多标签图像训练集中选择一张或者多张训练图像,用于模型训练。
在一实施例中,可以采用多批量训练图像对模型进行训练,具体可以利用不同批次的目标训练图像实现批量训练,也即,每次可以多标签图像训练集中选择多张用于模型训练的目标训练图像。比如,步骤“从多标签图像训练集中选择多个训练图像”,可以包括;从多标签图像训练集中选择所述多个训练图像,作为当前的批量训练的目标训练图像。
例如,可以每次选择100张训练图像作为目标训练图像,以对模型进行批量训练。
在实际应用中,每个批次即每个批量训练所选的训练图像数量可以相同,比如均为100张,也可以不相同,如第一次选100张训练图像,第二次选200张训练图像。
在一实施例中,为提升模型训练的效率和准确性,还可以在对模型训练之前对训练图像进行预处理;比如,在采用深度神经网络模型对目标训练图像进行标签预测之前,还可以包括:对目标训练图像进行预处理。
其中,图1C为本申请实施例对目标训练图像进行预处理的流程图,如图1C所示,预处理过程可以包括以下步骤:
S111、从目标训练图像中提取相应的区域图像。
具体地,可以从目标训练图像中截取占目标训练图像预定比例、且宽高比为预定比值的区域图像。
在本申请的一个实施例中,预定比例可以为从预定比例区间中随机选取的值,和/或预定比值可以为从预定宽高比区间中随机选取的值,比如预定比例区间可以是[0.7,1.0],预定宽高比区间可以是[3/4,4/3]。
S112、将区域图像缩放到预定尺寸,得到缩放后图像。
其中,预定尺寸可以根据实际需求设定,比如,将提取的区域图像统一缩 放到224*224大小。
S113、对缩放后图像进行随机扰动处理,得到预处理后的训练图像;
其中,随机扰动处理,可以包括:
以第一处理概率对缩放后图像进行水平翻转处理;和/或
以第二处理概率对缩放后图像进行随机角度的旋转处理,随机角度为从预定角度区间中随机选取的值;和/或
以第三处理概率对缩放后图像的属性进行扰动处理;和/或
将缩放后图像的像素值缩放到预设像素值范围内。
其中,图像的属性包括饱和度、对比度、亮度和色度等。预设像素值范围可以为[-1,1]。各个处理概率可以根据实际需求设定。预定角度区间可以根据实际需求设定,比如,可以为[-45,45]度之间。
例如,对于一副目标训练图像。该图像可以是RGB图像。预处理过程可以如下:
①从图像中随机剪裁出占图像总面积比例为[0.7,1.0]中的任一比例,宽高比为[3/4,4/3]中的任一比值的区域。
②将裁剪得到的图像尺寸调整为224*224的尺寸。
③以上述第一处理概率例如0.5的处理概率对图像进行水平翻转处理。
④以上述第一处理概率例如0.25的处理概率对图像进行随机角度的旋转处理,随机角度在[-45,45]度之间随机决定。
⑤以上述第三处理概率例如0.5的处理概率对图像的饱和度、对比度、亮度和色度进行扰动处理;
⑥将图像中像素点的像素值缩放到[-1,1]范围内,比如,在对图像进行了二值化处理的情况下,可将像素值从[0,1]缩放到[-1,1]。
S103、采用所述模型对每个目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签。
比如,所述模型可以是深度神经网络模型,在标签包括图像类别时,可以采用深度神经网络模型对每个目标训练图像进行分类,得到每个目标训练图像的预测类别,即预测标签。
在一实施例中,当对各个目标训练图像进行预处理后,可以“采用所述模型对每个目标训练图像进行标签预测”,包括:采用模型对预处理后的每个目标训练图像进行标签预测。
其中,深度神经网络模型可以包括输出层,该输出层可以包括多个输出函数,每个输出函数用于输出相应标签如类别的预测结果,如预测标签、预测标签对应的预测概率等等。
比如,深度网络模型的输出层可以包括m个输出函数如Sigmoid函数,该m为多标签图像训练集对应的标签数量,例如,在标签为类别时,m为多标签图像训练集的类别数量,该m为正整数。其中,每个输出函数如Sigmoid函数的输出可以包括给定训练图像属于某一个标签如物体类别、和/或概率值,即预测概率。
其中,深度神经网络模型可以为基于深度学习网络如卷积神经网络的模型,比如,可以为残差神经网络(ResNet:Residual Neural Network)模型,ResNet是由何凯明等人提出的神经网络,ResNet的结构可以极快地加速超深神经网络的训练,模型的准确率也有非常大的提升。
在一实施例中,在原始的残差网络结构中,卷积分支中的第一个卷积层的卷积核大小为1×1、卷积步长为2,第二个卷积层的卷积核大小为3×3、卷积步长为1,那么在第一个卷积层进行卷积操作时,两次卷积过程之间会跳过一个特征点,进而会造成特征网络的损失,因此,可以对残差网络进行如下结构上的改进:
其中,残差网络包含顺次相连的多个残差块(Residual Blocks),每个残差块包含卷积分支和残差分支,卷积分支中的第一卷积层的卷积核大小小于位于第一卷积层之后的第二卷积层的卷积核大小,第二卷积层的卷积步长大于第一卷积层的卷积步长且小于第二卷积层的卷积核宽度。其中,残差块中的残差分支是由卷积分支的输入指向卷积分支的输出。
在一实施例中,残差网络可以是深度残差网络,残差网络还包含位于多个残差块之前的初始卷积层,初始卷积层的输出作为多个残差块中第一个残差块的输入。在该实施例中,由于残差块中的第二卷积层已经能够实现下采样处理,因此可以去掉原始残差网络中位于残差块之前的池化层,简化了残差网络的结构。
在本申请的一个实施例中,残差网络中的多个残差块构成了多个卷积阶段(stages),每个卷积阶段中的第一个残差块包含的残差分支包含顺次相连的一批量归一化处理层和一目标卷积层。
在该实施例中,对于一个残差块而言,如果其输入和输出具有相同的尺寸(包括size和channel等),那么残差分支是一个恒等映射,但是如果其输入和输出的尺寸不相同,那么需要通过一个卷积操作来将输入和输出映射到相同的尺寸上,通常情况下在每个卷积阶段中的第一个残差块中需要通过非恒等映射(即增加一个卷积层)的残差分支来保证残差块的输入和输出一致。同时,由于卷积层的卷积操作没有偏置项,因此可以在卷积层之前添加一个BN(即Batch Normalization,批量归一化)层来添加偏置项,进而能够保证达到最优的处理效果。
由于采用残差网络的结构,可以使得残差块中的卷积层在进行卷积操作时,既保证了能够通过第二卷积层实现下采样的处理,又能够保证不会跳过任何一个特征点,进而可以保证不会造成特征网络表征能力的损失,从而能够确保图像特征提取的准确性,提高了图像识别的准确率。
基于前述实施例中所介绍的残差网络的结构,在本申请的一个具体实施例中,如图2所示,为本申请实施例的残差网络的每个卷积阶段中的第一个残差块的结构示意图,具体包括:卷积分支201和残差分支202,其中,残差分支202由卷积分支201的输入指向卷积分支201的输出。
卷积分支201包括第一卷积层2011、第二卷积层2012和第三卷积层2013,第一卷积层2011、第二卷积层2012和第三卷积层2013中的每个卷积层之前均设置有 BN层,并且在通过BN层处理之后,均会通过Relu(Rectified Linear Unit,线性整流单元)进行处理。其中,第一卷积层2011的卷积核大小为1×1、卷积步长为1;第二卷积层2012的卷积核大小为3×3、卷积步长为2;第三卷积层2013的卷积核大小为1×1、卷积步长为1。由于第二卷积层2012既可以实现下采样的处理,又能够保证不会跳过任何一个特征点,因此本申请实施例的残差块能够保证不会造成特征网络表征能力的损失。
残差分支202包括卷积层2021和卷积层之前设置的BN层,并且在通过BN层处理之后,会通过Relu函数进行处理。
卷积分支201和残差分支202的输出在元素层面上执行加法(Addition)运算,得到每个残差块的输出。
在本申请的一个实施例中,如图3所示为本申请实施例的残差网络的结构示意图,该结构包括:顺次相连的初始卷积层301、卷积阶段302、卷积阶段303、卷积阶段304、卷积阶段305、全局平均池化层(Global Average Pool Layer)306和全连接层(Fully-Connected Layer)307。其中,初始卷积层301的卷积核大小为7×7、卷积步长为2、通道数(channel)为64;卷积阶段302、卷积阶段303、卷积阶段304、卷积阶段305中的每个卷积阶段包含多个残差块,不同卷积阶段中包含的残差块的数量可能不相同,比如在ResNet101中,卷积阶段302包含3个残差块、卷积阶段303包含4个残差块、卷积阶段304包含23个残差块、卷积阶段305包含4个残差块。需要说明的是,每个卷积阶段中的第一个残差块的结构如图2所示,其它的残差块中的残差分支为恒等映射,卷积分支与图2中所示的卷积分支201相同。
从图2和图3所示的残差网络的结构可以看出,本申请实施例中的残差网络在原始的残差网络基础上去掉了初始卷积层301之后的最大池化层,并将下采样过程放到了第一个卷积阶段中,即卷积阶段302中,具体是放到了卷积阶段302中第一个残差块中的第二卷积层2012中。同时,在每个残差块中,将下采样的过程均放到了3×3的第二卷积层中,进而保证下采样过程不会跳过任何一个特征点,确保不会造成特征网络表征能力的损失。此外,不仅在卷积分支中添加了BN层,而且在非恒等映射的残差分支中也添加了BN层,这样可以通过BN层在卷积层之前添加偏置项,进而能够保证达到最优的处理效果。
根据上述描述,在实际应用中,残差网络结构主要进行了如下改进:
去掉初始的Max Pooling操作,将下采样放置到第一个stages中。
替换卷积步长等于2的block,将下采样放置到3*3的卷积操作中
不只对每个block的卷积分支进行BN操作,对于非恒等映射的residual分支也进行BN操作。
在一实施例中,考虑到标签如类别间的正负训练图像的不均衡,比如,对于大多数类别来说,在整个数据集范围内,其正图像往往小于其负图像。尤其是一些罕见的小类别,往往只有几千张正图像,与负图像的比例甚至达到1比几千。
在一些实施例中,对于目标训练图像的一个样本标签,正图像是包含该样 本标签如类别对应的内容的图像,负图像是指不包含该样本标签如类别对应的内容的图像。例如,当样本标签为“狗”时,正图像是指包含狗的图像,而负图像是指不包含狗的图像。
在一些实施例中,对于目标训练图像的一个样本标签,正训练图像是用于模型训练的目标训练图像中包含该样本标签对应的内容的训练图像,也即,正训练图像是指具有的标签与该样本标签相同的训练图像;而负训练图像用于模型训练的目标训练图像中不包含样本标签对应的内容的训练图像,也即,负训练图像是指具有的标签与该样本标签不相同的训练图像。例如,从多标签图像训练集中供选择10000张用于模型训练的目标训练图像,当样本标签为“狗”时,正训练图像是指目标训练图像中包含狗的训练图像,而负图像是指目标训练图像中不包含狗的训练图像。
因此,为了抑制标签如类别间的正负训练图像的不均衡,提升模型准确率和视觉表现能力,还可以对负样本集负训练图像进行降采样等操作;比如,当深度神经网络模型包括输出层,输出层包括多个输出函数的情况下,步骤S103“采用所述模型对目标训练图像进行标签预测,得到目标训练图像的多个预测标签”,可以包括:
针对每个所述目标训练图像的每个样本标签:
当各个目标训练图像均为不具有该样本标签的负训练图像时,根据预设处理概率对所述样本标签对应的输出函数中的参数进行更新,得到更新后的模型;
在一些实施例中,对于目标训练图像的一个样本标签,当目标训练图像中的各个训练图像均不包含该样本标签,也即目标训练图像中的各个训练图像均为负训练图像时,则该样本标签所对应的输出函数的参数依据预设处理概率例如0.1的处理概率进行更新。
采用更新后的模型对各个目标训练图像进行标签预测,得到各个目标训练图像的多个预测标签。
或者,步骤“采用所述模型对每个目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签”,可以包括:
针对每个所述目标训练图像的每个样本标签:
当各个目标训练图像中存在具有该样本标签的正训练图像时,对各个目标训练图像中不具有该样本标签的负训练图像进行随机降采样,得到降采样后的目标训练图像;
采用所述模型对降采样后的目标训练图像对所述模型进行标签预测,得到各个目标训练图像的多个预测标签。
其中,正训练图像为具有与样本标签相同的标签(如类别)的训练图像,负训练图像为不具有与样本标签相同的标签(如类别)的训练图像。
在一实施例中,步骤“对各个目标训练图像中不具有样本标签的负训练图像进行随机降采样”,可以包括:
根据样本标签对应的预设正负训练图像比例,对各个目标训练图像中不具 有样本标签的负训练图像进行随机降采样。
其中,标签(如类别)的正负训练图像比例可以为标签(如类别)的正训练图像与负例训练图像的数量比值。可以根据实际需求设定。
比如,可以对各个目标训练图像中样本标签的负训练图像进行随机降采样,使得正负训练比例不小于预设正负训练图像比例,例如不小于1:5的预设比例。
譬如,实际应用中,在每个批量训练图像中,对于每一类别,大多数训练图像是负的,即该类别不存在于该训练图像中,甚至会出现所有训练图像对该类别都是负的情况。为了抑制数据不均衡,根据上述描述可以采取以下措施:
a)如果当前批量训练的图像中对于某一类别全部为负,也即当前批量训练的各个训练图像均不包含该类别对应的内容,则该类别所对应的Sigmoid函数的参数依据预设处理概率0.1进行更新;
b)如果存在正训练图像,我们对负训练图像进行随机降采样,使得正负训练图像的比例不小于1:5。
因此,尽管不同类别所对应的正训练图像的数量非常不均衡,但是对负训练图像的降采样保证了所有类别在近似的正负数据分布下进行训练,在一定程度上缓解了类别之间的不均衡,提升模型的准确率和视觉表现能力。
S104、获取每个所述目标训练图像的多个样本标签所对应的交叉熵损失函数,交叉熵损失函数中的正标签损失设有权重,且权重大于1以使所述正标签的损失大于负标签的损失。
其中,目标训练图像一般具有多个样本标签,交叉熵损失函数涵盖了这多个样本标签,交叉熵损失函数获取时序不受序号限制,可以根据实际需求设置在模型训练过程中相应的时序位置,比如,在选择训练图像之后,可以获取多个样本标签对应的交叉熵损失函数等等。
其中,正标签为与训练图像的样本标签相同的标签,比如,在标签为类别j时,正标签为与训练图像的类别j相同的类别;负标签为与训练图像的样本标签不相同的标签,比如,在标签为类别j时,负标签为与训练图像的类别j不相同的类别。
本申请实施例中,交叉熵损失函数可以包括正标签损失和负标签损失,该正标签损失和负标签损失可以基于训练图像的标签预测概率、样本标签得到。
对于训练图像如第i个训练图像xi的每类样本标签,本申请实施例可以采用该样本标签对应的交叉熵函数,来进行收敛。例如,以标签为物体类别为例,涵盖m个样本标签的交叉熵损失函数的定义可以如下:
Figure PCTCN2019110361-appb-000001
其中,
Figure PCTCN2019110361-appb-000002
表示对第j类的后验概率,即预测概率。W表示模型的可训练参数的集合。y i∈{0,1} m表示第i个训练图像xi的给定标签向量(即第i个训练图像xi的样本标签组),如果第j个物体存在于该图像中,则yi的第j个元素为1,否则为0。m为多标签图像训练集的标签种数即类别数量。
其中,η为正标签损失的权重参数,其值表示正标签损失的权重。该权重参数η>1的作用是使所述正标签的损失大于负标签的损失,也即作用是设定正标签(即该图像中存在的物体类别)的分错代价要大于负标签(即该图像中不存在的物体类别)。这样做的原因在于:a)对于图像标注,我们更看重正标签的正确预测程度;b)负标签的数量远远大于正标签,η>1可以在一定程度上抑制这种不均衡;c)正标签的可靠程度高于负标签,因为负标签中有很多是遗漏的正标签。在实际应用中,优先设置η为12,可以抑制类别内的正负标签的不均衡。
在一实施例中,还考虑到标签间如类别间的正图像不均衡。不同类别说对应的正图像数量的差异是非常大的,对于一些常见的大类(比如动物,植物)来说,在整个数据集内正图像所占的比例可能超过10%,而对于一些罕见的小类别来说,其正图像的比例可能几千分之一。因此,会导致训练处的模型的准确率和视觉表现力降低。
为了进一步抑制标签间如类别间的图像不均衡,提升模型准确率和视觉表现力,本申请实施例可以对某些标签如类别的交叉熵损失进行自适应衰减,比如,可以在交叉熵损失函数中加入交叉熵损失衰减参数,以对相应类别的交叉熵损失进行衰减。参考上述交叉熵损失函数,其中,r t j即为交叉熵损失衰减参数。
实际应用中,每次批量训练时可以对交叉熵损失衰减参数进行更新,以对类别的交叉熵损失进行自适应性衰减。具体地,在一实施例中,在根据交叉熵损失函数对目标训练图像的预测标签和样本标签进行收敛之前,本申请实施例方法还可以包括:
获取相邻批次的各个目标训练图像的每个样本标签对应的第一训练图像整体类型、以及具有与所述样本标签相同的标签的训练图像连续出现的次数,所述每个样本标签对应的第一训练图像整体类型用于指示相邻批次的各个目标训练图像中,是否存在具有与所述样本标签相同的标签的一个或多个连续的训练图像;
获取当前批次的各个目标训练图像的每个样本标签对应的第二训练图像整体类型,所述每个样本标签对应的所述第二训练图像整体类型用于指示当前批次的各个目标训练图像中,是否存在具有与所述样本标签相同的标签的一个或多个连续的训练图像;
根据第一训练图像整体类型、第二训练图像整体类型以及次数,对交叉熵损失衰减参数进行更新,得到更新后交叉熵损失函数。
在一些实施例中,相邻批次是指相邻的批量训练,当前批次是指当前的批量训练。
在一些实施例中,如果相邻批次中各个目标训练图像存在一个或多个连续的训练图像具有与所述样本标签相同的标签,则对于所述样本标签,第一训练图像整体类型为正;如果相邻的批量训练中各个目标训练图像的各个训练图像均不具有与所述样本标签相同的标签,则对于所述样本标签第一训练图像整体类型为负。
在一些实施例中,如果当前批次的各个目标训练图像存在具有与所述样本 标签相同的标签的一个或多个连续的训练图像,则对于所述样本标签,第二训练图像整体类型为正;如果当前批次的各个目标训练图像的各个训练图像均不具有与所述样本标签相同的标签,则对于所述样本标签,第二训练图像整体类型为负。
例如,从多标签图像训练集中选择用于模型训练的目标训练图像为10000张,当前学习的目标训练图像的样本标签为“人”和“狗”。为了便于举例,不失一般性的假设,相邻批次的目标训练图像为5张,则对于样本标签“人”的第一训练图像整体类型用于指示相邻批次的5张目标训练图像中各个训练图像分别具有以及不具有“人”的标签的结果,例如当训练图像具有“人”的标签,则其类型标识为1,当训练图像不具有“人”的标签,则其类型标识为0。如果相邻批次的5张目标训练图像中各个训练图像存在一个或多个训练图像具有“人”的标签,例如相邻批次的5张目标训练图像中各个训练图像的类型标识为01110,则对于样本标签“人”,第一训练图像整体类型为“正”;如果相邻批次的5张目标训练图像中各个训练图像均不具有“人”的标签,例如相邻批次的5张目标训练图像中各个训练图像的类型标识为00000,则对于样本标签“人”,第一训练图像整体类型可以为“负”。
同样的,第二训练图像整体类型用于指示当前批次的各个目标训练图像中是否存在具有与所述样本标签相同的标签的一个或多个连续训练图像,其确定方法与第一训练图像整体类型相同,在此不再赘述。
需要说明的是,在统计具有与样本标签相同的标签的训练图像连续出现的次数以及第一训练图像整体类型、第二训练图像整体类型时,需要针对每一个样本标签分别统计,例如样本标签为“人”和“狗”时,需要对于样本标签“人”统计具有与“人”的训练图像连续出现的次数以及对于样本标签“狗”统计具有与“狗”的训练图像连续出现的次数。
其中,整体训练图像类型可以为批量训练的训练图像中某一标签如类别的整体训练图像类型,该整体训练图像类型可以为正、或者负。比如,训练图像整体类型可以为批量训练图中物体类别j对应的整体训练图像类。
也就是说,整体训练图像类型即批量训练的训练图像中某一标签如类别的符号,可以正、或者负。
例如,以物体类别j为例,在某次批量训练中,可以获取多种训练图像即批量训练图像,如果训练图像中出现一个或者多个连续的且类别为j的训练图像,那么确定物体类别j的训练图像整体类型为正,即j的符号为正;如果全部训练图像的类别均不为j类别,那么可以确定物体类别j的训练图像整体类型为负,即符号为负。
其中,样本标签的训练图像连续出现的次数为批量训练图像中样本标签(非样本标签)对应的训练图像当前连续出现的次数,即对于某一标签,批量训练图像中正(负)训练图像连续出现的次数;在,比如,批量训练图像的类别依次为j、j+1、j、j、j、j+1、j、j、j、j、j、j,那么此时,批量训练中类别j的正训练图像当前连续出现次数为6。
其中,相邻批量训练为当前批量训练的相邻批量训练,比如,可以为当前批量训练的上一批量训练,也即上一次批次。
本申请实施例可以获取当前批量训练图像(即当前批次的训练图像)中样本标签如类别的训练图像整体类型,以及相邻批量训练图像中样本标签如类别的训练图像整体类型、以及样本标签如类别的正(负)训练图像连续出现的次数;然后,基于图像整体类型和次数对交叉熵损失衰减参数如r t j进行更新。
比如,在上述交叉熵损失函数中,加入了交叉熵损失自适应衰减参数(也可以称为交叉熵损失的自适应权重参数)
Figure PCTCN2019110361-appb-000003
其中,t表示当前批量训练图像中类别j的正(负)样本连续出现的次数。由公式可知,通过t便可以求得交叉熵损失自适应参数的值,也即通过t对交叉熵损失自适应参数进行更新。
因此,要求得交叉熵损失自适应参数需要先得到当前批量训练图像中类别j的正(负)样本连续出现的次数,即t。实际应用中t与当前批量训练与相邻批量训练的训练图像整体类型相关;具体地,步骤“根据第一训练图像整体类型、第二训练图像整体类型以及次数,对交叉熵损失衰减参数进行更新,得到更新后交叉上损失函数”,可以包括:
将第一训练图像整体类型和第二训练图像整体类型进行比较,得到比较结果;
根据比较结果和次数,获取当前批量训练中样本标签的当前训练图像连续出现的目标次数;
根据目标次数对交叉熵损失衰减参数进行更新,得到更新后交叉上损失函数。
其中,目标次数即为上述t,可以基于当前批量训练与相邻历史批量训练的图像整体类型比较结果、以及相邻历史批量训练中样本标签的正(负)训练图像连续出现的次数来得到。
比如,当第一训练图像整体类型和第二训练图像整体类型一致时,譬如都为符号均为正,那么可以将相邻历史批量训练中样本标签的正(负)训练图像连续出现的次数加1,得到当前批量训练中样本标签的正(负)训练图像连续出现的次数;也即t=t+1;如果当第一训练图像整体类型和第二训练图像整体类型不一致时,譬如都为一个符号为正、一个符号为负,那么当前批量训练中样本标签的正(负)训练图像连续出现的次数等于1,也即t=1。
例如,根据上述描述,对于第j类,如果在相邻批量训练图像中连续出现正样本或者连续不出现正样本(全是负样本),则对其交叉熵损失进行自适应衰减。在上述损失函数中,加入了一个自适应权重参数
Figure PCTCN2019110361-appb-000004
以实现交叉熵损失的自适应衰减,其中t表示正(负)样本连续出现的次数。
由于实际应用中,对于常见的大类别来说,其出现连续正样本的概率更大一些;罕见的小类别连续出现全部负样本的概率较高。因此,通过上述对相应标签如类别的交叉熵损失进行自适应衰减既可以削弱模型对大类别的过度拟合(基 于正样本的更新),又可以削弱模型对小类别的抑制(基于负样本的更新),从而对类别间的不均衡有所抑制,从而提升模型的准确率和视觉表现能力。
S105、根据交叉熵损失函数对每个目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
比如,对于训练图像的每一样本标签如类别可以获取其对应的交叉熵损失函数,然后,基于交叉熵损失函数对该训练图像的预测标签和样本标签进行收敛,以对模型的模型参数进行训练,得到训练后的模型。
具体地,在一实施例中,根据交叉熵损失函数获取训练图像的预测标签与样本标签的交叉熵损失;根据交叉熵损失对深度神经网络模型中的模型参数进行训练。
本申请实施例,可以采用反向传播算法,配合带有动量的随机梯度下降算法来训练模型;比如,可以根据交叉熵损失函数获取训练图像的预测标签与样本标签的交叉熵损失下降梯度(可以通过对损失函数求导得到),然后,基于交叉熵损失下降梯度对深度神经网络模型中的模型参数进行训练;具体地,可以基于交叉熵损失下降梯度以及模型参数对应的学习率(即模型参数所在层对应的学习率)对模型参数进行更新。
在一实施例中,在通过上述方式训练得到深度神经网络模型后,可以通过迁移学习,将多标签预测或输出的深度神经网络模型变为单标签预测或输出的分类模型,可以提升模型的通用性。比如,本申请实施例方法还可以包括:
将训练后深度神经网络模型的输出层中多个输出函数变为单标签分类器,得到变化后网络模型;
按照高层的学习率大于低层的学习率原则,对变化后网络模型中每层的学习率进行自适应调整,得到调整后网络模型;
根据单标签训练图像集训练调整后网络模型的模型参数,得到单标签图像分类模型。
通过上述方式可以对在多标签图像训练集ML-Images上训练到的多标签输出的ResNet-101模型进行学习迁移,使得在ML-Images上训练到的多标签输出的ResNet-101模型可以帮助其他视觉任务,比如单标签图像分类等。
具体地,可以替换在ML-Images上训练到的多标签输出的ResNet-101模型的输出层(即多个独立的Sigmoid函数),变为单标签分类器(即单个Softmax函数);然后,对变化后网络模型中每层的学习率进行层次自适应学习率微调;接着,在单标签训练图像集如ImageNet数据集对调整后网络模型的模型参数进行训练,得到单标签图像分类模型。其中,模型参数包括:单标签分类器(即单个Softmax函数)的参数、以及其他模型参数等等。
其中,层次自适应学习率微调方式即为按照高层的学习率大于低层的学习率原则,对变化后网络模型中每层的学习率进行自适应调整,具体地,将高层的学习率设置为大于底层。也即越靠近输出的层学习率越大。
通过上述学习迁移方式得到的单标签分类模型,相比传统方式得到的单标 签分类模型,缓解了多标签数据和单标签数据集差异带来的负面影响,具有性能优越、分类精度高、质量高等优点。
本申请实施例提供的模型训练方法可以适用于视觉相关的业务中。比如文章的图像质量评价与推荐,游戏内物体识别,通过本申请实施例方法训练出的模型均取得了很好的效果。此外,该模型还将为其他更广泛的视觉业务,包括图像理解和视频理解等,提供优秀的初始模型。
由上可知,本申请实施例可以获取多标签图像训练集,多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;从多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;采用所述模型对每个目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;获取每个所述目标训练图像的多个样本标签对应的交叉熵损失函数,交叉熵损失函数中的正标签损失设有权重,且权重大于1以使所述正标签的损失大于负标签的损失;根据交叉熵损失函数对每个目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。该方案可以采用带权重的交叉熵损失函数训练深度神经网络模型的模型参数,并且交叉熵函数中权重参数值大于1,因此,可以抑制类别内正负标签的不均衡问题,提升了模型的准确率和视觉表现力。
此外,该方案还可以通过交叉熵损失自适应衰减、负样本降采样的方式抑制类别不均衡的问题,进一步提升模型的准确率和视觉表现能力。
根据上述实施例所描述的方法,以下将举例作进一步详细说明。
在本实施例中,将以该模型训练装置具体集成在网络设备中为例进行说明。
首先,定义标签如类别对应的交叉熵损失函数,如下:
Figure PCTCN2019110361-appb-000005
其中,
Figure PCTCN2019110361-appb-000006
表示对第j类的后验概率,即预测概率。W表示模型的可训练参数的集合。y i∈{0,1} m表示第i个训练图像xi的给定标签向量(即第i个训练图像xi的样本标签组),如果第j个物体存在于该图像中,则yi的第j个元素为1,否则为0。m为多标签图像训练集的标签种数即类别数量。
其中,η为正标签损失的权重参数,其值表示正标签损失的权重。在实际应用中,优先设置η为12,可以抑制类别内的正负标签的不均衡。
其中,r t j为交叉熵损失衰减参数,在上述交叉熵损失函数中,加入了交叉熵损失自适应衰减参数(也可以称为交叉熵损失的自适应权重参数)
Figure PCTCN2019110361-appb-000007
其中,t表示当前批量训练图像中类别j的正(负)样本连续出现的次数。由公式可知,通过t便可以求得交叉熵损失自适应参数的值,也即通过t对交叉熵损失自适应参数进行更新。
然后,利用上述交叉熵损失函数对模型训练,具体地模型训练方法的流程,如图4所示,如下:
S401、网络设备获取多标签图像训练集,该多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签。
其中,多标签图像训练集可以包括至少一张标注了多个标签(如多个物体类别)的图像,该图像可以称为多标签图像。多标签图像训练集可以包括多张多标签图像,并且涵盖多个物体类别。比如可以为ML-Images多标签图像训练集。
S402、网络设备从多标签图像训练集中选择多个训练图像,作为当前的批量训练的目标训练图像。
网络设备可以采用多批量训练图像对模型进行训练,也即,每次可以多标签图像训练集中选择多张用于模型训练的目标训练图像。
在实际应用中,每个批次即每个批量训练所选的训练图像数量可以相同,比如均为100张,也可以不相同,如第一次选100张训练图像,第二次选400张训练图像。
S403、网络设备对每个所述目标训练图像的多个样本标签对应的交叉熵损失函数中交叉熵损失衰减参数进行更新。
比如,网络设备获取相邻批量训练中各个目标训练图像的每个样本标签的第一训练图像整体类型、以及具有与样本标签相同标签的训练图像连续出现的次数;获取当前批量训练中各个目标训练图像的每个样本标签的第二训练图像整体类型;将第一训练图像整体类型和第二训练图像整体类型进行比较,得到比较结果;根据比较结果和次数,获取当前批量训练中具有与样本标签相同标签的训练图像连续出现的目标次数;根据目标次数对交叉熵损失衰减参数进行更新,得到更新后交叉上损失函数。
其中,目标次数即为上述t,可以基于当前批量训练与相邻历史批量训练的图像整体类型比较结果、以及相邻历史批量训练中样本标签的正(负)训练图像连续出现的次数来得到。
具体地交叉熵衰减参数的更新可以参考上述实施例的描述,这里不再赘述。
S404、网络设备对当前的批量训练中的每个目标训练图像进行预处理。
其中,图像的预处理可以参考上述的描述,比如,可以从目标训练图像中提取相应的区域图像,将区域图像缩放到预定尺寸,得到缩放后图像;对缩放后图像进行随机扰动处理,等等。
S405、网络设备对当前批量训练中的各个目标训练图像进行负样本降采样。
为了抑制标签如类别间的正负训练图像的不均衡,提升模型准确率和视觉表现能力,还可以对负样本集负训练图像进行降采样等操作;具体地:
当深度神经网络模型包括输出层,输出层包括多个输出函数的情况下,当目标训练图像中的各个训练图像均为不具有样本标签的负训练图像时,根据预设处理概率对样本标签对应的输出函数中参数进行更新;
在目标训练图像中存在具有样本标签的正训练图像时,对目标训练图像中不具有样本标签的负训练图像进行随机降采样。
譬如,实际应用中,在每个批量训练图像中,对于每一类别,大多数图像 是负的,即该类别不存在于该图像中,甚至会出现所有图像对该类别都是负的情况。为了抑制数据不均衡,根据上述描述可以采取以下措施:
a)如果当前批量数据中对于某一类别全部为负,则该类别所对应的Sigmoid函数的参数依据概率0.1进行更新;
b)如果存在正图像,我们对负图像进行随机降采样,使得正负图像的比例不小于1:5。
S406、网络设备采用所述模型对每个目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签。
S407、网络设备根据每个所述目标训练图像的多个样本标签对应的交叉熵损失函数对每个目标训练图像的预测标签和样本标签进行收敛,更新所述深度神经网络模型的参数,得到训练后的模型。
本申请实施例可以通过不断选择对每个批次的训练图像,采用上述方式对模型参数进行训练,便可以得到训练后深度神经网络模型。
其中,交叉熵损失函数可以参考上述介绍。
其中,深度神经网络模型可以为基于深度学习网络如卷积神经网络的模型,比如,可以为ResNet(Residual Neural Network,残差神经网络)模型。具体地,残差网络的结构可以参考上述介绍。
本申请实施例中,可以采用反向传播算法,配合带有动量的随机梯度下降算法来训练模型;比如,可以根据交叉熵损失函数获取训练图像的预测标签与样本标签的交叉熵损失下降梯度(可以通过对损失函数求导得到),然后,基于交叉熵损失下降梯度对深度神经网络模型中的模型参数进行训练;具体地,可以基于交叉熵损失下降梯度以及模型参数对应的学习率(即模型参数所在层对应的学习率)对模型参数进行更新。
下面通过实验可以验证采用本申请实施例提供的模型训练方法具有准确率高、视觉表现能力高等优点。
首选,确定训练算法和超参数:采用常用的反向传播算法,配合带有动量的随机梯度下降算法来训练ResNet-101模型。训练超参数如下。批量图像数量4096。学习率采取warm-up策略,初始学习率为0.01,每个epoch乘以1.297,直到第9epoch学习率变为0.08,随后学习率每25epoch按照0.1的乘子进行衰减,直到第60个epoch。动量大小为0.9。在更新批量归一化(batch normalization)的参数时,移动平均的衰减因子为0.9,且在分母的方差上加1e-5以避免出现0方差。另外,还可对所有训练参数加上一个L2正则项,其权重参数为0.0001。
度量准则:为了验证在多标签数据即ML-Images训练出的ResNet-101模型的性能,可以在ML-Images的验证集上进行了测试,并采用三种常用的多标签度量准则,包括精准率,召回率和F1指数。由于每个Sigmoid函数的输出都是介于0和1之间的一个连续值,也即针对每个类别的后验概率,需要先把后验概率向量转换成一个二值向量,再进行度量。给定一个连续值的后验概率向量,可以将对应前k个最大值的元素设为1,表示预测为正标签,其他元素设为0,表示预测为 负标签。对于第i个测试图像,可以得到了一个二值预测向量
Figure PCTCN2019110361-appb-000008
采样的三种度量准则定义如下:
Figure PCTCN2019110361-appb-000009
Figure PCTCN2019110361-appb-000010
Figure PCTCN2019110361-appb-000011
实验结果:采用本申请实施例提供的模型训练方法展示了两种结果,分别为k=5和k=10。具体实验结果如下表所示。值得注意的是,各项指标的数值都不算太高,其主要原因在于:1)ML-Images中的标注本身是包含噪声的;2)对于很多类别来说,训练样本不足够(约5000类的训练图像不超过1000)。
  精准率 召回率 F1指数
Top-5预测结果 43.7% 22.9% 29.5%
Top-10预测结果 33.7% 35.6% 33.9%
表1
在一实施例中,在通过上述方式训练得到深度神经网络模型后,可以通过迁移学习,将多标签预测或输出的深度神经网络模型变为单标签预测或输出的分类模型,可以提升模型的通用性。
比如,网络设备可以将训练后深度神经网络模型的输出层中多个输出函数变为单标签分类器,得到变化后网络模型;按照高层的学习率大于低层的学习率原则,对变化后网络模型中每层的学习率进行自适应调整,得到调整后网络模型;根据单标签训练图像集训练调整后网络模型的模型参数,得到单标签图像分类模型。
具体地,可以替换在ML-Images上训练到的多标签输出的ResNet-101模型的输出层(即多个独立的Sigmoid函数),变为单标签分类器(即单个Softmax函数);然后,对变化后网络模型中每层的学习率进行层次自适应学习率微调;接着,在单标签训练图像集如ImageNet数据集对调整后网络模型的模型参数进行训练,得到单标签图像分类模型。其中,模型参数包括:单标签分类器(即单个Softmax函数)的参数、以及其他模型参数等等。
其中,层次自适应学习率微调方式即为按照高层的学习率大于低层的学习率原则,对变化后网络模型中每层的学习率进行自适应调整,具体地,将高层的学习率设置为大于底层。也即越靠近输出的层学习率越大。
通过上述方式可以对在多标签图像训练集ML-Images上训练到的多标签输出的ResNet-101模型进行学习迁移,使得在ML-Images上训练到的多标签输出的ResNet-101模型可以帮助其他视觉任务,比如单标签图像分类等。
下面通过实验来验证本申请提供的学习迁移方式的有效性和优点。
比如,可以设置了三种不同模型开展对比实验:
(1)、模型一:直接在ImageNet训练数据集上训练单标签输出的ResNet-101模型,并在ImageNet验证集上进行测试。
(2)、模型二:替换在ML-Images上训练到的多标签输出的ResNet-101模型的输出层(即多个独立的Sigmoid函数),变为单标签分类器(即单个Softmax函数),在ImageNet数据集上训练Softmax函数的参数,并对其他层参数进行学习率一致性微调(见下文)。
(3)、模型三:替换在ML-Images上训练到的多标签输出的ResNet-101模型的输出层(即多个独立的Sigmoid函数),变为单标签分类器(即单个Softmax函数),在ImageNet数据集上训练Softmax函数的参数,并对其他层参数进行层次自适应学习率微调(见下文)。
微调学习率:在深度神经网络的迁移学习中,对模型参数进行微调是非常重要且关键的步骤,既可以保留初始参数的视觉表达能力,又能根据原始数据集与目标数据集的差异进行调整。常用的微调算法的超参数设置为:给输出层参数设置一个较大的初始学习率,将其他所有层参数的学习率设为一个较小的值。因为除了输出层以外的学习率一致,所以将这种标准的微调算法称为学习率一致性微调算法。但是,考虑到预训练数据集ML-Images与目标数据集(即ImageNet)的差异性(包括图像和标注的差异),本申请实施例提出了层次自适应学习率的微调算法。具体而言,高层参数与训练数据集更相关,因此设置较大的学习率;底层参数表示低层视觉信息,与训练数据集的关系更弱,因此设置较小的学习率。
其他超参数设置:以上三种模型的超参数设置如下表所示:
Figure PCTCN2019110361-appb-000012
Figure PCTCN2019110361-appb-000013
表2
实验结果:实验结果以及与其他方法和第三方实现的对比如下表所示。本申请实现的模型1的性能超过MSRA和谷歌实现的模型1,说明本申请改进的ResNet模型性能比原始的ResNet模型性能优越,且本申请的模型实现质量较高。本申请实现的模型2的性能相比模型1,都有很大的性能下降,这说明了ML-Images与ImageNet的差异性。本申请实现的模型3则取得了最佳的性能,甚至超过了谷歌实现的模型2,这说明本申请提出的层次自适应学习率微调算法可以有效缓解数据集之间的差异。值得注意的是,谷歌的模型2是在包含3亿图像的JFT-300M数据集上预训练的,而ML-Images只包含1800万图像。本申请只利用了约1/17的数据量,就超越了谷歌的性能,充分说明了本申请的模型实现和训练算法的有效性。
Figure PCTCN2019110361-appb-000014
表3
本申请实施例提供的模型训练方法可以适用于视觉相关的业务中。比如文章的图像质量评价与推荐,游戏内物体识别,通过本申请实施例方法训练出的模型均取得了很好的效果。此外,该模型还将为其他更广泛的视觉业务,包括图像理解和视频理解等,提供优秀的初始模型。
由上可知,本申请实施例可以在交叉熵损失函数中增加正标签损失的权重,且权重大于1;根据交叉熵损失函数对目标训练图像的预测标签和样本标签进行收敛,得到训练后深度神经网络模型。该方案可以采用带权重的交叉熵损失函数训练深度神经网络模型的模型参数,并且交叉熵函数中权重参数值大于1,因此,可以抑制类别内正负标签的不均衡问题,提升了模型的准确率和视觉表现力。此外,该方案还可以通过交叉熵损失自适应衰减、负样本降采样的方式抑制类别不均衡的问题,进一步提升模型的准确率和视觉表现能力。
为了更好地实施以上方法,本申请实施例还提供一种模型训练装置,该模型训练装置具体可以集成在网络设备如终端或服务器等设备中,该终端可以包括手机、平板电脑、笔记本电脑或PC等设备。
例如,如图5A所示,该模型训练装置可以包括图像获取单元501、选择单元502、预测单元503、函数获取单元504、以及训练单元505,如下:
图像获取单元501,用于获取多标签图像训练集,所述多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;
选择单元502,用于从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;
预测单元503,用于采用所述模型对每个所述目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;
函数获取单元504,用于获取每个所述目标训练图像的多个样本标签对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;
训练单元505,用于根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
在一实施例中,参考图5B,所述交叉熵损失函数还包括:交叉熵损失衰减参数;模型训练装置还可以包括:第一类型获取单元506、第二类型获取单元507以及参数更新单元508;
所述选择单元502,可以具体用于:将选择的所述多个训练图像,作为当前批次的目标训练图像;
第一类型获取单元506,用于在所述训练单元505根据所述交叉熵损失函数对所述目标训练图像的预测标签和样本标签进行收敛之前,获取相邻批次的批次的各个目标训练图像的每个样本标签的第一训练图像整体类型、以及具有与所述样本标签相同的标签的训练图像连续出现的次数,所述每个样本标签对应的第一训练图像整体类型用于指示相邻批次的各个目标训练图像中,是否存在具有与所述样本标签相同的标签的一个或多个连续的训练图像;
第二类型获取单元507,用于获取当前批次的各个目标训练图像的每个样本标签对应的第二训练图像整体类型,所述每个样本标签对应的所述第二训练图像整体类型用于指示当前批次的各个目标训练图像中,是否存在具有与所述样本标签相同的标签的一个或多个连续的训练图像;
参数更新单元508,用于根据所述第一训练图像整体类型、第二训练图像整体类型以及所述次数,对所述交叉熵损失衰减参数进行更新。
在一实施例中,所述参数更新单元508,可以具体用于:
将第一训练图像整体类型和所述第二训练图像整体类型进行比较,得到比较结果;
根据所述比较结果和所述次数,获取当前批量训练中样本标签的当前训练图像连续出现的目标次数;
根据所述目标次数对所述交叉熵损失衰减参数进行更新,得到更新后交叉熵损失函数。
在一实施例中,所述深度神经网络模型包括输出层,所述输出层包括多个 输出函数;所述预测单元503,可以具体用于:针对每个所述目标训练图像的每个样本标签:
当所述目标训练图像均为不具有该样本标签的负训练图像时,根据预设处理概率对所述样本标签对应的输出函数中的参数进行更新,得到更新后的模型;
采用所述更新后的模型对各个所述目标训练图像进行标签预测,得到各个目标训练图像的多个预测标签。
在一实施例中,所述深度神经网络模型包括输出层,所述输出层包括多个输出函数;所述预测单元503,可以具体用于:针对每个所述目标训练图像的每个样本标签:
当各个所述目标训练图像中存在具有该样本标签的正训练图像时,对各个所述目标训练图像中不具有所述样本标签的负训练图像进行随机降采样,得到降采样后的目标训练图像;
采用所述模型对所述降采样后的目标训练图像对所述模型进行标签预测,得到各个目标训练图像的多个预测标签。
在一实施例中,所述预测单元503,可以具体用于:根据样本标签对应的预设正负训练图像比例,对各个所述目标训练图像中不具有所述样本标签的负训练图像进行随机降采样。
在一实施例中,参考图5C,模型训练装置还可以包括:预处理单元509;
所述预处理单元509,可以具体用于:
从所述目标训练图像中提取相应的区域图像;
将所述区域图像缩放到预定尺寸,得到缩放后图像;
对所述缩放后图像进行随机扰动处理,得到预处理后的训练图像;
此时,所述预测单元503,可以具体用于:采用模型对预处理后的每个训练图像进行标签预测。
在一实施例中,预处理单元509对所述缩放后图像进行随机扰动处理,可以包括:
根据第一处理概率对缩放后图像进行水平翻转处理,得到翻转后图像;
根据第二处理概率对所述翻转后图像进行随机角度的旋转处理,得到旋转后图像,所述随机角度为从预定角度区间中随机选取的角度;
根据第三处理概率分别对旋转后图像的属性进行扰动处理,得到处理后图像;
将所述处理后图像的像素值缩放到预设像素值范围内。
在一实施例中,所述深度神经网络模型包括深度残差网络模型;所述深度残差网络模型包括顺次相连的多个残差块,每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度
在一实施例中,参考图5D,模型训练装置还可以包括:迁移学习单元510;
所述迁移学习单元510,可以具体用于:
将训练后的模型的输出层中多个输出函数变为单标签分类器,得到变化后网络模型;
按照高层的学习率大于低层的学习率原则,对所述变化后网络模型中每层的学习率进行自适应调整,得到调整后网络模型;
根据单标签训练图像集训练调整后网络模型的模型参数,得到单标签图像分类模型。
在一些实施例中,训练单元505根据所述交叉熵损失函数获取每个所述目标训练图像的预测标签与样本标签的交叉熵损失下降梯度;
基于所述交叉熵损失下降梯度对所述模型中的模型参数进行训练,并更新所述模型中的模型参数,以得到训练后的模型。
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
由上可知,本实施例的模型训练装置通过图像获取单元501获取多标签图像训练集;由选择单元502从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;由预测单元503采用所述模型对每个所述目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;由函数获取单元504获取每个所述目标训练图像的多个样本标签对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;由训练单元505根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。该方案可以采用带权重的交叉熵损失函数训练用于图像识别模型的模型参数,并且交叉熵函数中权重参数值大于1,因此,可以抑制类别内正负标签的不均衡问题,提升了模型的准确率和视觉表现力。
本申请实施例还提供一种网络设备,该网络设备可以为服务器或终端等设备。如图6所示,其示出了本申请实施例所涉及的网络设备的结构示意图,具体来讲:
该网络设备可以包括一个或者一个以上处理核心的处理器601、一个或一个以上计算机可读存储介质的存储器602、电源603和输入单元604等部件。本领域技术人员可以理解,图6中示出的网络设备结构并不构成对网络设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器601是该网络设备的控制中心,利用各种接口和线路连接整个网络设备的各个部分,通过运行或执行存储在存储器602内的软件程序和/或模块,以及调用存储在存储器602内的数据,执行网络设备的各种功能和处理数据,从而对网络设备进行整体监控。处理器601可包括一个或多个处理核心;处理器601可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述 调制解调处理器也可以不集成到处理器601中。
存储器602可用于存储软件程序以及模块,处理器601通过运行存储在存储器602的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器602可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据网络设备的使用所创建的数据等。此外,存储器602可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器602还可以包括存储器控制器,以提供处理器601对存储器602的访问。
网络设备还包括给各个部件供电的电源603,电源603可以通过电源管理系统与处理器601逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源603还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该网络设备还可包括输入单元604,该输入单元604可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,网络设备还可以包括显示单元等,在此不再赘述。具体在本实施例中,网络设备中的处理器601会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器602中,并由处理器601来运行存储在存储器602中的应用程序,从而实现各种功能,如下:
获取多标签图像训练集,所述多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;采用所述模型对每个所述目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;获取每个所述目标训练图像的多个样本标签对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
由上可知,本实施例的网络设备可以获取多标签图像训练集;从多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;采用所述模型对每个目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;获取每个所述目标训练图像的多个样本标签对应的交叉熵损失函数,交叉熵损失函数中的正标签损失设有权重,且权重大于1以使所述正标签的损失大于负标签的损失;根据交叉熵损失函数对每个目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。该方案可以采用带权重的交叉熵损失函数训练用于图像识别的模型参数,并且交叉熵函数中权重参数值大于1,因此,可以抑制类别内正负标签的不均衡问题,提升了模型的准确率和视 觉表现力。
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。
为此,本申请实施例提供一种存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本申请实施例所提供的任一种模型训练方法中的步骤。例如,该指令可以执行如下步骤:
获取多标签图像训练集,所述多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;采用所述模型对每个所述目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;获取每个所述目标训练图像的多个样本标签对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
由于该存储介质中所存储的指令,可以执行本申请实施例所提供的任一种模型训练方法中的步骤,因此,可以实现本申请实施例所提供的任一种模型训练方法所能实现的有益效果,详见前面的实施例,在此不再赘述。
以上对本申请实施例所提供的一种模型训练方法、装置和存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种用于图像识别的模型的训练方法,由网络设备执行,包括:
    获取多标签图像训练集,所述多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;
    从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;
    采用所述模型对每个所述目标训练图像进行标签预测,得到每个所述目标训练图像的多个预测标签;
    获取每个所述目标训练图像的多个样本标签所对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;
    根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
  2. 如权利要求1所述的模型训练方法,其中,所述交叉熵损失函数还包括:交叉熵损失衰减参数;
    所述选择的多个训练图像为当前批次的目标训练图像;
    在根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛之前,所述方法还包括:
    对于相邻批次的各个目标训练图像的每个样本标签,获取所述样本标签对应的第一训练图像整体类型、以及具有与所述样本标签相同的标签的训练图像连续出现的次数,所述样本标签对应的第一训练图像整体类型用于指示相邻批次的各个目标训练图像中,是否存在具有与所述样本标签相同的标签的一个或多个连续的训练图像;
    获取当前批次的各个目标训练图像的每个样本标签对应的第二训练图像整体类型,所述每个样本标签对应的所述第二训练图像整体类型用于指示当前批次的各个目标训练图像中,是否存在具有与所述样本标签相同的标签的一个或多个连续的训练图像;
    根据所述第一训练图像整体类型、第二训练图像整体类型以及所述次数,对所述交叉熵损失衰减参数进行更新。
  3. 如权利要求2所述的模型训练方法,其中,根据所述第一训练图像整体类型、第二训练图像整体类型以及所述次数,对所述交叉熵损失衰减参数进行更新,包括:
    将第一训练图像整体类型和所述第二训练图像整体类型进行比较,得到比较结果;
    根据所述比较结果和所述次数,获取当前批量训练中样本标签的当前训练图像连续出现的目标次数;
    根据所述目标次数对所述交叉熵损失衰减参数进行更新,得到更新后交叉 熵损失函数。
  4. 如权利要求1所述的模型训练方法,其中,所述模型包括输出层,所述输出层包括多个输出函数;
    采用所述模型对每个所述目标训练图像进行标签预测,得到每个所述目标训练图像的多个预测标签,包括:
    针对每个所述目标训练图像的每个样本标签:
    当各个所述目标训练图像均为不具有该样本标签的负训练图像时,根据预设处理概率对所述样本标签对应的输出函数中的参数进行更新,得到更新后的模型;
    采用所述更新后的模型对各个所述目标训练图像进行标签预测,得到各个目标训练图像的多个预测标签。
  5. 如权利要求1所述的模型训练方法,其中,采用所述模型对每个所述目标训练图像进行标签预测,得到每个所述目标训练图像的多个预测标签,包括:
    针对每个所述目标训练图像的每个样本标签:
    当各个所述目标训练图像中存在具有该样本标签的正训练图像时,对各个所述目标训练图像中不具有所述样本标签的负训练图像进行随机降采样,得到降采样后的目标训练图像;
    采用所述模型对所述降采样后的目标训练图像进行标签预测,得到各个目标训练图像的多个预测标签。
  6. 如权利要求5所述的模型训练方法,其中,对所述各个目标训练图像中不具有所述样本标签的负训练图像进行随机降采样,包括:根据所述样本标签对应的预设正负训练图像比例,对各个所述目标训练图像中不具有所述样本标签的负训练图像进行随机降采样。
  7. 如权利要求1所述的模型训练方法,其中,在采用所述模型对所述目标训练图像进行标签预测之前,所述方法还包括:
    从所述目标训练图像中提取相应的区域图像;
    将所述区域图像缩放到预定尺寸,得到缩放后图像;
    对所述缩放后图像进行随机扰动处理,得到预处理后的训练图像;
    其中,所述模型对每个所述目标训练图像进行标签预测,包括:采用所述模型对预处理后的每个训练图像进行标签预测。
  8. 如权利要求7所述的模型训练方法,其中,对所述缩放后图像进行随机扰动处理,包括:
    根据第一处理概率对缩放后图像进行水平翻转处理,得到翻转后图像;
    根据第二处理概率对所述翻转后图像进行随机角度的旋转处理,得到旋转后图像,所述随机角度为从预定角度区间中随机选取的角度;
    根据第三处理概率分别对旋转后图像的属性进行扰动处理,得到处理后图像;
    将所述处理后图像的像素值缩放到预设像素值范围内。
  9. 如权利要求1所述的模型训练方法,其中,所述模型包括深度残差网络模型;所述深度残差网络模型包括顺次相连的多个残差块,每个所述残差块包含卷积分支和残差分支,所述卷积分支中的第一卷积层的卷积核大小小于位于所述第一卷积层之后的第二卷积层的卷积核大小,所述第二卷积层的卷积步长大于所述第一卷积层的卷积步长且小于所述第二卷积层的卷积核宽度。
  10. 如权利要求1所述的模型训练方法,其中,还包括:
    将训练后的模型的输出层中多个输出函数变为单标签分类器,得到变化后网络模型;
    按照高层的学习率大于低层的学习率原则,对所述变化后网络模型中每层的学习率进行自适应调整,得到调整后网络模型;
    根据单标签训练图像集训练调整后网络模型的模型参数,得到单标签图像分类模型。
  11. 如权利要求1所述的模型训练方法,其中,所述根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型,包括:
    根据所述交叉熵损失函数获取每个所述目标训练图像的预测标签与样本标签的交叉熵损失下降梯度;
    基于所述交叉熵损失下降梯度对所述模型中的模型参数进行训练,并更新所述模型中的模型参数,以得到训练后的模型。
  12. 一种用于图像识别的模型的训练装置,包括:
    图像获取单元,用于获取多标签图像训练集,所述多标签图像训练集包括多个训练图像,每个训练图像标注了多个样本标签;
    选择单元,用于从所述多标签图像训练集中选择多个训练图像,作为用于训练当前的模型的目标训练图像;
    预测单元,用于采用所述模型对每个所述目标训练图像进行标签预测,得到每个目标训练图像的多个预测标签;
    函数获取单元,用于获取每个所述目标训练图像的多个样本标签所对应的交叉熵损失函数,所述交叉熵损失函数中的正标签损失设有权重,且所述权重大于1以使所述正标签的损失大于负标签的损失;
    训练单元,用于根据所述交叉熵损失函数对每个所述目标训练图像的预测标签和样本标签进行收敛,更新所述模型的参数,得到训练后的模型。
  13. 一种网络设备,包括处理器;
    与所述处理器相连接的存储器;所述存储器中存储有机器可读指令,所述机器可读指令可以由处理器执行以完成上述权利要求1至11中任一项所述的方法。
  14. 一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1至11任一项所述的方法。
PCT/CN2019/110361 2018-10-10 2019-10-10 用于图像识别的模型的训练方法、装置、网络设备和存储介质 WO2020073951A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/083,180 US20210042580A1 (en) 2018-10-10 2020-10-28 Model training method and apparatus for image recognition, network device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811180282.2 2018-10-10
CN201811180282.2A CN110163234B (zh) 2018-10-10 2018-10-10 一种模型训练方法、装置和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/083,180 Continuation US20210042580A1 (en) 2018-10-10 2020-10-28 Model training method and apparatus for image recognition, network device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020073951A1 true WO2020073951A1 (zh) 2020-04-16

Family

ID=67645007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/110361 WO2020073951A1 (zh) 2018-10-10 2019-10-10 用于图像识别的模型的训练方法、装置、网络设备和存储介质

Country Status (3)

Country Link
US (1) US20210042580A1 (zh)
CN (1) CN110163234B (zh)
WO (1) WO2020073951A1 (zh)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461265A (zh) * 2020-05-27 2020-07-28 东北大学 基于粗-细粒度多图多标签学习的场景图像标注方法
CN111523483A (zh) * 2020-04-24 2020-08-11 北京邮电大学 中餐菜品图像识别方法及装置
CN111768005A (zh) * 2020-06-19 2020-10-13 北京百度网讯科技有限公司 轻量级检测模型的训练方法、装置、电子设备及存储介质
CN111860278A (zh) * 2020-07-14 2020-10-30 陕西理工大学 一种基于深度学习的人体行为识别算法
CN111860573A (zh) * 2020-06-04 2020-10-30 北京迈格威科技有限公司 模型训练方法、图像类别检测方法、装置和电子设备
CN111931900A (zh) * 2020-05-29 2020-11-13 西安电子科技大学 基于残差网络与多尺度特征融合的gis放电波形检测方法
CN111950647A (zh) * 2020-08-20 2020-11-17 连尚(新昌)网络科技有限公司 分类模型训练方法和设备
CN112131961A (zh) * 2020-08-28 2020-12-25 中国海洋大学 一种基于单样本的半监督行人重识别方法
CN112434729A (zh) * 2020-11-09 2021-03-02 西安交通大学 一种类不平衡样本下基于层再生网络的故障智能诊断方法
CN113033436A (zh) * 2021-03-29 2021-06-25 京东鲲鹏(江苏)科技有限公司 障碍物识别模型训练方法及装置、电子设备、存储介质
CN113065066A (zh) * 2021-03-31 2021-07-02 北京达佳互联信息技术有限公司 预测方法、装置、服务器及存储介质
CN113192108A (zh) * 2021-05-19 2021-07-30 西安交通大学 一种针对视觉跟踪模型的人在回路训练方法及相关装置
CN113436184A (zh) * 2021-07-15 2021-09-24 南瑞集团有限公司 基于改进孪生网络的电力设备图像缺陷判别方法及系统
CN113449613A (zh) * 2021-06-15 2021-09-28 北京华创智芯科技有限公司 多任务长尾分布图像识别方法、系统、电子设备及介质
CN113658109A (zh) * 2021-07-22 2021-11-16 西南财经大学 一种基于领域损失预测主动学习的玻璃缺陷检测方法
CN113724128A (zh) * 2020-05-25 2021-11-30 Tcl科技集团股份有限公司 一种训练样本的扩充方法
CN113887561A (zh) * 2021-09-03 2022-01-04 广东履安实业有限公司 一种基于数据分析的人脸识别方法、设备、介质、产品
CN114118339A (zh) * 2021-11-12 2022-03-01 吉林大学 基于布谷鸟算法改进ResNet的无线电调制信号识别分类方法
CN114266012A (zh) * 2021-12-21 2022-04-01 浙江大学 基于WiFi的非接触式博物馆多区域观众计数方法
CN114580484A (zh) * 2022-04-28 2022-06-03 西安电子科技大学 一种基于增量学习的小样本通信信号自动调制识别方法
CN114624768A (zh) * 2020-12-14 2022-06-14 中国石油化工股份有限公司 训练地震初至拾取模型的方法及装置
CN114743043A (zh) * 2022-03-15 2022-07-12 北京迈格威科技有限公司 一种图像分类方法、电子设备、存储介质及程序产品
CN115240078A (zh) * 2022-06-24 2022-10-25 安徽大学 一种基于轻量化元学习的sar图像小样本目标检测方法
TWI785739B (zh) * 2020-08-21 2022-12-01 大陸商北京市商湯科技開發有限公司 目標模型的獲取方法、電子設備與儲存媒體
CN116051486A (zh) * 2022-12-29 2023-05-02 抖音视界有限公司 内窥镜图像识别模型的训练方法、图像识别方法及装置
CN116612338A (zh) * 2023-07-21 2023-08-18 华中科技大学 基于网络状态索引卷积神经网络集的图像识别方法及系统
CN117173548A (zh) * 2023-08-10 2023-12-05 中国地质大学(武汉) 一种海底地貌智能分类模型构建方法、装置及分类方法
CN111860573B (zh) * 2020-06-04 2024-05-10 北京迈格威科技有限公司 模型训练方法、图像类别检测方法、装置和电子设备

Families Citing this family (102)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163234B (zh) * 2018-10-10 2023-04-18 腾讯科技(深圳)有限公司 一种模型训练方法、装置和存储介质
US11074456B2 (en) * 2018-11-14 2021-07-27 Disney Enterprises, Inc. Guided training for automation of content annotation
CN110619283B (zh) * 2019-08-26 2023-01-10 海南撰云空间信息技术有限公司 一种无人机正射影像道路自动提取方法
CN110516098A (zh) * 2019-08-26 2019-11-29 苏州大学 基于卷积神经网络及二进制编码特征的图像标注方法
CN110598603A (zh) * 2019-09-02 2019-12-20 深圳力维智联技术有限公司 人脸识别模型获取方法、装置、设备和介质
CN110598620B (zh) * 2019-09-06 2022-05-06 腾讯科技(深圳)有限公司 基于深度神经网络模型的推荐方法和装置
CN112529146B (zh) * 2019-09-18 2023-10-17 华为技术有限公司 神经网络模型训练的方法和装置
CN110599492B (zh) * 2019-09-19 2024-02-06 腾讯科技(深圳)有限公司 图像分割模型的训练方法、装置、电子设备及存储介质
CN110738263B (zh) * 2019-10-17 2020-12-29 腾讯科技(深圳)有限公司 一种图像识别模型训练的方法、图像识别的方法及装置
CN110879917A (zh) * 2019-11-08 2020-03-13 北京交通大学 一种基于迁移学习的电力系统暂态稳定自适应评估方法
CN110929847A (zh) * 2019-11-15 2020-03-27 国网浙江省电力有限公司电力科学研究院 一种基于深度卷积神经网络的换流变压器故障诊断方法
CN112825143A (zh) * 2019-11-20 2021-05-21 北京眼神智能科技有限公司 深度卷积神经网络压缩方法、装置、存储介质及设备
CN110889450B (zh) * 2019-11-27 2023-08-11 腾讯科技(深圳)有限公司 超参数调优、模型构建方法和装置
CN111260665B (zh) * 2020-01-17 2022-01-21 北京达佳互联信息技术有限公司 图像分割模型训练方法和装置
CN111460909A (zh) * 2020-03-09 2020-07-28 兰剑智能科技股份有限公司 基于视觉的货位管理方法和装置
CN111476285B (zh) * 2020-04-01 2023-07-28 深圳力维智联技术有限公司 一种图像分类模型的训练方法及图像分类方法、存储介质
CN111488985B (zh) * 2020-04-08 2023-11-14 华南理工大学 深度神经网络模型压缩训练方法、装置、设备、介质
CN111597887B (zh) * 2020-04-08 2023-02-03 北京大学 一种行人再识别方法及系统
CN111507419B (zh) * 2020-04-22 2022-09-30 腾讯科技(深圳)有限公司 图像分类模型的训练方法及装置
CN111523596B (zh) * 2020-04-23 2023-07-04 北京百度网讯科技有限公司 目标识别模型训练方法、装置、设备以及存储介质
CN113642592A (zh) * 2020-04-27 2021-11-12 武汉Tcl集团工业研究院有限公司 一种训练模型的训练方法、场景识别方法、计算机设备
CN111582342B (zh) * 2020-04-29 2022-08-26 腾讯科技(深圳)有限公司 一种图像识别方法、装置、设备以及可读存储介质
CN111639755B (zh) * 2020-06-07 2023-04-25 电子科技大学中山学院 一种网络模型训练方法、装置、电子设备及存储介质
CN111798414A (zh) * 2020-06-12 2020-10-20 北京阅视智能技术有限责任公司 显微图像的清晰度确定方法、装置、设备及存储介质
CN111696101A (zh) * 2020-06-18 2020-09-22 中国农业大学 一种基于SE-Inception的轻量级茄科病害识别方法
CN111738436B (zh) * 2020-06-28 2023-07-18 电子科技大学中山学院 一种模型蒸馏方法、装置、电子设备及存储介质
CN111898619A (zh) * 2020-07-13 2020-11-06 上海眼控科技股份有限公司 图片特征提取方法、装置、计算机设备和可读存储介质
CN111814736B (zh) * 2020-07-23 2023-12-29 上海东普信息科技有限公司 快递面单信息的识别方法、装置、设备及存储介质
US11272097B2 (en) * 2020-07-30 2022-03-08 Steven Brian Demers Aesthetic learning methods and apparatus for automating image capture device controls
CN112016622A (zh) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 模型训练的方法、电子设备和计算机可读存储介质
CN112070027B (zh) * 2020-09-09 2022-08-26 腾讯科技(深圳)有限公司 网络训练、动作识别方法、装置、设备及存储介质
CN112149738B (zh) * 2020-09-24 2021-04-27 北京建筑大学 一种用于改善图像识别模型领域变换现象的方法
CN112488173B (zh) * 2020-11-26 2022-09-27 华南师范大学 基于图像増广的模型训练方法、系统和存储介质
CN112257855B (zh) * 2020-11-26 2022-08-16 Oppo(重庆)智能科技有限公司 一种神经网络的训练方法及装置、电子设备及存储介质
CN112396123A (zh) * 2020-11-30 2021-02-23 上海交通大学 基于卷积神经网络的图像识别方法、系统、终端和介质
CN112508078A (zh) * 2020-12-02 2021-03-16 携程旅游信息技术(上海)有限公司 图像多任务多标签识别方法、系统、设备及介质
CN112614571B (zh) * 2020-12-24 2023-08-18 中国科学院深圳先进技术研究院 神经网络模型的训练方法、装置、图像分类方法和介质
CN112734035B (zh) * 2020-12-31 2023-10-27 成都佳华物链云科技有限公司 一种数据处理方法及装置、可读存储介质
CN112699945B (zh) * 2020-12-31 2023-10-27 青岛海尔科技有限公司 数据标注方法及装置、存储介质及电子装置
CN112699962A (zh) * 2021-01-13 2021-04-23 福州大学 一种在边缘节点上部署二值化分类网络的方法
CN112784984A (zh) * 2021-01-29 2021-05-11 联想(北京)有限公司 一种模型训练方法及装置
CN112949688A (zh) * 2021-02-01 2021-06-11 哈尔滨市科佳通用机电股份有限公司 一种动车组底板胶皮破损故障检测方法、系统及装置
CN113159275A (zh) * 2021-03-05 2021-07-23 深圳市商汤科技有限公司 网络训练方法、图像处理方法、装置、设备及存储介质
CN112948897B (zh) * 2021-03-15 2022-08-26 东北农业大学 一种基于drae与svm相结合的网页防篡改检测方法
CN112668675B (zh) * 2021-03-22 2021-06-22 腾讯科技(深圳)有限公司 一种图像处理方法、装置、计算机设备及存储介质
CN113033660B (zh) * 2021-03-24 2022-08-02 支付宝(杭州)信息技术有限公司 一种通用小语种检测方法、装置以及设备
CN113762502B (zh) * 2021-04-22 2023-09-19 腾讯科技(深圳)有限公司 神经网络模型的训练方法及装置
CN113221964B (zh) * 2021-04-22 2022-06-24 华南师范大学 单样本图像分类方法、系统、计算机设备及存储介质
CN113240081B (zh) * 2021-05-06 2022-03-22 西安电子科技大学 针对雷达载频变换的高分辨距离像目标稳健识别方法
CN113128472B (zh) * 2021-05-17 2022-09-20 北京邮电大学 一种基于智能协同学习的多标签标注方法
CN113139076B (zh) * 2021-05-20 2024-03-29 广东工业大学 一种深度特征学习多标签的神经网络影像自动标记方法
CN113298135B (zh) * 2021-05-21 2023-04-18 小视科技(江苏)股份有限公司 基于深度学习的模型训练方法、装置、存储介质及设备
CN113240027A (zh) * 2021-05-24 2021-08-10 北京有竹居网络技术有限公司 图像分类方法、装置、可读介质及电子设备
CN113222050B (zh) * 2021-05-26 2024-05-03 北京有竹居网络技术有限公司 图像分类方法、装置、可读介质及电子设备
CN113094758B (zh) * 2021-06-08 2021-08-13 华中科技大学 一种基于梯度扰动的联邦学习数据隐私保护方法及系统
CN113506245A (zh) * 2021-06-11 2021-10-15 国网浙江省电力有限公司嘉兴供电公司 一种基于深度残差网络的高压断路器机械性能评估方法
CN113221855B (zh) * 2021-06-11 2023-04-07 中国人民解放军陆军炮兵防空兵学院 基于尺度敏感损失与特征融合的小目标检测方法和系统
CN113111872B (zh) * 2021-06-16 2022-04-05 智道网联科技(北京)有限公司 图像识别模型的训练方法、装置及电子设备、存储介质
CN113111979B (zh) * 2021-06-16 2021-09-07 上海齐感电子信息科技有限公司 模型训练方法、图像检测方法及检测装置
CN113255701B (zh) * 2021-06-24 2021-10-22 军事科学院系统工程研究院网络信息研究所 一种基于绝对-相对学习架构的小样本学习方法和系统
CN113378971A (zh) * 2021-06-28 2021-09-10 燕山大学 近红外光谱的分类模型训练方法、系统及分类方法、系统
CN113469025A (zh) * 2021-06-29 2021-10-01 阿波罗智联(北京)科技有限公司 应用于车路协同的目标检测方法、装置、路侧设备和车辆
CN113255601B (zh) * 2021-06-29 2021-11-12 深圳市安软科技股份有限公司 一种车辆重识别模型的训练方法、系统及相关设备
CN113569913B (zh) * 2021-06-29 2023-04-25 西北大学 基于分层选择性Adaboost-DNNs的图像分类模型建立、分类方法及系统
CN113409194B (zh) * 2021-06-30 2024-03-22 上海汽车集团股份有限公司 泊车信息获取方法及装置、泊车方法及装置
CN113627073B (zh) * 2021-07-01 2023-09-19 武汉大学 一种基于改进的Unet++网络的水下航行器流场结果预测方法
CN113554078B (zh) * 2021-07-13 2023-10-17 浙江大学 一种基于对比类别集中提升连续学习下图分类精度的方法
CN113724197B (zh) * 2021-07-26 2023-09-15 南京邮电大学 基于元学习的螺纹旋合性判定方法
CN113537630A (zh) * 2021-08-04 2021-10-22 支付宝(杭州)信息技术有限公司 业务预测模型的训练方法及装置
CN113673591B (zh) * 2021-08-13 2023-12-01 上海交通大学 一种自调整采样优化的图像分类方法、设备及介质
CN113807194B (zh) * 2021-08-24 2023-10-10 哈尔滨工程大学 一种增强性电力传输线故障图像识别方法
CN113421192B (zh) * 2021-08-24 2021-11-19 北京金山云网络技术有限公司 对象统计模型的训练方法、目标对象的统计方法和装置
CN114332578A (zh) * 2021-09-15 2022-04-12 广州腾讯科技有限公司 图像异常检测模型训练方法、图像异常检测方法和装置
CN114283290B (zh) * 2021-09-27 2024-05-03 腾讯科技(深圳)有限公司 图像处理模型的训练、图像处理方法、装置、设备及介质
CN113723378B (zh) * 2021-11-02 2022-02-08 腾讯科技(深圳)有限公司 一种模型训练的方法、装置、计算机设备和存储介质
CN114445767B (zh) * 2021-11-08 2024-02-20 山东科技大学 一种传输带传输异物检测方法及系统
CN113869333B (zh) * 2021-11-29 2022-03-25 山东力聚机器人科技股份有限公司 基于半监督关系度量网络的图像识别方法及装置
WO2023108317A1 (zh) * 2021-12-13 2023-06-22 中国科学院深圳先进技术研究院 基于信息熵最大化正则机制的深度多标签分类网络鲁棒训练方法
CN114219026A (zh) * 2021-12-15 2022-03-22 中兴通讯股份有限公司 数据处理方法及其装置、计算机可读存储介质
CN114722189B (zh) * 2021-12-15 2023-06-23 南京审计大学 一种预算执行审计中多标记不平衡文本分类方法
CN114139656B (zh) * 2022-01-27 2022-04-26 成都橙视传媒科技股份公司 一种基于深度卷积分析的图片归类方法及播控平台
WO2023159527A1 (zh) * 2022-02-25 2023-08-31 京东方科技集团股份有限公司 检测器训练方法、装置及存储介质
CN114565827A (zh) * 2022-04-29 2022-05-31 深圳爱莫科技有限公司 基于图像识别的卷烟陈列防作弊检测方法及模型训练方法
CN114841970B (zh) * 2022-05-09 2023-07-18 抖音视界有限公司 检查图像的识别方法、装置、可读介质和电子设备
CN114743081B (zh) * 2022-05-10 2023-06-20 北京瑞莱智慧科技有限公司 模型训练方法、相关装置及存储介质
CN115082955B (zh) * 2022-05-12 2024-04-16 华南理工大学 一种深度学习全局优化方法、识别方法、装置及介质
CN115170795B (zh) * 2022-05-13 2023-03-21 深圳大学 一种图像小目标分割方法、装置、终端及存储介质
CN114998960B (zh) * 2022-05-28 2024-03-26 华南理工大学 一种基于正负样本对比学习的表情识别方法
CN114937288B (zh) * 2022-06-21 2023-05-26 四川大学 一种非典型类数据集平衡方法、装置、介质
CN115082736A (zh) * 2022-06-23 2022-09-20 平安普惠企业管理有限公司 垃圾识别分类方法、装置、电子设备及存储介质
CN115294644A (zh) * 2022-06-24 2022-11-04 北京昭衍新药研究中心股份有限公司 一种基于3d卷积参数重构的快速猴子行为识别方法
CN114863224B (zh) * 2022-07-05 2022-10-11 深圳比特微电子科技有限公司 训练方法、图像质量检测方法、装置和介质
CN115292728B (zh) * 2022-07-15 2023-08-04 浙江大学 一种基于生成对抗网络的图像数据隐私保护方法
CN115310547B (zh) * 2022-08-12 2023-11-17 中国电信股份有限公司 模型训练方法、物品识别方法及装置、电子设备、介质
CN115511012B (zh) * 2022-11-22 2023-04-07 南京码极客科技有限公司 一种最大熵约束的类别软标签识别训练方法
CN115546614B (zh) * 2022-12-02 2023-04-18 天津城建大学 一种基于改进yolov5模型的安全帽佩戴检测方法
CN115841596B (zh) * 2022-12-16 2023-09-15 华院计算技术(上海)股份有限公司 多标签图像分类方法及其模型的训练方法、装置
CN116070120B (zh) * 2023-04-06 2023-06-27 湖南归途信息科技有限公司 一种多标签时序电生理信号的自动识别方法及系统
CN116188916B (zh) * 2023-04-17 2023-07-28 杰创智能科技股份有限公司 细粒度图像识别方法、装置、设备及存储介质
CN116129375B (zh) * 2023-04-18 2023-07-21 华中科技大学 一种基于多曝光生成融合的弱光车辆检测方法
CN116167922B (zh) * 2023-04-24 2023-07-18 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备
CN116824306B (zh) * 2023-08-28 2023-11-17 天津大学 基于多模态元数据的笔石化石图像识别模型的训练方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818314A (zh) * 2017-11-22 2018-03-20 北京达佳互联信息技术有限公司 脸部图像处理方法、装置及服务器
CN108416318A (zh) * 2018-03-22 2018-08-17 电子科技大学 基于数据增强的合成孔径雷达图像目标深度模型识别方法
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110163234A (zh) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 一种模型训练方法、装置和存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321522A1 (en) * 2015-04-30 2016-11-03 Canon Kabushiki Kaisha Devices, systems, and methods for pairwise multi-task feature learning
US10282462B2 (en) * 2016-10-31 2019-05-07 Walmart Apollo, Llc Systems, method, and non-transitory computer-readable storage media for multi-modal product classification
CN109284749A (zh) * 2017-07-19 2019-01-29 微软技术许可有限责任公司 精细化图像识别
CN107480725A (zh) * 2017-08-23 2017-12-15 京东方科技集团股份有限公司 基于深度学习的图像识别方法、装置和计算机设备
GB2566067B (en) * 2017-09-01 2020-02-19 Intelligent Play Ltd Maintenance of playing surfaces
CN107506740B (zh) * 2017-09-04 2020-03-17 北京航空航天大学 一种基于三维卷积神经网络和迁移学习模型的人体行为识别方法
CN108416384B (zh) * 2018-03-05 2021-11-05 苏州大学 一种图像标签标注方法、系统、设备及可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN107818314A (zh) * 2017-11-22 2018-03-20 北京达佳互联信息技术有限公司 脸部图像处理方法、装置及服务器
CN108416318A (zh) * 2018-03-22 2018-08-17 电子科技大学 基于数据增强的合成孔径雷达图像目标深度模型识别方法
CN110163234A (zh) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 一种模型训练方法、装置和存储介质

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523483A (zh) * 2020-04-24 2020-08-11 北京邮电大学 中餐菜品图像识别方法及装置
CN111523483B (zh) * 2020-04-24 2023-10-03 北京邮电大学 中餐菜品图像识别方法及装置
CN113724128A (zh) * 2020-05-25 2021-11-30 Tcl科技集团股份有限公司 一种训练样本的扩充方法
CN113724128B (zh) * 2020-05-25 2024-03-15 Tcl科技集团股份有限公司 一种训练样本的扩充方法
CN111461265B (zh) * 2020-05-27 2023-07-25 东北大学 基于粗-细粒度多图多标签学习的场景图像标注方法
CN111461265A (zh) * 2020-05-27 2020-07-28 东北大学 基于粗-细粒度多图多标签学习的场景图像标注方法
CN111931900A (zh) * 2020-05-29 2020-11-13 西安电子科技大学 基于残差网络与多尺度特征融合的gis放电波形检测方法
CN111931900B (zh) * 2020-05-29 2023-09-19 西安电子科技大学 基于残差网络与多尺度特征融合的gis放电波形检测方法
CN111860573A (zh) * 2020-06-04 2020-10-30 北京迈格威科技有限公司 模型训练方法、图像类别检测方法、装置和电子设备
CN111860573B (zh) * 2020-06-04 2024-05-10 北京迈格威科技有限公司 模型训练方法、图像类别检测方法、装置和电子设备
CN111768005B (zh) * 2020-06-19 2024-02-20 北京康夫子健康技术有限公司 轻量级检测模型的训练方法、装置、电子设备及存储介质
CN111768005A (zh) * 2020-06-19 2020-10-13 北京百度网讯科技有限公司 轻量级检测模型的训练方法、装置、电子设备及存储介质
CN111860278A (zh) * 2020-07-14 2020-10-30 陕西理工大学 一种基于深度学习的人体行为识别算法
CN111860278B (zh) * 2020-07-14 2024-05-14 陕西理工大学 一种基于深度学习的人体行为识别算法
CN111950647A (zh) * 2020-08-20 2020-11-17 连尚(新昌)网络科技有限公司 分类模型训练方法和设备
TWI785739B (zh) * 2020-08-21 2022-12-01 大陸商北京市商湯科技開發有限公司 目標模型的獲取方法、電子設備與儲存媒體
CN112131961B (zh) * 2020-08-28 2023-02-03 中国海洋大学 一种基于单样本的半监督行人重识别方法
CN112131961A (zh) * 2020-08-28 2020-12-25 中国海洋大学 一种基于单样本的半监督行人重识别方法
CN112434729B (zh) * 2020-11-09 2023-09-19 西安交通大学 一种类不平衡样本下基于层再生网络的故障智能诊断方法
CN112434729A (zh) * 2020-11-09 2021-03-02 西安交通大学 一种类不平衡样本下基于层再生网络的故障智能诊断方法
CN114624768A (zh) * 2020-12-14 2022-06-14 中国石油化工股份有限公司 训练地震初至拾取模型的方法及装置
CN113033436B (zh) * 2021-03-29 2024-04-16 京东鲲鹏(江苏)科技有限公司 障碍物识别模型训练方法及装置、电子设备、存储介质
CN113033436A (zh) * 2021-03-29 2021-06-25 京东鲲鹏(江苏)科技有限公司 障碍物识别模型训练方法及装置、电子设备、存储介质
CN113065066B (zh) * 2021-03-31 2024-05-07 北京达佳互联信息技术有限公司 预测方法、装置、服务器及存储介质
CN113065066A (zh) * 2021-03-31 2021-07-02 北京达佳互联信息技术有限公司 预测方法、装置、服务器及存储介质
CN113192108B (zh) * 2021-05-19 2024-04-02 西安交通大学 一种针对视觉跟踪模型的人在回路训练方法及相关装置
CN113192108A (zh) * 2021-05-19 2021-07-30 西安交通大学 一种针对视觉跟踪模型的人在回路训练方法及相关装置
CN113449613B (zh) * 2021-06-15 2024-02-27 北京华创智芯科技有限公司 多任务长尾分布图像识别方法、系统、电子设备及介质
CN113449613A (zh) * 2021-06-15 2021-09-28 北京华创智芯科技有限公司 多任务长尾分布图像识别方法、系统、电子设备及介质
CN113436184A (zh) * 2021-07-15 2021-09-24 南瑞集团有限公司 基于改进孪生网络的电力设备图像缺陷判别方法及系统
CN113658109A (zh) * 2021-07-22 2021-11-16 西南财经大学 一种基于领域损失预测主动学习的玻璃缺陷检测方法
CN113887561B (zh) * 2021-09-03 2022-08-09 广东履安实业有限公司 一种基于数据分析的人脸识别方法、设备、介质、产品
CN113887561A (zh) * 2021-09-03 2022-01-04 广东履安实业有限公司 一种基于数据分析的人脸识别方法、设备、介质、产品
CN114118339B (zh) * 2021-11-12 2024-05-14 吉林大学 基于布谷鸟算法改进ResNet的无线电调制信号识别分类方法
CN114118339A (zh) * 2021-11-12 2022-03-01 吉林大学 基于布谷鸟算法改进ResNet的无线电调制信号识别分类方法
CN114266012A (zh) * 2021-12-21 2022-04-01 浙江大学 基于WiFi的非接触式博物馆多区域观众计数方法
CN114743043B (zh) * 2022-03-15 2024-04-26 北京迈格威科技有限公司 一种图像分类方法、电子设备、存储介质及程序产品
CN114743043A (zh) * 2022-03-15 2022-07-12 北京迈格威科技有限公司 一种图像分类方法、电子设备、存储介质及程序产品
CN114580484A (zh) * 2022-04-28 2022-06-03 西安电子科技大学 一种基于增量学习的小样本通信信号自动调制识别方法
CN115240078B (zh) * 2022-06-24 2024-05-07 安徽大学 一种基于轻量化元学习的sar图像小样本目标检测方法
CN115240078A (zh) * 2022-06-24 2022-10-25 安徽大学 一种基于轻量化元学习的sar图像小样本目标检测方法
CN116051486A (zh) * 2022-12-29 2023-05-02 抖音视界有限公司 内窥镜图像识别模型的训练方法、图像识别方法及装置
CN116612338B (zh) * 2023-07-21 2023-09-29 华中科技大学 基于网络状态索引卷积神经网络集的图像识别方法及系统
CN116612338A (zh) * 2023-07-21 2023-08-18 华中科技大学 基于网络状态索引卷积神经网络集的图像识别方法及系统
CN117173548B (zh) * 2023-08-10 2024-04-02 中国自然资源航空物探遥感中心 一种海底地貌智能分类模型构建方法、装置及分类方法
CN117173548A (zh) * 2023-08-10 2023-12-05 中国地质大学(武汉) 一种海底地貌智能分类模型构建方法、装置及分类方法
CN117152507B (zh) * 2023-08-25 2024-05-17 中山大学附属口腔医院 一种牙齿健康状态检测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN110163234A (zh) 2019-08-23
CN110163234B (zh) 2023-04-18
US20210042580A1 (en) 2021-02-11

Similar Documents

Publication Publication Date Title
WO2020073951A1 (zh) 用于图像识别的模型的训练方法、装置、网络设备和存储介质
WO2020244261A1 (zh) 高分辨率遥感图像的场景识别系统及模型生成方法
Wu et al. Object detection based on RGC mask R‐CNN
WO2021078027A1 (zh) 构建网络结构优化器的方法、装置及计算机可读存储介质
CN112131978B (zh) 一种视频分类方法、装置、电子设备和存储介质
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
CN107766929B (zh) 模型分析方法及装置
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
Zhou et al. A method of improved CNN traffic classification
CN111008640A (zh) 图像识别模型训练及图像识别方法、装置、终端及介质
CN112926584B (zh) 裂缝检测方法、装置、计算机设备及存储介质
WO2023179099A1 (zh) 一种图像检测方法、装置、设备及可读存储介质
CN113807399A (zh) 一种神经网络训练方法、检测方法以及装置
CN110135505A (zh) 图像分类方法、装置、计算机设备及计算机可读存储介质
CN112819063B (zh) 一种基于改进的Focal损失函数的图像识别方法
CN111340213B (zh) 神经网络的训练方法、电子设备、存储介质
WO2020135054A1 (zh) 视频推荐方法、装置、设备及存储介质
CN111522926A (zh) 文本匹配方法、装置、服务器和存储介质
Wang et al. Soft focal loss: Evaluating sample quality for dense object detection
CN114611692A (zh) 模型训练方法、电子设备以及存储介质
CN111104831B (zh) 一种视觉追踪方法、装置、计算机设备以及介质
CN109978058A (zh) 确定图像分类的方法、装置、终端及存储介质
CN111860601B (zh) 预测大型真菌种类的方法及装置
CN111783688B (zh) 一种基于卷积神经网络的遥感图像场景分类方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19871311

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19871311

Country of ref document: EP

Kind code of ref document: A1