WO2022077646A1 - 一种用于图像处理的学生模型的训练方法及装置 - Google Patents
一种用于图像处理的学生模型的训练方法及装置 Download PDFInfo
- Publication number
- WO2022077646A1 WO2022077646A1 PCT/CN2020/126837 CN2020126837W WO2022077646A1 WO 2022077646 A1 WO2022077646 A1 WO 2022077646A1 CN 2020126837 W CN2020126837 W CN 2020126837W WO 2022077646 A1 WO2022077646 A1 WO 2022077646A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- classification
- model
- student model
- target
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012549 training Methods 0.000 title claims abstract description 54
- 238000012545 processing Methods 0.000 title claims abstract description 44
- 238000010606 normalization Methods 0.000 claims abstract description 44
- 238000013459 approach Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 92
- 230000015654 memory Effects 0.000 claims description 30
- 230000009467 reduction Effects 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 20
- 238000003860 storage Methods 0.000 claims description 16
- 238000002372 labelling Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 238000013140 knowledge distillation Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 14
- 239000011159 matrix material Substances 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000004821 distillation Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
Definitions
- the present application relates to the technical field of knowledge distillation, and in particular, to a method and device for training a student model for image processing.
- the important role of knowledge distillation is to transfer the knowledge learned by the complex model to the lightweight model, so that the lightweight model can have similar performance to the complex model even with a small amount of original parameters.
- Complex models are often referred to as teacher models, and lightweight models are often referred to as student models.
- the embodiments of the present application provide a method and device for training a student model for image processing, so as to solve the problem that the search effect of the student model trained by using the teacher model is relatively poor in the related art.
- an embodiment of the present application provides a method for training a student model for image processing, including:
- the teacher model is obtained by classifying and training target objects in multiple image samples, and the teacher model includes a convolution layer, a classification layer and a normalization layer that are connected in sequence;
- the student model includes a convolutional layer, a classification layer and a normalization layer that are connected in sequence, and the student model and all The normalization layer of the teacher model described above uses the same normalization function;
- the classification loss value of the student model for the target object in each image sample adjust the parameters of the target layer in the student model before the classification layer, so that the image of each type of target object learned by the target layer in the student model is adjusted.
- the features approach the image features of the target object learned by the target layer in the teacher model, and stop training the student model until it is determined that the classification error of the student model is less than the set error.
- the parameters of the classification layer in the teacher model are used to initialize the classification layer in the student model to be trained parameters, including:
- the parameters of the classification layer in the teacher model are used as the parameters of the classification layer in the student model to be trained.
- the following method is used to calculate The classification loss value of the student model for the target object in each image sample:
- the classification loss value of the student model for the target object in the image sample is calculated, and the target angle is the The angle between the feature vector of the target object in the image sample and the target classification weight vector, where the target classification weight vector refers to the classification weight vector corresponding to the labeled category of the target object in the image sample.
- the classification loss value loss i of the student model for the target object in the ith image sample is calculated according to the following formula:
- yi represents the labeling category of the target object in the ith image sample
- ⁇ j represents the angle between the feature vector of the target object in the ith image sample and the jth classification weight vector in the classification layer of the student model
- n represents the target
- the total number of categories of objects, s and m are preset constants, and i and j are integers.
- both the teacher model and the student model further include a dimensionality reduction layer located between the convolutional layer and the classification layer for performing dimensionality reduction processing, then according to the student model The classification loss value of the target object in each image sample, and adjusting the parameters of the target layer before the classification layer in the student model, including:
- the parameters of the convolution layer and the dimension reduction layer in the student model are adjusted.
- an object search method including:
- the target layer that is positioned before the classification layer in the student model to carry out feature extraction to the target object in the image to be processed, and the student model adopts the training method of any of the above-mentioned student models for image processing to train to obtain;
- the teacher model is the model used to train the student model
- the search result of the target object in the to-be-processed image in each candidate object is determined.
- an embodiment of the present application provides a training device for a student model for image processing, including:
- the acquisition module is used to acquire the parameters of the classification layer in the teacher model, the teacher model is obtained by classifying and training the target objects in the multiple image samples, and the teacher model includes a convolution layer, a classification layer and a normalization layer that are connected in sequence.
- the teacher model includes a convolution layer, a classification layer and a normalization layer that are connected in sequence.
- the initialization module is used for using the parameters of the classification layer in the teacher model to initialize the parameters of the classification layer in the student model to be trained, and the student model includes successively connected convolution layers, classification layers and normalization layers, and all The normalization layer of the student model and the teacher model uses the same normalization function;
- an input module for inputting at least part of the image samples into the student model to classify the target object in the at least part of the image samples
- the adjustment module is used to adjust the parameters of the target layer before the classification layer in the student model according to the classification loss value of the target object in each image sample by the student model, so that each target layer learned in the student model can be adjusted.
- the image features of the class target object approach the image features of the class target object learned by the target layer in the teacher model, and stop training the student model until it is determined that the classification error of the student model is less than the set error.
- the initialization module is specifically used for:
- the parameters of the classification layer in the teacher model are used as the parameters of the classification layer in the student model to be trained.
- the adjustment module The classification loss value of the student model for the target object in each image sample is calculated as follows:
- the classification loss value of the student model for the target object in the image sample is calculated, and the target angle is the The angle between the feature vector of the target object in the image sample and the target classification weight vector, where the target classification weight vector refers to the classification weight vector corresponding to the labeled category of the target object in the image sample.
- the adjustment module calculates the classification loss value loss i of the student model for the target object in the ith image sample according to the following formula:
- yi represents the labeling category of the target object in the ith image sample
- ⁇ j represents the angle between the feature vector of the target object in the ith image sample and the jth classification weight vector in the classification layer of the student model
- n represents the target
- the total number of categories of objects, s and m are preset constants, and i and j are integers.
- the adjustment module specifically uses At:
- the parameters of the convolution layer and the dimension reduction layer in the student model are adjusted.
- an object search device including:
- the acquisition module is used to acquire the image to be processed
- the feature extraction module is used to extract the feature of the target object in the image to be processed by using the target layer located before the classification layer in the student model, and the student model is trained by the training method of any of the above-mentioned student models for image processing. get;
- the comparison module is used to compare the image features of the target object in the extracted image to be processed with the image features of each candidate object, wherein the image features of each candidate object are the target located before the classification layer in the teacher model layer extracted, the teacher model is the model used to train the student model;
- the determining module is configured to determine, according to the comparison result, the search result of the target object in the to-be-processed image in each candidate object.
- an embodiment of the present application provides an electronic device, including: at least one processor, and a memory communicatively connected to the at least one processor, wherein:
- the memory stores instructions executable by at least one processor to enable the at least one processor to perform the above-described training method of a student model for image processing.
- an embodiment of the present application provides a storage medium.
- the electronic device can execute the above-mentioned training method for a student model for image processing.
- the target objects in the multiple image samples are classified and trained in advance to obtain a teacher model
- the parameters of the classification layer in the teacher model are obtained
- the parameters of the classification layer in the teacher model are used to initialize the classification layer in the student model to be trained.
- input at least part of the image samples into the student model to classify the target objects in the at least part of the image samples, according to the classification loss value of the student model for the target objects in each image sample, adjust the location in the student model.
- the parameters of the target layer before the classification layer make the image features of each target object learned by the target layer in the student model approach the image features of the target object learned by the target layer in the teacher model, until the classification error of the student model is determined.
- FIG. 1 is a flowchart of a method for training a student model for image processing provided by an embodiment of the present application
- FIG. 2 is a schematic diagram of a training process of a student model for image processing provided by an embodiment of the present application
- FIG. 3 is a flowchart of an object search method provided by an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of a training device for a student model for image processing provided by an embodiment of the present application
- FIG. 5 is a schematic structural diagram of an object searching apparatus according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of the hardware structure of an electronic device for implementing a training method and/or an object search method for a student model for image processing provided by an embodiment of the present application.
- the embodiments of the present application provide a training method and apparatus for a student model for image processing.
- the target layer including at least the convolution layer
- the classification layer to extract the image features of the target object in the image to be processed, and compare the extracted image features with the image features of each candidate object. Compare to search for objects matching the target object in the image to be processed in each candidate object.
- the image features of each candidate object are generally extracted by the teacher model, that is, the image features extracted by the student model need to be compared with the image features extracted by the teacher model.
- the correlation between the image features extracted by the student model and the image features extracted by the teacher model is not considered when training the student model, that is, the image features of the target object extracted by the student model for searching are not performed. Optimization, so that the image features of the same target object extracted by the student model and the teacher model are not aligned in space (the distance in space is still relatively far), so the final student model does not search well for the target object.
- an embodiment of the present application provides a training method for a student model used for image processing.
- the target objects in multiple image samples are classified and trained in advance to obtain a teacher model, the parameters of the classification layer in the teacher model are obtained, and the The parameters of the classification layer in the teacher model initialize the parameters of the classification layer in the student model to be trained, so that the parameters of the classification layer used by the two are connected, and then at least part of the image samples are input into the student model to analyze the at least part of the images.
- the target objects in the sample are classified, and the degree of approximation between the image features of various target objects learned by the target layer in the student model and the image features of various target objects learned by the target layer in the teacher model is the goal.
- both the student model and the teacher model include a convolutional layer, a classification layer and a normalization layer connected in sequence.
- the normalization layer of the student model and the teacher model use the same normalization function. In this way, the difference between the image features of various types of target objects extracted by the student model and the teacher model can be reduced, that is, the characteristics of each type of target objects learned by the student model and the characteristics of each type of target objects learned by the teacher model can be reduced.
- the features are spatially aligned, thus improving the search effect of the final student model for the target object.
- FIG. 1 is a flowchart of a method for training a student model for image processing provided by an embodiment of the present application, comprising the following steps:
- S101 Obtain parameters of a classification layer in a teacher model, wherein the teacher model is obtained by classifying and training target objects in multiple image samples, and the teacher model includes a convolution layer, a classification layer, and a normalization layer that are connected in sequence.
- the target object is such as human face, plant, animal, etc.
- a person's face is classified into one class, and the number of classes of faces that need to be classified by the teacher model depends on the number of image samples of the class of faces that need to be obtained, and the number of faces in each class of image samples can be There are multiple so that the teacher model can better learn the image features of each type of face.
- the teacher model includes a convolutional layer, a classification layer and a normalization layer connected in sequence from front to back, wherein the convolutional layer is used for feature extraction of the target object in the image sample, and the classification layer is used for the convolutional layer.
- the image features of the target objects in the image samples extracted by the layer are classified, and the normalization layer, such as the softmax layer, is used to normalize the output results of the classification layer to obtain the probability distribution of the categories of the target objects in the image samples.
- the highest category is the category to which the target object in the image sample belongs.
- S102 Use the parameters of the classification layer in the teacher model to initialize the parameters of the classification layer in the student model to be trained, wherein the student model includes a convolution layer, a classification layer and a normalization layer that are connected in sequence, and the difference between the student model and the teacher model is The normalization layer uses the same normalization function.
- the student model also includes at least a convolution layer, a classification layer and a normalization layer connected in sequence from front to back.
- the functions of each layer in the student model are the same as those in the teacher model, but the convolution layer of the student model
- the layers are much simpler than the convolutional layers of the teacher model.
- the classification layer in the teacher model can be directly classified.
- the parameters of the layer are used as the parameters of the classification layer in the student model to be trained, that is, the parameter matrix of the classification layer in the teacher model is directly used as the parameter matrix of the classification layer in the student model.
- the connection between the student model and the classification layer in the teacher model can be preserved to the greatest extent, so that the image features of each type of target object extracted by the subsequent student model and the teacher model can be spatially aligned.
- S103 Input at least part of the image samples into the student model to classify the target objects in the at least part of the image samples.
- At least some of the image samples may include all the image samples, and may also include some of the image samples.
- this part of the image samples can contain images of all types of target objects.
- S104 Adjust the parameters of the target layer in the student model before the classification layer according to the classification loss value of the target object in each image sample by the student model, so that the image features of each type of target object learned by the target layer in the student model approach the teacher The image features of this type of target object learned by the target layer in the model, until it is determined that the classification error of the student model is less than the set error, stop training the student model.
- the classification loss value of the target object in each image sample by the student model is used to represent the image features of the target object in the image sample extracted by the student model and the feature center of the category of the target object in the image sample learned by the teacher model difference between.
- the image features of the target object input to the classification layer in the teacher model are feature vectors, and each column element in the parameter matrix of the classification layer in the teacher model can be regarded as a classification weight vector, and each classification weight vector corresponds to one of the target objects.
- Category which is used to describe the feature center of the target object of this category.
- the image features of the target object input to the classification layer in the student model are also feature vectors, and each column element in the parameter matrix of the classification layer in the student model can also be regarded as a classification weight vector, and each classification weight vector also corresponds to the target.
- a class of objects also used to describe the feature center of the target object of this class.
- the included angle between the feature vector of the target object in each image sample of the input classification layer in the student model and each classification weight vector of the classification layer in the student model can be calculated, and the included angle is used to characterize the image sample The degree of proximity between the target object in and the category corresponding to the classification weight vector, and then, according to the target angle and the angle between the feature vector of the target object in the image sample and each classification weight vector, calculate the student model pair.
- the classification loss value of the target object in the image sample, where the target angle is the angle between the feature vector of the target object in the image sample and the target classification weight vector, and the target classification weight vector refers to the target object in the image sample.
- the classification weight vector corresponding to the label category.
- yi represents the labeling category of the target object in the ith image sample
- yi represents the labeling category of the target object in the ith image sample
- ⁇ j represents the angle between the feature vector of the target object in the ith image sample and the jth classification weight vector in the classification layer of the student model
- n represents the target
- j in the formula ranges from 1 to n and j ⁇ y i , which means that j takes all the categories of the n categories except the label category of the target object in the ith image sample.
- the parameters of the classification layer in the student model can be kept unchanged, and according to the classification loss value of the student model to the target object in each image sample, the gradient descent algorithm is used to adjust the parameters of the target layer in the student model before the classification layer, until it is determined.
- the classification error of the student model is less than the set error, the training of the student model is stopped.
- the parameters of the classification layer in the teacher model are used to initialize the parameters of the classification layer in the student model to be trained, so that the parameters of the classification layer used by the two are connected, and each class of the target layer learned in the student model is required.
- the image features of the target object are close to the image features of the target object learned by the target layer in the teacher model, that is, the image features of various target objects finally obtained by the target layer in the student model and the image features obtained by the target layer in the teacher model are required.
- the image features of the object-like objects are spatially aligned as much as possible.
- FIG. 2 is a schematic diagram of a training process of a student model for image processing provided by an embodiment of the present application, wherein the teacher model and the student model both include a convolutional layer, a classification layer and a softmax layer that are connected in sequence, that is, as shown in FIG. 2
- the shown target layer only includes the convolutional layer.
- the convolutional layer of the teacher model is more complicated than the convolutional layer of the student model, but the convolutional layers of the teacher model and the student model both output the image features of the target object of the same dimension, the teacher model and the student model.
- the dimensions of the classification layer in the model are the same, and the softmax layer of the teacher model and the student model use the same normalization function.
- the convolutional layer in the teacher model outputs a 1*512-dimensional feature vector of the target object in the image sample
- the parameters of the classification layer in the teacher model are 512*1000 parameter matrix
- the output of the softmax layer in the teacher model is the probability data of 1*1000 after normalization processing
- the probability data of 1*1000 represents the target in the image sample
- the distribution probability of the object among 1000 categories, the category with the highest probability is the category to which the target object belongs in the image sample determined by the teacher model.
- the teacher model can be obtained by training the image samples and the label categories of the target objects in the image samples. Then, the 512*1000 parameter matrix used by the classification layer in the teacher model is obtained, and the obtained teacher model used by the classification layer is used. The 512*1000 parameter matrix is used as the 512*1000 parameter matrix used by the classification layer in the student model to be trained, after which at least part of the image samples are input into the student model to classify the target objects in the at least part of the image samples .
- the convolutional layer of the student model is simpler than the convolutional layer of the teacher model
- the convolutional layer of the student model also extracts a 1*512-dimensional feature vector for the target object in each image sample.
- the parameters are the same as those of the classification layer in the teacher model, and the same normalization function is used in the softmax layer of the student and teacher models. Then, the feature vector of the target object in each image sample that enters the classification layer in the student model is as close as possible to the feature vector of the target object in the image sample that enters the classification layer in the teacher model, and the student model and the teacher model can be made.
- the extracted image features of various target objects are spatially aligned as much as possible.
- the student model can calculate the target object in each image sample.
- N represents the number of image samples
- yi represents the labeling category of the target object in the ith image sample
- yi represents the labeling category of the target object in the ith image sample
- ⁇ j represents the angle between the feature vector of the target object in the ith image sample and the jth classification weight vector in the classification layer of the student model
- n represents the target
- both the teacher model and the student model may also include a dimensionality reduction layer located between the convolutional layer and the classification layer for dimensionality reduction processing.
- the dimensionality reduction layer is used to compress the image features of the target object.
- FIG. 3 is a flowchart of an object search method provided by an embodiment of the present application, comprising the following steps:
- S301 Acquire an image to be processed.
- S302 Use the target layer in the student model before the classification layer to perform feature extraction on the target object in the to-be-processed image, where the student model is obtained by training using the training method of the student model for image processing provided by the implementation of this application.
- the student model includes a convolution layer, a classification layer and a normalization layer
- only the convolution layer of the student model is used to perform feature extraction on the target object in the image to be processed
- the student model includes a convolution layer
- the dimensionality reduction layer, the classification layer and the normalization layer use the convolutional layer and the dimensionality reduction layer of the student model to perform feature extraction on the target object in the image to be processed.
- S303 Compare the image features of the target object in the extracted image to be processed with the image features of each candidate object, wherein the image features of each candidate object are extracted by using the target layer located before the classification layer in the teacher model. is the model used to train the student model.
- the teacher model includes a convolution layer, a classification layer and a normalization layer that are connected in sequence
- the image features of each candidate object are extracted only by using the convolution layer of the teacher model; if it is determined that the teacher model includes sequentially connected layers
- the convolution layer, dimensionality reduction layer, classification layer and normalization layer of each candidate object are extracted by using the convolutional layer and dimensionality reduction layer of the teacher model.
- S304 According to the comparison result, determine the search result of the target object in the image to be processed in each candidate object.
- the candidate object with the highest matching degree among the image features of each candidate object and the image feature of the target object in the image to be processed may be used as the search result of the target object in the to-be-processed image among the candidate objects.
- the parameters of the classification layer in the teacher model can be directly used as the parameters of the classification layer in the student model to teach the student model how to align the spatial features of the face image features extracted by the teacher model, and can combine various loss functions To ensure the distillation effect, the application prospect is relatively wide, and the practical value is relatively large.
- the parameters of the classification layer in the teacher model are used to initialize the parameters of the classification layer in the student model to be trained, so that the parameters of the classification layer used by the two are connected, and the target in the student model is required.
- the image features of each type of target object learned by the layer approach the image features of the target object learned by the target layer in the teacher model, that is, the image features of various target objects finally obtained by the target layer in the student model and the teacher model are required.
- the image features of various target objects obtained from the middle target layer are spatially aligned, so that the difference between the image features of various target objects extracted by the student model and the teacher model can be reduced, which is more in line with the search task.
- the extracted image feature of the target object is used as the feature of matching the search feature with the image feature of the target object extracted by the teacher model. Therefore, the search effect of the final student model for the target object can be improved.
- the electronic device may include multiple functional modules, and each functional module may include software, hardware, or a combination thereof.
- FIG. 4 is a schematic structural diagram of an apparatus for training a student model for image processing provided by an embodiment of the present application, including an acquisition module 401 , an initialization module 402 , an input module 403 , and an adjustment module 404 .
- the acquisition module 401 is used to acquire the parameters of the classification layer in the teacher model, the teacher model is obtained by classifying and training the target objects in the multiple image samples, and the teacher model includes the convolution layer, the classification layer and the normalization layer;
- the initialization module 402 is used for using the parameters of the classification layer in the teacher model to initialize the parameters of the classification layer in the student model to be trained, and the student model includes a convolution layer, a classification layer and a normalization layer that are connected in turn, and The normalization layer of the student model and the teacher model uses the same normalization function;
- an input module 403, configured to input at least part of the image samples into the student model, so as to classify the target object in the at least part of the image samples;
- the adjustment module 404 is used to adjust the parameters of the target layer before the classification layer in the student model according to the classification loss value of the target object in each image sample by the student model, so that the target layer in the student model learns The image features of each type of target object approach the image features of the target object learned by the target layer in the teacher model, and stop training the student model until it is determined that the classification error of the student model is less than the set error.
- the initialization module 402 is specifically configured to:
- the parameters of the classification layer in the teacher model are used as the parameters of the classification layer in the student model to be trained.
- the adjustment module 404 Calculate the classification loss value of the student model for the target object in each image sample in the following manner:
- the classification loss value of the student model for the target object in the image sample is calculated, and the target angle is the The angle between the feature vector of the target object in the image sample and the target classification weight vector, where the target classification weight vector refers to the classification weight vector corresponding to the labeled category of the target object in the image sample.
- the adjustment module 404 calculates the classification loss value loss i of the student model for the target object in the ith image sample according to the following formula:
- yi represents the labeling category of the target object in the ith image sample
- ⁇ j represents the angle between the feature vector of the target object in the ith image sample and the jth classification weight vector in the classification layer of the student model
- n represents the target
- the total number of categories of objects, s and m are preset constants, and i and j are integers.
- both the teacher model and the student model further include a dimensionality reduction layer located between the convolutional layer and the classification layer for performing dimensionality reduction processing
- the adjustment module 404 specifically Used for:
- the parameters of the convolution layer and the dimension reduction layer in the student model are adjusted.
- FIG. 5 is a schematic structural diagram of an object search apparatus provided by an embodiment of the present application, including an acquisition module 501, a feature extraction module 502, a comparison module 503, and a determination module 504.
- an acquisition module 501 configured to acquire an image to be processed
- the feature extraction module 502 is used to extract the feature of the target object in the image to be processed by using the target layer located before the classification layer in the student model, and the student model adopts the training method of any of the above-mentioned student models for image processing. trained;
- the comparison module 503 is used to compare the image features of the target object in the extracted image to be processed with the image features of each candidate object, wherein the image features of each candidate object are obtained by using the image features located before the classification layer in the teacher model. Extracted from the target layer, the teacher model is a model used to train the student model;
- the determining module 504 is configured to determine, according to the comparison result, a search result of the target object in the to-be-processed image in each candidate object.
- modules in the embodiments of the present application are schematic, and is only a logical function division. In actual implementation, there may be other division methods.
- the functional modules in the various embodiments of the present application may be integrated into one processing unit. In the device, it can also exist physically alone, or two or more modules can be integrated into one module.
- the coupling between the various modules may be implemented through some interfaces, which are usually electrical communication interfaces, but may be mechanical interfaces or other forms of interfaces.
- modules described as separate components may or may not be physically separate, and may be located in one place or distributed in different locations on the same or different devices.
- the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.
- the electronic device includes physical devices such as a transceiver 601 and a processor 602, wherein the processor 602 may be a central processing unit (Central Processing Unit, CPU) , microprocessors, application-specific integrated circuits, programmable logic circuits, large-scale integrated circuits, or digital processing units, etc.
- the transceiver 601 is used for data transmission and reception between electronic devices and other devices.
- the electronic device may also include a memory 603 for storing software instructions executed by the processor 602, and certainly may also store some other data required by the electronic device, such as identification information of the electronic device, encrypted information of the electronic device, user data, and the like.
- the memory 603 may be a volatile memory (Volatile Memory), such as a random-access memory (Random-Access Memory, RAM); the memory 603 may also be a non-volatile memory (Non-Volatile Memory), such as a read-only memory (Read- Only Memory (ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD) or Solid-State Drive (SSD), or the memory 603 is capable of carrying or storing instructions or data structures in the form of desired program code and any other medium that can be accessed by a computer, but is not limited thereto.
- the memory 603 may be a combination of the above-described memories.
- the specific connection medium between the processor 602, the memory 603, and the transceiver 601 is not limited in this embodiment of the present application.
- the embodiment of the present application only takes the connection between the memory 603 , the processor 602 and the transceiver 601 through the bus 604 as an example for description.
- the bus is represented by a thick line in FIG. 6 . It is a schematic illustration and is not intended to be limiting.
- the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
- the processor 602 can be a dedicated hardware or a processor running software. When the processor 602 can run software, the processor 602 reads the software instructions stored in the memory 603, and under the drive of the software instructions, executes the preceding embodiments. Involved training methods for student models for image processing.
- the embodiment of the present application also provides a storage medium, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can perform the training of the student model for image processing involved in the foregoing embodiments method.
- various aspects of the training method for image processing student models provided by the present application can also be implemented in the form of a program product, wherein the program product includes program codes, and when the program When the product runs on an electronic device, the program code is used to make the electronic device execute the training method of the student model for image processing involved in the foregoing embodiments.
- the program product may employ any combination of one or more readable media.
- the readable medium may be a readable signal medium or a readable storage medium.
- the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- the program product for training the student model for image processing in the embodiments of the present application may adopt a portable compact disk read only memory (CD-ROM) and include program codes, and may be executed on a computing device.
- CD-ROM portable compact disk read only memory
- the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Program code for carrying out the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural A programming language such as the "C" language or similar programming language.
- the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
- the remote computing devices may be connected to the user computing device through any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider to via Internet connection).
- LAN local area network
- WAN wide area network
- the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
- the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- General Business, Economics & Management (AREA)
- Image Analysis (AREA)
Abstract
一种用于图像处理的学生模型的训练方法及装置,属于知识蒸馏技术领域,该方法包括:获取教师模型中分类层的参数,教师模型由对多个图像样本中的目标对象进行分类训练得到,利用获取的参数初始化待训练的学生模型中分类层的参数,将至少部分图像样本输入到学生模型中进行分类,根据学生模型的分类损失值调整学生模型中位于分类层之前的目标层的参数,使学生模型中目标层学习到的每类目标对象的图像特征趋近教师模型中目标层学习到的该类目标对象的图像特征,直至确定学生模型的分类误差小于设定误差时结束训练,其中,教师模型和学生模型均包括依次相连的卷积层、分类层和归一化层、且两者的归一化层使用相同的归一化函数。
Description
相关申请的交叉引用
本申请要求在2020年10月13日提交中国专利局、申请号为202011089981.3、申请名称为“一种用于图像处理的学生模型的训练方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及知识蒸馏技术领域,尤其涉及一种用于图像处理的学生模型的训练方法及装置。
一般地,知识蒸馏的重要作用是将复杂模型学习到的知识迁移到轻量级模型中,使得轻量级模型在原始参数量较小的情况下也能拥有和复杂模型相近的性能,其中,复杂模型通常称作教师模型,轻量级模型通常称作学生模型。
以对图像样本中的目标对象进行分类为例。相关技术中,会先用大量的图像样本和图像样本中目标对象的标注类别训练教师模型,在教师模型的分类精度达到要求时,再用图像样本的标注类别和教师模型中归一化层的输出结果一起作为学生模型的监督信息来训练学生模型,这样,可为学生模型提供尽可能多的先验信息,使学生模型尽可能快地学习到教师模型学习到的知识。然而,这种知识蒸馏方式还停留在学术研究阶段、未与实际的搜索任务相结合,学生模型与教师模型之间的差异也比较大,所以学生模型难以达到比较好的搜索效果。
发明内容
本申请实施例提供一种用于图像处理的学生模型的训练方法及装置,用 以解决相关技术中利用教师模型训练的学生模型存在的搜索效果比较差的问题。
第一方面,本申请实施例提供一种用于图像处理的学生模型的训练方法,包括:
获取教师模型中分类层的参数,所述教师模型是对多个图像样本中的目标对象进行分类训练得到的,所述教师模型包括依次相连的卷积层、分类层和归一化层;
利用所述教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,所述学生模型包括依次相连的卷积层、分类层和归一化层、且所述学生模型与所述教师模型的归一化层使用相同的归一化函数;
将至少部分图像样本输入到所述学生模型中,以对所述至少部分图像样本中的目标对象进行分类;
根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中位于分类层之前的目标层的参数,使所述学生模型中目标层学习到的每类目标对象的图像特征趋近所述教师模型中目标层学习到的该类目标对象的图像特征,直至确定所述学生模型的分类误差小于设定误差时,停止训练所述学生模型。
在一种可能的实施方式中,若所述教师模型与所述学生模型中分类层的参数的维数相同,则利用所述教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,包括:
将所述教师模型中分类层的参数作为待训练的学生模型中分类层的参数。
在一种可能的实施方式中,若输入所述学生模型中分类层的目标对象的图像特征为特征向量、且所述学生模型中分类层的参数包括多个分类权重向量,则采用如下方式计算所述学生模型对每个图像样本中目标对象的分类损失值:
计算所述学生模型中输入分类层的每个图像样本中目标对象的特征向量和每个分类权重向量之间的夹角,所述夹角用于表征该图像样本中的目标对 象与该分类权重向量对应的类别之间的接近程度;
根据目标夹角、以及该图像样本中目标对象的特征向量和各分类权重向量之间的夹角,计算所述学生模型对该图像样本中目标对象的分类损失值,所述目标夹角是该图像样本中目标对象的特征向量和目标分类权重向量之间的夹角,所述目标分类权重向量是指该图像样本中目标对象的标注类别对应的分类权重向量。
在一种可能的实施方式中,根据以下公式计算所述学生模型对第i个图像样本中目标对象的分类损失值loss
i:
其中,y
i表示第i个图像样本中目标对象的标注类别,
表示第i个图像样本中目标对象的目标夹角,θ
j表示第i个图像样本中目标对象的特征向量和学生模型的分类层中第j个分类权重向量之间的夹角,n表示目标对象的类别总数,s和m为预设常数,i和j均为整数。
在一种可能的实施方式中,若所述教师模型和所述学生模型均还包括位于卷积层和分类层之间的用于进行降维处理的降维层,则根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中位于分类层之前的目标层的参数,包括:
根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中卷积层和降维层的参数。
第二方面,本申请实施例提供一种对象搜索方法,包括:
获取待处理图像;
利用学生模型中位于分类层之前的目标层对所述待处理图像中的目标对象进行特征提取,所述学生模型采用上述任一用于图像处理的学生模型的训 练方法训练得到;
将提取的所述待处理图像中目标对象的图像特征与各候选对象的图像特征进行比对,其中,各候选对象的图像特征是利用教师模型中位于分类层之前的目标层提取的,所述教师模型是用于训练所述学生模型的模型;
根据比对结果,确定所述待处理图像中的目标对象在各候选对象中的搜索结果。
第三方面,本申请实施例提供一种用于图像处理的学生模型的训练装置,包括:
获取模块,用于获取教师模型中分类层的参数,所述教师模型是对多个图像样本中的目标对象进行分类训练得到的,所述教师模型包括依次相连的卷积层、分类层和归一化层;
初始化模块,用于利用所述教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,所述学生模型包括依次相连的卷积层、分类层和归一化层、且所述学生模型与所述教师模型的归一化层使用相同的归一化函数;
输入模块,用于将至少部分图像样本输入到所述学生模型中,以对所述至少部分图像样本中的目标对象进行分类;
调整模块,用于根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中位于分类层之前的目标层的参数,使所述学生模型中目标层学习到的每类目标对象的图像特征趋近所述教师模型中目标层学习到的该类目标对象的图像特征,直至确定所述学生模型的分类误差小于设定误差时,停止训练所述学生模型。
在一种可能的实施方式中,若所述教师模型与所述学生模型中分类层的参数的维数相同,则所述初始化模块具体用于:
将所述教师模型中分类层的参数作为待训练的学生模型中分类层的参数。
在一种可能的实施方式中,若输入所述学生模型中分类层的目标对象的图像特征为特征向量、且所述学生模型中分类层的参数包括多个分类权重向 量,则所述调整模块采用如下方式计算所述学生模型对每个图像样本中目标对象的分类损失值:
计算所述学生模型中输入分类层的每个图像样本中目标对象的特征向量和每个分类权重向量之间的夹角,所述夹角用于表征该图像样本中的目标对象与该分类权重向量对应的类别之间的接近程度;
根据目标夹角、以及该图像样本中目标对象的特征向量和各分类权重向量之间的夹角,计算所述学生模型对该图像样本中目标对象的分类损失值,所述目标夹角是该图像样本中目标对象的特征向量和目标分类权重向量之间的夹角,所述目标分类权重向量是指该图像样本中目标对象的标注类别对应的分类权重向量。
在一种可能的实施方式中,所述调整模块根据以下公式计算所述学生模型对第i个图像样本中目标对象的分类损失值loss
i:
其中,y
i表示第i个图像样本中目标对象的标注类别,
表示第i个图像样本中目标对象的目标夹角,θ
j表示第i个图像样本中目标对象的特征向量和学生模型的分类层中第j个分类权重向量之间的夹角,n表示目标对象的类别总数,s和m为预设常数,i和j均为整数。
在一种可能的实施方式中,若所述教师模型和所述学生模型均还包括位于卷积层和分类层之间的用于进行降维处理的降维层,则所述调整模块具体用于:
根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中卷积层和降维层的参数。
第四方面,本申请实施例提供一种对象搜索装置,包括:
获取模块,用于获取待处理图像;
特征提取模块,用于利用学生模型中位于分类层之前的目标层对所述待处理图像中的目标对象进行特征提取,所述学生模型采用上述任一用于图像处理的学生模型的训练方法训练得到;
比对模块,用于将提取的所述待处理图像中目标对象的图像特征与各候选对象的图像特征进行比对,其中,各候选对象的图像特征是利用教师模型中位于分类层之前的目标层提取的,所述教师模型是用于训练所述学生模型的模型;
确定模块,用于根据比对结果,确定所述待处理图像中的目标对象在各候选对象中的搜索结果。
第五方面,本申请实施例提供一种电子设备,包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中:
存储器存储有可被至少一个处理器执行的指令,该指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述用于图像处理的学生模型的训练方法。
第六方面,本申请实施例提供一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,所述电子设备能够执行上述用于图像处理的学生模型的训练方法。
本申请实施例中,预先对多个图像样本中的目标对象进行分类训练得到教师模型,获取教师模型中分类层的参数,利用教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,之后,将至少部分图像样本输入到学生模型中,以对这至少部分图像样本中的目标对象进行分类,根据学生模型对各图像样本中目标对象的分类损失值,调整学生模型中位于分类层之前的目标层的参数,使学生模型中目标层学习到的每类目标对象的图像特征趋近教师模型中目标层学习到的该类目标对象的图像特征,直至确定学生模型的分类误差小于设定误差时,停止训练学生模型,其中,学生模型与教师模型均包括依次相连的卷积层、分类层和归一化层、且学生模型和教师模型 的归一化层使用相同的归一化函数。这样,利用教师模型中分类层的参数初始化待训练的学生模型中分类层的参数,使两者使用的分类层的参数产生联系,并以学生模型中目标层学习到的各类目标对象的图像特征和教师模型中目标层学习到的各类目标对象的图像特征之间的趋近程度为目标,调整学生模型中位于分类层之前的目标层的参数,可缩小学生模型与教师模型所提取到的各类目标对象的图像特征之间的差异,因此,可提升最终得到的学生模型对目标对象的搜索效果。
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的一种用于图像处理的学生模型的训练方法的流程图;
图2为本申请实施例提供的一种用于图像处理的学生模型的训练过程的示意图;
图3为本申请实施例提供的一种对象搜索方法的流程图;
图4为本申请实施例提供的一种用于图像处理的学生模型的训练装置的结构示意图;
图5为本申请实施例提供的一种对象搜索装置的结构示意图;
图6为本申请实施例提供的一种实现用于图像处理的学生模型的训练方法和/或对象搜索方法的电子设备的硬件结构示意图。
为了解决相关技术中利用教师模型训练的学生模型存在的搜索效果比较差的问题,本申请实施例提供了一种用于图像处理的学生模型的训练方法及装置。
以下结合说明书附图对本申请的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本申请,并不用于限定本申请,并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
相关技术中,知识蒸馏还停留在学术研究阶段,给出的各种蒸馏方式都未考虑实际的业务场景,而在不同业务场景中想要学生模型从教师模型学习到的重点知识也会有所区别,所以不考虑业务场景的蒸馏方式在实际业务场景中的使用效果并不是很好。
以对图像样本中的目标对象进行分类为例。相关技术中,会先用大量的图像样本和图像样本中目标对象的标注类别训练教师模型,在教师模型的分类精度达到要求时,再将图像样本的标注类别和教师模型中归一化层的输出结果一起作为学生模型的监督信息来训练学生模型,这样,可尽可能多地为学生模型提供先验信息,使学生模型尽可能快地学习到教师模型学习到的知识。
而在搜索任务中,仅需利用学生模型中位于分类层之前的目标层(至少包括卷积层)提取待处理图像中目标对象的图像特征,将提取的图像特征与各候选对象的图像特征进行比对,以在各候选对象中搜索与待处理图像中目标对象匹配的对象。为了尽可能准确地表达每个目标对象的图像特征,各候选对象的图像特征一般是利用教师模型提取的,也就是说,学生模型提取的图像特征需与教师模型提取的图像特征进行比对。而实际上,相关技术中在训练学生模型时未考虑学生模型提取的图像特征与教师模型提取的图像特征之间的关联,即,未对学生模型提取的用于搜索的目标对象的图像特征进行优化,使学生模型和教师模型抽取的相同目标对象的图像特征未在空间上进行对齐(在空间上的距离还比较远),因此,最终得到的学生模型对目标对象的搜索效果并不好。
为了解决上述问题,本申请实施例提供一种用于图像处理的学生模型的训练方法,预先对多个图像样本中的目标对象进行分类训练得到教师模型,获取教师模型中分类层的参数,利用教师模型中分类层的参数初始化待训练 的学生模型中分类层的参数,使两者使用的分类层的参数产生联系,之后,将至少部分图像样本输入到学生模型中,以对这至少部分图像样本中的目标对象进行分类,并以学生模型中目标层学习到的各类目标对象的图像特征和教师模型中目标层学习到的各类目标对象的图像特征之间的趋近程度为目标,调整学生模型中位于分类层之前的目标层的参数,直至确定学生模型的分类误差小于设定误差时停止训练,其中,学生模型与教师模型均包括依次相连的卷积层、分类层和归一化层,且学生模型和教师模型的归一化层使用相同的归一化函数。这样,可缩小学生模型与教师模型所提取到的各类目标对象的图像特征之间的差异,即,使学生模型学习到的每类目标对象的特征与教师模型学习到的每类目标对象的特征在空间上进行对齐,因此,可提升最终得到的学生模型对目标对象的搜索效果。
图1为本申请实施例提供的一种用于图像处理的学生模型的训练方法的流程图,包括以下步骤:
S101:获取教师模型中分类层的参数,其中,教师模型是对多个图像样本中的目标对象进行分类训练得到的,教师模型包括依次相连的卷积层、分类层和归一化层。
具体实施时,目标对象如人脸、植物、动物等。以目标对象为人脸为例,一个人的人脸为一类,需要教师模型将人脸分为多少类,就需要获取多少类人脸的图像样本,并且,每类图像样本的人脸数量可以有多个,以便教师模型可以更好地学习到每类人脸的图像特征。
一般地,教师模型从前至后包括依次相连的卷积层、分类层和归一化层,其中,卷积层用于对图像样本中的目标对象进行特征提取,分类层,用于对卷积层提取到的图像样本中目标对象的图像特征进行分类,归一化层如softmax层,用于对分类层的输出结果进行归一化处理,得到图像样本中目标对象所属类别的概率分布,概率最高的类别即是图像样本中目标对象所属的类别。
S102:利用教师模型中分类层的参数,初始化待训练的学生模型中分类 层的参数,其中,学生模型包括依次相连的卷积层、分类层和归一化层、且学生模型与教师模型的归一化层使用相同的归一化函数。
实际应用中,学生模型从前至后也至少包括依次相连的卷积层、分类层和归一化层,学生模型中各层的作用与教师模型中各层的作用相同,但学生模型的卷积层比教师模型的卷积层简单许多。
具体实施时,若教师模型与学生模型中分类层的参数的维数相同,即教师模型中分类层的参数矩阵与学生模型中分类层的参数矩阵的大小相同,则可直接将教师模型中分类层的参数作为待训练的学生模型中分类层的参数,即直接将教师模型中分类层的参数矩阵作为学生模型中分类层的参数矩阵。这样,可最大程度地保留学生模型与教师模型中分类层的联系,便于后续学生模型与教师模型提取的每类目标对象的图像特征在空间上对齐。
S103:将至少部分图像样本输入到学生模型中,以对这至少部分图像样本中的目标对象进行分类。
具体实施时,至少部分图像样本可以包括全部的图像样本,也可以包括部分图像样本。当输入学生模型的为部分图像样本时,为了使学生模型学习到各类目标对象的图像特征,这部分图像样本可以包含所有类别的目标对象的图像。
S104:根据学生模型对各图像样本中目标对象的分类损失值,调整学生模型中位于分类层之前的目标层的参数,使学生模型中目标层学习到的每类目标对象的图像特征趋近教师模型中目标层学习到的该类目标对象的图像特征,直至确定学生模型的分类误差小于设定误差时,停止训练学生模型。
其中,学生模型对每个图像样本中目标对象的分类损失值,用于表征学生模型提取的该图像样本中目标对象的图像特征与教师模型学习到的该图像样本中目标对象所属类别的特征中心之间的差异。
一般地,输入教师模型中分类层的目标对象的图像特征为特征向量,教师模型中分类层的参数矩阵中的每列元素可看作一个分类权重向量,每个分类权重向量对应目标对象的一个类别,用于描述该类别的目标对象的特征中 心。类似地,输入学生模型中分类层的目标对象的图像特征也为特征向量,学生模型中分类层的参数矩阵中的每列元素也可看作一个分类权重向量,每个分类权重向量也对应目标对象的一个类,同样用于描述该类别的目标对象的特征中心。
具体实施时,可计算学生模型中输入分类层的每个图像样本中目标对象的特征向量和学生模型中分类层的每个分类权重向量之间的夹角,该夹角用于表征该图像样本中的目标对象与该分类权重向量对应的类别之间的接近程度,然后,根据目标夹角、以及该图像样本中目标对象的特征向量和各分类权重向量之间的夹角,计算学生模型对该图像样本中目标对象的分类损失值,其中,目标夹角是该图像样本中目标对象的特征向量和目标分类权重向量之间的夹角,目标分类权重向量是指该图像样本中目标对象的标注类别对应的分类权重向量。
比如,根据以下公式计算学生模型对第i个图像样本中目标对象的分类损失值loss
i:
其中,y
i表示第i个图像样本中目标对象的标注类别,
表示第i个图像样本中目标对象的特征向量和第i个图像样本中目标对象的标注类别对应的目标分类权重向量之间的夹角,即
表示第i个图像样本中目标对象的目标夹角,θ
j表示第i个图像样本中目标对象的特征向量和学生模型的分类层中第j个分类权重向量之间的夹角,n表示目标对象的类别总数,s和m为预设常数,比如s=64、m=0.5,i和j均为整数。
需要说明的是,公式中j的取值从1到n且j≠y
i,是指j取n个类别中除第i个图像样本中目标对象的标注类别之外的所有类别。
进一步地,可保持学生模型中分类层的参数不变,根据学生模型对各图像样本中目标对象的分类损失值,采用梯度下降算法调整学生模型中位于分类层之前的目标层的参数,直至确定学生模型的分类误差小于设定误差时,停止训练学生模型。
本申请实施例中,利用教师模型中分类层的参数初始化待训练的学生模型中分类层的参数,使两者使用的分类层的参数产生联系,并要求学生模型中目标层学习到的每类目标对象的图像特征趋近教师模型中目标层学习到的该类目标对象的图像特征,即是要求学生模型中目标层最终得到的各类目标对象的图像特征与教师模型中目标层得到的各类目标对象的图像特征在空间上尽可能地对齐。这样,可缩小学生模型与教师模型所提取到的各类目标对象的图像特征之间的差异,更符合搜索任务中以学生模型提取的目标对象的图像特征作为搜索特征与教师模型提取的目标对象的图像特征进行匹配的特点,因此,可提升最终得到的学生模型对目标对象的搜索效果。并且,这种仅利用教师模型中分类层的参数教导学生模型训练的方式不影响计算分类损失值的方式,即这种蒸馏方式可与现有分类任务中设计的分类损失计算方式自由结合,因此,灵活度比较高、使用也更加方便。
下面结合具体实时例对上述过程进行说明。
图2为本申请实施例提供的一种用于图像处理的学生模型的训练过程的示意图,其中,教师模型和学生模型均包括依次相连的卷积层、分类层和softmax层,即图2所示的目标层仅包括卷积层,教师模型的卷积层比学生模型的卷积层复杂,但教师模型和学生模型的卷积层均输出相同维度的目标对象的图像特征,教师模型和学生模型中分类层的维度相同,且教师模型和学生模型的softmax层使用相同的归一化函数。
假设教师模型中卷积层输出的是图像样本中目标对象的1*512维的特征向量,并假设将目标对象分为1000个类别,则教师模型中分类层的参数为512*1000的参数矩阵,教师模型中进入softmax层的为1*1000的概率数据,教师模型中softmax层输出的是经过归一化处理后的1*1000的概率数据,这 1*1000的概率数据表示图像样本中目标对象在1000个类别之间的分布概率,概率最高的类别即是教师模型判定的图像样本中目标对象所属的类别。
具体实施时,可先利用图像样本和图像样本中目标对象的标注类别训练得到教师模型,然后,获取教师模型中分类层使用的512*1000的参数矩阵,将获取的教师模型中分类层使用的512*1000的参数矩阵作为待训练的学生模型中分类层使用的512*1000的参数矩阵,之后,将至少部分图像样本输入到学生模型中,以对这至少部分图像样本中的目标对象进行分类。
虽然学生模型的卷积层比教师模型中卷积层要简单,但学生模型的卷积层同样对每个图像样本中的目标对象提取出1*512维的特征向量,学生模型中分类层的参数与教师模型中分类层的参数相同,学生模型和教师模型的softmax层使用相同的归一化函数。那么,使学生模型中进入分类层的每个图像样本中目标对象的特征向量尽可能地趋近教师模型中进入分类层的该图像样本中目标对象的特征向量,即可使学生模型与教师模型所提取的各类目标对象的图像特征尽可能地实现空间对齐。
为此,可根据学生模型中输入分类层的各图像样本中目标对象的图像特征、学生模型中分类层的参数和各图像样本中目标对象的标注类别,计算学生模型对各图像样本中目标对象的平均分类损失值。
比如,根据以下公式计算学生模型对各图像样本中目标对象的平均分类损失值L:
其中,N表示图像样本的个数,y
i表示第i个图像样本中目标对象的标注类别,
表示第i个图像样本中目标对象的特征向量和第i个图像样本中目标对象的标注类别对应的目标分类权重向量之间的夹角,即
表示第i个图 像样本中目标对象的目标夹角,θ
j表示第i个图像样本中目标对象的特征向量和学生模型的分类层中第j个分类权重向量之间的夹角,n表示目标对象的类别总数,即1000,s和m为常数,比如s=64、m=0.5,i和j均为整数。
进一步地,保持学生模型中分类层的参数不变,根据学生模型对各图像样本中目标对象的分类损失值,采用梯度下降算法调整学生模型中位于分类层之前的目标层的参数,直至确定学生模型的分类误差小于设定误差时,停止训练学生模型。
此外,具体实施时,为了减少搜索阶段需处理的数据量、提升搜索速度,教师模型和学生模型还可以均包括位于卷积层和分类层之间的用于进行降维处理的降维层,该降维层用于目标对象的图像特征进行压缩。此时,根据学生模型对各图像样本中目标对象的分类损失值,调整学生模型中位于分类层之前的目标层的参数,即是调整学生模型中卷积层和降维层的参数。
图3为本申请实施例提供的一种对象搜索方法的流程图,包括以下步骤:
S301:获取待处理图像。
S302:利用学生模型中位于分类层之前的目标层对待处理图像中的目标对象进行特征提取,其中,学生模型采用本申请实施提供的用于图像处理的学生模型的训练方法训练得到。
具体实施时,若确定学生模型包括卷积层、分类层和归一化层,则仅利用学生模型的卷积层对待处理图像中的目标对象进行特征提取;若确定学生模型包括卷积层、降维层、分类层和归一化层,则利用学生模型的卷积层和降维层对待处理图像中的目标对象进行特征提取。
S303:将提取的待处理图像中目标对象的图像特征与各候选对象的图像特征进行比对,其中,各候选对象的图像特征是利用教师模型中位于分类层之前的目标层提取的,教师模型是用于训练学生模型的模型。
具体实施时,若确定教师模型包括依次相连的卷积层、分类层和归一化层,则各候选对象的图像特征是仅利用教师模型的卷积层提取的;若确定教 师模型包括依次相连的卷积层、降维层、分类层和归一化层,则各候选对象的图像特征是利用教师模型的卷积层和降维层提取的。
S304:根据比对结果,确定待处理图像中的目标对象在各候选对象中的搜索结果。
具体实施时,可将各候选对象的图像特征中与待处理图像中目标对象的图像特征匹配度最高的候选对象,作为待处理图像中的目标对象在各候选对象中的搜索结果。
以人脸识别为例。
在人脸识别任务中,为了加快人脸识别速度和精度,往往需要使用大量的人脸图像训练一个教师模型,然后,利用知识蒸馏的方式将教师模型学习到的知识教给容量很小的学生模型,以保证学生模型具有较高的人脸识别速度和精度。而学生模型在人脸搜索过程中需要将待识别人脸的图像特征与特征库中用教师模型预先提取的各候选人脸的图像特征进行比对,如果学生模型与教师模型学习到的相同类别的人脸图像的图像特征相差比较大,学生模型的搜索效果可想而知。因此,教师模型与学生模型提取的各人脸的图像特征在空间上实现对齐显得尤为重要。
本申请实施例中,可将教师模型中分类层的参数直接作为学生模型中分类层的参数,教给学生模型如何对齐教师模型提取的人脸图像特征的空间特征,且可以结合各种损失函数保证蒸馏效果,应用前景比较广泛、实用价值也比较大。
本申请实施例中,在训练学生模型时,利用教师模型中分类层的参数初始化待训练的学生模型中分类层的参数,使两者使用的分类层的参数产生联系,并要求学生模型中目标层学习到的每类目标对象的图像特征趋近教师模型中目标层学习到的该类目标对象的图像特征,即是要求学生模型中目标层最终得到的各类目标对象的图像特征与教师模型中目标层得到的各类目标对象的图像特征进行空间对齐处理,这样,可缩小学生模型与教师模型所提取到的各类目标对象的图像特征之间的差异,更符合搜索任务中以学生模型提 取的目标对象的图像特征作为搜索特征与教师模型提取的目标对象的图像特征进行匹配的特点,因此,可提升最终得到的学生模型对目标对象的搜索效果。
当本申请实施例中提供的方法以软件或硬件或软硬件结合实现的时候,电子设备中可以包括多个功能模块,每个功能模块可以包括软件、硬件或其结合。
图4为本申请实施例提供的一种用于图像处理的学生模型的训练装置的结构示意图,包括获取模块401、初始化模块402、输入模块403、调整模块404。
获取模块401,用于获取教师模型中分类层的参数,所述教师模型是对多个图像样本中的目标对象进行分类训练得到的,所述教师模型包括依次相连的卷积层、分类层和归一化层;
初始化模块402,用于利用所述教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,所述学生模型包括依次相连的卷积层、分类层和归一化层、且所述学生模型与所述教师模型的归一化层使用相同的归一化函数;
输入模块403,用于将至少部分图像样本输入到所述学生模型中,以对所述至少部分图像样本中的目标对象进行分类;
调整模块404,用于根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中位于分类层之前的目标层的参数,使所述学生模型中目标层学习到的每类目标对象的图像特征趋近所述教师模型中目标层学习到的该类目标对象的图像特征,直至确定所述学生模型的分类误差小于设定误差时,停止训练所述学生模型。
在一种可能的实施方式中,若所述教师模型与所述学生模型中分类层的参数的维数相同,则所述初始化模块402具体用于:
将所述教师模型中分类层的参数作为待训练的学生模型中分类层的参数。
在一种可能的实施方式中,若输入所述学生模型中分类层的目标对象的 图像特征为特征向量、且所述学生模型中分类层的参数包括多个分类权重向量,则所述调整模块404采用如下方式计算所述学生模型对每个图像样本中目标对象的分类损失值:
计算所述学生模型中输入分类层的每个图像样本中目标对象的特征向量和每个分类权重向量之间的夹角,所述夹角用于表征该图像样本中的目标对象与该分类权重向量对应的类别之间的接近程度;
根据目标夹角、以及该图像样本中目标对象的特征向量和各分类权重向量之间的夹角,计算所述学生模型对该图像样本中目标对象的分类损失值,所述目标夹角是该图像样本中目标对象的特征向量和目标分类权重向量之间的夹角,所述目标分类权重向量是指该图像样本中目标对象的标注类别对应的分类权重向量。
在一种可能的实施方式中,所述调整模块404根据以下公式计算所述学生模型对第i个图像样本中目标对象的分类损失值loss
i:
其中,y
i表示第i个图像样本中目标对象的标注类别,
表示第i个图像样本中目标对象的目标夹角,θ
j表示第i个图像样本中目标对象的特征向量和学生模型的分类层中第j个分类权重向量之间的夹角,n表示目标对象的类别总数,s和m为预设常数,i和j均为整数。
在一种可能的实施方式中,若所述教师模型和所述学生模型均还包括位于卷积层和分类层之间的用于进行降维处理的降维层,则所述调整模块404具体用于:
根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中卷积层和降维层的参数。
图5为本申请实施例提供的一种对象搜索装置的结构示意图,包括获取 模块501、特征提取模块502、比对模块503、确定模块504。
获取模块501,用于获取待处理图像;
特征提取模块502,用于利用学生模型中位于分类层之前的目标层对所述待处理图像中的目标对象进行特征提取,所述学生模型采用上述任一用于图像处理的学生模型的训练方法训练得到;
比对模块503,用于将提取的所述待处理图像中目标对象的图像特征与各候选对象的图像特征进行比对,其中,各候选对象的图像特征是利用教师模型中位于分类层之前的目标层提取的,所述教师模型是用于训练所述学生模型的模型;
确定模块504,用于根据比对结果,确定所述待处理图像中的目标对象在各候选对象中的搜索结果。
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。各个模块相互之间的耦合可以是通过一些接口实现,这些接口通常是电性通信接口,但是也不排除可能是机械接口或其它的形式接口。因此,作为分离部件说明的模块可以是或者也可以不是物理上分开的,既可以位于一个地方,也可以分布到同一个或不同设备的不同位置上。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
图6为本申请实施例提供的一种电子设备的结构示意图,该电子设备包括收发器601以及处理器602等物理器件,其中,处理器602可以是一个中央处理单元(Central Processing Unit,CPU)、微处理器、专用集成电路、可编程逻辑电路、大规模集成电路、或者为数字处理单元等等。收发器601用于电子设备和其他设备进行数据收发。
该电子设备还可以包括存储器603用于存储处理器602执行的软件指令,当然还可以存储电子设备需要的一些其他数据,如电子设备的标识信息、电 子设备的加密信息、用户数据等。存储器603可以是易失性存储器(Volatile Memory),例如随机存取存储器(Random-Access Memory,RAM);存储器603也可以是非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(Flash Memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD)、或者存储器603是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器603可以是上述存储器的组合。
本申请实施例中不限定上述处理器602、存储器603以及收发器601之间的具体连接介质。本申请实施例在图6中仅以存储器603、处理器602以及收发器601之间通过总线604连接为例进行说明,总线在图6中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器602可以是专用硬件或运行软件的处理器,当处理器602可以运行软件时,处理器602读取存储器603存储的软件指令,并在所述软件指令的驱动下,执行前述实施例中涉及的用于图像处理的学生模型的训练方法。
本申请实施例还提供了一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,所述电子设备能够执行前述实施例中涉及的用于图像处理的学生模型的训练方法。
在一些可能的实施方式中,本申请提供的用于图像处理的学生模型的训练方法的各个方面还可以实现为一种程序产品的形式,所述程序产品中包括有程序代码,当所述程序产品在电子设备上运行时,所述程序代码用于使所述电子设备执行前述实施例中涉及的用于图像处理的学生模型的训练方法。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个 导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
本申请实施例中用于图像处理的学生模型的训练的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在计算设备上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络包括局域网(LAN)或广域网(WAN)连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
应当注意,尽管在上文详细描述中提及了装置的若干单元或子单元,但是这种划分仅仅是示例性的并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多单元的特征和功能可以在一个单元中具体化。反之, 上文描述的一个单元的特征和功能可以进一步划分为由多个单元来具体化。
此外,尽管在附图中以特定顺序描述了本申请方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、装置(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了 基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (14)
- 一种用于图像处理的学生模型的训练方法,其特征在于,包括:获取教师模型中分类层的参数,所述教师模型是对多个图像样本中的目标对象进行分类训练得到的,所述教师模型包括依次相连的卷积层、分类层和归一化层;利用所述教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,所述学生模型包括依次相连的卷积层、分类层和归一化层、且所述学生模型与所述教师模型的归一化层使用相同的归一化函数;将至少部分图像样本输入到所述学生模型中,以对所述至少部分图像样本中的目标对象进行分类;根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中位于分类层之前的目标层的参数,使所述学生模型中目标层学习到的每类目标对象的图像特征趋近所述教师模型中目标层学习到的该类目标对象的图像特征,直至确定所述学生模型的分类误差小于设定误差时,停止训练所述学生模型。
- 如权利要求1所述的方法,其特征在于,若所述教师模型与所述学生模型中分类层的参数的维数相同,则利用所述教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,包括:将所述教师模型中分类层的参数作为待训练的学生模型中分类层的参数。
- 如权利要求1所述的方法,其特征在于,若输入所述学生模型中分类层的目标对象的图像特征为特征向量、且所述学生模型中分类层的参数包括多个分类权重向量,则采用如下方式计算所述学生模型对每个图像样本中目标对象的分类损失值:计算所述学生模型中输入分类层的每个图像样本中目标对象的特征向量和每个分类权重向量之间的夹角,所述夹角用于表征该图像样本中的目标对象与该分类权重向量对应的类别之间的接近程度;根据目标夹角、以及该图像样本中目标对象的特征向量和各分类权重向量之间的夹角,计算所述学生模型对该图像样本中目标对象的分类损失值,所述目标夹角是该图像样本中目标对象的特征向量和目标分类权重向量之间的夹角,所述目标分类权重向量是指该图像样本中目标对象的标注类别对应的分类权重向量。
- 如权利要求1-4任一所述的方法,其特征在于,若所述教师模型和所述学生模型均还包括位于卷积层和分类层之间的用于进行降维处理的降维层,则根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中位于分类层之前的目标层的参数,包括:根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中卷积层和降维层的参数。
- 一种对象搜索方法,其特征在于,包括:获取待处理图像;利用学生模型中位于分类层之前的目标层对所述待处理图像中的目标对象进行特征提取,所述学生模型采用权利要求1-5任一所述的方法训练得到;将提取的所述待处理图像中目标对象的图像特征与各候选对象的图像特 征进行比对,其中,各候选对象的图像特征是利用教师模型中位于分类层之前的目标层提取的,所述教师模型是用于训练所述学生模型的模型;根据比对结果,确定所述待处理图像中的目标对象在各候选对象中的搜索结果。
- 一种用于图像处理的学生模型的训练装置,其特征在于,包括:获取模块,用于获取教师模型中分类层的参数,所述教师模型是对多个图像样本中的目标对象进行分类训练得到的,所述教师模型包括依次相连的卷积层、分类层和归一化层;初始化模块,用于利用所述教师模型中分类层的参数,初始化待训练的学生模型中分类层的参数,所述学生模型包括依次相连的卷积层、分类层和归一化层、且所述学生模型与所述教师模型的归一化层使用相同的归一化函数;输入模块,用于将至少部分图像样本输入到所述学生模型中,以对所述至少部分图像样本中的目标对象进行分类;调整模块,用于根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中位于分类层之前的目标层的参数,使所述学生模型中目标层学习到的每类目标对象的图像特征趋近所述教师模型中目标层学习到的该类目标对象的图像特征,直至确定所述学生模型的分类误差小于设定误差时,停止训练所述学生模型。
- 如权利要求7所述的装置,其特征在于,若所述教师模型与所述学生模型中分类层的参数的维数相同,则所述初始化模块具体用于:将所述教师模型中分类层的参数作为待训练的学生模型中分类层的参数。
- 如权利要求7所述的装置,其特征在于,若输入所述学生模型中分类层的目标对象的图像特征为特征向量、且所述学生模型中分类层的参数包括多个分类权重向量,则所述调整模块采用如下方式计算所述学生模型对每个图像样本中目标对象的分类损失值:计算所述学生模型中输入分类层的每个图像样本中目标对象的特征向量 和每个分类权重向量之间的夹角,所述夹角用于表征该图像样本中的目标对象与该分类权重向量对应的类别之间的接近程度;根据目标夹角、以及该图像样本中目标对象的特征向量和各分类权重向量之间的夹角,计算所述学生模型对该图像样本中目标对象的分类损失值,所述目标夹角是该图像样本中目标对象的特征向量和目标分类权重向量之间的夹角,所述目标分类权重向量是指该图像样本中目标对象的标注类别对应的分类权重向量。
- 如权利要求7-10任一所述的装置,其特征在于,若所述教师模型和所述学生模型均还包括位于卷积层和分类层之间的用于进行降维处理的降维层,则所述调整模块具体用于:根据所述学生模型对各图像样本中目标对象的分类损失值,调整所述学生模型中卷积层和降维层的参数。
- 一种对象搜索装置,其特征在于,包括:获取模块,用于获取待处理图像;特征提取模块,用于利用学生模型中位于分类层之前的目标层对所述待处理图像中的目标对象进行特征提取,所述学生模型采用权利要求1-5任一所述的方法训练得到;比对模块,用于将提取的所述待处理图像中目标对象的图像特征与各候选对象的图像特征进行比对,其中,各候选对象的图像特征是利用教师模型中位于分类层之前的目标层提取的,所述教师模型是用于训练所述学生模型的模型;确定模块,用于根据比对结果,确定所述待处理图像中的目标对象在各候选对象中的搜索结果。
- 一种电子设备,其特征在于,包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中:所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-6任一所述的方法。
- 一种存储介质,其特征在于,当所述存储介质中的指令由电子设备的处理器执行时,所述电子设备能够执行如权利要求1-6中任一所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011089981.3A CN112184508B (zh) | 2020-10-13 | 2020-10-13 | 一种用于图像处理的学生模型的训练方法及装置 |
CN202011089981.3 | 2020-10-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022077646A1 true WO2022077646A1 (zh) | 2022-04-21 |
Family
ID=73949527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/126837 WO2022077646A1 (zh) | 2020-10-13 | 2020-11-05 | 一种用于图像处理的学生模型的训练方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112184508B (zh) |
WO (1) | WO2022077646A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115700845A (zh) * | 2022-11-15 | 2023-02-07 | 智慧眼科技股份有限公司 | 人脸识别模型训练方法、人脸识别方法、装置及相关设备 |
CN117726884A (zh) * | 2024-02-09 | 2024-03-19 | 腾讯科技(深圳)有限公司 | 对象类别识别模型的训练方法、对象类别识别方法及装置 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408571B (zh) * | 2021-05-08 | 2022-07-19 | 浙江智慧视频安防创新中心有限公司 | 一种基于模型蒸馏的图像分类方法、装置、存储介质及终端 |
CN113408570A (zh) * | 2021-05-08 | 2021-09-17 | 浙江智慧视频安防创新中心有限公司 | 一种基于模型蒸馏的图像类别识别方法、装置、存储介质及终端 |
CN112949786B (zh) * | 2021-05-17 | 2021-08-06 | 腾讯科技(深圳)有限公司 | 数据分类识别方法、装置、设备及可读存储介质 |
CN113361572B (zh) * | 2021-05-25 | 2023-06-27 | 北京百度网讯科技有限公司 | 图像处理模型的训练方法、装置、电子设备以及存储介质 |
CN113361384A (zh) * | 2021-06-03 | 2021-09-07 | 深圳前海微众银行股份有限公司 | 人脸识别模型压缩方法、设备、介质及计算机程序产品 |
CN113486978B (zh) * | 2021-07-26 | 2024-03-05 | 北京达佳互联信息技术有限公司 | 文本分类模型的训练方法、装置、电子设备及存储介质 |
CN113657523A (zh) * | 2021-08-23 | 2021-11-16 | 科大讯飞股份有限公司 | 一种图像目标分类方法、装置、设备及存储介质 |
CN114298224B (zh) * | 2021-12-29 | 2024-06-18 | 云从科技集团股份有限公司 | 图像分类方法、装置以及计算机可读存储介质 |
CN115272881B (zh) * | 2022-08-02 | 2023-03-21 | 大连理工大学 | 动态关系蒸馏的长尾遥感图像目标识别方法 |
CN116070138B (zh) * | 2023-03-06 | 2023-07-07 | 南方电网调峰调频发电有限公司检修试验分公司 | 一种抽水蓄能机组的状态监测方法、装置、设备及介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191453A (zh) * | 2018-09-14 | 2019-01-11 | 北京字节跳动网络技术有限公司 | 用于生成图像类别检测模型的方法和装置 |
CN110647893A (zh) * | 2019-09-20 | 2020-01-03 | 北京地平线机器人技术研发有限公司 | 目标对象识别方法、装置、存储介质和设备 |
CN111242297A (zh) * | 2019-12-19 | 2020-06-05 | 北京迈格威科技有限公司 | 基于知识蒸馏的模型训练方法、图像处理方法及装置 |
CN111353542A (zh) * | 2020-03-03 | 2020-06-30 | 腾讯科技(深圳)有限公司 | 图像分类模型的训练方法、装置、计算机设备和存储介质 |
CN111639710A (zh) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | 图像识别模型训练方法、装置、设备以及存储介质 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180268292A1 (en) * | 2017-03-17 | 2018-09-20 | Nec Laboratories America, Inc. | Learning efficient object detection models with knowledge distillation |
US11410029B2 (en) * | 2018-01-02 | 2022-08-09 | International Business Machines Corporation | Soft label generation for knowledge distillation |
CN109034219B (zh) * | 2018-07-12 | 2021-08-24 | 上海商汤智能科技有限公司 | 图像的多标签类别预测方法及装置、电子设备和存储介质 |
CA3076424A1 (en) * | 2019-03-22 | 2020-09-22 | Royal Bank Of Canada | System and method for knowledge distillation between neural networks |
CN111738401A (zh) * | 2019-03-25 | 2020-10-02 | 北京三星通信技术研究有限公司 | 模型优化方法、分组压缩方法、相应的装置、设备 |
CN110674880B (zh) * | 2019-09-27 | 2022-11-11 | 北京迈格威科技有限公司 | 用于知识蒸馏的网络训练方法、装置、介质与电子设备 |
CN110852426B (zh) * | 2019-11-19 | 2023-03-24 | 成都晓多科技有限公司 | 基于知识蒸馏的预训练模型集成加速方法及装置 |
CN111210000B (zh) * | 2019-12-18 | 2021-11-23 | 浙江工业大学 | 一种基于固定特征的调制信号增量学习方法 |
CN111402311B (zh) * | 2020-03-09 | 2023-04-14 | 福建帝视信息科技有限公司 | 一种基于知识蒸馏的轻量级立体视差估计方法 |
CN111461212B (zh) * | 2020-03-31 | 2023-04-07 | 中国科学院计算技术研究所 | 一种用于点云目标检测模型的压缩方法 |
CN111667728B (zh) * | 2020-06-18 | 2021-11-30 | 思必驰科技股份有限公司 | 语音后处理模块训练方法和装置 |
CN111738436B (zh) * | 2020-06-28 | 2023-07-18 | 电子科技大学中山学院 | 一种模型蒸馏方法、装置、电子设备及存储介质 |
CN111597374B (zh) * | 2020-07-24 | 2020-10-27 | 腾讯科技(深圳)有限公司 | 图像分类方法、装置及电子设备 |
-
2020
- 2020-10-13 CN CN202011089981.3A patent/CN112184508B/zh active Active
- 2020-11-05 WO PCT/CN2020/126837 patent/WO2022077646A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191453A (zh) * | 2018-09-14 | 2019-01-11 | 北京字节跳动网络技术有限公司 | 用于生成图像类别检测模型的方法和装置 |
CN110647893A (zh) * | 2019-09-20 | 2020-01-03 | 北京地平线机器人技术研发有限公司 | 目标对象识别方法、装置、存储介质和设备 |
CN111242297A (zh) * | 2019-12-19 | 2020-06-05 | 北京迈格威科技有限公司 | 基于知识蒸馏的模型训练方法、图像处理方法及装置 |
CN111353542A (zh) * | 2020-03-03 | 2020-06-30 | 腾讯科技(深圳)有限公司 | 图像分类模型的训练方法、装置、计算机设备和存储介质 |
CN111639710A (zh) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | 图像识别模型训练方法、装置、设备以及存储介质 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115700845A (zh) * | 2022-11-15 | 2023-02-07 | 智慧眼科技股份有限公司 | 人脸识别模型训练方法、人脸识别方法、装置及相关设备 |
CN115700845B (zh) * | 2022-11-15 | 2023-08-11 | 智慧眼科技股份有限公司 | 人脸识别模型训练方法、人脸识别方法、装置及相关设备 |
CN117726884A (zh) * | 2024-02-09 | 2024-03-19 | 腾讯科技(深圳)有限公司 | 对象类别识别模型的训练方法、对象类别识别方法及装置 |
CN117726884B (zh) * | 2024-02-09 | 2024-05-03 | 腾讯科技(深圳)有限公司 | 对象类别识别模型的训练方法、对象类别识别方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN112184508A (zh) | 2021-01-05 |
CN112184508B (zh) | 2021-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022077646A1 (zh) | 一种用于图像处理的学生模型的训练方法及装置 | |
CN111797893B (zh) | 一种神经网络的训练方法、图像分类系统及相关设备 | |
US12100192B2 (en) | Method, apparatus, and electronic device for training place recognition model | |
CN111353076B (zh) | 训练跨模态检索模型的方法、跨模态检索的方法和相关装置 | |
WO2020155518A1 (zh) | 物体检测方法、装置、计算机设备及存储介质 | |
WO2022068195A1 (zh) | 跨模态的数据处理方法、装置、存储介质以及电子装置 | |
US11947626B2 (en) | Face recognition from unseen domains via learning of semantic features | |
US20220253856A1 (en) | System and method for machine learning based detection of fraud | |
CN114169442B (zh) | 基于双原型网络的遥感图像小样本场景分类方法 | |
US20240257423A1 (en) | Image processing method and apparatus, and computer readable storage medium | |
US20210034976A1 (en) | Framework for Learning to Transfer Learn | |
CN110929802A (zh) | 基于信息熵的细分类识别模型训练、图像识别方法及装置 | |
CN111898550B (zh) | 建立表情识别模型方法、装置、计算机设备及存储介质 | |
CN113051914A (zh) | 一种基于多特征动态画像的企业隐藏标签抽取方法及装置 | |
WO2023088174A1 (zh) | 目标检测方法及装置 | |
WO2023108985A1 (zh) | 绿色资产的占比的识别方法及相关产品 | |
WO2022056841A1 (en) | Neural architecture search via similarity-based operator ranking | |
CN113434683A (zh) | 文本分类方法、装置、介质及电子设备 | |
CN112668482A (zh) | 人脸识别训练方法、装置、计算机设备及存储介质 | |
WO2020135054A1 (zh) | 视频推荐方法、装置、设备及存储介质 | |
CN109657693B (zh) | 一种基于相关熵和迁移学习的分类方法 | |
CN111161238A (zh) | 图像质量评价方法及装置、电子设备、存储介质 | |
US20190378043A1 (en) | Technologies for discovering specific data in large data platforms and systems | |
CN111091198A (zh) | 一种数据处理方法及装置 | |
CN108733702B (zh) | 用户查询上下位关系提取的方法、装置、电子设备和介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20957424 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20957424 Country of ref document: EP Kind code of ref document: A1 |