CN114005015A - Model training method, electronic device, and computer-readable storage medium - Google Patents

Model training method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN114005015A
CN114005015A CN202111614740.0A CN202111614740A CN114005015A CN 114005015 A CN114005015 A CN 114005015A CN 202111614740 A CN202111614740 A CN 202111614740A CN 114005015 A CN114005015 A CN 114005015A
Authority
CN
China
Prior art keywords
model
feature
training
class
training samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111614740.0A
Other languages
Chinese (zh)
Other versions
CN114005015B (en
Inventor
浦煜
何武
付贤强
朱海涛
户磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Dilusense Technology Co Ltd
Original Assignee
Beijing Dilusense Technology Co Ltd
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dilusense Technology Co Ltd, Hefei Dilusense Technology Co Ltd filed Critical Beijing Dilusense Technology Co Ltd
Priority to CN202111614740.0A priority Critical patent/CN114005015B/en
Publication of CN114005015A publication Critical patent/CN114005015A/en
Application granted granted Critical
Publication of CN114005015B publication Critical patent/CN114005015B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application relates to the technical field of visual search, and discloses a model training method, electronic equipment and a computer-readable storage medium, wherein the method comprises the following steps: obtaining a training sample of a first model; labeling labels of feature classes for representing features of the training samples by the training samples; constructing a second model based on the network structure of the first model; obtaining a category central vector of each feature category corresponding to the third model according to the training sample and the third model; wherein, the first model and the third model are models with the same function; determining the classification layer weight of the second model according to the class central vector of each feature class; the second model is subjected to iterative training according to the training samples and the labels, parameters of the second model except for the weight of the classification layer are updated, the feature set extracted by the trained model can be directly compared with the feature library extracted by the used model, time and labor are saved, the cost is reduced, and the convenience of model industrial deployment is greatly improved.

Description

Model training method, electronic device, and computer-readable storage medium
Technical Field
The embodiment of the application relates to the technical field of visual matching and searching, in particular to a model training method, electronic equipment and a computer-readable storage medium.
Background
With the increasing maturity of visual matching and searching technology, recognition models based on visual matching and searching technology are widely used in many fields, such as image retrieval, pedestrian re-recognition, vehicle re-recognition, face recognition, etc., and these recognition models based on visual matching and searching technology are mapped to a feature embedding space through a deep neural network image, in this feature space, features of the same category are similar to each other and are gathered into a category, generally speaking, for large-scale image data in a data retrieval library, features of these image data are extracted in advance by the recognition models, the features of the image to be queried form a feature library gallery, and features of the image to be queried are extracted in real time by the recognition models, these features constitute a feature set probe of the image to be queried, the recognition models can traverse each feature in the probe, retrieve the features most similar to the template from the gallery, and returns corresponding information.
In an actual application scenario, in order to enable a user to obtain better use experience, an identification model based on a visual matching and search technology needs to be continuously updated in an iterative manner, however, after the identification model is updated, features in a probe are extracted by using a new model, but features in a galery are extracted by using an old model, in order to ensure that the features between the probe and the galery are consistent, technicians need to use the new model to extract features of original image data corresponding to the galery again, the process is very long in time-consuming and high in cost, and in some scenarios with high safety requirements, the original image data corresponding to the galery are automatically deleted after the galery is generated, the features cannot be extracted again, the features between the probe and the galery cannot be ensured to be consistent, and direct comparison cannot be performed.
Disclosure of Invention
An object of the embodiments of the present application is to provide a model training method, an electronic device, and a computer-readable storage medium, in which a feature set extracted from a trained model can be directly compared with a feature library extracted from an old model, so that time and labor are saved, cost is reduced, and convenience of model industrial deployment is greatly improved.
In order to solve the above technical problem, an embodiment of the present application provides a model training method, including the following steps: obtaining a training sample of a first model; wherein the training samples are marked with labels, and the labels are used for characterizing feature classes of the features of the training samples; constructing a second model based on the network structure of the first model; according to the training sample and a third model, obtaining a category central vector of each feature category corresponding to the third model; wherein the first model and the third model are functionally identical models; determining the classification layer weight of the second model according to the class central vector of each feature class; and performing iterative training on the second model according to the training samples and the labels, and updating parameters of the second model except the classification layer weight.
An embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method described above.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the model training method described above.
The model training method, the electronic device, and the computer-readable storage medium provided in the embodiments of the present application, first obtain a training sample of a first model, label a feature class of the feature of the training sample on the training sample, then construct a second model based on a network structure of the first model, obtain a class center vector of each feature class corresponding to the third model according to the training sample of the first model and a third model having the same function as the first model, determine a classification level weight of the second model according to the class center vector of each feature class corresponding to the third model, and finally perform iterative training on the second model according to the training sample of the first model and the label labeled on the training sample, update parameters of the second model other than the classification level weight, in consideration of consistency between a feature set extracted by a new model and a feature library extracted by an old model, the method comprises the steps of constructing a compatible model based on a network structure of the new model, determining classification layer weights of the compatible model according to classification center vectors of all feature classes corresponding to the old model, performing iterative training on the compatible model by using training samples and labels of the new model, updating parameters except the classification layer weights of the compatible model, forcing the classification center of each feature class corresponding to the trained compatible model to be matched with the classification center of each feature class corresponding to the old model, directly comparing feature sets extracted by the trained compatible model with the feature library extracted by the old model without performing feature extraction again, time and labor are saved, the cost is reduced, and the convenience of industrial deployment of the model is greatly improved.
In addition, the number of the training samples is several, and the iteratively training the second model according to the training samples and the labels to update the parameters of the second model except the classification layer weight includes: inputting the training samples into the first model and the second model respectively, and obtaining first feature vectors extracted from the training samples by the first model and second feature vectors extracted from the training samples by the second model respectively; constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels and the classification layer weight; the method comprises the steps of taking a loss function as supervision, carrying out iterative training on a second model based on a small-batch gradient descent method, updating parameters of the second model except for the weight of a classification layer until the loss function meets a preset convergence condition, taking a new model as a teacher model, taking a compatible model as a student model, and training the compatible model based on a model distillation principle, namely, when the compatible model is trained, referring to and using characteristics extracted from each training sample by the new model, so that the compatible model can quickly obtain the functions and performances of the new model, and the training effect of the compatible model is further improved.
Additionally, the loss function includes a first loss term, the constructing a loss function includes: according to the number of the training samples and the total number of the feature classes, the second feature vectors, the labels, the classification layer weights and a preset softmax cross entropy loss function, the first loss item is constructed, the softmax cross entropy loss function is used as the first loss item, the construction is simple, the calculation amount is small, and the speed of compatible model training can be improved.
In addition, the loss function further includes a second loss term, the loss function is a sum of the first loss term and the second loss term, and after the first loss term is constructed, the method further includes: calculating a first Euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and the label, and determining the number of times of calculation corresponding to each feature type; calculating the mean value of the first Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first Euclidean distance; calculating a second Euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and the label; calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance; the second loss item is constructed according to the mean value of the first Euclidean distance, the mean value of the second Euclidean distance and a preset L1 loss function, and considering that if only a softmax cross entropy function is used as the loss function, a trained compatible model is restricted by the performance of an old model, namely the performance bottleneck of the old model cannot be broken, the high performance advantage of the new model is lost, therefore, the second loss item is further arranged in the embodiment, the compatible model is driven to learn the relative structure information of the feature distribution in the same feature category of the new model, the intra-class compactness is better, and the performance of the compatible model is greatly improved.
In addition, the number of the training samples is several, and the obtaining of the category center vector of each feature category corresponding to the third model according to the training samples and the third model includes: inputting each training sample into the third model, and obtaining each third feature vector extracted from each training sample by the third model; according to the third feature vectors and the labels, a third mean value of the third feature vectors corresponding to the same feature type in the feature types is calculated, and the third mean value of the third feature vectors corresponding to the same feature type is used as a type central vector of the feature type corresponding to the third model, so that the determined type central vector of each feature type can be more accurate and can represent the feature type of the user.
In addition, the determining the classification layer weight of the second model according to the class center vector of each feature class includes: respectively transposing the category central vectors of the feature categories; splicing the category central vectors of the feature categories after the conversion into a parameter matrix; the parameter matrix is used as the classification layer weight of the second model, the classification central vectors of the transformed feature classes are spliced into the parameter matrix to be used as the classification layer weight of the second model, so that the compatible model can be better matched with the features of the old model, and the consistency of the features between the feature set extracted by the compatible model and the features in the feature library extracted by the old model is further ensured.
Drawings
One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.
FIG. 1 is a flow diagram of model training according to an embodiment of the present application;
FIG. 2 is a flow diagram of iteratively training a second model based on training samples and labels to update parameters of the second model other than classification level weights, according to an embodiment of the present application;
FIG. 3 is a flow diagram of constructing a second lossy term in accordance with an embodiment of the present application;
FIG. 4 is a flowchart illustrating obtaining a class center vector of each feature class corresponding to a third model according to a training sample and the third model according to an embodiment of the present application;
FIG. 5 is a flow diagram for determining classification layer weights for a second model based on class center vectors for feature classes, according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that in the examples of the present application, numerous technical details are set forth in order to provide a better understanding of the present application. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present application, and the embodiments may be mutually incorporated and referred to without contradiction.
An embodiment of the present application relates to a model training method, which is applied to an electronic device, where the electronic device may be a terminal or a server, and the electronic device in this embodiment and the following embodiments are described by taking the server as an example.
The model trained by the embodiment of the present application may be: an image retrieval model for performing image retrieval by using an image recognition image; a vehicle identification model for vehicle identification of traffic; a re-identification model for re-identifying pedestrians in shopping malls and stations; the face recognition model is used for security monitoring and face payment.
The flowchart of the model training method in this embodiment may be as shown in fig. 1, and includes:
step 101, a training sample of the first model is obtained, and the training sample is labeled with a label.
Specifically, when the server trains the second model, the training samples of the first model may be obtained first, the training samples of the first model are all labeled with labels, and the labels labeled on the training samples are used to characterize the feature categories of the features of the training samples.
In one example, in order to obtain a better user experience, the recognition model based on visual matching and search techniques needs to be continuously updated iteratively, such as by using more advanced algorithms, adapting old models to new models using new parameters, etc., resulting in new models, whereas when using new models, the consistency of the features extracted by the new model and the features extracted by the old model needs to be ensured, otherwise, the features cannot be matched, technicians need to use the new model to extract the features of the original data corresponding to the feature library extracted by the old model again, the process is very long in time consumption and very high in cost, and when the original data is deleted, the new model cannot extract features, so the method builds and trains a compatible model (namely, a second model) based on the new model (namely, a first model) so as to be compatible with the features of the old model (a third model).
Step 102, constructing a second model based on the network structure of the first model.
In a specific implementation, the server may obtain a model file of the first model while obtaining a training sample of the first model, so as to specify a network structure of the first model, and construct the second model based on the network structure of the first model.
In one example, the server may directly treat the first model as the second model.
In one example, the server may take the network structure of the first model as the network structure of the second model and initialize the parameters of the layers of the second model.
And 103, acquiring a category center vector of each feature category corresponding to the third model according to the training sample and the third model.
In particular, the first model and the third model are functionally identical models, i.e. the first model and the third model may perform the same task, but the first model is not identical to the third model.
In one example, the first model and the third model use different algorithms.
In one example, the first model differs from the third model in network structure.
In specific implementation, after obtaining the training sample of the first model, the server may obtain a model file of the third model to obtain the third model, and then input the training sample into the third model to obtain a category central vector of each feature category corresponding to the third model, where each feature category corresponding to the first model, each feature category corresponding to the second model, and each feature category corresponding to the third model are the same, that is, the feature categories that can be extracted by the three models are the same.
In an example, the server may first obtain a class center vector of each feature class corresponding to the third model according to the training sample and the third model, and then construct the second model based on the network structure of the first model.
And step 104, determining the classification layer weight of the second model according to the class center vector of each feature class.
In a specific implementation, after the server obtains the class center vector of each feature class corresponding to the third model, the server may determine the classification layer weight of the second model based on the class center vector of each feature class corresponding to the third model, and lock the classification layer weight of the second model, that is, the classification layer weight of the second model is not updated in the subsequent iterative training of the second model.
And 105, performing iterative training on the second model according to the training samples and the labels, and updating parameters of the second model except the classification layer weight.
In an example, after determining the classification layer weight of the second model, the server may input the training sample into the second model to obtain an output result of the second model, perform iterative training on the second model according to the output result of the second model, the label marked on the training sample, and a preset loss function, update all parameters of the second model except the classification layer weight, such as the weight of each convolution layer, the bias of each convolution layer, and the like, after each training, and force the class center of each feature class corresponding to the trained compatible model to match the class center of each feature class corresponding to the old model.
In one example, the server may iteratively train the second model based on a stochastic gradient descent method.
In one example, the number of training samples is several, and the server may iteratively train the second model based on a small batch gradient descent method.
In one example, the number of training samples is several, and the server may iteratively train the second model based on a batch gradient descent method.
In this embodiment, a server first obtains a training sample of a first model, a label of a feature class for characterizing features of the training sample is labeled on the training sample, then a second model is constructed based on a network structure of the first model, a class center vector of each feature class corresponding to the third model is obtained according to the training sample of the first model and a third model having the same function as the first model, a classification layer weight of the second model is determined according to the class center vector of each feature class corresponding to the third model, finally, iterative training is performed on the second model according to the training sample of the first model and the label labeled on the training sample, parameters of the second model except the classification layer weight are updated, in consideration of the fact that feature sets extracted by a new model are consistent with features extracted by an old model, feature extraction needs to be performed on original data corresponding to a feature base extracted by the old model again by the new model, but the whole process is very long in time consumption and very high in cost, and when the original data is deleted, the original data cannot be re-extracted, and the embodiment of the application constructs a compatible model based on the network structure of the new model, determining classification layer weight of a compatible model according to the class center vector of each feature class corresponding to the old model, performing iterative training on the compatible model by using a training sample and a label of a new model, updating parameters except the classification layer weight of the compatible model, forcing the class center of each feature class corresponding to the trained compatible model to be matched with the class center of each feature class corresponding to the old model, and directly comparing the feature set extracted by using the trained compatible model with a feature library extracted by using the old model without performing feature extraction again, so that the method is time-saving and labor-saving, reduces cost and greatly improves the convenience of model industrial deployment.
In one embodiment, the number of training samples is several, the server performs iterative training on the second model based on a small batch gradient descent method, the server performs iterative training on the second model according to the training samples and the labels, and updates parameters of the second model except for the classification layer weight, which can be implemented through the steps shown in fig. 2, and specifically includes:
step 201, inputting each training sample into the first model and the second model, respectively, and obtaining each first feature vector extracted from each training sample by the first model and each second feature vector extracted from each training sample by the second model, respectively.
In a specific implementation, the server may input each training sample, that is, a small batch (mini-batch) of training samples, into the first model and the second model, respectively, where the first model and the second model extract features from each training sample, and the server obtains each first feature vector extracted from each training sample by the first model, and each second feature vector extracted from each training sample by the second model.
In one example, each first feature vector extracted from each training sample by the first model and each second feature vector extracted from each training sample by the second model are N-dimensional feature vectors, and the first feature vector extracted from the training sample x by the first model is represented as f1(x) The second feature vector extracted from the training sample x by the second model is represented as f2(x)。
Step 202, constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels of the training samples and the classification layer weight.
In the specific implementation, the server does not use a preset loss function for training, but constructs the loss function for training by itself, and after obtaining each first feature vector and each second feature vector, the server constructs the loss function by itself according to the number of training samples, the total number of feature classes, each first feature vector, each second feature vector, labels of the training samples and classification layer weights, that is, the server uses the first model as a teacher model, uses the second model as a student model, and refers to and uses the features extracted from each training sample by the first model during training based on the principle of model distillation, so that the second model can rapidly obtain the functions and performances of the first model.
And step 203, performing iterative training on the second model based on a small batch gradient descent method by taking the loss function as supervision, and updating parameters of the second model except for the weight of the classification layer until the loss function meets a preset convergence condition.
In a specific implementation, after each training, the server determines whether the constructed loss function meets a preset convergence condition, if the loss function meets the preset convergence condition, the server stores parameters of the second model at the time, issues the trained second model, and if the loss function does not meet the preset convergence condition, the server continues to perform iterative training on the second model, where the preset convergence condition may be set by a person skilled in the art according to actual needs, and the embodiment of the present application is not specifically limited to this.
In this embodiment, the training samples are a plurality of samples, and the iteratively training the second model according to the training samples and the labels to update the parameters of the second model except the weight of the classification layer includes: inputting the training samples into the first model and the second model respectively, and obtaining first feature vectors extracted from the training samples by the first model and second feature vectors extracted from the training samples by the second model respectively; constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels and the classification layer weight; the method comprises the steps of taking a loss function as supervision, carrying out iterative training on a second model based on a small-batch gradient descent method, updating parameters of the second model except for the weight of a classification layer until the loss function meets a preset convergence condition, taking a new model as a teacher model, taking a compatible model as a student model, and training the compatible model based on a model distillation principle, namely, when the compatible model is trained, referring to and using characteristics extracted from each training sample by the new model, so that the compatible model can quickly obtain the functions and performances of the new model, and the training effect of the compatible model is further improved.
In one embodiment, the number of the training samples is several, the server performs iterative training on the second model based on a small batch gradient descent method, the loss function constructed by the server includes a first loss item, the server can construct the first loss item according to the number of the training samples, the total number of the feature classes, each second feature vector, the labels of the training samples, the classification layer weights and a preset softmax cross entropy loss function, the softmax cross entropy loss function is used as the first loss item, the construction is simple, the calculation amount is small, and the speed of compatible model training can be increased.
In one example, the first loss term constructed by the server can be expressed by the following formula:
Figure 781469DEST_PATH_IMAGE001
in the formula, LsoftmaxFor the first loss term to construct, R is the number of training samples, xiFor the ith training sample, yiFor the label of the ith training sample, wTAs classification level weight, f2(x) And K is the total number of the feature classes.
In one embodiment, the number of training samples is several, the server performs iterative training on the second model based on a small batch gradient descent method, the loss function constructed by the server includes a second loss term, the loss function constructed by the server is the sum of the first loss term and the second loss term, and after the server constructs the first loss term, the server may construct the second loss term according to the steps shown in fig. 3, which specifically includes:
step 301, calculating a first euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and each label, and determining the number of times of calculation corresponding to each feature type.
In the specific implementation, the server determines the number of training samples, that is, the size of a batch of small-batch gradient descent is G, the feature classes that can be extracted by the first model, the second model and the third model are the same, the number of the feature classes that can be extracted by the three models is K, because the label is used for representing the feature class of the training samples, the server can determine the number of training samples corresponding to each feature class according to the label, the server sequentially traverses each feature class, calculates the first euclidean distance between every two training samples corresponding to the current feature class according to each first feature vector, and simultaneously determines the number of times that the current feature class needs to calculate the first euclidean distance.
In one example, three feature classes, feature class a, feature class b, and feature class c, are shared: 4 training samples are corresponding to the characteristic class A, and the server can calculate the number of times of calculation corresponding to the characteristic class A to be 6; the characteristic class B corresponds to 3 training samples, and the server can calculate the calculation times corresponding to the characteristic class B to be 3 times; the feature class C corresponds to 5 training samples, and the server can calculate the number of times of calculation corresponding to the feature class C to be 10.
Step 302, calculating a mean value of the first euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first euclidean distance.
Specifically, after the server calculates each first euclidean distance and determines the number of calculations corresponding to each feature type, the server may calculate an average value of the first euclidean distances corresponding to each feature type based on the number of calculations corresponding to each feature type and each first euclidean distance.
In an example, the server calculates a mean value of the first euclidean distances corresponding to each feature class, which may be implemented by the following formula:
Figure 911099DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
where K is the total number of feature classes, N (K) is the set of training samples for the kth feature class, φ1k(xi,xj) Is the mean value of the first Euclidean distance corresponding to the kth feature class, BkThe number of calculations corresponding to the kth feature class, f1(xi) For the first feature vector of the i-th training sample in the k-th feature class, f1(xj) A first feature vector for a jth training sample in the kth feature class.
Step 303, calculating a second euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and each label.
In a specific implementation, as the label is determined, the number of times that each feature type needs to calculate the first euclidean distance is the same as the number of times that each feature type needs to calculate the second euclidean distance, the server sequentially traverses each feature type, and calculates the second euclidean distance between every two training samples corresponding to the current feature type according to each second feature vector.
And 304, calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance.
Specifically, after the server calculates each second euclidean distance and determines the number of calculations corresponding to each feature type, the server may calculate an average value of the second euclidean distances corresponding to each feature type based on the number of calculations corresponding to each feature type and each second euclidean distance.
In an example, the server calculates a mean value of the second euclidean distances corresponding to each feature class, which may be implemented by the following formula:
Figure 928733DEST_PATH_IMAGE004
Figure 485617DEST_PATH_IMAGE005
where K is the total number of feature classes, N (K) is the set of training samples for the kth feature class, φ1k(xi,xj) Is the mean value of the first Euclidean distance corresponding to the kth feature class, BkThe number of calculations corresponding to the kth feature class, f2(xi) For the second feature vector of the i-th training sample in the k-th feature class, f2(xj) A second feature vector for a jth training sample in the kth feature class.
Step 305, constructing a second loss term according to the mean value of the first euclidean distance, the mean value of the second euclidean distance and a preset L1 loss function.
In one example, the server may construct the second loss term according to the mean value of the first euclidean distance, the mean value of the second euclidean distance, and a preset L1 loss function by the following formula:
Figure 549388DEST_PATH_IMAGE006
in the formula, Ldisλ is a preset balance factor, K is the total number of feature classes, n (K) is a set of training samples of the kth feature class, SmoothL1Is a preset L1 loss function, phi1k(xi,xj) Is the mean value of the first Euclidean distance, phi, corresponding to the kth feature class2k(xi,xj) Is the mean value of the second Euclidean distance corresponding to the kth feature class, BkThe number of calculations corresponding to the kth feature class, f1(xi) For the first feature vector of the i-th training sample in the k-th feature class, f2(xi) For the second feature vector of the i-th training sample in the k-th feature class, f1(xj) For the first feature vector of the jth training sample in the kth feature class, f1(xj) A second feature vector for the jth training sample in the kth feature class.
In one example, the server-constructed loss function includes a first loss term and a second loss term, and the loss function can be expressed as: l = Lsoftmax+Ldis,LsoftmaxIs the first loss term, LdisIs the second loss term.
In this embodiment, the loss function further includes a second loss term, and after constructing the first loss term, the method further includes: calculating a first Euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and the label, and determining the number of times of calculation corresponding to each feature type; calculating the mean value of the first Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first Euclidean distance; calculating a second Euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and the label; calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance; the second loss item is constructed according to the mean value of the first Euclidean distance, the mean value of the second Euclidean distance and a preset L1 loss function, and considering that if only a softmax cross entropy function is used as the loss function, a trained compatible model is restricted by the performance of an old model, namely the performance bottleneck of the old model cannot be broken, the high performance advantage of the new model is lost, therefore, the second loss item is further arranged in the embodiment, the compatible model is driven to learn the relative structure information of the feature distribution in the same feature category of the new model, the intra-class compactness is better, and the performance of the compatible model is greatly improved.
In an embodiment, the number of the training samples is several, and the server obtains the category center vector of each feature category corresponding to the third model according to the training samples and the third model, which may be implemented by the steps shown in fig. 4, and specifically includes:
step 401, inputting each training sample into the third model, and obtaining each third feature vector extracted from each training sample by the third model.
In a specific implementation, the server may input each training sample, that is, a small batch (mini-batch) of training samples, into the third model, where the third model extracts features from each training sample, and the server obtains each third feature vector extracted from each training sample by the third model.
In one example, each of the third feature vectors extracted from the training samples by the third model is an N-dimensional feature vector, and the third feature vector extracted from the training sample x by the third model is represented as f3(x)。
Step 402, according to the third feature vectors and the labels, calculating a third mean value of the third feature vectors corresponding to the same feature type in the feature types, and using the third mean value of the third feature vectors corresponding to the same feature type as a type center vector of the feature type corresponding to the third model.
Specifically, after obtaining each third feature vector, the server may calculate a third mean value of the third feature vectors corresponding to the same feature class in each feature class according to each third feature vector and the label, and use the third mean value of the third feature vectors corresponding to the same feature class as a class center vector of the feature class corresponding to the third model, so that the determined class center vector of each feature class is more accurate and can represent the feature class of the server.
In one example, the server may calculate a third mean value of third feature vectors corresponding to the same feature class in each feature class and use the third mean value of the third feature vectors corresponding to the same feature class as a class center vector of the feature class corresponding to the third model by using the following formula:
Figure 482708DEST_PATH_IMAGE007
in the formula, wkClass center vector, m, for the kth feature classkThe number of training samples for the kth feature class, N (k) the set of training samples for the kth feature class, f3(x) K is the total number of feature classes as the third feature vector.
In an embodiment, the determining, by the server, the classification layer weight of the second model according to the class center vector of each feature class may be implemented by the steps shown in fig. 5, which specifically include:
in step 501, the class center vectors of the feature classes are transposed.
And 502, splicing the category central vectors of the transferred feature categories into a parameter matrix.
Step 503, using the parameter matrix as the classification layer weight of the second model.
In specific implementation, the server transposes the category center vectors of the feature categories, and splices the transposed category center vectors of the feature categories into a parameter matrix, where the spliced parameter matrix can be represented as W = [ W = [ ]1 T,w2 T,…,wK T]The server takes the parameter matrix as the secondThe classification layer weight of the model can enable the compatible model to better match the features of the old model, and further ensure the consistency of the features between the feature set extracted by the compatible model and the feature library extracted by the old model.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
Another embodiment of the present application relates to an electronic device, as shown in fig. 6, including: at least one processor 601; and a memory 602 communicatively coupled to the at least one processor 601; the memory 602 stores instructions executable by the at least one processor 601, and the instructions are executed by the at least one processor 601 to enable the at least one processor 601 to perform the model training method in the above embodiments.
Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting together one or more of the various circuits of the processor and the memory. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.
Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the present application, and that various changes in form and details may be made therein without departing from the spirit and scope of the present application in practice.

Claims (10)

1. A method of model training, comprising:
obtaining a training sample of a first model; wherein the training samples are marked with labels, and the labels are used for characterizing feature classes of the features of the training samples;
constructing a second model based on the network structure of the first model;
according to the training sample and a third model, obtaining a category central vector of each feature category corresponding to the third model; wherein the first model and the third model are functionally identical models;
determining the classification layer weight of the second model according to the class central vector of each feature class;
and performing iterative training on the second model according to the training samples and the labels, and updating parameters of the second model except the classification layer weight.
2. The model training method according to claim 1, wherein the number of the training samples is several, and the iteratively training the second model according to the training samples and the labels to update the parameters of the second model except the classification layer weights comprises:
inputting the training samples into the first model and the second model respectively, and obtaining first feature vectors extracted from the training samples by the first model and second feature vectors extracted from the training samples by the second model respectively;
constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels and the classification layer weight;
and carrying out iterative training on the second model based on a small-batch gradient descent method by taking the loss function as supervision, and updating parameters of the second model except the weight of the classification layer until the loss function meets a preset convergence condition.
3. The model training method of claim 2, wherein the loss function comprises a first loss term, and wherein constructing the loss function comprises:
and constructing the first loss item according to the number of the training samples, the total number of the feature classes, the second feature vectors, the labels, the classification layer weights and a preset softmax cross entropy loss function.
4. The model training method of claim 3, wherein the loss function further comprises a second loss term, the loss function being a sum of the first loss term and the second loss term, and further comprising, after constructing the first loss term:
calculating a first Euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and the label, and determining the number of times of calculation corresponding to each feature type;
calculating the mean value of the first Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first Euclidean distance;
calculating a second Euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and the label;
calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance;
and constructing the second loss term according to the mean value of the first Euclidean distance, the mean value of the second Euclidean distance and a preset L1 loss function.
5. The model training method according to claim 4, wherein the second loss term is constructed from the mean value of the first Euclidean distance, the mean value of the second Euclidean distance, and a preset L1 loss function by the following formula:
Figure 512173DEST_PATH_IMAGE001
wherein L isdisλ is a preset balance factor, K is the total number of the feature classes, n (K) is a set of training samples of the kth feature class, SmoothL1Is the predetermined L1 loss function, phi1k(xi,xj) Is the mean value of the first Euclidean distance, phi, corresponding to the kth feature class2k(xi,xj) Is the mean value of the second Euclidean distance corresponding to the kth feature class, BkThe number of calculations corresponding to the kth feature class, f1(xi) For the first feature vector of the i-th training sample in the k-th feature class, f2(xi) A second feature vector, f, for the ith training sample in the kth feature class1(xj) A first feature vector, f, for a jth training sample in the kth feature class1(xj) A second feature vector for a jth training sample in the kth feature class.
6. The model training method according to any one of claims 1 to 5, wherein the number of the training samples is several, and the obtaining of the class center vector of each feature class corresponding to the third model according to the training samples and the third model includes:
inputting each training sample into the third model, and obtaining each third feature vector extracted from each training sample by the third model;
and calculating a third mean value of third feature vectors corresponding to the same feature type in the feature types according to the third feature vectors and the labels, and taking the third mean value of the third feature vectors corresponding to the same feature type as a type central vector of the feature type corresponding to the third model.
7. The model training method according to claim 6, wherein a third mean value of third feature vectors corresponding to a same feature class in the feature classes is calculated and used as a class center vector of the feature class corresponding to the third model by using the following formula:
Figure 577781DEST_PATH_IMAGE002
wherein, wkClass center vector, m, for the kth feature classkTraining samples for the kth feature classN (k) is the set of training samples of the kth feature class, f3(x) K is the total number of feature classes for the third feature vector.
8. The model training method of claim 6, wherein the determining the classification layer weight of the second model according to the class center vector of each feature class comprises:
respectively transposing the category central vectors of the feature categories;
splicing the category central vectors of the feature categories after the conversion into a parameter matrix;
and taking the parameter matrix as the classification layer weight of the second model.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 8.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the model training method of any one of claims 1 to 8.
CN202111614740.0A 2021-12-28 2021-12-28 Training method of image recognition model, electronic device and storage medium Active CN114005015B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111614740.0A CN114005015B (en) 2021-12-28 2021-12-28 Training method of image recognition model, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111614740.0A CN114005015B (en) 2021-12-28 2021-12-28 Training method of image recognition model, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114005015A true CN114005015A (en) 2022-02-01
CN114005015B CN114005015B (en) 2022-05-31

Family

ID=79932078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111614740.0A Active CN114005015B (en) 2021-12-28 2021-12-28 Training method of image recognition model, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114005015B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399058A (en) * 2022-03-25 2022-04-26 腾讯科技(深圳)有限公司 Model updating method, related device, equipment and storage medium
CN114565807A (en) * 2022-03-03 2022-05-31 腾讯科技(深圳)有限公司 Method and device for training target image retrieval model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10565471B1 (en) * 2019-03-07 2020-02-18 Capital One Services, Llc Systems and methods for transfer learning of neural networks
US20200134469A1 (en) * 2018-10-30 2020-04-30 Samsung Sds Co., Ltd. Method and apparatus for determining a base model for transfer learning
CN111444958A (en) * 2020-03-25 2020-07-24 北京百度网讯科技有限公司 Model migration training method, device, equipment and storage medium
WO2021012526A1 (en) * 2019-07-22 2021-01-28 平安科技(深圳)有限公司 Face recognition model training method, face recognition method and apparatus, device, and storage medium
CN112395986A (en) * 2020-11-17 2021-02-23 广州像素数据技术股份有限公司 Face recognition method for quickly migrating new scene and preventing forgetting
CN112488209A (en) * 2020-11-25 2021-03-12 南京大学 Incremental image classification method based on semi-supervised learning
CN113343804A (en) * 2021-05-26 2021-09-03 武汉大学 Integrated migration learning classification method and system for single-view fully-polarized SAR data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134469A1 (en) * 2018-10-30 2020-04-30 Samsung Sds Co., Ltd. Method and apparatus for determining a base model for transfer learning
US10565471B1 (en) * 2019-03-07 2020-02-18 Capital One Services, Llc Systems and methods for transfer learning of neural networks
WO2021012526A1 (en) * 2019-07-22 2021-01-28 平安科技(深圳)有限公司 Face recognition model training method, face recognition method and apparatus, device, and storage medium
CN111444958A (en) * 2020-03-25 2020-07-24 北京百度网讯科技有限公司 Model migration training method, device, equipment and storage medium
CN112395986A (en) * 2020-11-17 2021-02-23 广州像素数据技术股份有限公司 Face recognition method for quickly migrating new scene and preventing forgetting
CN112488209A (en) * 2020-11-25 2021-03-12 南京大学 Incremental image classification method based on semi-supervised learning
CN113343804A (en) * 2021-05-26 2021-09-03 武汉大学 Integrated migration learning classification method and system for single-view fully-polarized SAR data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565807A (en) * 2022-03-03 2022-05-31 腾讯科技(深圳)有限公司 Method and device for training target image retrieval model
CN114399058A (en) * 2022-03-25 2022-04-26 腾讯科技(深圳)有限公司 Model updating method, related device, equipment and storage medium
CN114399058B (en) * 2022-03-25 2022-06-10 腾讯科技(深圳)有限公司 Model updating method, related device, equipment and storage medium

Also Published As

Publication number Publication date
CN114005015B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN114005015B (en) Training method of image recognition model, electronic device and storage medium
CN104899579A (en) Face recognition method and face recognition device
CN107545038B (en) Text classification method and equipment
CN110598603A (en) Face recognition model acquisition method, device, equipment and medium
CN109063113A (en) A kind of fast image retrieval method based on the discrete Hash of asymmetric depth, retrieval model and model building method
CN110232154B (en) Random forest-based product recommendation method, device and medium
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
CN110619059A (en) Building marking method based on transfer learning
CN114329029B (en) Object retrieval method, device, equipment and computer storage medium
CN115129883B (en) Entity linking method and device, storage medium and electronic equipment
CN112949740A (en) Small sample image classification method based on multilevel measurement
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN114626380A (en) Entity identification method and device, electronic equipment and storage medium
CN109784407A (en) The method and apparatus for determining the type of literary name section
CN112418291A (en) Distillation method, device, equipment and storage medium applied to BERT model
CN110399547A (en) For updating the method, apparatus, equipment and storage medium of model parameter
CN113254687B (en) Image retrieval and image quantification model training method, device and storage medium
CN113869609A (en) Method and system for predicting confidence of frequent subgraph of root cause analysis
CN112069412B (en) Information recommendation method, device, computer equipment and storage medium
CN111782774B (en) Method and device for recommending problems
CN110262906B (en) Interface label recommendation method and device, storage medium and electronic equipment
CN111159999A (en) Method and device for filling word slot, electronic equipment and storage medium
CN112417260B (en) Localized recommendation method, device and storage medium
CN110019809A (en) A kind of classification determines method, apparatus and the network equipment
CN113901175A (en) Article relation judging method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220418

Address after: 230091 room 611-217, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, high tech Zone, Hefei, Anhui Province

Applicant after: Hefei lushenshi Technology Co.,Ltd.

Address before: 100083 room 3032, North B, bungalow, building 2, A5 Xueyuan Road, Haidian District, Beijing

Applicant before: BEIJING DILUSENSE TECHNOLOGY CO.,LTD.

Applicant before: Hefei lushenshi Technology Co., Ltd

GR01 Patent grant
GR01 Patent grant