CN114005015A

CN114005015A - Model training method, electronic device, and computer-readable storage medium

Info

Publication number: CN114005015A
Application number: CN202111614740.0A
Authority: CN
Inventors: 浦煜; 何武; 付贤强; 朱海涛; 户磊
Original assignee: Beijing Dilusense Technology Co Ltd; Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-02-01
Anticipated expiration: 2041-12-28
Also published as: CN114005015B

Abstract

The embodiment of the application relates to the technical field of visual search, and discloses a model training method, electronic equipment and a computer-readable storage medium, wherein the method comprises the following steps: obtaining a training sample of a first model; labeling labels of feature classes for representing features of the training samples by the training samples; constructing a second model based on the network structure of the first model; obtaining a category central vector of each feature category corresponding to the third model according to the training sample and the third model; wherein, the first model and the third model are models with the same function; determining the classification layer weight of the second model according to the class central vector of each feature class; the second model is subjected to iterative training according to the training samples and the labels, parameters of the second model except for the weight of the classification layer are updated, the feature set extracted by the trained model can be directly compared with the feature library extracted by the used model, time and labor are saved, the cost is reduced, and the convenience of model industrial deployment is greatly improved.

Description

Model training method, electronic device, and computer-readable storage medium

Technical Field

The embodiment of the application relates to the technical field of visual matching and searching, in particular to a model training method, electronic equipment and a computer-readable storage medium.

Background

With the increasing maturity of visual matching and searching technology, recognition models based on visual matching and searching technology are widely used in many fields, such as image retrieval, pedestrian re-recognition, vehicle re-recognition, face recognition, etc., and these recognition models based on visual matching and searching technology are mapped to a feature embedding space through a deep neural network image, in this feature space, features of the same category are similar to each other and are gathered into a category, generally speaking, for large-scale image data in a data retrieval library, features of these image data are extracted in advance by the recognition models, the features of the image to be queried form a feature library gallery, and features of the image to be queried are extracted in real time by the recognition models, these features constitute a feature set probe of the image to be queried, the recognition models can traverse each feature in the probe, retrieve the features most similar to the template from the gallery, and returns corresponding information.

In an actual application scenario, in order to enable a user to obtain better use experience, an identification model based on a visual matching and search technology needs to be continuously updated in an iterative manner, however, after the identification model is updated, features in a probe are extracted by using a new model, but features in a galery are extracted by using an old model, in order to ensure that the features between the probe and the galery are consistent, technicians need to use the new model to extract features of original image data corresponding to the galery again, the process is very long in time-consuming and high in cost, and in some scenarios with high safety requirements, the original image data corresponding to the galery are automatically deleted after the galery is generated, the features cannot be extracted again, the features between the probe and the galery cannot be ensured to be consistent, and direct comparison cannot be performed.

Disclosure of Invention

An object of the embodiments of the present application is to provide a model training method, an electronic device, and a computer-readable storage medium, in which a feature set extracted from a trained model can be directly compared with a feature library extracted from an old model, so that time and labor are saved, cost is reduced, and convenience of model industrial deployment is greatly improved.

In order to solve the above technical problem, an embodiment of the present application provides a model training method, including the following steps: obtaining a training sample of a first model; wherein the training samples are marked with labels, and the labels are used for characterizing feature classes of the features of the training samples; constructing a second model based on the network structure of the first model; according to the training sample and a third model, obtaining a category central vector of each feature category corresponding to the third model; wherein the first model and the third model are functionally identical models; determining the classification layer weight of the second model according to the class central vector of each feature class; and performing iterative training on the second model according to the training samples and the labels, and updating parameters of the second model except the classification layer weight.

An embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method described above.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the model training method described above.

The model training method, the electronic device, and the computer-readable storage medium provided in the embodiments of the present application, first obtain a training sample of a first model, label a feature class of the feature of the training sample on the training sample, then construct a second model based on a network structure of the first model, obtain a class center vector of each feature class corresponding to the third model according to the training sample of the first model and a third model having the same function as the first model, determine a classification level weight of the second model according to the class center vector of each feature class corresponding to the third model, and finally perform iterative training on the second model according to the training sample of the first model and the label labeled on the training sample, update parameters of the second model other than the classification level weight, in consideration of consistency between a feature set extracted by a new model and a feature library extracted by an old model, the method comprises the steps of constructing a compatible model based on a network structure of the new model, determining classification layer weights of the compatible model according to classification center vectors of all feature classes corresponding to the old model, performing iterative training on the compatible model by using training samples and labels of the new model, updating parameters except the classification layer weights of the compatible model, forcing the classification center of each feature class corresponding to the trained compatible model to be matched with the classification center of each feature class corresponding to the old model, directly comparing feature sets extracted by the trained compatible model with the feature library extracted by the old model without performing feature extraction again, time and labor are saved, the cost is reduced, and the convenience of industrial deployment of the model is greatly improved.

In addition, the number of the training samples is several, and the iteratively training the second model according to the training samples and the labels to update the parameters of the second model except the classification layer weight includes: inputting the training samples into the first model and the second model respectively, and obtaining first feature vectors extracted from the training samples by the first model and second feature vectors extracted from the training samples by the second model respectively; constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels and the classification layer weight; the method comprises the steps of taking a loss function as supervision, carrying out iterative training on a second model based on a small-batch gradient descent method, updating parameters of the second model except for the weight of a classification layer until the loss function meets a preset convergence condition, taking a new model as a teacher model, taking a compatible model as a student model, and training the compatible model based on a model distillation principle, namely, when the compatible model is trained, referring to and using characteristics extracted from each training sample by the new model, so that the compatible model can quickly obtain the functions and performances of the new model, and the training effect of the compatible model is further improved.

Additionally, the loss function includes a first loss term, the constructing a loss function includes: according to the number of the training samples and the total number of the feature classes, the second feature vectors, the labels, the classification layer weights and a preset softmax cross entropy loss function, the first loss item is constructed, the softmax cross entropy loss function is used as the first loss item, the construction is simple, the calculation amount is small, and the speed of compatible model training can be improved.

In addition, the loss function further includes a second loss term, the loss function is a sum of the first loss term and the second loss term, and after the first loss term is constructed, the method further includes: calculating a first Euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and the label, and determining the number of times of calculation corresponding to each feature type; calculating the mean value of the first Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first Euclidean distance; calculating a second Euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and the label; calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance; the second loss item is constructed according to the mean value of the first Euclidean distance, the mean value of the second Euclidean distance and a preset L1 loss function, and considering that if only a softmax cross entropy function is used as the loss function, a trained compatible model is restricted by the performance of an old model, namely the performance bottleneck of the old model cannot be broken, the high performance advantage of the new model is lost, therefore, the second loss item is further arranged in the embodiment, the compatible model is driven to learn the relative structure information of the feature distribution in the same feature category of the new model, the intra-class compactness is better, and the performance of the compatible model is greatly improved.

In addition, the number of the training samples is several, and the obtaining of the category center vector of each feature category corresponding to the third model according to the training samples and the third model includes: inputting each training sample into the third model, and obtaining each third feature vector extracted from each training sample by the third model; according to the third feature vectors and the labels, a third mean value of the third feature vectors corresponding to the same feature type in the feature types is calculated, and the third mean value of the third feature vectors corresponding to the same feature type is used as a type central vector of the feature type corresponding to the third model, so that the determined type central vector of each feature type can be more accurate and can represent the feature type of the user.

In addition, the determining the classification layer weight of the second model according to the class center vector of each feature class includes: respectively transposing the category central vectors of the feature categories; splicing the category central vectors of the feature categories after the conversion into a parameter matrix; the parameter matrix is used as the classification layer weight of the second model, the classification central vectors of the transformed feature classes are spliced into the parameter matrix to be used as the classification layer weight of the second model, so that the compatible model can be better matched with the features of the old model, and the consistency of the features between the feature set extracted by the compatible model and the features in the feature library extracted by the old model is further ensured.

Drawings

One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.

FIG. 1 is a flow diagram of model training according to an embodiment of the present application;

FIG. 2 is a flow diagram of iteratively training a second model based on training samples and labels to update parameters of the second model other than classification level weights, according to an embodiment of the present application;

FIG. 3 is a flow diagram of constructing a second lossy term in accordance with an embodiment of the present application;

FIG. 4 is a flowchart illustrating obtaining a class center vector of each feature class corresponding to a third model according to a training sample and the third model according to an embodiment of the present application;

FIG. 5 is a flow diagram for determining classification layer weights for a second model based on class center vectors for feature classes, according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that in the examples of the present application, numerous technical details are set forth in order to provide a better understanding of the present application. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present application, and the embodiments may be mutually incorporated and referred to without contradiction.

An embodiment of the present application relates to a model training method, which is applied to an electronic device, where the electronic device may be a terminal or a server, and the electronic device in this embodiment and the following embodiments are described by taking the server as an example.

The model trained by the embodiment of the present application may be: an image retrieval model for performing image retrieval by using an image recognition image; a vehicle identification model for vehicle identification of traffic; a re-identification model for re-identifying pedestrians in shopping malls and stations; the face recognition model is used for security monitoring and face payment.

The flowchart of the model training method in this embodiment may be as shown in fig. 1, and includes:

step 101, a training sample of the first model is obtained, and the training sample is labeled with a label.

Specifically, when the server trains the second model, the training samples of the first model may be obtained first, the training samples of the first model are all labeled with labels, and the labels labeled on the training samples are used to characterize the feature categories of the features of the training samples.

In one example, in order to obtain a better user experience, the recognition model based on visual matching and search techniques needs to be continuously updated iteratively, such as by using more advanced algorithms, adapting old models to new models using new parameters, etc., resulting in new models, whereas when using new models, the consistency of the features extracted by the new model and the features extracted by the old model needs to be ensured, otherwise, the features cannot be matched, technicians need to use the new model to extract the features of the original data corresponding to the feature library extracted by the old model again, the process is very long in time consumption and very high in cost, and when the original data is deleted, the new model cannot extract features, so the method builds and trains a compatible model (namely, a second model) based on the new model (namely, a first model) so as to be compatible with the features of the old model (a third model).

Step 102, constructing a second model based on the network structure of the first model.

In a specific implementation, the server may obtain a model file of the first model while obtaining a training sample of the first model, so as to specify a network structure of the first model, and construct the second model based on the network structure of the first model.

In one example, the server may directly treat the first model as the second model.

In one example, the server may take the network structure of the first model as the network structure of the second model and initialize the parameters of the layers of the second model.

And 103, acquiring a category center vector of each feature category corresponding to the third model according to the training sample and the third model.

In particular, the first model and the third model are functionally identical models, i.e. the first model and the third model may perform the same task, but the first model is not identical to the third model.

In one example, the first model and the third model use different algorithms.

In one example, the first model differs from the third model in network structure.

In specific implementation, after obtaining the training sample of the first model, the server may obtain a model file of the third model to obtain the third model, and then input the training sample into the third model to obtain a category central vector of each feature category corresponding to the third model, where each feature category corresponding to the first model, each feature category corresponding to the second model, and each feature category corresponding to the third model are the same, that is, the feature categories that can be extracted by the three models are the same.

In an example, the server may first obtain a class center vector of each feature class corresponding to the third model according to the training sample and the third model, and then construct the second model based on the network structure of the first model.

And step 104, determining the classification layer weight of the second model according to the class center vector of each feature class.

In a specific implementation, after the server obtains the class center vector of each feature class corresponding to the third model, the server may determine the classification layer weight of the second model based on the class center vector of each feature class corresponding to the third model, and lock the classification layer weight of the second model, that is, the classification layer weight of the second model is not updated in the subsequent iterative training of the second model.

And 105, performing iterative training on the second model according to the training samples and the labels, and updating parameters of the second model except the classification layer weight.

In an example, after determining the classification layer weight of the second model, the server may input the training sample into the second model to obtain an output result of the second model, perform iterative training on the second model according to the output result of the second model, the label marked on the training sample, and a preset loss function, update all parameters of the second model except the classification layer weight, such as the weight of each convolution layer, the bias of each convolution layer, and the like, after each training, and force the class center of each feature class corresponding to the trained compatible model to match the class center of each feature class corresponding to the old model.

In one example, the server may iteratively train the second model based on a stochastic gradient descent method.

In one example, the number of training samples is several, and the server may iteratively train the second model based on a small batch gradient descent method.

In one example, the number of training samples is several, and the server may iteratively train the second model based on a batch gradient descent method.

In this embodiment, a server first obtains a training sample of a first model, a label of a feature class for characterizing features of the training sample is labeled on the training sample, then a second model is constructed based on a network structure of the first model, a class center vector of each feature class corresponding to the third model is obtained according to the training sample of the first model and a third model having the same function as the first model, a classification layer weight of the second model is determined according to the class center vector of each feature class corresponding to the third model, finally, iterative training is performed on the second model according to the training sample of the first model and the label labeled on the training sample, parameters of the second model except the classification layer weight are updated, in consideration of the fact that feature sets extracted by a new model are consistent with features extracted by an old model, feature extraction needs to be performed on original data corresponding to a feature base extracted by the old model again by the new model, but the whole process is very long in time consumption and very high in cost, and when the original data is deleted, the original data cannot be re-extracted, and the embodiment of the application constructs a compatible model based on the network structure of the new model, determining classification layer weight of a compatible model according to the class center vector of each feature class corresponding to the old model, performing iterative training on the compatible model by using a training sample and a label of a new model, updating parameters except the classification layer weight of the compatible model, forcing the class center of each feature class corresponding to the trained compatible model to be matched with the class center of each feature class corresponding to the old model, and directly comparing the feature set extracted by using the trained compatible model with a feature library extracted by using the old model without performing feature extraction again, so that the method is time-saving and labor-saving, reduces cost and greatly improves the convenience of model industrial deployment.

In one embodiment, the number of training samples is several, the server performs iterative training on the second model based on a small batch gradient descent method, the server performs iterative training on the second model according to the training samples and the labels, and updates parameters of the second model except for the classification layer weight, which can be implemented through the steps shown in fig. 2, and specifically includes:

step 201, inputting each training sample into the first model and the second model, respectively, and obtaining each first feature vector extracted from each training sample by the first model and each second feature vector extracted from each training sample by the second model, respectively.

In a specific implementation, the server may input each training sample, that is, a small batch (mini-batch) of training samples, into the first model and the second model, respectively, where the first model and the second model extract features from each training sample, and the server obtains each first feature vector extracted from each training sample by the first model, and each second feature vector extracted from each training sample by the second model.

In one example, each first feature vector extracted from each training sample by the first model and each second feature vector extracted from each training sample by the second model are N-dimensional feature vectors, and the first feature vector extracted from the training sample x by the first model is represented as f₁(x) The second feature vector extracted from the training sample x by the second model is represented as f₂(x)。

Step 202, constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels of the training samples and the classification layer weight.

In the specific implementation, the server does not use a preset loss function for training, but constructs the loss function for training by itself, and after obtaining each first feature vector and each second feature vector, the server constructs the loss function by itself according to the number of training samples, the total number of feature classes, each first feature vector, each second feature vector, labels of the training samples and classification layer weights, that is, the server uses the first model as a teacher model, uses the second model as a student model, and refers to and uses the features extracted from each training sample by the first model during training based on the principle of model distillation, so that the second model can rapidly obtain the functions and performances of the first model.

And step 203, performing iterative training on the second model based on a small batch gradient descent method by taking the loss function as supervision, and updating parameters of the second model except for the weight of the classification layer until the loss function meets a preset convergence condition.

In a specific implementation, after each training, the server determines whether the constructed loss function meets a preset convergence condition, if the loss function meets the preset convergence condition, the server stores parameters of the second model at the time, issues the trained second model, and if the loss function does not meet the preset convergence condition, the server continues to perform iterative training on the second model, where the preset convergence condition may be set by a person skilled in the art according to actual needs, and the embodiment of the present application is not specifically limited to this.

In this embodiment, the training samples are a plurality of samples, and the iteratively training the second model according to the training samples and the labels to update the parameters of the second model except the weight of the classification layer includes: inputting the training samples into the first model and the second model respectively, and obtaining first feature vectors extracted from the training samples by the first model and second feature vectors extracted from the training samples by the second model respectively; constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels and the classification layer weight; the method comprises the steps of taking a loss function as supervision, carrying out iterative training on a second model based on a small-batch gradient descent method, updating parameters of the second model except for the weight of a classification layer until the loss function meets a preset convergence condition, taking a new model as a teacher model, taking a compatible model as a student model, and training the compatible model based on a model distillation principle, namely, when the compatible model is trained, referring to and using characteristics extracted from each training sample by the new model, so that the compatible model can quickly obtain the functions and performances of the new model, and the training effect of the compatible model is further improved.

In one embodiment, the number of the training samples is several, the server performs iterative training on the second model based on a small batch gradient descent method, the loss function constructed by the server includes a first loss item, the server can construct the first loss item according to the number of the training samples, the total number of the feature classes, each second feature vector, the labels of the training samples, the classification layer weights and a preset softmax cross entropy loss function, the softmax cross entropy loss function is used as the first loss item, the construction is simple, the calculation amount is small, and the speed of compatible model training can be increased.

In one example, the first loss term constructed by the server can be expressed by the following formula:

in the formula, L_softmaxFor the first loss term to construct, R is the number of training samples, x_iFor the ith training sample, y_iFor the label of the ith training sample, w^TAs classification level weight, f₂(x) And K is the total number of the feature classes.

In one embodiment, the number of training samples is several, the server performs iterative training on the second model based on a small batch gradient descent method, the loss function constructed by the server includes a second loss term, the loss function constructed by the server is the sum of the first loss term and the second loss term, and after the server constructs the first loss term, the server may construct the second loss term according to the steps shown in fig. 3, which specifically includes:

step 301, calculating a first euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and each label, and determining the number of times of calculation corresponding to each feature type.

In the specific implementation, the server determines the number of training samples, that is, the size of a batch of small-batch gradient descent is G, the feature classes that can be extracted by the first model, the second model and the third model are the same, the number of the feature classes that can be extracted by the three models is K, because the label is used for representing the feature class of the training samples, the server can determine the number of training samples corresponding to each feature class according to the label, the server sequentially traverses each feature class, calculates the first euclidean distance between every two training samples corresponding to the current feature class according to each first feature vector, and simultaneously determines the number of times that the current feature class needs to calculate the first euclidean distance.

In one example, three feature classes, feature class a, feature class b, and feature class c, are shared: 4 training samples are corresponding to the characteristic class A, and the server can calculate the number of times of calculation corresponding to the characteristic class A to be 6; the characteristic class B corresponds to 3 training samples, and the server can calculate the calculation times corresponding to the characteristic class B to be 3 times; the feature class C corresponds to 5 training samples, and the server can calculate the number of times of calculation corresponding to the feature class C to be 10.

Step 302, calculating a mean value of the first euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first euclidean distance.

Specifically, after the server calculates each first euclidean distance and determines the number of calculations corresponding to each feature type, the server may calculate an average value of the first euclidean distances corresponding to each feature type based on the number of calculations corresponding to each feature type and each first euclidean distance.

In an example, the server calculates a mean value of the first euclidean distances corresponding to each feature class, which may be implemented by the following formula:

where K is the total number of feature classes, N (K) is the set of training samples for the kth feature class, φ_1k(x_i,x_j) Is the mean value of the first Euclidean distance corresponding to the kth feature class, B_kThe number of calculations corresponding to the kth feature class, f₁(x_i) For the first feature vector of the i-th training sample in the k-th feature class, f₁(x_j) A first feature vector for a jth training sample in the kth feature class.

Step 303, calculating a second euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and each label.

In a specific implementation, as the label is determined, the number of times that each feature type needs to calculate the first euclidean distance is the same as the number of times that each feature type needs to calculate the second euclidean distance, the server sequentially traverses each feature type, and calculates the second euclidean distance between every two training samples corresponding to the current feature type according to each second feature vector.

And 304, calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance.

Specifically, after the server calculates each second euclidean distance and determines the number of calculations corresponding to each feature type, the server may calculate an average value of the second euclidean distances corresponding to each feature type based on the number of calculations corresponding to each feature type and each second euclidean distance.

In an example, the server calculates a mean value of the second euclidean distances corresponding to each feature class, which may be implemented by the following formula:

where K is the total number of feature classes, N (K) is the set of training samples for the kth feature class, φ_1k(x_i,x_j) Is the mean value of the first Euclidean distance corresponding to the kth feature class, B_kThe number of calculations corresponding to the kth feature class, f₂(x_i) For the second feature vector of the i-th training sample in the k-th feature class, f₂(x_j) A second feature vector for a jth training sample in the kth feature class.

Step 305, constructing a second loss term according to the mean value of the first euclidean distance, the mean value of the second euclidean distance and a preset L1 loss function.

In one example, the server may construct the second loss term according to the mean value of the first euclidean distance, the mean value of the second euclidean distance, and a preset L1 loss function by the following formula:

in the formula, L_disλ is a preset balance factor, K is the total number of feature classes, n (K) is a set of training samples of the kth feature class, Smooth_L1Is a preset L1 loss function, phi_1k(x_i,x_j) Is the mean value of the first Euclidean distance, phi, corresponding to the kth feature class_2k(x_i,x_j) Is the mean value of the second Euclidean distance corresponding to the kth feature class, B_kThe number of calculations corresponding to the kth feature class, f₁(x_i) For the first feature vector of the i-th training sample in the k-th feature class, f₂(x_i) For the second feature vector of the i-th training sample in the k-th feature class, f₁(x_j) For the first feature vector of the jth training sample in the kth feature class, f₁(x_j) A second feature vector for the jth training sample in the kth feature class.

In one example, the server-constructed loss function includes a first loss term and a second loss term, and the loss function can be expressed as: l = L_softmax+L_dis，L_softmaxIs the first loss term, L_disIs the second loss term.

In this embodiment, the loss function further includes a second loss term, and after constructing the first loss term, the method further includes: calculating a first Euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and the label, and determining the number of times of calculation corresponding to each feature type; calculating the mean value of the first Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first Euclidean distance; calculating a second Euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and the label; calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance; the second loss item is constructed according to the mean value of the first Euclidean distance, the mean value of the second Euclidean distance and a preset L1 loss function, and considering that if only a softmax cross entropy function is used as the loss function, a trained compatible model is restricted by the performance of an old model, namely the performance bottleneck of the old model cannot be broken, the high performance advantage of the new model is lost, therefore, the second loss item is further arranged in the embodiment, the compatible model is driven to learn the relative structure information of the feature distribution in the same feature category of the new model, the intra-class compactness is better, and the performance of the compatible model is greatly improved.

In an embodiment, the number of the training samples is several, and the server obtains the category center vector of each feature category corresponding to the third model according to the training samples and the third model, which may be implemented by the steps shown in fig. 4, and specifically includes:

step 401, inputting each training sample into the third model, and obtaining each third feature vector extracted from each training sample by the third model.

In a specific implementation, the server may input each training sample, that is, a small batch (mini-batch) of training samples, into the third model, where the third model extracts features from each training sample, and the server obtains each third feature vector extracted from each training sample by the third model.

In one example, each of the third feature vectors extracted from the training samples by the third model is an N-dimensional feature vector, and the third feature vector extracted from the training sample x by the third model is represented as f₃(x)。

Step 402, according to the third feature vectors and the labels, calculating a third mean value of the third feature vectors corresponding to the same feature type in the feature types, and using the third mean value of the third feature vectors corresponding to the same feature type as a type center vector of the feature type corresponding to the third model.

Specifically, after obtaining each third feature vector, the server may calculate a third mean value of the third feature vectors corresponding to the same feature class in each feature class according to each third feature vector and the label, and use the third mean value of the third feature vectors corresponding to the same feature class as a class center vector of the feature class corresponding to the third model, so that the determined class center vector of each feature class is more accurate and can represent the feature class of the server.

In one example, the server may calculate a third mean value of third feature vectors corresponding to the same feature class in each feature class and use the third mean value of the third feature vectors corresponding to the same feature class as a class center vector of the feature class corresponding to the third model by using the following formula:

in the formula, w_kClass center vector, m, for the kth feature class_kThe number of training samples for the kth feature class, N (k) the set of training samples for the kth feature class, f₃(x) K is the total number of feature classes as the third feature vector.

In an embodiment, the determining, by the server, the classification layer weight of the second model according to the class center vector of each feature class may be implemented by the steps shown in fig. 5, which specifically include:

in step 501, the class center vectors of the feature classes are transposed.

And 502, splicing the category central vectors of the transferred feature categories into a parameter matrix.

Step 503, using the parameter matrix as the classification layer weight of the second model.

In specific implementation, the server transposes the category center vectors of the feature categories, and splices the transposed category center vectors of the feature categories into a parameter matrix, where the spliced parameter matrix can be represented as W = [ W = [ ]₁ ^T,w₂ ^T,…,w_K ^T]The server takes the parameter matrix as the secondThe classification layer weight of the model can enable the compatible model to better match the features of the old model, and further ensure the consistency of the features between the feature set extracted by the compatible model and the feature library extracted by the old model.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

Another embodiment of the present application relates to an electronic device, as shown in fig. 6, including: at least one processor 601; and a memory 602 communicatively coupled to the at least one processor 601; the memory 602 stores instructions executable by the at least one processor 601, and the instructions are executed by the at least one processor 601 to enable the at least one processor 601 to perform the model training method in the above embodiments.

Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting together one or more of the various circuits of the processor and the memory. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.

Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the present application, and that various changes in form and details may be made therein without departing from the spirit and scope of the present application in practice.

Claims

1. A method of model training, comprising:

obtaining a training sample of a first model; wherein the training samples are marked with labels, and the labels are used for characterizing feature classes of the features of the training samples;

constructing a second model based on the network structure of the first model;

according to the training sample and a third model, obtaining a category central vector of each feature category corresponding to the third model; wherein the first model and the third model are functionally identical models;

determining the classification layer weight of the second model according to the class central vector of each feature class;

and performing iterative training on the second model according to the training samples and the labels, and updating parameters of the second model except the classification layer weight.

2. The model training method according to claim 1, wherein the number of the training samples is several, and the iteratively training the second model according to the training samples and the labels to update the parameters of the second model except the classification layer weights comprises:

inputting the training samples into the first model and the second model respectively, and obtaining first feature vectors extracted from the training samples by the first model and second feature vectors extracted from the training samples by the second model respectively;

constructing a loss function according to the number of the training samples, the total number of the feature classes, the first feature vectors, the second feature vectors, the labels and the classification layer weight;

and carrying out iterative training on the second model based on a small-batch gradient descent method by taking the loss function as supervision, and updating parameters of the second model except the weight of the classification layer until the loss function meets a preset convergence condition.

3. The model training method of claim 2, wherein the loss function comprises a first loss term, and wherein constructing the loss function comprises:

and constructing the first loss item according to the number of the training samples, the total number of the feature classes, the second feature vectors, the labels, the classification layer weights and a preset softmax cross entropy loss function.

4. The model training method of claim 3, wherein the loss function further comprises a second loss term, the loss function being a sum of the first loss term and the second loss term, and further comprising, after constructing the first loss term:

calculating a first Euclidean distance between every two training samples of the same feature type in each feature type according to each first feature vector and the label, and determining the number of times of calculation corresponding to each feature type;

calculating the mean value of the first Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each first Euclidean distance;

calculating a second Euclidean distance between every two training samples of the same feature type in each feature type according to each second feature vector and the label;

calculating the mean value of the second Euclidean distances corresponding to each feature type according to the calculated times corresponding to each feature type and each second Euclidean distance;

and constructing the second loss term according to the mean value of the first Euclidean distance, the mean value of the second Euclidean distance and a preset L1 loss function.

5. The model training method according to claim 4, wherein the second loss term is constructed from the mean value of the first Euclidean distance, the mean value of the second Euclidean distance, and a preset L1 loss function by the following formula:

wherein L is_disλ is a preset balance factor, K is the total number of the feature classes, n (K) is a set of training samples of the kth feature class, Smooth_L1Is the predetermined L1 loss function, phi_1k(x_i,x_j) Is the mean value of the first Euclidean distance, phi, corresponding to the kth feature class_2k(x_i,x_j) Is the mean value of the second Euclidean distance corresponding to the kth feature class, B_kThe number of calculations corresponding to the kth feature class, f₁(x_i) For the first feature vector of the i-th training sample in the k-th feature class, f₂(x_i) A second feature vector, f, for the ith training sample in the kth feature class₁(x_j) A first feature vector, f, for a jth training sample in the kth feature class₁(x_j) A second feature vector for a jth training sample in the kth feature class.

6. The model training method according to any one of claims 1 to 5, wherein the number of the training samples is several, and the obtaining of the class center vector of each feature class corresponding to the third model according to the training samples and the third model includes:

inputting each training sample into the third model, and obtaining each third feature vector extracted from each training sample by the third model;

and calculating a third mean value of third feature vectors corresponding to the same feature type in the feature types according to the third feature vectors and the labels, and taking the third mean value of the third feature vectors corresponding to the same feature type as a type central vector of the feature type corresponding to the third model.

7. The model training method according to claim 6, wherein a third mean value of third feature vectors corresponding to a same feature class in the feature classes is calculated and used as a class center vector of the feature class corresponding to the third model by using the following formula:

wherein, w_kClass center vector, m, for the kth feature class_kTraining samples for the kth feature classN (k) is the set of training samples of the kth feature class, f₃(x) K is the total number of feature classes for the third feature vector.

8. The model training method of claim 6, wherein the determining the classification layer weight of the second model according to the class center vector of each feature class comprises:

respectively transposing the category central vectors of the feature categories;

splicing the category central vectors of the feature categories after the conversion into a parameter matrix;

and taking the parameter matrix as the classification layer weight of the second model.

9. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 8.

10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the model training method of any one of claims 1 to 8.