CN111553399A

CN111553399A - Feature model training method, device, equipment and storage medium

Info

Publication number: CN111553399A
Application number: CN202010319373.0A
Authority: CN
Inventors: 黄振杰; 李德紘; 张少文; 冯琰一
Original assignee: Guangdong Huazhiyuan Information Engineering Co ltd; Guangzhou Jiadu Technology Software Development Co ltd; Guangzhou Xinke Jiadu Technology Co Ltd; PCI Suntek Technology Co Ltd
Current assignee: Guangdong Huazhiyuan Information Engineering Co ltd; Guangzhou Jiadu Technology Software Development Co ltd; Guangzhou Xinke Jiadu Technology Co Ltd; PCI Suntek Technology Co Ltd
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2020-08-18
Also published as: CN112949780A; CN112949780B

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for training a feature model, which relate to the technical field of metric learning and comprise the following steps: acquiring a training data set, wherein each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs; inputting the training data set into a neural network model to obtain a feature vector of each sample; constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs; determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function; the neural network model containing the loss function is trained until the loss function converges. By adopting the scheme, the technical problem that the convolutional neural network cannot expand the distance of the feature vectors between samples of different classes to cause that the samples of different classes cannot be effectively distinguished in the prior art can be solved.

Description

Feature model training method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of metric learning, in particular to a feature model training method, a device, equipment and a storage medium.

Background

Metric learning may also be understood as similarity learning, the objective of which is to make the degree of similarity between samples of the same category high and the degree of similarity between samples of different categories low. With the development of deep learning, metric learning can be realized by means of a convolutional neural network. In the prior art, a convolutional neural network obtains feature vectors of each sample in a processing process, and gathers the feature vectors of the same type, that is, the feature vectors of the same type of sample are closer to each other. However, for samples of different classes, the convolutional neural network cannot expand the distance of the feature vectors between the samples of different classes, so that the distance of the feature vectors between the samples of different classes in the convolutional neural network is short, and further the samples of different classes cannot be effectively distinguished based on the feature vectors, that is, the learning effect of metric learning is affected.

Disclosure of Invention

The invention provides a feature model training method, a device, equipment and a storage medium, which aim to solve the technical problem that samples of different classes cannot be effectively distinguished because a convolutional neural network cannot expand the distance of feature vectors between the samples of different classes in the prior art.

In a first aspect, an embodiment of the present invention provides a feature model training method, including:

acquiring a training data set, wherein each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs;

inputting the training data set into a neural network model to obtain a feature vector of each sample;

constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs;

determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function;

training the neural network model including the loss function until the loss function converges.

Further, the constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs includes:

acquiring a weight parameter matrix of a full connection layer in the neural network model, wherein the weight parameter of the pth column in the weight parameter matrix represents a weight parameter vector corresponding to the pth category;

determining cluster centers corresponding to all categories according to the weight parameter matrix;

determining an intra-class loss function according to the cluster centers and the feature vectors under the classes;

and determining an inter-class loss function according to each cluster center.

Further, the determining an intra-class loss function according to the cluster centers and the feature vectors in the categories includes:

determining a learnable angle according to the category to which the feature vector belongs;

determining a first class internal distance between the feature vector and the target cluster center according to the feature vector, the target cluster center and the learnable angle, wherein the target cluster center is a cluster center corresponding to the class to which the feature vector belongs, and each feature vector corresponds to one first class internal distance;

determining a second-class inner distance between the feature vector and the other cluster centers according to the feature vector and the other cluster centers, wherein the other cluster centers are cluster centers except the corresponding target cluster center in all the cluster centers, each feature vector corresponds to a group of other cluster centers, and each other cluster center corresponds to a second-class inner distance;

determining a penalty item according to the learnable angles of all categories;

and determining an intra-class loss function according to the first intra-class distance, the second intra-class distance and the penalty term.

Further, the calculation formula of the intra-class loss function is as follows:

wherein S is_intraRepresenting an intra-class loss function, M is the number of samples selected in each iteration process in the training process of the neural network model, s is a scale factor, and omega₁Is a first balance factor, R_jmAs the feature vector y of the jth sample_jThe corresponding in-class distance is the first,

is y_jAnd y_jThe angle between the centers of the corresponding target clusters,

W_ydenotes y_jCorresponding target cluster center, y_jBelongs to the y-th class, m₁Is the first hyperparameter, m_yLearnable angles, R, corresponding to the y-th category_jiIs y_jWith the ith other clusterThe second type of inner distance between the centers,

W_ifor the ith other cluster center, C is the total number of classes, L_mIn order to be a penalty term,

further, the determining an inter-class loss function according to each cluster center includes:

calculating first inter-class distances among different classes according to the cluster centers;

selecting a second inter-class distance larger than a distance threshold value from all first inter-class distances corresponding to each class, and calculating the second inter-class distance sum value of each class;

and determining an inter-class loss function according to the second inter-class distance and the value corresponding to each class.

Further, the calculation formula of the inter-class loss function is as follows:

wherein S is_interRepresenting the loss function between classes, C being the total number of classes, D_zFor the second inter-class distance and value corresponding to the z-th class,

is the first inter-class distance, m, between the z-th class and the q-th class₂Is the second hyperparameter.

Further, the calculation formula of the loss function of the neural network model is as follows: s_loss＝ω₂S_inter+S_intraWherein S is_lossRepresents the loss function, S_intraRepresenting the loss-in-class function, S_interRepresenting the inter-class loss function, ω₂Is the second balance factor.

Further, the training the neural network model including the loss function until the loss function converges comprises:

updating the neural network model containing the loss function with a gradient descent algorithm until the loss function satisfies a stability condition;

initializing a sampling probability for each sample in the training data set, the initialized sampling probabilities for each sample being the same;

iteratively training the neural network model including the loss function based on the training data set;

in an iterative training process, determining correctly classified samples and incorrectly classified samples in the training data set;

decreasing the sampling probability of the correctly classified sample by a first value and increasing the sampling probability of the incorrectly classified sample by a second value;

and deleting the samples with the sampling probability lower than the probability threshold in the training data set, and continuing the iterative training until the loss function is converged.

In a second aspect, an embodiment of the present invention further provides a feature model training apparatus, including:

the data acquisition module is used for acquiring a training data set, each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs;

the characteristic determining module is used for inputting the training data set into a neural network model to obtain a characteristic vector of each sample;

the first construction module is used for constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs;

a second construction module for determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function;

a model training module for training the neural network model including the loss function until the loss function converges.

In a third aspect, an embodiment of the present invention further provides a feature model training device, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the feature model training method of the first aspect.

In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the feature model training method according to the first aspect.

According to the characteristic model training method, the device, the equipment and the storage medium, the technical problem that the convolutional neural network cannot expand the distance of characteristic vectors between samples of different types to effectively distinguish the samples of different types in the prior art can be solved by the technical scheme that the training data set with class labels is obtained, then characteristic vectors of all samples in the training data set are obtained according to the neural network model, an intra-class loss function and an inter-class loss function are constructed according to the characteristic vectors and the class to which the samples belong, then a loss function of the neural network model is constructed according to the intra-class loss function and the inter-class loss function, the neural network model containing the loss function is trained until the loss function converges. When the loss function is constructed, the intra-class loss function is constructed by considering the characteristic vectors of all samples, and the inter-class loss function is constructed by considering different classes, so that when the characteristic vectors are extracted by the neural network model, the characteristic vectors of the samples of the same class are closer to each other, the characteristic vectors of the samples of different classes are farther from each other, and a better metric learning result is realized.

Drawings

Fig. 1 is a flowchart of a feature model training method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a feature model training method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a feature model training apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a feature model training device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not limitation. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a feature model training method according to an embodiment of the present invention. The feature model training method provided in the embodiment may be executed by a feature model training apparatus, which may be implemented in software and/or hardware and integrated in a feature model training device. The feature model training device may be an intelligent device with data processing and analyzing capabilities, such as a tablet computer and a desktop computer, and the feature model training device may be an independent intelligent device or may be composed of a plurality of intelligent devices capable of data communication.

Specifically, referring to fig. 1, the feature model training method specifically includes:

step 110, a training data set is obtained, each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs.

Illustratively, a training data set refers to a set of samples used to train a feature model. In one embodiment, the feature model has a main function of recognizing the features of the sample, that is, the feature model can recognize the features of the sample and map the features into the feature space, and then adjust the parameters of the feature model based on the result of feature recognition, so that in the subsequent working process of the feature model, the features with high similarity are closer to each other in the feature space, and the features with low similarity are farther from each other in the feature space. It is understood that after the feature model identifies the features of the sample, the class of the sample can be determined based on the features, i.e., the sample is classified. Optionally, the number and type of the samples in the training data set may be set according to actual conditions. Typically, the samples in the training data set are of the same type, e.g., the samples in the training data are all picture types. It should be noted that the embodiment does not limit the collection manner of each sample in the training data set.

Typically, there is one label per sample in the training dataset. The label is used for identifying the category to which the sample belongs, wherein the display format embodiment of the category in the label is not limited. It should be noted that each sample has a corresponding category, and there may be at least one sample in the same category. For example, if n samples are included in the training sample set, then n labels are included in the corresponding label set, each sample corresponds to one label, and the same category may be recorded in different labels.

And 120, inputting the training data set into a neural network model to obtain a feature vector of each sample.

In this embodiment, the feature model is a neural network model, wherein the neural network model may be a deep neural network model, a specific network structure of the neural network model may be set according to an actual situation, and in the embodiment, the last layer of the neural network model is described as a full connection layer. Specifically, each sample in the training data set is input into the neural network model, the characteristics of each sample are learned through the neural network model, and data to be input into the full connection layer is obtained, that is, the output of the neural network model except the full connection layer is obtained. In an embodiment, the output is recorded as a feature vector, where each sample corresponds to one feature vector. The feature vector can be understood as a feature learning result of the neural network model for the sample, and the dimension of the feature vector can be set according to the actual situation, which is not limited by the embodiment. It can be understood that when the neural network model learns the characteristics of the sample for the first time, the parameters in the neural network model are initialized randomly, and in the subsequent training process, the parameters of the neural network model can be updated according to the loss function constructed by the characteristic vector.

Optionally, in this embodiment, the full connection layer is configured to identify a category of the feature vector, that is, classify the feature vector, so that in a subsequent training process, a sample that is difficult to be classified is selected based on a classification result for training. And after the training of the neural network model is finished, the full connection layer can be deleted when the neural network model is applied, and at the moment, the neural network model is used for extracting the characteristic vector of the sample.

And step 130, constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs.

The loss function (loss function) is a function that maps the value of a random event or its related random variables to non-negative real numbers to represent the "risk" or "loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function. Embodiments stabilize the neural network model by converging the loss function in the neural network model. In the embodiment, when the loss function is constructed, an intra-class loss function is constructed according to the distribution of the feature vectors corresponding to the samples under each class in the feature space, an inter-class loss function is constructed according to the distribution of the cluster centers corresponding to each class in the feature space, and then the loss function is determined according to the intra-class loss function and the inter-class loss function. Wherein the position of the cluster center in the feature space is influenced by the feature vectors of the samples under the corresponding category.

The intra-class loss function considers the distribution condition of the feature vectors in the feature space under each class, and aims to enable the feature vectors under the same class to be more compact in the feature space when a sample passes through the neural network model so as to optimize the distance between the feature vectors under different classes. The specific construction mode of the intra-class loss function can be set according to actual conditions. For example, the distance between each feature vector and the center of each cluster is calculated. Optionally, when calculating the distance between the feature vector and the cluster center of the category to which the feature vector belongs, a learnable angle may be added to make the feature vector distance in the same category more compact. Then, an intra-class loss function is constructed by utilizing the softmax loss function according to each distance. Optionally, in order to make the learnable angle take a larger value, a penalty term is added to the intra-class loss function to improve the effect of the intra-class loss function. With the progress of the subsequent training process, the distance of the feature vectors corresponding to the samples belonging to the same category in the training data set in the feature space becomes closer and closer, that is, the feature vectors are gathered into a cluster. Further, each category corresponds to a cluster center, and the determination manner of the cluster center may be set according to an actual situation, for example, weight parameter vectors corresponding to different categories are set in the full connection layer, so that the feature vector under each category is mapped to the vicinity of the corresponding weight parameter vector, and at this time, the weight parameter vector of each category may be used as the cluster center under the corresponding category. It can be understood that in the first learning process of the neural network model, the weight parameter vector is initialized randomly, and in the subsequent process, the weight parameter vector can be updated by combining the feature vector obtained by learning. Optionally, the distance between the feature vector and the cluster center may be calculated by using a vector included angle formula, or the distance between the feature vector and the cluster center may be calculated by using other methods. Optionally, the penalty term may be obtained by calculating a mean of learnable angles corresponding to all the categories, or may be calculated in other manners. When constructing the intra-class loss function, other parameters may be set according to actual conditions, for example, the size of the batch size in the softmax loss function may be set according to actual conditions.

The inter-class loss function considers the distribution condition of cluster centers in the feature space under different classes, and the inter-class loss function can be regarded as a regular term, so that the distance of the feature vectors under different classes in the feature space is longer, and the feature vectors under the same class are more compact in the feature space. The specific construction mode of the inter-class loss function can be set according to actual conditions. For example, assume that there are M classes currently, the distances between the cluster center corresponding to each class and the centers of the other M-1 clusters are calculated, that is, M-1 inter-class distances are obtained, and then for each class, the inter-class distances meeting the set conditions among the M-1 inter-class distances are selected and summed to obtain the total distance corresponding to the current class. The setting condition may be set according to an actual situation, and the inter-class loss function is enabled to make the distances between the feature distances of different classes farther through the setting condition, for example, the setting condition is that the distance is greater than a set distance threshold, and at this time, only the inter-class distance greater than the distance threshold is selected to punish the inter-class distance greater than the distance threshold. Further, after the total distance corresponding to each category is obtained, the average of the total distances is taken as an inter-category loss function.

And 140, determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function.

And constructing a loss function based on the intra-class loss function and the inter-class loss function, wherein the loss function can be constructed in a weighted sum mode, namely the intra-class loss function and the inter-class loss function are subjected to weighted sum to obtain the loss function. It can be understood that, when performing the weighted sum, the magnitude of the weight value can be set according to the actual situation. It should be noted that, constructing the loss function by a weighted sum method is only an optional method, and in practical applications, the loss function may also be constructed by other methods.

Step 150, training the neural network model including the loss function until the loss function converges.

In one embodiment, the constructed loss function is used as a loss function of the neural network model, and the neural network model is trained. Because each sample in the training data set has a label for identifying the category, the training data set can be directly adopted for training when the neural network model is trained. Inputting each sample in the training data set into a neural network model, obtaining an output result of the neural network model, wherein the output result is the category to which the neural network model identifies each sample, then adjusting the corresponding parameter of the neural network model according to the output result and the category identified in the sample label, and repeating the operation until the loss function is stable, namely the output result of the neural network is stable. At this time, the training may be considered to be finished. Optionally, in order to ensure accuracy of the neural network model, after the loss function is stable, a difficult-to-sample mining mode is adopted, a more valuable sample is selected from the training data set to perform iterative training on the neural network model, and corresponding parameters of the neural network model are adjusted again according to an output result of the neural network model until the loss function is converged, at this time, accuracy of an output result of the obtained neural network model is better. The hard case mining mode can be set according to actual conditions, for example, a hard sample is selected, that is, a sample with a high error rate of neural network model class identification is selected as the hard sample, then, each hard sample is taken as the hard case mining sample, and the neural network model is iteratively trained according to the hard sample until the loss function converges. When the loss function is converged, the neural network model is stable, the accuracy rate achieves the expected effect, and the method can be directly applied.

The technical problem that the convolutional neural network cannot expand the distance of the feature vectors between samples of different categories to effectively distinguish the samples of different categories in the prior art can be solved by the technical scheme that the training data set with the category labels is obtained, then the feature vectors of the samples in the training data set are obtained according to the neural network model, the intra-class loss function and the inter-class loss function are constructed according to the feature vectors and the category to which the samples belong, then the loss function of the neural network model is constructed according to the intra-class loss function and the inter-class loss function, the neural network model containing the loss function is trained until the loss function converges. When the loss function is constructed, the intra-class loss function is constructed by considering the characteristic vectors of all samples, and the inter-class loss function is constructed by considering different classes, so that when the neural network model extracts the characteristic vectors, the characteristic vectors of the samples of the same class are closer to each other, the characteristic vectors of the samples of different classes are farther from each other, and a better metric learning result is realized.

Example two

Fig. 2 is a flowchart of a feature model training method according to a second embodiment of the present invention. The present embodiment is embodied on the basis of the above-described embodiments. Specifically, referring to fig. 2, the feature model training method provided in this embodiment specifically includes:

step 210, a training data set is obtained, each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs.

Step 220, inputting the training data set into a neural network model to obtain a feature vector of each sample.

And 230, acquiring a weight parameter matrix of a full connection layer in the neural network model, wherein the weight parameter of the pth column in the weight parameter matrix represents a weight parameter vector corresponding to the pth category.

The method comprises the steps that a weight parameter matrix is arranged on a full connection layer in a neural network model, classification results of samples can be obtained through the weight parameter matrix and feature vectors input to the full connection layer, the weight parameter matrix comprises a plurality of weight parameters, numerical values of the weight parameters can be set according to actual conditions, in the first learning process of the neural network model, the weight parameters are initialized randomly, in the subsequent process, the weight parameters can be updated by combining the feature vectors obtained through learning, further, the weight parameter matrix is set to be W, W ∈ R^K×CThat is, the size of W is K × C, where K is the total dimension of the feature vector, and if the feature vector is 512 dimensions, the value of K in the weight parameter matrix is 512. C is the total number of classes corresponding to each sample in the training data set, and the values of K and C can be set according to the actual situation.

And 240, determining the cluster center corresponding to each category according to the weight parameter matrix.

Specifically, the weight parameter vector corresponding to each category can be determined by the weight parameter matrix. When the neural network model obtains the feature vectors based on the samples, the feature vectors are distributed near the weight parameter vectors of the corresponding categories, so that the weight parameter vector of each category can be used as the cluster center corresponding to the category, namely the cluster centers of all the feature vectors of the category.

And step 250, determining an intra-class loss function according to the centers of the clusters and the feature vectors under the classes.

Specifically, after determining the feature vector of the sample under each category and the cluster center of each category, an intra-class loss function can be constructed. In the embodiment, the intra-class loss function is constructed by calculating the distance between the feature vector and the cluster center, and at this time, the setting of this step specifically includes steps 251 to 255:

step 251, determining a learnable angle according to the category to which the feature vector belongs.

In the embodiment, a learnable angle is added, the specific value of the learnable angle can be continuously updated along with the training process of the neural network model, typically, each category corresponds to one learnable angle, and the learnable angles corresponding to different categories may be the same or different. Learnable angles may be understood as adding a parameter to the angle to make the distance between feature vectors in the same class more compact when mapping the feature vectors. The angle refers to the degree of an included angle between the feature vector and the cluster center corresponding to the category to which the feature vector belongs. Note that, in some cases, the value of the learnable angle may be 0, and for example, when the feature vectors in the same category are already sufficiently compact with each other, the learnable angle may be set to 0.

Step 252, determining a first class inner distance between the feature vector and the target cluster center according to the feature vector, the target cluster center and the learnable angle, where the target cluster center is a cluster center corresponding to a class to which the feature vector belongs, and each feature vector corresponds to one first class inner distance.

Specifically, the first intra-class distance is a distance between a feature vector and a center of a corresponding target cluster, and may be represented by a cosine of an angle, that is, the distance may be understood as an angular distance between the feature vector and the center of the target cluster. And recording the cluster center corresponding to the type to which a certain sample belongs as the target cluster center corresponding to the sample. In one embodiment, a learnable angle is added to the distance. Illustratively, when calculating the distance within the first class, the current distance is calculated firstThe cosine of the angle between the feature vector and the center of the target cluster, i.e., the distance between the current feature vector and the center of the target cluster is calculated first. When calculating the distance, a vector angle formula can be used, and a mode of performing L2 regularization on the feature vector and the target cluster center can be used. For example, the feature vector of the current calculated distance is recorded as the current feature vector, and the feature vector corresponding to the jth sample is described as the current feature vector, at this time, the current feature vector may be recorded as y_jThe current feature vector belongs to the y category, y_jThe corresponding target cluster center is marked as W_y，y_jAnd W_yThe angle (i.e. included angle) between them is recorded as

At this time, the distance between the current feature vector and the corresponding target cluster center may be expressed as:

as can be seen from this formula, the,

can be given by y_jAnd W_yThus obtaining the product. For y_jAnd W_yWhen applying L2 regularization, the modulus of the vector, i.e., | | W_y||₂And y_j||₂Has a value of 1. And after obtaining the distance, adding a learnable angle into the distance, wherein the learnable angle corresponding to the y category is recorded as m_y. At this time, the learnable angle and the hyper-parameter are added to the distance to obtain the first-class inner distance. Representing the first class inner distance corresponding to the jth sample as R_jmAt this time, the process of the present invention,

wherein m is₁To calculate the hyper-parameter used in the first intra-class distance, in the embodiment, m is₁Is recorded as a first hyperparameter. It is understood that the hyper-parameter is a parameter set before the learning process is started, and is not parameter data obtained through training, and the specific value thereof can be set according to actual conditions. Is provided withThe benefit of setting the first hyperparameter is that the angle between the feature vector and the center of the target cluster can be made smaller. It will be appreciated that a smaller angle between two feature vectors indicates that they are more similar, and if the angle between two feature vectors is larger, indicates that they differ more. It should be noted that the first intra-class distance corresponding to each feature vector can be calculated in the above manner.

And 253, determining a second-class inner distance between the feature vector and the other cluster centers according to the feature vector and the other cluster centers, wherein the other cluster centers are cluster centers except the corresponding target cluster center in all the cluster centers, each feature vector corresponds to one group of other cluster centers, and each other cluster center corresponds to one second-class inner distance.

Typically, for the feature vector currently processed, the cluster centers corresponding to other categories in each cluster center except the target cluster center are denoted as other cluster centers. Assuming that the total number of classes is C, the number of other cluster centers is C-1 for the feature vector currently being processed. At this time, each feature vector corresponds to C-1 other cluster centers, and the other cluster centers corresponding to the feature vectors in the same category are the same.

Typically, the feature vector currently being processed is denoted as the current feature vector. Specifically, the distance between the current feature vector and the center of the corresponding other cluster is calculated and recorded as the distance in the second class, and the distance can also be understood as the cosine of the angle between the current feature vector and the center of the other cluster. At this point, there is a second intra-class distance between the current feature vector and the center of each of the other clusters. It will be appreciated that the second class of inner distances may utilize the vector angle formula and the manner of L2 regularization of feature vectors and other cluster centers. For example, the current feature vector is the feature vector corresponding to the jth sample and is denoted as y_j，y_jBelongs to the y category, and the ith other cluster center is marked as W_i，i≠y，y_jAnd W_iThe angle between is recorded as

Typically, the j sampleY of this correspondence_jThe second intra-cluster distance from the ith other cluster center is denoted as R_jiAt this time, the process of the present invention,

as can be seen from this formula, the,

can be given by y_jAnd W_iThus obtaining the product. For y_jAnd W_iWhen applying L2 regularization, the modulus of the vector, i.e., | | W_i||₂And y_j||₂Has a value of 1. At this time, for the current feature vector, each of the other cluster centers corresponds to one of the second intra-class distances. According to the method, the second intra-class distance corresponding to each feature vector can be calculated.

And step 254, determining a penalty term according to the learnable angles of all the categories.

The penalty term can also be understood as a regularization penalty term, and the purpose of adding the penalty term in the embodiment is to make the value of the learnable angle larger. The calculation formula of the penalty term is

Where C is the total number of classes, L_mAs a penalty term, m_yThe learnable angle corresponding to the y-th category.

It is to be understood that the execution sequence among step 252, step 253, and step 254 is not limited, for example, step 252, step 253, and step 254 may be executed simultaneously, or step 254 may be executed after step 252.

And 255, determining an intra-class loss function according to the first intra-class distance, the second intra-class distance and the penalty term.

Adopting a softmax loss function as an intra-class loss function, wherein the calculation formula of the intra-class loss function is as follows:

wherein S is_intraRepresenting loss-in-class functions, MThe number of samples selected for each iteration process in the neural network model training process, s is a scale factor, omega₁Is a first balance factor, R_jmAs the feature vector y of the jth sample_jThe corresponding in-class distance is the first,

W_ydenotes y_jCorresponding target cluster center, y_jBelongs to the y-th class, m₁Is the first hyperparameter, m_yLearnable angles, R, corresponding to the y-th category_jiIs y_jA second intra-class distance from the center of the ith other cluster,

at this time, s, M, ω₁The specific value of s can be set according to actual conditions, for example, s is set to be 32, 64 or 128 or the like in combination with the actual conditions. In the formula

The posterior probability that the jth sample belongs to the category y can be understood, the distance between the feature vector and the center of each cluster is considered, and the distance between the feature vectors of the same category is effectively guaranteed to be compact.

And step 260, determining an inter-class loss function according to the cluster centers.

Because the cluster centers of different classes are distributed unevenly, the distances between some cluster centers are closer, and the distances between some cluster centers are farther, in the embodiment, the distances between the cluster centers closer to each other are farther by punishing the angular distances. At this time, the inter-class loss function can be constructed by the distance between the centers of different clusters. Accordingly, the setting step 260 in the embodiment specifically includes steps 261 to 263:

and 261, calculating a first inter-class distance between different classes according to the cluster centers.

Specifically, the distance between any two cluster centers is calculated and recorded as a first inter-cluster distance, and the distance can be represented by the cosine of the angle between the two cluster centers. The first inter-class distance can also be calculated by utilizing a vector included angle formula and performing L2 regularization on the centers of the two clusters. For example, the cluster center corresponding to the z-th category is denoted as W_zZ is less than or equal to C, and the cluster center corresponding to the qth category is marked as W_qQ is not more than C, and q is not equal to z. At this time, the first inter-class distance between the cluster center corresponding to the z-th class and the cluster center corresponding to the q-th class is

Wherein the content of the first and second substances,

is the first inter-class distance. At this time, there are C-1 first inter-class distances for the cluster center corresponding to each sample. Wherein, for W_zAnd W_qWhen applying L2 regularization, the modulus of the vector, i.e., | | W_z||₂And W_q||₂Has a value of 1.

And 262, selecting a second inter-class distance larger than the distance threshold from all the first inter-class distances corresponding to each class, and calculating the sum of the second inter-class distances of each class.

Specifically, for the purpose of penalizing the angular distance, for the current cluster center, a cluster center farther away from the current cluster center is searched in the first inter-class distance corresponding to the current cluster center, and the first inter-class distance corresponding to the searched cluster center is recorded as the second inter-class distance. Optionally, the second inter-class distance is searched by setting a distance threshold, that is, the first inter-class distance greater than the distance threshold is searched in the first inter-class distance corresponding to the current cluster center and is recorded as the second inter-class distance. The distance threshold value can be set according to actual conditions. Each cluster center corresponds to at least one second inter-class distance, and then the second inter-class distances corresponding to each cluster center are added to obtain a second inter-class distance sum value, wherein each class corresponds to one second inter-class distance sum value.

For example, the current cluster center is the cluster center corresponding to the z-th class, and the corresponding distance and value between the second classes are D_zIs shown to be

Wherein m is₂The specific value of the second hyperparameter can be set according to actual conditions by setting m₂The distance threshold can be derived, i.e. only keeping more than cos (m)₂) And summing to obtain corresponding second inter-class distance sum values. It will be appreciated that each cluster center corresponds to a second inter-class distance and value.

And 263, determining an inter-class loss function according to the second inter-class distance and the second inter-class distance corresponding to each class.

Specifically, the second inter-class distance and the second inter-class distance are averaged to construct an inter-class loss function, and at this time, a calculation formula of the inter-class loss function is as follows:

Step 270, determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function.

Wherein, the calculation formula of the loss function of the neural network model is as follows: s_loss＝ω₂S_inter+S_intraWherein S is_lossRepresents the loss function, S_intraRepresenting the loss-in-class function, S_interRepresenting the inter-class loss function, ω₂The second balance factor can also be understood as a weight of the inter-class loss function, and a specific numerical value of the second balance factor can be set according to actual conditions.

Step 280, updating the neural network model including the loss function by using a gradient descent algorithm until the loss function meets a stable condition.

Specifically, the gradient descent algorithm (gradient device) is an optimization algorithm, which can be used to recursively approximate the minimum deviation model in machine learning and artificial intelligence. And after the loss function is constructed, iterative training is carried out on the neural network model by using a gradient descent algorithm so as to realize training of the neural network model. Wherein, the samples in the training data set are used as training data to train the neural network model. In the training process, parameters (such as weight parameters) in the neural network model can be updated until the loss function meets the stable condition. The stability condition can be set according to actual conditions, and when the loss function meets the stability condition, the output result of the neural network model is stable, and the result deviation is small. It can be understood that, in the training process, a loss value can be calculated according to the loss function in each iteration process, and parameters in the neural network model can be updated reversely according to the loss value, that is, the parameters in the neural network model can be updated in each iteration process.

Step 290, initializing a sampling probability for each sample in the training data set, wherein the initialized sampling probabilities for each sample are the same.

Specifically, after the neural network model is stable, the neural network model is trained in a difficult excavation mode, so that the accuracy of the neural network model is further improved. When difficult mining is carried out, a sampling probability is initialized for each sample in the training data set, and the sampling probability of each sample is the same. The sampling probability represents the probability of sampling the corresponding sample in the training process of the neural network model. The specific value of the initialized sampling probability may be set according to actual conditions, and the embodiment does not limit this.

Step 2100, iteratively training the neural network model including the loss function based on the training data set.

Step 2110, in an iterative training process, determining correctly classified samples and incorrectly classified samples in the training data set.

Specifically, taking a training process as an example, at this time, a classification result of the neural network model for the samples sampled in the training process is obtained. Then, the classification result is compared with the classification described in the label of the corresponding sample, and if the classification result is the same as the classification described in the label, it is determined that the corresponding sample is correctly classified by the neural network model, and if the classification result is different from the classification described in the label, it is determined that the corresponding sample is incorrectly classified by the neural network model. It will be appreciated that during each training process, both correctly classified samples and misclassified samples are available.

Step 2120, decreasing the sampling probability of the correctly classified sample by a first value and increasing the sampling probability of the incorrectly classified sample by a second value.

Specifically, in the one-time training process, when the sample is correctly classified, it is indicated that the neural network model can accurately identify the sample, and thus, the sampling probability of the sample can be reduced. In an embodiment, the sampling probability is reduced by reducing a first value to the sampling probability, where the first value may be set according to an actual situation. Accordingly, in one training process, when the sample is classified incorrectly, it indicates that the neural network model cannot accurately identify the sample, and thus, the sampling probability of the sample can be increased. In the embodiment, the sampling probability is increased by adding a second value to the sampling probability, where the second value may be set according to an actual situation, and may be the same as or different from the first value. In the iterative training process, after each training is finished, the sampling probability of the correctly classified samples is reduced by a first value, and the sampling probability of the incorrectly classified samples is increased by a second value. With the iterative training, the probability that the correctly classified samples are sampled by the neural network model is smaller, and the probability that the incorrectly classified samples are sampled by the neural network model is larger.

And 2130, deleting the samples with the sampling probability lower than the probability threshold in the training data set, and continuing the iterative training until the loss function is converged.

In one embodiment, the value of the probability threshold may be set according to actual conditions, and when the sampling probability is lower than the probability threshold, it indicates that the probability that the sample corresponding to the sampling probability is accurately identified by the neural network model is high. Therefore, the sample in the training data set can be deleted, and the sample which cannot be accurately identified by the neural network model is retained, that is, the sample with high sampling probability is retained, and at this time, the training data set can be considered to be difficultly mined. Optionally, each time training is completed, the sample may be screened once by a probability threshold. In the subsequent iterative training process of the neural network model, training can be performed only on samples with high sampling probability, namely, training is performed on samples which are easy to be identified by mistake gradually in the training process, and parameters of the neural network model are adjusted in the training process until the loss function is converged.

The method comprises the steps of obtaining a training data set, determining a feature vector of each sample in the training data set through a neural network model, then obtaining a weight parameter matrix of a full connection layer in the neural network model, determining a cluster center corresponding to each category according to the weight parameter matrix, then constructing an intra-class loss function according to the cluster center and the feature vector, constructing an inter-class loss function according to the cluster center, constructing a loss function of the neural network model according to the intra-class loss function and the inter-class loss function, and then training the neural network model until the loss function converges. When the loss function is constructed, not only the characteristic vectors of all samples are considered to construct the intra-class loss function, but also the cluster centers of different classes are considered to construct the inter-class loss function. And a learnable angle is added when an intra-class loss function is constructed so as to limit the distance between the feature vectors in the same class, and meanwhile, a penalty term is added so as to increase the learnable angle, so that the feature vectors in the same class are more compact. When constructing the inter-class loss function, the penalty distance is realized by setting a distance threshold value so as to enable the distance between the feature vectors under different classes to be longer. Meanwhile, when the neural network model is trained, the sampling probability of each sample is changed through the recognition result, difficult samples in the iterative training process of the neural network model can be mined, and then the neural network model is trained based on the difficult samples, so that the processing accuracy of the neural network model is further improved.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a feature model training device according to a third embodiment of the present invention. Referring to fig. 3, the feature model training apparatus provided in this embodiment includes: a data acquisition module 301, a feature determination module 302, a first construction module 303, a second construction module 304, and a model training module 305.

The data acquisition module 301 is configured to acquire a training data set, where each sample in the training data set corresponds to a label, and the label is used to identify a category to which the corresponding sample belongs; a feature determination module 302, configured to input the training data set to a neural network model to obtain a feature vector of each sample; a first constructing module 303, configured to construct an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs; a second building module 304, configured to determine a loss function of the neural network model according to the intra-class loss function and the inter-class loss function; a model training module 305 for training the neural network model including the loss function until the loss function converges.

The technical problem that the convolutional neural network cannot expand the distance of the feature vectors between samples of different categories to effectively distinguish the samples of different categories in the prior art can be solved by the technical scheme that the training data set with the category labels is obtained, then the feature vectors of the samples in the training data set are obtained according to the neural network model, the intra-class loss function and the inter-class loss function are constructed according to the feature vectors and the category to which the samples belong, then the loss function of the neural network model is constructed according to the intra-class loss function and the inter-class loss function, the neural network model containing the loss function is trained until the loss function converges. When the loss function is constructed, the intra-class loss function is constructed by considering all the characteristic vectors, and the inter-class loss function is constructed by considering different classes, so that when the characteristic vectors are extracted by the neural network model, the characteristic vectors of the samples of the same class are closer to each other, the characteristic vectors of the samples of different classes are farther from each other, and a better metric learning result is realized.

On the basis of the above embodiment, the first building block 303 includes: the parameter acquisition unit is used for acquiring a weight parameter matrix of a full connection layer in the neural network model, wherein the weight parameter of the pth column in the weight parameter matrix represents a weight parameter vector corresponding to the pth category; the cluster center determining unit is used for determining the cluster center corresponding to each category according to the weight parameter matrix; an intra-class loss function determining unit, configured to determine an intra-class loss function according to each cluster center and the feature vector under each category; and the inter-class loss function determining unit is used for determining an inter-class loss function according to each cluster center.

On the basis of the above embodiment, the intra-class loss function determining unit includes: the learnable angle determining subunit is used for determining a learnable angle according to the category to which the feature vector belongs; a first distance determining subunit, configured to determine, according to the feature vector, a target cluster center and the learnable angle, a first intra-class distance between the feature vector and the target cluster center, where the target cluster center is a cluster center corresponding to a class to which the feature vector belongs, and each feature vector corresponds to one first intra-class distance; a second distance determining subunit, configured to determine, according to the feature vector and other cluster centers, a second intra-class distance between the feature vector and the other cluster centers, where the other cluster centers are cluster centers excluding a corresponding target cluster center from all cluster centers, each feature vector corresponds to a group of other cluster centers, and each other cluster center corresponds to a second intra-class distance; the penalty item determining subunit is used for determining penalty items according to the learnable angles of all the categories; and the intra-class loss function constructing subunit is used for determining the intra-class loss function according to the first intra-class distance, the second intra-class distance and the penalty term.

On the basis of the above embodiment, the calculation formula of the intra-class loss function is as follows:

on the basis of the above embodiment, the inter-class loss function determining unit includes: a third distance determining subunit, configured to calculate a first inter-class distance between different classes according to each cluster center; a fourth distance determining subunit, configured to select a second inter-class distance greater than the distance threshold from all the first inter-class distances corresponding to each class, and calculate a sum of the second inter-class distances for each class; and the inter-class loss function constructing subunit is used for determining the inter-class loss function according to the second inter-class distance and the value corresponding to each class.

On the basis of the above embodiment, the calculation formula of the inter-class loss function is as follows:

On the basis of the above embodiment, the calculation formula of the loss function of the neural network model is as follows: s_loss＝ω₂S_inter+S_intraWherein S is_lossRepresents the loss function, S_intraRepresenting the loss-in-class function, S_interRepresenting the inter-class loss function, ω₂Is the second balance factor.

On the basis of the above embodiment, the model training module 305 includes: a model updating unit for updating the neural network model including the loss function by using a gradient descent algorithm until the loss function satisfies a stable condition; a probability initialization unit, configured to initialize a sampling probability for each sample in the training data set, where the initialized sampling probabilities of each sample are the same; an iterative training unit for iteratively training the neural network model including the loss function based on the training data set; the sample identification unit is used for determining correctly classified samples and incorrectly classified samples in the training data set in the iterative training process; a probability updating unit for decreasing the sampling probability of the correctly classified sample by a first value and increasing the sampling probability of the incorrectly classified sample by a second value; and the sample deleting unit is used for deleting the samples with the sampling probability lower than the probability threshold in the training data set and continuing the iterative training until the loss function is converged.

The feature model training device provided by this embodiment is included in the feature model training device, and can be used to execute the feature model training method provided by any of the above embodiments, and has corresponding functions and beneficial effects.

Example four

Fig. 4 is a schematic structural diagram of a feature model training device according to a fourth embodiment of the present invention. Specifically, as shown in fig. 4, the feature model training apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of the processors 40 in the feature model training device may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the memory 41, the input device 42 and the output device 43 in the feature model training apparatus may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example.

The memory 41, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules in the feature model training method in the embodiment of the present invention (for example, the data acquisition module 301, the feature determination module 302, the first construction module 303, the second construction module 304, and the model training module 305 in the feature model training apparatus). The processor 40 executes various functional applications and data processing of the feature model training apparatus by executing software programs, instructions and modules stored in the memory 41, namely, implements the feature model training method provided by any of the above-described embodiments.

The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the feature model training apparatus, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the feature model training device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 42 may be used to receive input numeric or character information and generate key signal inputs relating to user settings and function controls of the feature model training apparatus. The output device 43 may include a display screen, a speaker, etc. The feature model training device may further comprise communication means (not shown) operable to communicate data with other devices.

The feature model training device includes the feature model training apparatus provided in the third embodiment, and may be used to execute the feature model training method provided in any embodiment of the present invention, and has corresponding functions and beneficial effects.

EXAMPLE five

Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for feature model training, the method comprising:

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the feature model training method provided by any embodiments of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, and the computer software product may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute the feature model training method according to the embodiments of the present invention.

It should be noted that, in the embodiment of the feature model training apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A feature model training method is characterized by comprising the following steps:

2. The feature model training method according to claim 1, wherein the constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs comprises:

and determining an inter-class loss function according to each cluster center.

3. The method of claim 2, wherein determining an intra-class loss function according to the cluster centers and the feature vectors under the classes comprises:

determining a penalty item according to the learnable angles of all categories;

4. The feature model training method according to claim 3, wherein the calculation formula of the intra-class loss function is:

wherein S is_intraRepresenting an intra-class loss function, M is the number of samples selected in each iteration process in the training process of the neural network model, s is a scale factor, and omega₁Is a first balance factor, R_jmFor the j sampleFeature vector y_jThe corresponding in-class distance is the first,

W_ydenotes y_jCorresponding target cluster center, y_jBelongs to the y-th class, m₁Is the first hyperparameter, m_yLearnable angles, R, corresponding to the y-th category_jiIs y_jSecond-class inner distance, R, from the center of the ith other cluster_ji＝W_i ^Ty_j，W_iFor the ith other cluster center, C is the total number of classes, L_mIn order to be a penalty term,

5. the feature model training method of claim 2, wherein the determining an inter-class loss function from each of the cluster centers comprises:

6. The feature model training method according to claim 5, wherein the calculation formula of the inter-class loss function is:

7. The feature model training method according to claim 1, wherein the calculation formula of the loss function of the neural network model is: s_loss＝ω₂S_inter+S_intraWherein S is_lossRepresents the loss function, S_intraRepresenting the loss-in-class function, S_interRepresenting the inter-class loss function, ω₂Is the second balance factor.

8. The feature model training method of claim 1, wherein the training the neural network model including the loss function until the loss function converges comprises:

9. A feature model training device, comprising:

10. A feature model training apparatus, characterized by comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the feature model training method of any one of claims 1-8.

11. A storage medium containing computer-executable instructions for performing the feature model training method of any one of claims 1-8 when executed by a computer processor.