CN111553399A - Feature model training method, device, equipment and storage medium - Google Patents

Feature model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN111553399A
CN111553399A CN202010319373.0A CN202010319373A CN111553399A CN 111553399 A CN111553399 A CN 111553399A CN 202010319373 A CN202010319373 A CN 202010319373A CN 111553399 A CN111553399 A CN 111553399A
Authority
CN
China
Prior art keywords
class
loss function
inter
sample
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010319373.0A
Other languages
Chinese (zh)
Inventor
黄振杰
李德紘
张少文
冯琰一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Huazhiyuan Information Engineering Co ltd
Guangzhou Jiadu Technology Software Development Co ltd
Guangzhou Xinke Jiadu Technology Co Ltd
PCI Suntek Technology Co Ltd
Original Assignee
Guangdong Huazhiyuan Information Engineering Co ltd
Guangzhou Jiadu Technology Software Development Co ltd
Guangzhou Xinke Jiadu Technology Co Ltd
PCI Suntek Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Huazhiyuan Information Engineering Co ltd, Guangzhou Jiadu Technology Software Development Co ltd, Guangzhou Xinke Jiadu Technology Co Ltd, PCI Suntek Technology Co Ltd filed Critical Guangdong Huazhiyuan Information Engineering Co ltd
Priority to CN202010319373.0A priority Critical patent/CN111553399A/en
Publication of CN111553399A publication Critical patent/CN111553399A/en
Priority to CN202110426522.8A priority patent/CN112949780B/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for training a feature model, which relate to the technical field of metric learning and comprise the following steps: acquiring a training data set, wherein each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs; inputting the training data set into a neural network model to obtain a feature vector of each sample; constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs; determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function; the neural network model containing the loss function is trained until the loss function converges. By adopting the scheme, the technical problem that the convolutional neural network cannot expand the distance of the feature vectors between samples of different classes to cause that the samples of different classes cannot be effectively distinguished in the prior art can be solved.

Description

Feature model training method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of metric learning, in particular to a feature model training method, a device, equipment and a storage medium.
Background
Metric learning may also be understood as similarity learning, the objective of which is to make the degree of similarity between samples of the same category high and the degree of similarity between samples of different categories low. With the development of deep learning, metric learning can be realized by means of a convolutional neural network. In the prior art, a convolutional neural network obtains feature vectors of each sample in a processing process, and gathers the feature vectors of the same type, that is, the feature vectors of the same type of sample are closer to each other. However, for samples of different classes, the convolutional neural network cannot expand the distance of the feature vectors between the samples of different classes, so that the distance of the feature vectors between the samples of different classes in the convolutional neural network is short, and further the samples of different classes cannot be effectively distinguished based on the feature vectors, that is, the learning effect of metric learning is affected.
Disclosure of Invention
The invention provides a feature model training method, a device, equipment and a storage medium, which aim to solve the technical problem that samples of different classes cannot be effectively distinguished because a convolutional neural network cannot expand the distance of feature vectors between the samples of different classes in the prior art.
In a first aspect, an embodiment of the present invention provides a feature model training method, including:
acquiring a training data set, wherein each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs;
inputting the training data set into a neural network model to obtain a feature vector of each sample;
constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs;
determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function;
training the neural network model including the loss function until the loss function converges.
Further, the constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs includes:
acquiring a weight parameter matrix of a full connection layer in the neural network model, wherein the weight parameter of the pth column in the weight parameter matrix represents a weight parameter vector corresponding to the pth category;
determining cluster centers corresponding to all categories according to the weight parameter matrix;
determining an intra-class loss function according to the cluster centers and the feature vectors under the classes;
and determining an inter-class loss function according to each cluster center.
Further, the determining an intra-class loss function according to the cluster centers and the feature vectors in the categories includes:
determining a learnable angle according to the category to which the feature vector belongs;
determining a first class internal distance between the feature vector and the target cluster center according to the feature vector, the target cluster center and the learnable angle, wherein the target cluster center is a cluster center corresponding to the class to which the feature vector belongs, and each feature vector corresponds to one first class internal distance;
determining a second-class inner distance between the feature vector and the other cluster centers according to the feature vector and the other cluster centers, wherein the other cluster centers are cluster centers except the corresponding target cluster center in all the cluster centers, each feature vector corresponds to a group of other cluster centers, and each other cluster center corresponds to a second-class inner distance;
determining a penalty item according to the learnable angles of all categories;
and determining an intra-class loss function according to the first intra-class distance, the second intra-class distance and the penalty term.
Further, the calculation formula of the intra-class loss function is as follows:
Figure BDA0002460770000000021
wherein S isintraRepresenting an intra-class loss function, M is the number of samples selected in each iteration process in the training process of the neural network model, s is a scale factor, and omega1Is a first balance factor, RjmAs the feature vector y of the jth samplejThe corresponding in-class distance is the first,
Figure BDA0002460770000000022
Figure BDA0002460770000000023
is yjAnd yjThe angle between the centers of the corresponding target clusters,
Figure BDA0002460770000000024
Wydenotes yjCorresponding target cluster center, yjBelongs to the y-th class, m1Is the first hyperparameter, myLearnable angles, R, corresponding to the y-th categoryjiIs yjWith the ith other clusterThe second type of inner distance between the centers,
Figure BDA0002460770000000031
Wifor the ith other cluster center, C is the total number of classes, LmIn order to be a penalty term,
Figure BDA0002460770000000032
further, the determining an inter-class loss function according to each cluster center includes:
calculating first inter-class distances among different classes according to the cluster centers;
selecting a second inter-class distance larger than a distance threshold value from all first inter-class distances corresponding to each class, and calculating the second inter-class distance sum value of each class;
and determining an inter-class loss function according to the second inter-class distance and the value corresponding to each class.
Further, the calculation formula of the inter-class loss function is as follows:
Figure BDA0002460770000000033
wherein S isinterRepresenting the loss function between classes, C being the total number of classes, DzFor the second inter-class distance and value corresponding to the z-th class,
Figure BDA0002460770000000034
Figure BDA0002460770000000035
is the first inter-class distance, m, between the z-th class and the q-th class2Is the second hyperparameter.
Further, the calculation formula of the loss function of the neural network model is as follows: sloss=ω2Sinter+SintraWherein S islossRepresents the loss function, SintraRepresenting the loss-in-class function, SinterRepresenting the inter-class loss function, ω2Is the second balance factor.
Further, the training the neural network model including the loss function until the loss function converges comprises:
updating the neural network model containing the loss function with a gradient descent algorithm until the loss function satisfies a stability condition;
initializing a sampling probability for each sample in the training data set, the initialized sampling probabilities for each sample being the same;
iteratively training the neural network model including the loss function based on the training data set;
in an iterative training process, determining correctly classified samples and incorrectly classified samples in the training data set;
decreasing the sampling probability of the correctly classified sample by a first value and increasing the sampling probability of the incorrectly classified sample by a second value;
and deleting the samples with the sampling probability lower than the probability threshold in the training data set, and continuing the iterative training until the loss function is converged.
In a second aspect, an embodiment of the present invention further provides a feature model training apparatus, including:
the data acquisition module is used for acquiring a training data set, each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs;
the characteristic determining module is used for inputting the training data set into a neural network model to obtain a characteristic vector of each sample;
the first construction module is used for constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs;
a second construction module for determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function;
a model training module for training the neural network model including the loss function until the loss function converges.
In a third aspect, an embodiment of the present invention further provides a feature model training device, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the feature model training method of the first aspect.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the feature model training method according to the first aspect.
According to the characteristic model training method, the device, the equipment and the storage medium, the technical problem that the convolutional neural network cannot expand the distance of characteristic vectors between samples of different types to effectively distinguish the samples of different types in the prior art can be solved by the technical scheme that the training data set with class labels is obtained, then characteristic vectors of all samples in the training data set are obtained according to the neural network model, an intra-class loss function and an inter-class loss function are constructed according to the characteristic vectors and the class to which the samples belong, then a loss function of the neural network model is constructed according to the intra-class loss function and the inter-class loss function, the neural network model containing the loss function is trained until the loss function converges. When the loss function is constructed, the intra-class loss function is constructed by considering the characteristic vectors of all samples, and the inter-class loss function is constructed by considering different classes, so that when the characteristic vectors are extracted by the neural network model, the characteristic vectors of the samples of the same class are closer to each other, the characteristic vectors of the samples of different classes are farther from each other, and a better metric learning result is realized.
Drawings
Fig. 1 is a flowchart of a feature model training method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a feature model training method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a feature model training apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a feature model training device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not limitation. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a feature model training method according to an embodiment of the present invention. The feature model training method provided in the embodiment may be executed by a feature model training apparatus, which may be implemented in software and/or hardware and integrated in a feature model training device. The feature model training device may be an intelligent device with data processing and analyzing capabilities, such as a tablet computer and a desktop computer, and the feature model training device may be an independent intelligent device or may be composed of a plurality of intelligent devices capable of data communication.
Specifically, referring to fig. 1, the feature model training method specifically includes:
step 110, a training data set is obtained, each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs.
Illustratively, a training data set refers to a set of samples used to train a feature model. In one embodiment, the feature model has a main function of recognizing the features of the sample, that is, the feature model can recognize the features of the sample and map the features into the feature space, and then adjust the parameters of the feature model based on the result of feature recognition, so that in the subsequent working process of the feature model, the features with high similarity are closer to each other in the feature space, and the features with low similarity are farther from each other in the feature space. It is understood that after the feature model identifies the features of the sample, the class of the sample can be determined based on the features, i.e., the sample is classified. Optionally, the number and type of the samples in the training data set may be set according to actual conditions. Typically, the samples in the training data set are of the same type, e.g., the samples in the training data are all picture types. It should be noted that the embodiment does not limit the collection manner of each sample in the training data set.
Typically, there is one label per sample in the training dataset. The label is used for identifying the category to which the sample belongs, wherein the display format embodiment of the category in the label is not limited. It should be noted that each sample has a corresponding category, and there may be at least one sample in the same category. For example, if n samples are included in the training sample set, then n labels are included in the corresponding label set, each sample corresponds to one label, and the same category may be recorded in different labels.
And 120, inputting the training data set into a neural network model to obtain a feature vector of each sample.
In this embodiment, the feature model is a neural network model, wherein the neural network model may be a deep neural network model, a specific network structure of the neural network model may be set according to an actual situation, and in the embodiment, the last layer of the neural network model is described as a full connection layer. Specifically, each sample in the training data set is input into the neural network model, the characteristics of each sample are learned through the neural network model, and data to be input into the full connection layer is obtained, that is, the output of the neural network model except the full connection layer is obtained. In an embodiment, the output is recorded as a feature vector, where each sample corresponds to one feature vector. The feature vector can be understood as a feature learning result of the neural network model for the sample, and the dimension of the feature vector can be set according to the actual situation, which is not limited by the embodiment. It can be understood that when the neural network model learns the characteristics of the sample for the first time, the parameters in the neural network model are initialized randomly, and in the subsequent training process, the parameters of the neural network model can be updated according to the loss function constructed by the characteristic vector.
Optionally, in this embodiment, the full connection layer is configured to identify a category of the feature vector, that is, classify the feature vector, so that in a subsequent training process, a sample that is difficult to be classified is selected based on a classification result for training. And after the training of the neural network model is finished, the full connection layer can be deleted when the neural network model is applied, and at the moment, the neural network model is used for extracting the characteristic vector of the sample.
And step 130, constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs.
The loss function (loss function) is a function that maps the value of a random event or its related random variables to non-negative real numbers to represent the "risk" or "loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function. Embodiments stabilize the neural network model by converging the loss function in the neural network model. In the embodiment, when the loss function is constructed, an intra-class loss function is constructed according to the distribution of the feature vectors corresponding to the samples under each class in the feature space, an inter-class loss function is constructed according to the distribution of the cluster centers corresponding to each class in the feature space, and then the loss function is determined according to the intra-class loss function and the inter-class loss function. Wherein the position of the cluster center in the feature space is influenced by the feature vectors of the samples under the corresponding category.
The intra-class loss function considers the distribution condition of the feature vectors in the feature space under each class, and aims to enable the feature vectors under the same class to be more compact in the feature space when a sample passes through the neural network model so as to optimize the distance between the feature vectors under different classes. The specific construction mode of the intra-class loss function can be set according to actual conditions. For example, the distance between each feature vector and the center of each cluster is calculated. Optionally, when calculating the distance between the feature vector and the cluster center of the category to which the feature vector belongs, a learnable angle may be added to make the feature vector distance in the same category more compact. Then, an intra-class loss function is constructed by utilizing the softmax loss function according to each distance. Optionally, in order to make the learnable angle take a larger value, a penalty term is added to the intra-class loss function to improve the effect of the intra-class loss function. With the progress of the subsequent training process, the distance of the feature vectors corresponding to the samples belonging to the same category in the training data set in the feature space becomes closer and closer, that is, the feature vectors are gathered into a cluster. Further, each category corresponds to a cluster center, and the determination manner of the cluster center may be set according to an actual situation, for example, weight parameter vectors corresponding to different categories are set in the full connection layer, so that the feature vector under each category is mapped to the vicinity of the corresponding weight parameter vector, and at this time, the weight parameter vector of each category may be used as the cluster center under the corresponding category. It can be understood that in the first learning process of the neural network model, the weight parameter vector is initialized randomly, and in the subsequent process, the weight parameter vector can be updated by combining the feature vector obtained by learning. Optionally, the distance between the feature vector and the cluster center may be calculated by using a vector included angle formula, or the distance between the feature vector and the cluster center may be calculated by using other methods. Optionally, the penalty term may be obtained by calculating a mean of learnable angles corresponding to all the categories, or may be calculated in other manners. When constructing the intra-class loss function, other parameters may be set according to actual conditions, for example, the size of the batch size in the softmax loss function may be set according to actual conditions.
The inter-class loss function considers the distribution condition of cluster centers in the feature space under different classes, and the inter-class loss function can be regarded as a regular term, so that the distance of the feature vectors under different classes in the feature space is longer, and the feature vectors under the same class are more compact in the feature space. The specific construction mode of the inter-class loss function can be set according to actual conditions. For example, assume that there are M classes currently, the distances between the cluster center corresponding to each class and the centers of the other M-1 clusters are calculated, that is, M-1 inter-class distances are obtained, and then for each class, the inter-class distances meeting the set conditions among the M-1 inter-class distances are selected and summed to obtain the total distance corresponding to the current class. The setting condition may be set according to an actual situation, and the inter-class loss function is enabled to make the distances between the feature distances of different classes farther through the setting condition, for example, the setting condition is that the distance is greater than a set distance threshold, and at this time, only the inter-class distance greater than the distance threshold is selected to punish the inter-class distance greater than the distance threshold. Further, after the total distance corresponding to each category is obtained, the average of the total distances is taken as an inter-category loss function.
And 140, determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function.
And constructing a loss function based on the intra-class loss function and the inter-class loss function, wherein the loss function can be constructed in a weighted sum mode, namely the intra-class loss function and the inter-class loss function are subjected to weighted sum to obtain the loss function. It can be understood that, when performing the weighted sum, the magnitude of the weight value can be set according to the actual situation. It should be noted that, constructing the loss function by a weighted sum method is only an optional method, and in practical applications, the loss function may also be constructed by other methods.
Step 150, training the neural network model including the loss function until the loss function converges.
In one embodiment, the constructed loss function is used as a loss function of the neural network model, and the neural network model is trained. Because each sample in the training data set has a label for identifying the category, the training data set can be directly adopted for training when the neural network model is trained. Inputting each sample in the training data set into a neural network model, obtaining an output result of the neural network model, wherein the output result is the category to which the neural network model identifies each sample, then adjusting the corresponding parameter of the neural network model according to the output result and the category identified in the sample label, and repeating the operation until the loss function is stable, namely the output result of the neural network is stable. At this time, the training may be considered to be finished. Optionally, in order to ensure accuracy of the neural network model, after the loss function is stable, a difficult-to-sample mining mode is adopted, a more valuable sample is selected from the training data set to perform iterative training on the neural network model, and corresponding parameters of the neural network model are adjusted again according to an output result of the neural network model until the loss function is converged, at this time, accuracy of an output result of the obtained neural network model is better. The hard case mining mode can be set according to actual conditions, for example, a hard sample is selected, that is, a sample with a high error rate of neural network model class identification is selected as the hard sample, then, each hard sample is taken as the hard case mining sample, and the neural network model is iteratively trained according to the hard sample until the loss function converges. When the loss function is converged, the neural network model is stable, the accuracy rate achieves the expected effect, and the method can be directly applied.
The technical problem that the convolutional neural network cannot expand the distance of the feature vectors between samples of different categories to effectively distinguish the samples of different categories in the prior art can be solved by the technical scheme that the training data set with the category labels is obtained, then the feature vectors of the samples in the training data set are obtained according to the neural network model, the intra-class loss function and the inter-class loss function are constructed according to the feature vectors and the category to which the samples belong, then the loss function of the neural network model is constructed according to the intra-class loss function and the inter-class loss function, the neural network model containing the loss function is trained until the loss function converges. When the loss function is constructed, the intra-class loss function is constructed by considering the characteristic vectors of all samples, and the inter-class loss function is constructed by considering different classes, so that when the neural network model extracts the characteristic vectors, the characteristic vectors of the samples of the same class are closer to each other, the characteristic vectors of the samples of different classes are farther from each other, and a better metric learning result is realized.
Example two
Fig. 2 is a flowchart of a feature model training method according to a second embodiment of the present invention. The present embodiment is embodied on the basis of the above-described embodiments. Specifically, referring to fig. 2, the feature model training method provided in this embodiment specifically includes:
step 210, a training data set is obtained, each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs.
Step 220, inputting the training data set into a neural network model to obtain a feature vector of each sample.
And 230, acquiring a weight parameter matrix of a full connection layer in the neural network model, wherein the weight parameter of the pth column in the weight parameter matrix represents a weight parameter vector corresponding to the pth category.
The method comprises the steps that a weight parameter matrix is arranged on a full connection layer in a neural network model, classification results of samples can be obtained through the weight parameter matrix and feature vectors input to the full connection layer, the weight parameter matrix comprises a plurality of weight parameters, numerical values of the weight parameters can be set according to actual conditions, in the first learning process of the neural network model, the weight parameters are initialized randomly, in the subsequent process, the weight parameters can be updated by combining the feature vectors obtained through learning, further, the weight parameter matrix is set to be W, W ∈ RK×CThat is, the size of W is K × C, where K is the total dimension of the feature vector, and if the feature vector is 512 dimensions, the value of K in the weight parameter matrix is 512. C is the total number of classes corresponding to each sample in the training data set, and the values of K and C can be set according to the actual situation.
And 240, determining the cluster center corresponding to each category according to the weight parameter matrix.
Specifically, the weight parameter vector corresponding to each category can be determined by the weight parameter matrix. When the neural network model obtains the feature vectors based on the samples, the feature vectors are distributed near the weight parameter vectors of the corresponding categories, so that the weight parameter vector of each category can be used as the cluster center corresponding to the category, namely the cluster centers of all the feature vectors of the category.
And step 250, determining an intra-class loss function according to the centers of the clusters and the feature vectors under the classes.
Specifically, after determining the feature vector of the sample under each category and the cluster center of each category, an intra-class loss function can be constructed. In the embodiment, the intra-class loss function is constructed by calculating the distance between the feature vector and the cluster center, and at this time, the setting of this step specifically includes steps 251 to 255:
step 251, determining a learnable angle according to the category to which the feature vector belongs.
In the embodiment, a learnable angle is added, the specific value of the learnable angle can be continuously updated along with the training process of the neural network model, typically, each category corresponds to one learnable angle, and the learnable angles corresponding to different categories may be the same or different. Learnable angles may be understood as adding a parameter to the angle to make the distance between feature vectors in the same class more compact when mapping the feature vectors. The angle refers to the degree of an included angle between the feature vector and the cluster center corresponding to the category to which the feature vector belongs. Note that, in some cases, the value of the learnable angle may be 0, and for example, when the feature vectors in the same category are already sufficiently compact with each other, the learnable angle may be set to 0.
Step 252, determining a first class inner distance between the feature vector and the target cluster center according to the feature vector, the target cluster center and the learnable angle, where the target cluster center is a cluster center corresponding to a class to which the feature vector belongs, and each feature vector corresponds to one first class inner distance.
Specifically, the first intra-class distance is a distance between a feature vector and a center of a corresponding target cluster, and may be represented by a cosine of an angle, that is, the distance may be understood as an angular distance between the feature vector and the center of the target cluster. And recording the cluster center corresponding to the type to which a certain sample belongs as the target cluster center corresponding to the sample. In one embodiment, a learnable angle is added to the distance. Illustratively, when calculating the distance within the first class, the current distance is calculated firstThe cosine of the angle between the feature vector and the center of the target cluster, i.e., the distance between the current feature vector and the center of the target cluster is calculated first. When calculating the distance, a vector angle formula can be used, and a mode of performing L2 regularization on the feature vector and the target cluster center can be used. For example, the feature vector of the current calculated distance is recorded as the current feature vector, and the feature vector corresponding to the jth sample is described as the current feature vector, at this time, the current feature vector may be recorded as yjThe current feature vector belongs to the y category, yjThe corresponding target cluster center is marked as Wy,yjAnd WyThe angle (i.e. included angle) between them is recorded as
Figure BDA0002460770000000111
At this time, the distance between the current feature vector and the corresponding target cluster center may be expressed as:
Figure BDA0002460770000000112
as can be seen from this formula, the,
Figure BDA0002460770000000113
can be given by yjAnd WyThus obtaining the product. For yjAnd WyWhen applying L2 regularization, the modulus of the vector, i.e., | | Wy||2And yj||2Has a value of 1. And after obtaining the distance, adding a learnable angle into the distance, wherein the learnable angle corresponding to the y category is recorded as my. At this time, the learnable angle and the hyper-parameter are added to the distance to obtain the first-class inner distance. Representing the first class inner distance corresponding to the jth sample as RjmAt this time, the process of the present invention,
Figure BDA0002460770000000114
wherein m is1To calculate the hyper-parameter used in the first intra-class distance, in the embodiment, m is1Is recorded as a first hyperparameter. It is understood that the hyper-parameter is a parameter set before the learning process is started, and is not parameter data obtained through training, and the specific value thereof can be set according to actual conditions. Is provided withThe benefit of setting the first hyperparameter is that the angle between the feature vector and the center of the target cluster can be made smaller. It will be appreciated that a smaller angle between two feature vectors indicates that they are more similar, and if the angle between two feature vectors is larger, indicates that they differ more. It should be noted that the first intra-class distance corresponding to each feature vector can be calculated in the above manner.
And 253, determining a second-class inner distance between the feature vector and the other cluster centers according to the feature vector and the other cluster centers, wherein the other cluster centers are cluster centers except the corresponding target cluster center in all the cluster centers, each feature vector corresponds to one group of other cluster centers, and each other cluster center corresponds to one second-class inner distance.
Typically, for the feature vector currently processed, the cluster centers corresponding to other categories in each cluster center except the target cluster center are denoted as other cluster centers. Assuming that the total number of classes is C, the number of other cluster centers is C-1 for the feature vector currently being processed. At this time, each feature vector corresponds to C-1 other cluster centers, and the other cluster centers corresponding to the feature vectors in the same category are the same.
Typically, the feature vector currently being processed is denoted as the current feature vector. Specifically, the distance between the current feature vector and the center of the corresponding other cluster is calculated and recorded as the distance in the second class, and the distance can also be understood as the cosine of the angle between the current feature vector and the center of the other cluster. At this point, there is a second intra-class distance between the current feature vector and the center of each of the other clusters. It will be appreciated that the second class of inner distances may utilize the vector angle formula and the manner of L2 regularization of feature vectors and other cluster centers. For example, the current feature vector is the feature vector corresponding to the jth sample and is denoted as yj,yjBelongs to the y category, and the ith other cluster center is marked as Wi,i≠y,yjAnd WiThe angle between is recorded as
Figure BDA0002460770000000121
Typically, the j sampleY of this correspondencejThe second intra-cluster distance from the ith other cluster center is denoted as RjiAt this time, the process of the present invention,
Figure BDA0002460770000000122
as can be seen from this formula, the,
Figure BDA0002460770000000123
can be given by yjAnd WiThus obtaining the product. For yjAnd WiWhen applying L2 regularization, the modulus of the vector, i.e., | | Wi||2And yj||2Has a value of 1. At this time, for the current feature vector, each of the other cluster centers corresponds to one of the second intra-class distances. According to the method, the second intra-class distance corresponding to each feature vector can be calculated.
And step 254, determining a penalty term according to the learnable angles of all the categories.
The penalty term can also be understood as a regularization penalty term, and the purpose of adding the penalty term in the embodiment is to make the value of the learnable angle larger. The calculation formula of the penalty term is
Figure BDA0002460770000000124
Where C is the total number of classes, LmAs a penalty term, myThe learnable angle corresponding to the y-th category.
It is to be understood that the execution sequence among step 252, step 253, and step 254 is not limited, for example, step 252, step 253, and step 254 may be executed simultaneously, or step 254 may be executed after step 252.
And 255, determining an intra-class loss function according to the first intra-class distance, the second intra-class distance and the penalty term.
Adopting a softmax loss function as an intra-class loss function, wherein the calculation formula of the intra-class loss function is as follows:
Figure BDA0002460770000000125
wherein S isintraRepresenting loss-in-class functions, MThe number of samples selected for each iteration process in the neural network model training process, s is a scale factor, omega1Is a first balance factor, RjmAs the feature vector y of the jth samplejThe corresponding in-class distance is the first,
Figure BDA0002460770000000131
Figure BDA0002460770000000132
is yjAnd yjThe angle between the centers of the corresponding target clusters,
Figure BDA0002460770000000133
Wydenotes yjCorresponding target cluster center, yjBelongs to the y-th class, m1Is the first hyperparameter, myLearnable angles, R, corresponding to the y-th categoryjiIs yjA second intra-class distance from the center of the ith other cluster,
Figure BDA0002460770000000134
Wifor the ith other cluster center, C is the total number of classes, LmIn order to be a penalty term,
Figure BDA0002460770000000135
at this time, s, M, ω1The specific value of s can be set according to actual conditions, for example, s is set to be 32, 64 or 128 or the like in combination with the actual conditions. In the formula
Figure BDA0002460770000000136
The posterior probability that the jth sample belongs to the category y can be understood, the distance between the feature vector and the center of each cluster is considered, and the distance between the feature vectors of the same category is effectively guaranteed to be compact.
And step 260, determining an inter-class loss function according to the cluster centers.
Because the cluster centers of different classes are distributed unevenly, the distances between some cluster centers are closer, and the distances between some cluster centers are farther, in the embodiment, the distances between the cluster centers closer to each other are farther by punishing the angular distances. At this time, the inter-class loss function can be constructed by the distance between the centers of different clusters. Accordingly, the setting step 260 in the embodiment specifically includes steps 261 to 263:
and 261, calculating a first inter-class distance between different classes according to the cluster centers.
Specifically, the distance between any two cluster centers is calculated and recorded as a first inter-cluster distance, and the distance can be represented by the cosine of the angle between the two cluster centers. The first inter-class distance can also be calculated by utilizing a vector included angle formula and performing L2 regularization on the centers of the two clusters. For example, the cluster center corresponding to the z-th category is denoted as WzZ is less than or equal to C, and the cluster center corresponding to the qth category is marked as WqQ is not more than C, and q is not equal to z. At this time, the first inter-class distance between the cluster center corresponding to the z-th class and the cluster center corresponding to the q-th class is
Figure BDA0002460770000000137
Wherein the content of the first and second substances,
Figure BDA0002460770000000138
is the first inter-class distance. At this time, there are C-1 first inter-class distances for the cluster center corresponding to each sample. Wherein, for WzAnd WqWhen applying L2 regularization, the modulus of the vector, i.e., | | Wz||2And Wq||2Has a value of 1.
And 262, selecting a second inter-class distance larger than the distance threshold from all the first inter-class distances corresponding to each class, and calculating the sum of the second inter-class distances of each class.
Specifically, for the purpose of penalizing the angular distance, for the current cluster center, a cluster center farther away from the current cluster center is searched in the first inter-class distance corresponding to the current cluster center, and the first inter-class distance corresponding to the searched cluster center is recorded as the second inter-class distance. Optionally, the second inter-class distance is searched by setting a distance threshold, that is, the first inter-class distance greater than the distance threshold is searched in the first inter-class distance corresponding to the current cluster center and is recorded as the second inter-class distance. The distance threshold value can be set according to actual conditions. Each cluster center corresponds to at least one second inter-class distance, and then the second inter-class distances corresponding to each cluster center are added to obtain a second inter-class distance sum value, wherein each class corresponds to one second inter-class distance sum value.
For example, the current cluster center is the cluster center corresponding to the z-th class, and the corresponding distance and value between the second classes are DzIs shown to be
Figure BDA0002460770000000141
Wherein m is2The specific value of the second hyperparameter can be set according to actual conditions by setting m2The distance threshold can be derived, i.e. only keeping more than cos (m)2) And summing to obtain corresponding second inter-class distance sum values. It will be appreciated that each cluster center corresponds to a second inter-class distance and value.
And 263, determining an inter-class loss function according to the second inter-class distance and the second inter-class distance corresponding to each class.
Specifically, the second inter-class distance and the second inter-class distance are averaged to construct an inter-class loss function, and at this time, a calculation formula of the inter-class loss function is as follows:
Figure BDA0002460770000000142
wherein S isinterRepresenting the loss function between classes, C being the total number of classes, DzFor the second inter-class distance and value corresponding to the z-th class,
Figure BDA0002460770000000143
Figure BDA0002460770000000144
is the first inter-class distance, m, between the z-th class and the q-th class2Is the second hyperparameter.
Step 270, determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function.
Wherein, the calculation formula of the loss function of the neural network model is as follows: sloss=ω2Sinter+SintraWherein S islossRepresents the loss function, SintraRepresenting the loss-in-class function, SinterRepresenting the inter-class loss function, ω2The second balance factor can also be understood as a weight of the inter-class loss function, and a specific numerical value of the second balance factor can be set according to actual conditions.
Step 280, updating the neural network model including the loss function by using a gradient descent algorithm until the loss function meets a stable condition.
Specifically, the gradient descent algorithm (gradient device) is an optimization algorithm, which can be used to recursively approximate the minimum deviation model in machine learning and artificial intelligence. And after the loss function is constructed, iterative training is carried out on the neural network model by using a gradient descent algorithm so as to realize training of the neural network model. Wherein, the samples in the training data set are used as training data to train the neural network model. In the training process, parameters (such as weight parameters) in the neural network model can be updated until the loss function meets the stable condition. The stability condition can be set according to actual conditions, and when the loss function meets the stability condition, the output result of the neural network model is stable, and the result deviation is small. It can be understood that, in the training process, a loss value can be calculated according to the loss function in each iteration process, and parameters in the neural network model can be updated reversely according to the loss value, that is, the parameters in the neural network model can be updated in each iteration process.
Step 290, initializing a sampling probability for each sample in the training data set, wherein the initialized sampling probabilities for each sample are the same.
Specifically, after the neural network model is stable, the neural network model is trained in a difficult excavation mode, so that the accuracy of the neural network model is further improved. When difficult mining is carried out, a sampling probability is initialized for each sample in the training data set, and the sampling probability of each sample is the same. The sampling probability represents the probability of sampling the corresponding sample in the training process of the neural network model. The specific value of the initialized sampling probability may be set according to actual conditions, and the embodiment does not limit this.
Step 2100, iteratively training the neural network model including the loss function based on the training data set.
Step 2110, in an iterative training process, determining correctly classified samples and incorrectly classified samples in the training data set.
Specifically, taking a training process as an example, at this time, a classification result of the neural network model for the samples sampled in the training process is obtained. Then, the classification result is compared with the classification described in the label of the corresponding sample, and if the classification result is the same as the classification described in the label, it is determined that the corresponding sample is correctly classified by the neural network model, and if the classification result is different from the classification described in the label, it is determined that the corresponding sample is incorrectly classified by the neural network model. It will be appreciated that during each training process, both correctly classified samples and misclassified samples are available.
Step 2120, decreasing the sampling probability of the correctly classified sample by a first value and increasing the sampling probability of the incorrectly classified sample by a second value.
Specifically, in the one-time training process, when the sample is correctly classified, it is indicated that the neural network model can accurately identify the sample, and thus, the sampling probability of the sample can be reduced. In an embodiment, the sampling probability is reduced by reducing a first value to the sampling probability, where the first value may be set according to an actual situation. Accordingly, in one training process, when the sample is classified incorrectly, it indicates that the neural network model cannot accurately identify the sample, and thus, the sampling probability of the sample can be increased. In the embodiment, the sampling probability is increased by adding a second value to the sampling probability, where the second value may be set according to an actual situation, and may be the same as or different from the first value. In the iterative training process, after each training is finished, the sampling probability of the correctly classified samples is reduced by a first value, and the sampling probability of the incorrectly classified samples is increased by a second value. With the iterative training, the probability that the correctly classified samples are sampled by the neural network model is smaller, and the probability that the incorrectly classified samples are sampled by the neural network model is larger.
And 2130, deleting the samples with the sampling probability lower than the probability threshold in the training data set, and continuing the iterative training until the loss function is converged.
In one embodiment, the value of the probability threshold may be set according to actual conditions, and when the sampling probability is lower than the probability threshold, it indicates that the probability that the sample corresponding to the sampling probability is accurately identified by the neural network model is high. Therefore, the sample in the training data set can be deleted, and the sample which cannot be accurately identified by the neural network model is retained, that is, the sample with high sampling probability is retained, and at this time, the training data set can be considered to be difficultly mined. Optionally, each time training is completed, the sample may be screened once by a probability threshold. In the subsequent iterative training process of the neural network model, training can be performed only on samples with high sampling probability, namely, training is performed on samples which are easy to be identified by mistake gradually in the training process, and parameters of the neural network model are adjusted in the training process until the loss function is converged.
The method comprises the steps of obtaining a training data set, determining a feature vector of each sample in the training data set through a neural network model, then obtaining a weight parameter matrix of a full connection layer in the neural network model, determining a cluster center corresponding to each category according to the weight parameter matrix, then constructing an intra-class loss function according to the cluster center and the feature vector, constructing an inter-class loss function according to the cluster center, constructing a loss function of the neural network model according to the intra-class loss function and the inter-class loss function, and then training the neural network model until the loss function converges. When the loss function is constructed, not only the characteristic vectors of all samples are considered to construct the intra-class loss function, but also the cluster centers of different classes are considered to construct the inter-class loss function. And a learnable angle is added when an intra-class loss function is constructed so as to limit the distance between the feature vectors in the same class, and meanwhile, a penalty term is added so as to increase the learnable angle, so that the feature vectors in the same class are more compact. When constructing the inter-class loss function, the penalty distance is realized by setting a distance threshold value so as to enable the distance between the feature vectors under different classes to be longer. Meanwhile, when the neural network model is trained, the sampling probability of each sample is changed through the recognition result, difficult samples in the iterative training process of the neural network model can be mined, and then the neural network model is trained based on the difficult samples, so that the processing accuracy of the neural network model is further improved.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a feature model training device according to a third embodiment of the present invention. Referring to fig. 3, the feature model training apparatus provided in this embodiment includes: a data acquisition module 301, a feature determination module 302, a first construction module 303, a second construction module 304, and a model training module 305.
The data acquisition module 301 is configured to acquire a training data set, where each sample in the training data set corresponds to a label, and the label is used to identify a category to which the corresponding sample belongs; a feature determination module 302, configured to input the training data set to a neural network model to obtain a feature vector of each sample; a first constructing module 303, configured to construct an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs; a second building module 304, configured to determine a loss function of the neural network model according to the intra-class loss function and the inter-class loss function; a model training module 305 for training the neural network model including the loss function until the loss function converges.
The technical problem that the convolutional neural network cannot expand the distance of the feature vectors between samples of different categories to effectively distinguish the samples of different categories in the prior art can be solved by the technical scheme that the training data set with the category labels is obtained, then the feature vectors of the samples in the training data set are obtained according to the neural network model, the intra-class loss function and the inter-class loss function are constructed according to the feature vectors and the category to which the samples belong, then the loss function of the neural network model is constructed according to the intra-class loss function and the inter-class loss function, the neural network model containing the loss function is trained until the loss function converges. When the loss function is constructed, the intra-class loss function is constructed by considering all the characteristic vectors, and the inter-class loss function is constructed by considering different classes, so that when the characteristic vectors are extracted by the neural network model, the characteristic vectors of the samples of the same class are closer to each other, the characteristic vectors of the samples of different classes are farther from each other, and a better metric learning result is realized.
On the basis of the above embodiment, the first building block 303 includes: the parameter acquisition unit is used for acquiring a weight parameter matrix of a full connection layer in the neural network model, wherein the weight parameter of the pth column in the weight parameter matrix represents a weight parameter vector corresponding to the pth category; the cluster center determining unit is used for determining the cluster center corresponding to each category according to the weight parameter matrix; an intra-class loss function determining unit, configured to determine an intra-class loss function according to each cluster center and the feature vector under each category; and the inter-class loss function determining unit is used for determining an inter-class loss function according to each cluster center.
On the basis of the above embodiment, the intra-class loss function determining unit includes: the learnable angle determining subunit is used for determining a learnable angle according to the category to which the feature vector belongs; a first distance determining subunit, configured to determine, according to the feature vector, a target cluster center and the learnable angle, a first intra-class distance between the feature vector and the target cluster center, where the target cluster center is a cluster center corresponding to a class to which the feature vector belongs, and each feature vector corresponds to one first intra-class distance; a second distance determining subunit, configured to determine, according to the feature vector and other cluster centers, a second intra-class distance between the feature vector and the other cluster centers, where the other cluster centers are cluster centers excluding a corresponding target cluster center from all cluster centers, each feature vector corresponds to a group of other cluster centers, and each other cluster center corresponds to a second intra-class distance; the penalty item determining subunit is used for determining penalty items according to the learnable angles of all the categories; and the intra-class loss function constructing subunit is used for determining the intra-class loss function according to the first intra-class distance, the second intra-class distance and the penalty term.
On the basis of the above embodiment, the calculation formula of the intra-class loss function is as follows:
Figure BDA0002460770000000181
wherein S isintraRepresenting an intra-class loss function, M is the number of samples selected in each iteration process in the training process of the neural network model, s is a scale factor, and omega1Is a first balance factor, RjmAs the feature vector y of the jth samplejThe corresponding in-class distance is the first,
Figure BDA0002460770000000182
Figure BDA0002460770000000183
is yjAnd yjThe angle between the centers of the corresponding target clusters,
Figure BDA0002460770000000184
Wydenotes yjCorresponding target cluster center, yjBelongs to the y-th class, m1Is the first hyperparameter, myLearnable angles, R, corresponding to the y-th categoryjiIs yjA second intra-class distance from the center of the ith other cluster,
Figure BDA0002460770000000185
Wifor the ith other cluster center, C is the total number of classes, LmIn order to be a penalty term,
Figure BDA0002460770000000191
on the basis of the above embodiment, the inter-class loss function determining unit includes: a third distance determining subunit, configured to calculate a first inter-class distance between different classes according to each cluster center; a fourth distance determining subunit, configured to select a second inter-class distance greater than the distance threshold from all the first inter-class distances corresponding to each class, and calculate a sum of the second inter-class distances for each class; and the inter-class loss function constructing subunit is used for determining the inter-class loss function according to the second inter-class distance and the value corresponding to each class.
On the basis of the above embodiment, the calculation formula of the inter-class loss function is as follows:
Figure BDA0002460770000000192
wherein S isinterRepresenting the loss function between classes, C being the total number of classes, DzFor the second inter-class distance and value corresponding to the z-th class,
Figure BDA0002460770000000193
Figure BDA0002460770000000194
is the first inter-class distance, m, between the z-th class and the q-th class2Is the second hyperparameter.
On the basis of the above embodiment, the calculation formula of the loss function of the neural network model is as follows: sloss=ω2Sinter+SintraWherein S islossRepresents the loss function, SintraRepresenting the loss-in-class function, SinterRepresenting the inter-class loss function, ω2Is the second balance factor.
On the basis of the above embodiment, the model training module 305 includes: a model updating unit for updating the neural network model including the loss function by using a gradient descent algorithm until the loss function satisfies a stable condition; a probability initialization unit, configured to initialize a sampling probability for each sample in the training data set, where the initialized sampling probabilities of each sample are the same; an iterative training unit for iteratively training the neural network model including the loss function based on the training data set; the sample identification unit is used for determining correctly classified samples and incorrectly classified samples in the training data set in the iterative training process; a probability updating unit for decreasing the sampling probability of the correctly classified sample by a first value and increasing the sampling probability of the incorrectly classified sample by a second value; and the sample deleting unit is used for deleting the samples with the sampling probability lower than the probability threshold in the training data set and continuing the iterative training until the loss function is converged.
The feature model training device provided by this embodiment is included in the feature model training device, and can be used to execute the feature model training method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
Example four
Fig. 4 is a schematic structural diagram of a feature model training device according to a fourth embodiment of the present invention. Specifically, as shown in fig. 4, the feature model training apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of the processors 40 in the feature model training device may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the memory 41, the input device 42 and the output device 43 in the feature model training apparatus may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example.
The memory 41, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules in the feature model training method in the embodiment of the present invention (for example, the data acquisition module 301, the feature determination module 302, the first construction module 303, the second construction module 304, and the model training module 305 in the feature model training apparatus). The processor 40 executes various functional applications and data processing of the feature model training apparatus by executing software programs, instructions and modules stored in the memory 41, namely, implements the feature model training method provided by any of the above-described embodiments.
The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the feature model training apparatus, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the feature model training device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 42 may be used to receive input numeric or character information and generate key signal inputs relating to user settings and function controls of the feature model training apparatus. The output device 43 may include a display screen, a speaker, etc. The feature model training device may further comprise communication means (not shown) operable to communicate data with other devices.
The feature model training device includes the feature model training apparatus provided in the third embodiment, and may be used to execute the feature model training method provided in any embodiment of the present invention, and has corresponding functions and beneficial effects.
EXAMPLE five
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for feature model training, the method comprising:
acquiring a training data set, wherein each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs;
inputting the training data set into a neural network model to obtain a feature vector of each sample;
constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs;
determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function;
training the neural network model including the loss function until the loss function converges.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the feature model training method provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, and the computer software product may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute the feature model training method according to the embodiments of the present invention.
It should be noted that, in the embodiment of the feature model training apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (11)

1. A feature model training method is characterized by comprising the following steps:
acquiring a training data set, wherein each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs;
inputting the training data set into a neural network model to obtain a feature vector of each sample;
constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs;
determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function;
training the neural network model including the loss function until the loss function converges.
2. The feature model training method according to claim 1, wherein the constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs comprises:
acquiring a weight parameter matrix of a full connection layer in the neural network model, wherein the weight parameter of the pth column in the weight parameter matrix represents a weight parameter vector corresponding to the pth category;
determining cluster centers corresponding to all categories according to the weight parameter matrix;
determining an intra-class loss function according to the cluster centers and the feature vectors under the classes;
and determining an inter-class loss function according to each cluster center.
3. The method of claim 2, wherein determining an intra-class loss function according to the cluster centers and the feature vectors under the classes comprises:
determining a learnable angle according to the category to which the feature vector belongs;
determining a first class internal distance between the feature vector and the target cluster center according to the feature vector, the target cluster center and the learnable angle, wherein the target cluster center is a cluster center corresponding to the class to which the feature vector belongs, and each feature vector corresponds to one first class internal distance;
determining a second-class inner distance between the feature vector and the other cluster centers according to the feature vector and the other cluster centers, wherein the other cluster centers are cluster centers except the corresponding target cluster center in all the cluster centers, each feature vector corresponds to a group of other cluster centers, and each other cluster center corresponds to a second-class inner distance;
determining a penalty item according to the learnable angles of all categories;
and determining an intra-class loss function according to the first intra-class distance, the second intra-class distance and the penalty term.
4. The feature model training method according to claim 3, wherein the calculation formula of the intra-class loss function is:
Figure FDA0002460769990000021
wherein S isintraRepresenting an intra-class loss function, M is the number of samples selected in each iteration process in the training process of the neural network model, s is a scale factor, and omega1Is a first balance factor, RjmFor the j sampleFeature vector yjThe corresponding in-class distance is the first,
Figure FDA0002460769990000022
Figure FDA0002460769990000023
is yjAnd yjThe angle between the centers of the corresponding target clusters,
Figure FDA0002460769990000024
Wydenotes yjCorresponding target cluster center, yjBelongs to the y-th class, m1Is the first hyperparameter, myLearnable angles, R, corresponding to the y-th categoryjiIs yjSecond-class inner distance, R, from the center of the ith other clusterji=Wi Tyj,WiFor the ith other cluster center, C is the total number of classes, LmIn order to be a penalty term,
Figure FDA0002460769990000025
5. the feature model training method of claim 2, wherein the determining an inter-class loss function from each of the cluster centers comprises:
calculating first inter-class distances among different classes according to the cluster centers;
selecting a second inter-class distance larger than a distance threshold value from all first inter-class distances corresponding to each class, and calculating the second inter-class distance sum value of each class;
and determining an inter-class loss function according to the second inter-class distance and the value corresponding to each class.
6. The feature model training method according to claim 5, wherein the calculation formula of the inter-class loss function is:
Figure FDA0002460769990000026
wherein S isinterRepresenting the loss function between classes, C being the total number of classes, DzFor the second inter-class distance and value corresponding to the z-th class,
Figure FDA0002460769990000027
Figure FDA0002460769990000028
is the first inter-class distance, m, between the z-th class and the q-th class2Is the second hyperparameter.
7. The feature model training method according to claim 1, wherein the calculation formula of the loss function of the neural network model is: sloss=ω2Sinter+SintraWherein S islossRepresents the loss function, SintraRepresenting the loss-in-class function, SinterRepresenting the inter-class loss function, ω2Is the second balance factor.
8. The feature model training method of claim 1, wherein the training the neural network model including the loss function until the loss function converges comprises:
updating the neural network model containing the loss function with a gradient descent algorithm until the loss function satisfies a stability condition;
initializing a sampling probability for each sample in the training data set, the initialized sampling probabilities for each sample being the same;
iteratively training the neural network model including the loss function based on the training data set;
in an iterative training process, determining correctly classified samples and incorrectly classified samples in the training data set;
decreasing the sampling probability of the correctly classified sample by a first value and increasing the sampling probability of the incorrectly classified sample by a second value;
and deleting the samples with the sampling probability lower than the probability threshold in the training data set, and continuing the iterative training until the loss function is converged.
9. A feature model training device, comprising:
the data acquisition module is used for acquiring a training data set, each sample in the training data set corresponds to a label, and the label is used for identifying the category to which the corresponding sample belongs;
the characteristic determining module is used for inputting the training data set into a neural network model to obtain a characteristic vector of each sample;
the first construction module is used for constructing an intra-class loss function and an inter-class loss function according to the feature vector corresponding to each sample in the training data set and the class to which the sample belongs;
a second construction module for determining a loss function of the neural network model according to the intra-class loss function and the inter-class loss function;
a model training module for training the neural network model including the loss function until the loss function converges.
10. A feature model training apparatus, characterized by comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the feature model training method of any one of claims 1-8.
11. A storage medium containing computer-executable instructions for performing the feature model training method of any one of claims 1-8 when executed by a computer processor.
CN202010319373.0A 2020-04-21 2020-04-21 Feature model training method, device, equipment and storage medium Pending CN111553399A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010319373.0A CN111553399A (en) 2020-04-21 2020-04-21 Feature model training method, device, equipment and storage medium
CN202110426522.8A CN112949780B (en) 2020-04-21 2021-04-20 Feature model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010319373.0A CN111553399A (en) 2020-04-21 2020-04-21 Feature model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111553399A true CN111553399A (en) 2020-08-18

Family

ID=72005834

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010319373.0A Pending CN111553399A (en) 2020-04-21 2020-04-21 Feature model training method, device, equipment and storage medium
CN202110426522.8A Active CN112949780B (en) 2020-04-21 2021-04-20 Feature model training method, device, equipment and storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110426522.8A Active CN112949780B (en) 2020-04-21 2021-04-20 Feature model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (2) CN111553399A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931059A (en) * 2020-08-19 2020-11-13 创新奇智(成都)科技有限公司 Object determination method and device and storage medium
CN112015659A (en) * 2020-09-02 2020-12-01 三维通信股份有限公司 Prediction method and device based on network model
CN112132092A (en) * 2020-09-30 2020-12-25 四川弘和通讯有限公司 Fire extinguisher and fire blanket identification method based on convolutional neural network
CN112149717A (en) * 2020-09-03 2020-12-29 清华大学 Confidence weighting-based graph neural network training method and device
CN112435230A (en) * 2020-11-20 2021-03-02 哈尔滨市科佳通用机电股份有限公司 Deep learning-based data set generation method and system
CN112633407A (en) * 2020-12-31 2021-04-09 深圳云天励飞技术股份有限公司 Method and device for training classification model, electronic equipment and storage medium
CN112686295A (en) * 2020-12-28 2021-04-20 南京工程学院 Personalized hearing loss modeling method
CN112766379A (en) * 2021-01-21 2021-05-07 中国科学技术大学 Data equalization method based on deep learning multi-weight loss function
CN113011532A (en) * 2021-04-30 2021-06-22 平安科技(深圳)有限公司 Classification model training method and device, computing equipment and storage medium
WO2021164388A1 (en) * 2020-09-25 2021-08-26 平安科技(深圳)有限公司 Triage fusion model training method, triage method, apparatus, device, and medium
CN113762005A (en) * 2020-11-09 2021-12-07 北京沃东天骏信息技术有限公司 Method, device, equipment and medium for training feature selection model and classifying objects
CN113780378A (en) * 2021-08-26 2021-12-10 北京科技大学 Disease high risk group prediction device
CN113947701A (en) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 Training method, object recognition method, device, electronic device and storage medium
CN114021630A (en) * 2021-10-28 2022-02-08 同济大学 Ordinal regression problem solving method for category-unbalanced data set
WO2022178775A1 (en) * 2021-02-25 2022-09-01 东莞理工学院 Deep ensemble model training method based on feature diversity learning
WO2022227217A1 (en) * 2021-04-28 2022-11-03 平安科技(深圳)有限公司 Text classification model training method and apparatus, and device and readable storage medium
CN115328871A (en) * 2022-10-12 2022-11-11 南通中泓网络科技有限公司 Evaluation method for format data stream file conversion based on machine learning model
CN116310648A (en) * 2023-03-23 2023-06-23 北京的卢铭视科技有限公司 Model training method, face recognition method, electronic device and storage medium
WO2023229345A1 (en) * 2022-05-25 2023-11-30 Samsung Electronics Co., Ltd. System and method for detecting unhandled applications in contrastive siamese network training

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118370A (en) * 2021-11-19 2022-03-01 北京的卢深视科技有限公司 Model training method, electronic device, and computer-readable storage medium
CN114611694B (en) * 2022-03-16 2022-09-23 上海交通大学 Loss function method and system for improving robustness of image classification network model
CN114549938B (en) * 2022-04-25 2022-09-09 广州市玄武无线科技股份有限公司 Model training method, image information management method, image recognition method and device
CN114792398B (en) * 2022-06-23 2022-09-27 阿里巴巴(中国)有限公司 Image classification method, storage medium, processor and system
CN115035463B (en) * 2022-08-09 2023-01-17 阿里巴巴(中国)有限公司 Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium
CN115809702B (en) * 2022-11-11 2023-07-11 中南大学 ACGAN model construction method, image generation method and garment design method
CN117807434A (en) * 2023-12-06 2024-04-02 中国信息通信研究院 Communication data set processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103281A (en) * 2017-03-10 2017-08-29 中山大学 Face identification method based on aggregation Damage degree metric learning
CN107977609B (en) * 2017-11-20 2021-07-20 华南理工大学 Finger vein identity authentication method based on CNN
CN108009528B (en) * 2017-12-26 2020-04-07 广州广电运通金融电子股份有限公司 Triple Loss-based face authentication method and device, computer equipment and storage medium
CN108681708A (en) * 2018-05-16 2018-10-19 福州精益精科技有限责任公司 A kind of vena metacarpea image-recognizing method, device and storage medium based on Inception neural network models

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931059A (en) * 2020-08-19 2020-11-13 创新奇智(成都)科技有限公司 Object determination method and device and storage medium
CN112015659A (en) * 2020-09-02 2020-12-01 三维通信股份有限公司 Prediction method and device based on network model
CN112149717A (en) * 2020-09-03 2020-12-29 清华大学 Confidence weighting-based graph neural network training method and device
CN112149717B (en) * 2020-09-03 2022-12-02 清华大学 Confidence weighting-based graph neural network training method and device
WO2021164388A1 (en) * 2020-09-25 2021-08-26 平安科技(深圳)有限公司 Triage fusion model training method, triage method, apparatus, device, and medium
CN112132092A (en) * 2020-09-30 2020-12-25 四川弘和通讯有限公司 Fire extinguisher and fire blanket identification method based on convolutional neural network
CN113762005A (en) * 2020-11-09 2021-12-07 北京沃东天骏信息技术有限公司 Method, device, equipment and medium for training feature selection model and classifying objects
CN112435230B (en) * 2020-11-20 2021-07-16 哈尔滨市科佳通用机电股份有限公司 Deep learning-based data set generation method and system
CN112435230A (en) * 2020-11-20 2021-03-02 哈尔滨市科佳通用机电股份有限公司 Deep learning-based data set generation method and system
CN112686295A (en) * 2020-12-28 2021-04-20 南京工程学院 Personalized hearing loss modeling method
CN112686295B (en) * 2020-12-28 2021-08-24 南京工程学院 Personalized hearing loss modeling method
CN112633407B (en) * 2020-12-31 2023-10-13 深圳云天励飞技术股份有限公司 Classification model training method and device, electronic equipment and storage medium
CN112633407A (en) * 2020-12-31 2021-04-09 深圳云天励飞技术股份有限公司 Method and device for training classification model, electronic equipment and storage medium
CN112766379B (en) * 2021-01-21 2023-06-20 中国科学技术大学 Data equalization method based on deep learning multiple weight loss functions
CN112766379A (en) * 2021-01-21 2021-05-07 中国科学技术大学 Data equalization method based on deep learning multi-weight loss function
WO2022178775A1 (en) * 2021-02-25 2022-09-01 东莞理工学院 Deep ensemble model training method based on feature diversity learning
WO2022227217A1 (en) * 2021-04-28 2022-11-03 平安科技(深圳)有限公司 Text classification model training method and apparatus, and device and readable storage medium
CN113011532A (en) * 2021-04-30 2021-06-22 平安科技(深圳)有限公司 Classification model training method and device, computing equipment and storage medium
CN113780378A (en) * 2021-08-26 2021-12-10 北京科技大学 Disease high risk group prediction device
CN113780378B (en) * 2021-08-26 2023-11-28 北京科技大学 Disease high risk crowd prediction device
CN113947701A (en) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 Training method, object recognition method, device, electronic device and storage medium
CN113947701B (en) * 2021-10-18 2024-02-23 北京百度网讯科技有限公司 Training method, object recognition method, device, electronic equipment and storage medium
CN114021630A (en) * 2021-10-28 2022-02-08 同济大学 Ordinal regression problem solving method for category-unbalanced data set
WO2023229345A1 (en) * 2022-05-25 2023-11-30 Samsung Electronics Co., Ltd. System and method for detecting unhandled applications in contrastive siamese network training
CN115328871A (en) * 2022-10-12 2022-11-11 南通中泓网络科技有限公司 Evaluation method for format data stream file conversion based on machine learning model
CN116310648A (en) * 2023-03-23 2023-06-23 北京的卢铭视科技有限公司 Model training method, face recognition method, electronic device and storage medium
CN116310648B (en) * 2023-03-23 2023-12-12 北京的卢铭视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Also Published As

Publication number Publication date
CN112949780A (en) 2021-06-11
CN112949780B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN111553399A (en) Feature model training method, device, equipment and storage medium
CN111191732B (en) Target detection method based on full-automatic learning
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
US11537884B2 (en) Machine learning model training method and device, and expression image classification method and device
CN110738247B (en) Fine-grained image classification method based on selective sparse sampling
CN108681746B (en) Image identification method and device, electronic equipment and computer readable medium
CN110363049B (en) Method and device for detecting, identifying and determining categories of graphic elements
CN111523621A (en) Image recognition method and device, computer equipment and storage medium
CN111667050B (en) Metric learning method, device, equipment and storage medium
WO2017003666A1 (en) Method and apparatus for large scale machine learning
CN112149705A (en) Method and system for training classification model, computer equipment and storage medium
JP6897749B2 (en) Learning methods, learning systems, and learning programs
CN110197207B (en) Method and related device for classifying unclassified user group
US10832036B2 (en) Meta-learning for facial recognition
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
KR20220116111A (en) Method for determining a confidence level of inference data produced by artificial neural network
CN111783088B (en) Malicious code family clustering method and device and computer equipment
CN111352926A (en) Data processing method, device, equipment and readable storage medium
Altun et al. SKETRACK: Stroke‐Based Recognition of Online Hand‐Drawn Sketches of Arrow‐Connected Diagrams and Digital Logic Circuit Diagrams
Lim et al. More powerful selective kernel tests for feature selection
CN115482436B (en) Training method and device for image screening model and image screening method
Khurana et al. Soft computing techniques for change detection in remotely sensed images: A review
CN110059180B (en) Article author identity recognition and evaluation model training method and device and storage medium
CN114255381A (en) Training method of image recognition model, image recognition method, device and medium
US20210357695A1 (en) Device and method for supporting generation of learning dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200818