WO2019127924A1

WO2019127924A1 - Sample weight allocation method, model training method, electronic device, and storage medium

Info

Publication number: WO2019127924A1
Application number: PCT/CN2018/079371
Authority: WO
Inventors: 严蕤; 牟永强
Original assignee: 深圳云天励飞技术有限公司
Priority date: 2017-12-29
Filing date: 2018-03-16
Publication date: 2019-07-04
Also published as: CN108229555A; CN108229555B

Abstract

A sample weight allocation method, a model training method, an electronic device, and a storage medium. The sample weight allocation method comprises: calculating the distance of each positive sample pair in a positive sample set and the distance of each negative sample pair in a negative sample set (S11); determining distance distribution of the positive sample set according to the distance of each positive sample pair in the positive sample set, the distance distribution of the positive sample set representing a relation between the frequency of occurrence of positive sample pairs and distances (S12); determining distance distribution of the negative sample set according to the distance of each negative sample pair in the negative sample set, the distance distribution of the negative sample set representing a relation between the frequency of occurrence of negative sample pairs and distances (S13); and determining weight distribution of training samples on the basis of the distance distribution of the positive sample set and the distance distribution of the negative sample set (S14). The sample weight allocation method can increase the weight of misclassified sample pairs, and increase, in a model training process, the contribution of the misclassified samples to target loss, so that model parameters can be well corrected, and the expression ability of the model parameters is improved.

Description

Sample weight distribution method, model training method, electronic device and storage medium

This application claims the priority of the Chinese patent application filed on December 29, 2017, the Chinese Patent Office, the application number is 201711480906.8, and the invention name is "sample weight distribution method, model training method, electronic equipment and storage medium". This is incorporated herein by reference.

Technical field

The present invention relates to the field of artificial intelligence, and in particular, to a sample weight distribution method, a model training method, an electronic device, and a storage medium.

Background technique

In the field of machine learning, loss functions are classified into two categories in the training of models (such as feature extraction models, face feature expression models, etc.). The first category is classification-based metrics. Since the features are not directly measured, the performance is limited. The other is an end-to-end approach to feature metrics directly. Such methods are better able to converge because they need to select a suitable sample network. The existing methods mainly obtain samples with appropriate difficulty levels by the following two methods: First, after the model is trained to a certain stage, according to the characteristic expression of the model, selecting some samples with moderate difficulty, such a method is troublesome to operate, and With the training of the model, the difficulty level of the selected samples changes, and the original offline selected samples are not representative and cannot fully express the characteristics of the subsequently added samples. Second, in the process of model training, select the moderately difficult samples according to the model of each training. Although the training samples selected by this method are representative, they can effectively improve the expression ability of the model, but the required computing resources are Large, difficult to achieve in actual model training.

Summary of the invention

In view of the above, it is necessary to provide a sample weight distribution method, a model training method, an electronic device and a storage medium, which can increase the weight of the sample pairs of the classification error, and increase the sample of the classification error to the target during the model training process. The contribution of loss, so that the model parameters can be better corrected and the expression ability of the model parameters can be improved.

A sample weight distribution method, the method comprising:

Obtaining a training sample, the training sample comprising a positive sample set and a negative sample set, the positive sample set comprising a positive sample pair and the negative sample set comprising a negative sample pair;

Calculating a distance of each positive sample pair in the positive sample set, and a distance of each negative sample pair in the negative sample set;

Determining a distance distribution of the positive sample set according to a distance of each positive sample pair in the positive sample set, the distance distribution of the positive sample set indicating a relationship between a frequency of occurrence of a positive sample pair and a distance;

Determining a distance distribution of the negative sample set according to a distance of each negative sample pair in the negative sample set, the distance distribution of the negative sample set indicating a relationship between a frequency of occurrence of a negative sample pair and a distance;

And determining a weight distribution of the training sample based on a distance distribution of the positive sample set and a distance distribution of the negative sample set.

According to a preferred embodiment of the present invention, determining the weight distribution of the training sample based on the distance distribution of the positive sample set and the distance distribution of the negative sample set includes:

Determining a first sample set of the classification error based on the distance distribution of the positive sample set and the distance distribution of the negative sample set;

In the weight distribution of the training samples, increasing the weight of each sample pair in the first sample set; and/or

Determining a correctly classified second sample set based on the distance distribution of the positive sample set and the distance distribution of the negative sample set;

In the weight distribution of the training samples, the weight of each sample pair in the second sample set is reduced.

According to a preferred embodiment of the present invention, the weight distribution of the training samples is a normal distribution, and when the maximum distance of the positive sample pairs in the positive sample set is less than or equal to the minimum distance of the negative sample pairs in the negative sample set, When the weight distribution of the training samples is described, the method further includes:

The mean of the maximum distance and the minimum distance is determined as the mean of the weight distribution of the training samples.

According to a preferred embodiment of the present invention, the weight distribution of the training samples is a normal distribution, and the training is determined when the maximum distance of the positive sample pairs in the positive sample set is greater than the minimum distance of the negative sample pairs in the negative sample set. When the weights of the samples are distributed, the method further includes:

And using a distance value corresponding to the intersection of the distance distribution of the positive sample set and the distance distribution of the negative sample set as the mean value of the weight distribution of the training sample; or

The distance corresponding to the absolute value of the difference between the frequency at which the positive sample pair appears and the frequency at which the negative sample pair appears is taken as the mean of the weight distribution of the training sample.

According to a preferred embodiment of the present invention, when determining the mean value of the weight distribution of the training samples, the method further includes:

Configure preset step size, initial mean value, and iteration termination condition;

And performing, according to the initial mean value and the preset step size, an iterative search for an optimal distance value satisfying the iterative termination condition in an interval composed of the minimum distance and the maximum distance, where the optimal distance value is positive The absolute value of the difference between the frequency of occurrence of the sample pair and the frequency of occurrence of the negative sample pair is minimal.

According to a preferred embodiment of the present invention, the weight distribution of the training samples is a normal distribution, and when determining the weight distribution of the training samples, the method further includes:

Obtaining a standard deviation of the distance between pairs of positive samples in the positive sample set during each training session;

The standard deviation of the weight distribution of the training samples in each training process is updated according to the standard deviation of the distance between the positive sample pairs in the positive sample set.

A model training method, the method comprising:

Obtain training samples;

Model parameters are trained based on the training samples using a loss function and a preset training algorithm, wherein the loss function is associated with a weight distribution of the training samples, and the weight distribution of the training samples utilizes samples as described in any embodiment The weight distribution method is obtained.

According to a preferred embodiment of the present invention, the method further includes:

The loss function is used to increase the contribution rate of the sample with the wrong classification to the target loss.

An electronic device, comprising: a memory and a processor, the memory for storing at least one instruction, the processor for executing the at least one instruction to implement as described in any embodiment A sample weight assignment method, and/or a model training method as described in any embodiment.

A computer readable storage medium storing at least one instruction, the at least one instruction being executed by a processor to implement a sample weight assignment method as described in any embodiment, and/or any implementation The model training method described in the example.

As can be seen from the above technical solution, the present invention provides a sample weight allocation method, the method comprising: acquiring a training sample, the training sample comprising a positive sample set and a negative sample set; and calculating each positive sample in the positive sample set a distance of the pair, and a distance of each negative sample pair in the negative sample set; determining a distance distribution of the positive sample set according to a distance of each positive sample pair in the positive sample set; a distance of the negative sample pair, determining a distance distribution of the negative sample set; determining a weight distribution of the training sample based on the distance distribution of the positive sample set and the distance distribution of the negative sample set. The invention also provides a model training method, an electronic device and a storage medium. The invention can increase the weight of the sample pairs with the wrong classification. In the model training process, the contribution of the sample with the wrong classification to the target loss is increased, so that the model parameters can be better corrected and the expression ability of the model parameters can be improved.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can obtain other drawings according to the provided drawings without any creative work.

1 is a flow chart of a preferred embodiment of a sample weighting method of the present invention.

2 is a schematic diagram of a distance distribution and a weight distribution of a sample in an example of the present invention.

Figure 3 is another schematic illustration of the distance distribution of a sample in an example of the present invention.

4 is a flow chart of a preferred embodiment of the model training method of the present invention.

Figure 5 is a functional block diagram of a preferred embodiment of the sample weight distribution device of the present invention.

Figure 6 is a functional block diagram of a preferred embodiment of the model training device of the present invention.

Figure 7 is a block diagram showing a preferred embodiment of an electronic device in at least one example of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

As shown in FIG. 1, it is a flow chart of a preferred embodiment of the sample weight distribution method of the present invention. The order of the steps in the flow chart can be changed according to different requirements, and some steps can be omitted.

S10. The electronic device acquires a training sample, where the training sample includes a positive sample set and a negative sample set, the positive sample set includes a positive sample pair and the negative sample set includes a negative sample pair.

In a preferred embodiment of the present invention, the electronic device configures a training sample set. In the training process of the model parameters, a part of the samples are first taken out from the configured training sample set for training, and the part of the samples is used as the training sample. For example, the training samples correspond to samples in each mini-batch.

In a preferred embodiment of the invention, the positive sample set comprises one or more positive sample pairs, wherein one of the positive sample pairs represents a sample pair belonging to a same category. The negative sample set includes one or more negative sample pairs.

For example, the face feature expression model is trained by using the training sample, and the face feature expression model is used for feature extraction of a face that is subsequently input. Therefore, a positive sample pair includes a sample pair of a face, such as a positive sample pair. Two pictures of a human face.

S11. The electronic device calculates a distance of each positive sample pair in the positive sample set and a distance of each negative sample pair in the negative sample set.

In a preferred embodiment of the invention, the electronic device calculates the Euclidean distance for each positive sample pair, using the Euclidean distance for each positive sample pair as the distance for each positive sample pair. The electronic device calculates an Euclidean distance for each negative sample pair, and uses the Euclidean distance of each negative sample pair as the distance of each of the negative sample pairs. The expression of the distance of each positive sample pair and the distance of each negative sample pair is not limited to the Euclidean distance, and may be other distance forms, and the present invention does not impose any limitation.

S12. The electronic device determines a distance distribution of the positive sample set according to a distance of each positive sample pair in the positive sample set, where the distance distribution of the positive sample set represents a relationship between a positive sample pair appearance frequency and a distance.

In a preferred embodiment of the invention, the distance distribution of the positive sample set comprises a plurality of distance points, each distance point corresponding to a positive sample pair appearance frequency. For example, the positive sample set has 100 positive sample pairs, at a distance of 0.2, corresponding to 30 positive sample pairs.

S13. The electronic device determines a distance distribution of the negative sample set according to a distance of each negative sample pair in the negative sample set, and the distance distribution of the negative sample set represents a relationship between a negative sample pair appearance frequency and a distance.

In a preferred embodiment of the invention, the distance distribution of the negative sample set comprises a plurality of distance points, each distance point corresponding to a negative sample pair appearance frequency. For example, the negative sample set has 100 negative sample pairs, at a distance of 0.5, corresponding to 20 negative sample pairs.

S14. The electronic device determines a weight distribution of the training sample based on a distance distribution of the positive sample set and a distance distribution of the negative sample set.

In the present invention, when there is no overlapping portion of the distance distribution of the positive sample set and the distance distribution of the negative sample set, it indicates that there is no sample pair of the classification error in the positive sample set and the negative sample set. When the distance distribution of the positive sample set overlaps with the distance distribution of the negative sample set, it indicates that there is a sample pair of the classification error in the positive sample set and the negative sample set. A sample pair corresponding to the distance of the intersection of the positive sample set and the distance overlap of the distance distribution of the negative sample set represents a sample pair that is misclassified. Therefore, in the subsequent training process, it is necessary to increase the weight of the sample pair of the classification error, so that the contribution rate of the sample of the classification error to the modified model parameter and the contribution rate of the expression ability of the model can be increased.

For example, as shown in FIG. 2, a schematic diagram of a distance distribution of a positive sample set and a distance distribution of the negative sample set, wherein the distance of each positive sample pair in the positive sample set is represented by a Euclidean distance, the negative The distance of each negative sample pair in the sample set is represented by the Euclidean distance. Of course, it can also be represented by other distances. This example cannot be used as a limitation on the distance calculation method. The sample pairs corresponding to the distance between the distance A and the distance B are all sample pairs that are misclassified. If the total number of positive sample pairs is 1000, the total number of negative sample pairs is 2000, the positive sample pair corresponding to A is 0.02, and the positive sample corresponding to point A is 20, distance A. The corresponding negative sample pair has a frequency of 0.15, and the negative sample pair corresponding to point A has 300. If the distance of a target sample pair is equal to the distance A, the target sample pair may belong to a positive sample pair or a negative sample pair, and thus the target sample pair may be classified incorrectly.

Preferably, the determining, according to the distance distribution of the positive sample set and the distance distribution of the negative sample set, determining a weight distribution of the training sample includes:

Determining a first sample set of the classification error based on the distance distribution of the positive sample set and the distance distribution of the negative sample set; and increasing each sample in the first sample set in the weight distribution of the training sample Weight of the pair; and/or

Determining a second sample set that is correctly classified based on a distance distribution of the positive sample set and a distance distribution of the negative sample set; and reducing a sample pair of the second sample set in a weight distribution of the training sample Weights.

In the above embodiment, the loss function and the training are performed in the subsequent model training process by increasing the weight of the misclassified sample pair in the weight distribution of the training sample, and/or reducing the weight of the correctly classified sample. The weight distribution of the sample is associated, and the loss function is established based on the weight distribution of the training sample, which can increase the contribution of the sample with the wrong classification to the network loss, thereby better correcting the model parameters and improving the expression of the model parameters. ability.

Preferably, the weight distribution of the training samples is a normal distribution. The parameter of the normal distribution is configured to achieve a weighting of a pair of samples that increase the classification error, and/or to reduce the weight of the sample that is correctly classified. The normal distribution represents the relationship between the distance of the pair of samples and the weight. The parameters of the normal distribution include, but are not limited to, mean, standard deviation.

Further, when the maximum distance of the positive sample pair in the positive sample set is smaller than the minimum distance of the negative sample pair in the negative sample set, when determining the weight distribution of the training sample, the method further includes: maximizing the maximum The mean of the distance from the minimum distance is determined as the mean of the weight distribution of the training samples. When the maximum distance of the positive sample pair in the positive sample set is smaller than the minimum distance of the negative sample pair in the negative sample set, it means that there is no sample pair of the classification error in the positive sample set and the negative sample set.

For example, as shown in FIG. 3, the distance of each positive sample pair in the positive sample set is represented by a Euclidean distance, and the distance of each negative sample pair in the negative sample set is represented by a Euclidean distance, and of course other distances may be used. It is indicated that this example cannot be used as a limitation on the way the distance is calculated. The distance corresponding to the maximum distance C point of the positive sample pair in the positive sample set is smaller than the distance corresponding to the minimum distance D point of the negative sample pair in the negative sample set. If there is no crossover portion of the distance distribution of the positive sample set and the distance distribution of the negative sample set, then there is no sample pair with the classification error in the positive sample set and the negative sample set.

Further, when the maximum distance of the positive sample pair in the positive sample set is equal to the minimum distance of the negative sample pair in the negative sample set, when determining the weight distribution of the training sample, the method further includes: maximizing the maximum The mean of the distance from the minimum distance is determined as the mean of the weight distribution of the training samples.

Further, when the maximum distance of the positive sample pair in the positive sample set is greater than the minimum distance of the negative sample pair in the negative sample set, when determining the weight distribution of the training sample, the method further includes:

The distance corresponding to the absolute value of the difference between the frequency at which the positive sample pair appears and the frequency at which the negative sample pair appears is taken as the mean of the weight distribution of the training sample. When the maximum distance of the positive sample pair in the positive sample set is greater than the minimum distance of the negative sample pair in the negative sample set, it means that there is a sample pair of the classification error in the positive sample set and the negative sample set.

For example, as shown in FIG. 2, the distance corresponding to the maximum distance B point of the positive sample pair in the positive sample set is smaller than the distance corresponding to the minimum distance A point of the negative sample pair in the negative sample set. If there is a crossover portion of the distance distribution of the positive sample set and the distance distribution of the negative sample set, then there is no sample pair with the classification error in the positive sample set and the negative sample set. A distance corresponding to the distance distribution of the positive sample set and the distance distribution of the negative sample set from the intersection E is taken as the mean of the normal distribution. The absolute value of the difference between the frequency of the positive sample pair and the frequency of occurrence of the negative sample pair is the frequency value F corresponding to the intersection E.

According to the above embodiment, in the average of the weight distribution of the training samples, the sample pairs corresponding to the distances included in the vicinity of the mean value of the weight distribution of the training samples are classified incorrectly. In the weight distribution of the training samples (ie, the normal distribution), the weight of the sample pairs corresponding to the distance of the mean of the weight distribution of the training samples is larger, thereby increasing the weight of the sample pairs that increase the classification error. And/or reducing the weight of the sample with the correct classification, so that in the subsequent model training process, the loss function is associated with the weight distribution of the training sample, and the loss function is established based on the weight distribution of the training sample, which may be increased The contribution of the sample with the wrong classification to the network loss can better correct the model parameters and improve the expression ability of the model parameters.

According to the example of the two figures in FIG. 2 above, the sample pairs corresponding between the distance A and the distance E are sample pairs that are misclassified, and the sample pairs corresponding between the distance B and the distance E are also classified incorrectly. Sample pair. Therefore, in the normal distribution, the pair of samples corresponding between the distance A and the distance E and the pair of samples corresponding to the distance B and the distance E have a higher weight than the pair of samples that can be correctly classified.

Further, between the minimum distance and the maximum distance, an optimal distance value that minimizes the absolute value of the difference between the frequency of occurrence of the positive sample pair and the frequency of occurrence of the negative sample pair is required. Preferably, when determining the mean value of the weight distribution of the training samples, the method further includes:

Further, the preset step size is equal to (the maximum distance - minimum distance) / n, and the n is a positive number. Of course, the preset step size may also be other forms of step size, and the present invention does not impose any limitation.

Further, the iterative termination condition includes, but is not limited to, a preset error.

Specifically, the initial iteration is the initial iteration, and the iterative search is performed based on the preset step step. In the current iteration, the distance represented by the current mean μ is calculated, and the frequency of the positive sample pair and the negative sample pair appear. Whether the absolute value of the difference between the frequencies is less than the preset error. If the preset value is less than the preset error, the current average value plus the preset step size is assigned to the current mean value μ, that is, (μ+step) is assigned to μ, and the determination continues. Whether the absolute value of the difference between the frequency of the positive sample pair and the frequency of the negative sample pair is less than the preset error until the absolute value of the difference between the frequency of the occurrence of the positive sample pair and the frequency of the negative sample pair is greater than the preset error Then, the search for the mean value is stopped, and the optimal distance value corresponding to the last iteration is output as the mean value of the weight distribution of the training sample.

In the present invention, as the model is trained, the model expression ability is continuously enhanced, so the weight of the sample pair (ie, the first sample set) of the classification error should also be gradually increased, that is, the normal distribution needs to be reduced (ie, The standard deviation of the weight distribution of the training samples. For the normal distribution, the smaller the standard deviation, the steeper the normal peak, that is, the closer the weight of the sample pair at the distance indicated by the mean value, so that a sample pair that gradually increases the classification error can be realized (ie, The weight of the first sample set).

Since the standard deviation of the sample-to-distance in the positive sample set is gradually smaller as the model training, the standard deviation of the normal distribution can be configured according to the standard deviation of the distance from the sample in the positive sample set. Preferably, the method further comprises: updating a standard deviation of weight distributions of the training samples in each training process according to a standard deviation of distances between pairs of positive samples in the positive sample set. In this way, the standard deviation in the weight distribution of the training samples gradually decreases with the increase of the number of training times in the model training process, so that the weight of the indistinguishable samples gradually increases, and the expression ability and convergence speed of the model are improved. .

According to the above technical solution, the present invention acquires a training sample, where the training sample includes a positive sample set and a negative sample set, the positive sample set includes a positive sample pair and the negative sample set includes a negative sample pair; and the positive sample is calculated Concentrating the distance of each positive sample pair and the distance of each negative sample pair in the negative sample set; determining a distance distribution of the positive sample set according to the distance of each positive sample pair in the positive sample set, the positive The distance distribution of the sample set represents a relationship between the occurrence frequency of the positive sample and the distance; determining the distance distribution of the negative sample set according to the distance of each negative sample pair in the negative sample set, the distance distribution of the negative sample set is negative The relationship between the appearance frequency and the distance of the sample; the weight distribution of the training sample is determined based on the distance distribution of the positive sample set and the distance distribution of the negative sample set. The invention can increase the weight of the sample pairs with the wrong classification, and in the subsequent training process, the contribution rate of the sampled errors to the modified model parameters, the improvement of the expression ability of the model can be increased, and the accuracy of the model parameters can be improved. .

4 is a flow chart of a preferred embodiment of the model training method of the present invention. The order of the steps in the flowchart may be changed according to different requirements, and some steps may be omitted.

S40. The electronic device acquires a training sample.

S41. The electronic device trains a model parameter by using a loss function and a preset training algorithm based on the training sample, wherein the loss function is associated with a weight distribution of the training sample.

Preferably, the weight distribution of the training samples is obtained by using the sample weight allocation method described in any of the above embodiments. It will not be detailed here.

Preferably, the preset training algorithm includes, but is not limited to: a convolutional neural network algorithm.

In the present invention, the loss function increases the contribution rate of the sampled error pair to the target loss by the weight distribution of the training sample. Preferably, the method further comprises: using the loss function, increasing a contribution rate of the sample error of the classification error to the target loss, thereby improving the contribution rate of the sample error pair to the modified model parameter, and improving the expression ability of the model. The contribution rate makes the model more focused on the classification of wrong samples during the training process, which increases the expression ability and convergence speed of the model.

As can be seen from the above technical solution, the present invention acquires a training sample, and based on the training sample, trains a model parameter using a loss function and a preset training algorithm, wherein the loss function is associated with a weight distribution of the training sample. The weight distribution of the training samples is obtained using the sample weight allocation method described in any of the above embodiments. In the present invention, the weight distribution of the training samples is in the process of model training, and the weights of the sample pairs that are misclassified gradually become larger. Therefore, when the model parameters are trained, the loss function can be used to improve the sample of the classification errors. The contribution rate to the modified model parameters and the ability to improve the expression of the model enable the model to focus more on the misclassified samples during the training process, increasing the expression ability and convergence speed of the model, and improving the accuracy of the model parameters. .

For the example of the application scenario of the above model training, the following examples are only an example and cannot be used as a model limitation.

The face feature expression model is trained using the model training method described in FIG. 4, wherein each positive sample pair in the positive sample set represents a face sample pair representing the same person. The trained face feature expression model is used to extract the features of the image to be detected, so that the accuracy of face recognition can be improved.

Specifically, the to-be-detected picture is obtained, and the feature of the to-be-detected picture is extracted by using the trained face feature expression model, and the face to be detected is subjected to face recognition based on the feature of the picture to be detected.

The face feature expression model trained by the present invention can increase the weight of the sample pairs that are misclassified, and reduce the weight of the sample pairs that have been correctly classified, thereby increasing the expression ability and convergence speed of the face feature expression model. Thereby improving the accuracy of face recognition.

As shown in Fig. 5, a functional block diagram of a preferred embodiment of the sample weight distribution device of the present invention. The sample weight distribution device 11 includes an acquisition module 100, a calculation module 101, and a determination module 102. The unit referred to in the present invention refers to a series of computer program segments that can be executed by the processor of the sample weight distribution device 11 and that can perform a fixed function, which is stored in the memory. In the present embodiment, the functions of the respective units will be described in detail in the subsequent embodiments.

The obtaining module 100 acquires a training sample, where the training sample includes a positive sample set and a negative sample set, the positive sample set includes a positive sample pair and the negative sample set includes a negative sample pair.

The calculation model 101 calculates the distance of each positive sample pair in the positive sample set and the distance of each negative sample pair in the negative sample set.

In a preferred embodiment of the invention, the calculation model 101 calculates the Euclidean distance for each positive sample pair, using the Euclidean distance for each positive sample pair as the distance for each positive sample pair. The calculation model 101 calculates the Euclidean distance for each negative sample pair, and takes the Euclidean distance of each negative sample pair as the distance of each of the negative sample pairs. The expression of the distance of each positive sample pair and the distance of each negative sample pair is not limited to the Euclidean distance, and may be other distance forms, and the present invention does not impose any limitation.

The determining module 102 determines a distance distribution of the positive sample set according to a distance of each positive sample pair in the positive sample set, and the distance distribution of the positive sample set represents a relationship between a positive sample pair appearance frequency and a distance.

The determining module 102 determines a distance distribution of the negative sample set according to a distance of each negative sample pair in the negative sample set, and the distance distribution of the negative sample set represents a relationship between a negative sample pair appearance frequency and a distance.

The determining module 102 determines a weight distribution of the training samples based on a distance distribution of the positive sample set and a distance distribution of the negative sample set.

For example, as shown in FIG. 2, a schematic diagram of a distance distribution of a positive sample set and a distance distribution of the negative sample set in an example, the sample pairs corresponding to the distance between the distance A and the distance B are all samples with incorrect classification. Correct. If the total number of positive sample pairs is 1000, the total number of negative sample pairs is 2000, the positive sample pair corresponding to A is 0.02, and the positive sample corresponding to point A is 20, distance A. The corresponding negative sample pair has a frequency of 0.15, and the negative sample pair corresponding to point A has 300. If the distance of a target sample pair is equal to the distance A, the target sample pair may belong to a positive sample pair or a negative sample pair, and thus the target sample pair may be classified incorrectly.

Preferably, the determining module 102 determines, according to the distance distribution of the positive sample set and the distance distribution of the negative sample set, that the weight distribution of the training sample comprises:

Further, when the maximum distance of the positive sample pair in the positive sample set is smaller than the minimum distance of the negative sample pair in the negative sample set, when determining the weight distribution of the training sample, the determining module 102 is further configured to: The mean of the maximum distance and the minimum distance is determined as the mean of the weight distribution of the training samples. When the maximum distance of the positive sample pair in the positive sample set is smaller than the minimum distance of the negative sample pair in the negative sample set, it means that there is no sample pair of the classification error in the positive sample set and the negative sample set.

For example, as shown in FIG. 3, the distance corresponding to the maximum distance C point of the positive sample pair in the positive sample set is smaller than the distance corresponding to the minimum distance D point of the negative sample pair in the negative sample set. If there is no crossover portion of the distance distribution of the positive sample set and the distance distribution of the negative sample set, then there is no sample pair with the classification error in the positive sample set and the negative sample set.

Further, when the maximum distance of the positive sample pair in the positive sample set is equal to the minimum distance of the negative sample pair in the negative sample set, when determining the weight distribution of the training sample, the determining module 102 is further configured to: The mean of the maximum distance and the minimum distance is determined as the mean of the weight distribution of the training samples.

Further, between the minimum distance and the maximum distance, an optimal distance value that minimizes the absolute value of the difference between the frequency of occurrence of the positive sample pair and the frequency of occurrence of the negative sample pair is required. Preferably, when determining the mean value of the weight distribution of the training samples, the determining module 102 is further configured to:

Figure 6 is a functional block diagram of a preferred embodiment of the model training device of the present invention. The model training device 61 includes a data acquisition module 600 and the training module 601. The unit referred to in the present invention refers to a series of computer program segments that can be executed by the processor of the model training device 61 and that can perform fixed functions, which are stored in the memory. In the present embodiment, the functions of the respective units will be described in detail in the subsequent embodiments.

The data acquisition module 600 acquires training samples.

The training module 601 trains model parameters based on the training samples using a loss function and a preset training algorithm, wherein the loss function is associated with a weight distribution of the training samples.

In the present invention, the loss function increases the contribution rate of the sampled error pair to the target loss by the weight distribution of the training sample. Preferably, the training module 601 is further configured to: use the loss function to increase a contribution rate of the sample error of the classification error to the target loss, thereby improving the contribution rate of the sample pair of the classification error to the modified model parameter, and improving the model. The contribution rate of expressive ability enables the model to focus more on the misclassified samples during the training process, increasing the expressive ability and convergence speed of the model.

The above-described integrated unit implemented in the form of a software function module can be stored in a computer readable storage medium. The above software functional modules are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the method of each embodiment of the present invention. Part of the steps.

As shown in Fig. 7, the electronic device 3 comprises at least one transmitting device 31, at least one memory 32, at least one processor 33, at least one receiving device 34 and at least one communication bus. Wherein, the communication bus is used to implement connection communication between these components.

The electronic device 3 is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance, and the hardware includes but is not limited to a microprocessor and an application specific integrated circuit (ASIC). ), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc. The electronic device 3 may also comprise a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud computing-based cloud composed of a large number of hosts or network servers, where the cloud computing is distributed computing. A super virtual computer consisting of a group of loosely coupled computers.

The electronic device 3 can be, but is not limited to, any electronic product that can interact with a user through a keyboard, a touch pad, or a voice control device, such as a tablet, a smart phone, or a personal digital assistant (Personal Digital Assistant). , PDA), smart wearable devices, camera equipment, monitoring equipment and other terminals.

The network in which the electronic device 3 is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (VPN), and the like.

The receiving device 34 and the transmitting device 31 may be wired transmission ports, or may be wireless devices, for example, including antenna devices, for performing data communication with other devices.

The memory 32 is used to store program code. The memory 32 may be a circuit having a storage function, such as a RAM (Random-Access Memory), a FIFO (First In First Out), or the like, which has no physical form in the integrated circuit. Alternatively, the memory 32 may also be a memory having a physical form, such as a memory stick, a TF card (Trans-flash Card), a smart media card, a secure digital card, a flash memory card. Storage devices such as (flash card) and the like.

The processor 33 can include one or more microprocessors, digital processors. The processor 33 can call program code stored in the memory 32 to perform related functions. For example, the various units described in FIGS. 5 and/or FIG. 6 are program code stored in the memory 32 and executed by the processor 33 to implement a sample weight distribution method, and/or model training. method. The processor 33, also known as a central processing unit (CPU), is a very large-scale integrated circuit, which is a computing core (Core) and a control unit (Control Unit).

The embodiment of the present invention further provides a computer readable storage medium having stored thereon computer instructions, when executed by an electronic device including one or more processors, causing the electronic device to perform the method embodiment as described above Sample weight distribution method.

As shown in FIG. 1, the memory 32 in the electronic device 3 stores a plurality of instructions to implement a sample weight allocation method, and the processor 33 can execute the plurality of instructions to implement:

Obtaining a training sample, the training sample comprising a positive sample set and a negative sample set, the positive sample set comprising a positive sample pair and the negative sample set comprising a negative sample pair; calculating a distance of each positive sample pair in the positive sample set And a distance of each negative sample pair in the negative sample set; determining a distance distribution of the positive sample set according to a distance of each positive sample pair in the positive sample set, the distance distribution of the positive sample set representing a positive sample The relationship between the appearance frequency and the distance; determining the distance distribution of the negative sample set according to the distance of each negative sample pair in the negative sample set, the distance distribution of the negative sample set indicating the relationship between the appearance frequency and the distance of the negative sample pair And determining a weight distribution of the training sample based on a distance distribution of the positive sample set and a distance distribution of the negative sample set.

The plurality of instructions corresponding to the sample weight assignment method are stored in the memory 32 in any of the embodiments and are executed by the processor 33 and will not be described in detail herein.

As shown in FIG. 4, the memory 32 in the electronic device 3 stores a plurality of instructions to implement a sample weight allocation method, and the processor 33 can execute the plurality of instructions to: acquire a training sample; The training sample, the model parameter is trained using a loss function and a preset training algorithm, wherein the loss function is associated with a weight distribution of the training sample, and the weight distribution of the training sample is trained using the model described in any embodiment The method is obtained.

A plurality of instructions corresponding to the model training method are stored in the memory 32 in any of the embodiments and executed by the processor 33 and will not be described in detail herein.

The above-described characteristic means of the present invention can be implemented by an integrated circuit and control the function of implementing the sample weight distribution method in any of the above embodiments. That is, the integrated circuit of the present invention is installed in the electronic device, so that the electronic device functions to acquire a training sample, the training sample includes a positive sample set and a negative sample set, and the positive sample set includes a positive sample pair And the negative sample set includes a negative sample pair; calculating a distance of each positive sample pair in the positive sample set, and a distance of each negative sample pair in the negative sample set; according to each positive sample pair in the positive sample set a distance, a distance distribution of the positive sample set, the distance distribution of the positive sample set representing a relationship between a frequency of occurrence of a positive sample pair and a distance; determining the negative according to a distance of each negative sample pair in the negative sample set a distance distribution of the sample set, the distance distribution of the negative sample set representing a relationship between the appearance frequency of the negative sample pair and the distance; determining the training sample based on the distance distribution of the positive sample set and the distance distribution of the negative sample set Weight distribution.

The functions that can be implemented by the sample weight distribution method in any of the embodiments can be installed in the electronic device by the integrated circuit of the present invention, so that the electronic device can perform the sample weight distribution method in any embodiment. The functions implemented are not detailed here.

The above-described characteristic means of the present invention can be implemented by an integrated circuit and control the function of implementing the model training method in any of the above embodiments. That is, the integrated circuit of the present invention is installed in the electronic device, so that the electronic device performs the following functions: acquiring training samples; and training model parameters based on the training samples using a loss function and a preset training algorithm, wherein the loss A function is associated with a weight distribution of the training samples, the weight distribution of the training samples being obtained using a model training method as described in any of the embodiments.

The functions that can be implemented by the model training method in any of the embodiments can be installed in the electronic device by the integrated circuit of the present invention, so that the electronic device can be implemented by the model training method in any embodiment. Function, no longer detailed here.

It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

In the several embodiments provided herein, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the embodiments are modified, or some of the technical features are replaced by equivalents; and the modifications or substitutions do not deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

A sample weight distribution method, wherein the method comprises:

Obtaining a training sample, the training sample comprising a positive sample set and a negative sample set, the positive sample set comprising a positive sample pair and the negative sample set comprising a negative sample pair;

Calculating a distance of each positive sample pair in the positive sample set, and a distance of each negative sample pair in the negative sample set;

Determining a distance distribution of the positive sample set according to a distance of each positive sample pair in the positive sample set, the distance distribution of the positive sample set indicating a relationship between a frequency of occurrence of a positive sample pair and a distance;

Determining a distance distribution of the negative sample set according to a distance of each negative sample pair in the negative sample set, the distance distribution of the negative sample set indicating a relationship between a frequency of occurrence of a negative sample pair and a distance;

And determining a weight distribution of the training sample based on a distance distribution of the positive sample set and a distance distribution of the negative sample set.
The sample weight allocation method according to claim 1, wherein the determining the weight distribution of the training sample based on the distance distribution of the positive sample set and the distance distribution of the negative sample set comprises:

Determining a first sample set of the classification error based on the distance distribution of the positive sample set and the distance distribution of the negative sample set;

In the weight distribution of the training samples, increasing the weight of each sample pair in the first sample set; and/or

Determining a correctly classified second sample set based on the distance distribution of the positive sample set and the distance distribution of the negative sample set;

In the weight distribution of the training samples, the weight of each sample pair in the second sample set is reduced.
The sample weight assignment method according to claim 1, wherein a weight distribution of the training samples is a normal distribution, and a maximum distance of a positive sample pair in the positive sample set is less than or equal to a negative sample pair in the negative sample set. At the minimum distance, when determining the weight distribution of the training sample, the method further includes:

The mean of the maximum distance and the minimum distance is determined as the mean of the weight distribution of the training samples.
The sample weight assignment method according to claim 1, wherein a weight distribution of the training samples is a normal distribution, and a maximum distance of a positive sample pair in the positive sample set is greater than a minimum distance of a negative sample pair in the negative sample set When determining the weight distribution of the training sample, the method further includes:

And using a distance value corresponding to the intersection of the distance distribution of the positive sample set and the distance distribution of the negative sample set as the mean value of the weight distribution of the training sample; or

The distance corresponding to the absolute value of the difference between the frequency at which the positive sample pair appears and the frequency at which the negative sample pair appears is taken as the mean of the weight distribution of the training sample.
The sample weight allocation method according to claim 4, wherein when determining the mean value of the weight distribution of the training samples, the method further comprises:

Configure preset step size, initial mean value, and iteration termination condition;

And performing, according to the initial mean value and the preset step size, an iterative search for an optimal distance value satisfying the iterative termination condition in an interval composed of the minimum distance and the maximum distance, where the optimal distance value is positive The absolute value of the difference between the frequency of occurrence of the sample pair and the frequency of occurrence of the negative sample pair is minimal.
The sample weight distribution method according to any one of claims 1 to 5, wherein the weight distribution of the training sample is a normal distribution, and when determining the weight distribution of the training sample, the method further includes:

Obtaining a standard deviation of the distance between pairs of positive samples in the positive sample set during each training session;

The standard deviation of the weight distribution of the training samples in each training process is updated according to the standard deviation of the distance between the positive sample pairs in the positive sample set.
A model training method, wherein the method comprises:

Obtain training samples;

Model parameters are trained based on the training samples using a loss function and a preset training algorithm, wherein the loss function is associated with a weight distribution of the training samples, and the weight distribution of the training samples utilizes any one of claims 1 to 6. The sample weight distribution method described in the item is obtained.
The sample training method according to claim 7, wherein the method further comprises:

Using the loss function, the contribution rate of the sampled error to the target loss is increased.
An electronic device, comprising: a memory for storing at least one instruction, the processor for executing the at least one instruction to implement any one of claims 1 to 6 The sample weight distribution method, and/or the model training method according to any one of claims 7 or 8.
A computer readable storage medium, wherein the computer readable storage medium stores at least one instruction, and the at least one instruction is executed by a processor to implement the sample weight distribution method according to any one of claims 1 to 6. And/or the model training method according to any one of claims 7 or 8.