CN111160394A

CN111160394A - Training method and device of classification network, computer equipment and storage medium

Info

Publication number: CN111160394A
Application number: CN201911234399.9A
Authority: CN
Inventors: 艾江波
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-05-15

Abstract

The invention relates to a training method and device of a classification network, computer equipment and a storage medium. The method comprises the following steps: determining a first weight factor of the sample data of each category in the sample set according to the number of the sample data of each category in the sample set; determining a second weight factor of each sample data according to the gradient value of each sample data in the sample set; determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data, and updating the parameters in the classification network according to the loss value. According to the scheme, the value of the loss function containing the first weight factor and the second weight factor is adopted to train the classification network, the problems that the class of a training sample is unbalanced and the sample data is unbalanced due to the existence of a difficult sample or a noise sample are solved, the robustness of the trained classification network is improved, and the training efficiency of the trained classification network is improved.

Description

Training method and device of classification network, computer equipment and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a method and an apparatus for training a classification network, a computer device, and a storage medium.

Background

With the rapid development and gradual maturity of deep learning methods, more and more scenes need to use the deep learning methods to solve the related technical problems, and the deep learning methods are established on the basis of massive data quantity, and the phenomenon of data imbalance is increasingly obvious along with the increase of the data quantity.

A great number of methods for solving the data imbalance have emerged in this year, for example, down-sampling data with a large number of classes to reduce the amount of data in the class. Alternatively, the up-sampling process is performed on the data with a small number of categories, or the data with the category is subjected to the augmentation process to increase the data amount of the category.

However, the above method for solving the data imbalance generally has a problem that the model is unstable in the training process, so that the training cannot be completed normally.

Disclosure of Invention

In view of the foregoing, there is a need to provide a method and an apparatus for training a classification network, a computer device, and a storage medium, which can effectively solve the problem of sample data imbalance.

In a first aspect, a method for training a classification network, the method comprising:

determining a first weight factor of the sample data of each category in the sample set according to the number of the sample data of each category in the sample set;

determining a second weight factor of each sample data according to the gradient value of each sample data in the sample set;

determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data, and updating parameters in the classification network according to the loss value.

In one embodiment, determining the first weight factor of the sample data of each class in the sample set according to the number of the sample data of each class in the sample set includes:

determining the number of effective samples of each category in the sample set according to the number of sample data of each category in the sample set;

and determining a first weight factor of the sample data of each category according to the reciprocal of the number of the effective samples of each category.

In one embodiment, determining the number of valid samples of each class in the sample set according to the number of sample data of each class in the sample set includes:

and determining the number of effective samples of each category in the sample set according to the number of the sample data of each category in the sample set and a preset constant coefficient.

In one embodiment, determining the second weighting factor for each sample data according to the gradient value of each sample data in the sample set includes:

acquiring gradient values of each sample datum;

determining the number of samples in different gradient distribution areas according to the distribution trend of the gradient values of each sample datum;

and determining a second weight factor of the sample data in each gradient distribution area according to the reciprocal of the number of the samples in each gradient distribution area.

In one embodiment, obtaining the gradient value of each sample data comprises:

obtaining each sample data and a label corresponding to each sample data;

inputting each sample data into a classification network to obtain a classification result corresponding to each sample data;

and obtaining the gradient value of each sample data according to the label corresponding to each sample data, the classification result corresponding to each sample data and a preset gradient calculation function.

In one embodiment, determining the number of samples in different gradient distribution regions according to the distribution trend of the gradient values includes:

dividing the distribution trend of the gradient values of each sample data to obtain a plurality of gradient distribution areas;

and counting the number of samples corresponding to the gradient value in each gradient distribution area to obtain the number of samples in different gradient distribution areas.

In one embodiment, determining a loss value of the classification network according to the first weighting factor, the second weighting factor, the label corresponding to each sample data, and the classification result corresponding to each sample data includes:

multiplying the first weight factor and the second weight factor to obtain a target weight factor;

and determining a loss value of the classification network according to the target weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data.

In a second aspect, an apparatus for training a classification network, the apparatus comprising:

the first determining module is used for determining a first weight factor of the sample data of each category in the sample set according to the number of the sample data of each category in the sample set;

the second determining module is used for determining a second weight factor of each sample data according to the gradient value of each sample data in the sample set;

and the training module is used for determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data, and updating parameters in the classification network according to the loss value.

In a third aspect, a computer device includes a memory and a processor, where the memory stores a computer program, and the processor implements the training method for the classification network according to any one of the embodiments of the first aspect when executing the computer program.

In a fourth aspect, a computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the method for training a classification network according to any one of the embodiments of the first aspect.

The application provides a training method, a training device, computer equipment and a storage medium for a classification network, which comprise the following steps: determining a first weight factor of the sample data of each category in the sample set according to the number of the sample data of each category in the sample set; determining a second weight factor of each sample data according to the gradient value of each sample data in the sample set; determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data, and updating the parameters in the classification network according to the loss value. Since the first weighting factor is determined by the number of sample data of each class, the sample data can be balanced according to the class of the sample. And because the second weighting factor is related to the gradient distribution condition of each sample data, namely related to the difficult samples or the noise samples existing in the sample set, the gradient of the difficult samples or the noise samples generally has a larger value, therefore, the second weighting factor can distinguish the difficult samples or the noise samples in the sample set, so that the difficult samples or the noise samples can be given with smaller weights, and the problem of unbalanced sample data of the difficult samples or the noise samples can be solved by restricting the difficult samples or the noise samples in the training process. In conclusion, the scheme trains the classification network according to the loss values obtained by the first weight factor and the second weight factor, solves the problems of unbalanced class of the training samples and unbalanced sample data caused by the existence of difficult samples or noise samples, improves the robustness of the trained classification network, and improves the training efficiency of the trained classification network.

Drawings

FIG. 1 is a schematic diagram illustrating an internal structure of a computer device according to an embodiment;

FIG. 2 is a flow diagram of a method for training a classification network according to an embodiment;

FIG. 3 is a flowchart of another implementation of S101 in the embodiment of FIG. 2;

FIG. 4 is a flow chart of another implementation of S102 in the embodiment of FIG. 2;

FIG. 5 is a flowchart of another implementation of S301 in the embodiment of FIG. 4;

FIG. 6 is a flow chart of another implementation of S402 in the embodiment of FIG. 5;

FIG. 7 is a flowchart of another implementation of S302 in the embodiment of FIG. 4;

FIG. 8 is a flowchart of another implementation of S103 in the embodiment of FIG. 2;

FIG. 9 is a diagram of a training system for a classification network, according to an embodiment;

FIG. 10 is a schematic structural diagram of a training apparatus of a classification network according to an embodiment;

FIG. 11 is a schematic structural diagram of a training apparatus of a classification network according to an embodiment;

FIG. 12 is a schematic structural diagram of a training apparatus of a classification network according to an embodiment;

fig. 13 is a schematic structural diagram of a training apparatus of a classification network according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The training method of the classification network provided by the application can be applied to computer equipment shown in fig. 1. The computer device may be a server, the computer device may also be a terminal, and the internal structure diagram of the computer device may be as shown in fig. 1. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of training a classification network. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

At present, a data processing method for solving the problem of sample data imbalance includes: the data volume of sample sets of different types is changed by adopting a down-sampling or up-sampling mode, or the data weight of the samples of different types is simply assigned, so that the problem of unbalanced data volume of the sample data is solved. For example, existing methods do not consider the existence of a hard sample or a noise sample, and when a noise sample exists in a few categories, if a large weight is given, the model training is unstable and falls into local optimization or even fails to converge. The application provides a training method and device of a classification network, computer equipment and a storage medium, which are used for solving the problem of unbalance of sample data, mining a noise sample or a difficult sample, and giving a small weight to avoid the influence of the noise sample or the difficult sample on a training model.

The following describes in detail the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems by embodiments and with reference to the drawings. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a flowchart of a method for training a classification network according to an embodiment, where the main execution object of the method is the computer device in fig. 1, and the method relates to a specific process of training the classification network by setting weights of different sample data and by using a loss function including the weights. As shown in fig. 2, the method specifically includes the following steps:

s101, determining a first weight factor of the sample data of each category in the sample set according to the number of the sample data of each category in the sample set.

The value of the first weighting factor is related to the category of the sample data, and different categories of sample data can be assigned with different values of the first weighting factor. In this embodiment, when the computer device obtains the sample set, the sample data in the sample set may be classified to obtain sample data of each category, and then the first weight factor assigned to the sample data of each category may be further obtained according to the number of the sample data of each category. Optionally, the computer device may determine an inverse of the number of each category of sample data as the first weight factor of each category of sample data; optionally, the computer device may also perform weighting processing on the number of each type of sample data, perform reciprocal operation, and determine a reciprocal value after the weighting processing and the reciprocal operation as a first weighting factor; optionally, the computer device may further obtain other related variables according to the number of the sample data of each category, and then obtain the first weight factor of the sample data of each category based on the other related variables.

S102, determining a second weight factor of each sample data according to the gradient value of each sample data in the sample set.

The value of the second weighting factor is related to the attribute of the sample data, for example, whether the sample data belongs to a hard sample or a noise sample, and the sample data with different attributes may be assigned with second weighting factors with different values. In this embodiment, when the computer device obtains the sample set, the sample set may be input to a preset classification network to be trained for training. In the training process, the gradient of each sample data in the sample set can be calculated, and then a second weighting factor given to each sample data is determined by analyzing the gradient of each sample data. It should be noted that, in the training process, the gradient of each sample data may directly reflect the training condition of each sample data, or the attribute of each sample data, for example, the gradient value corresponding to a difficult sample or a noise sample is usually relatively large. Therefore, based on the analysis, the difficult samples or the noise samples can be distinguished according to the gradient values of the sample data, and then the distinguished difficult samples or the distinguished noise samples can be further given smaller weights, so that the influence of the difficult samples or the noise samples on the unbalance of the sample data is avoided.

S103, determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data, and updating parameters in the classification network according to the loss value.

The loss value is a value obtained by calculating after substituting a loss function which is constructed by the computer equipment according to a corresponding optimization method into a value of a variable, and is used for training a classification network. In this embodiment, when the computer device obtains the first weight factor and the second weight factor based on the methods in S101 and S102, the first weight factor and the second weight factor may be used as values of variables to be substituted into a preset loss function for calculation, so as to obtain a value of the loss function, that is, a loss value. And then updating parameters in the classification network according to the loss value, realizing the training of the classification network and finally obtaining the trained classification network. It should be noted that the variables in the loss function include variables representing the first weighting factor and the second weighting factor.

The training method for the classification network provided by the embodiment comprises the following steps: determining a first weight factor of the sample data of each category in the sample set according to the number of the sample data of each category in the sample set; determining a second weight factor of each sample data according to the gradient value of each sample data in the sample set; determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data, and updating the parameters in the classification network according to the loss value. Since the first weighting factor is determined by the number of sample data of each class, the sample data can be balanced according to the class of the sample. And because the second weighting factor is related to the gradient distribution condition of each sample data, namely related to the difficult samples or the noise samples existing in the sample set, the gradient of the difficult samples or the noise samples generally has a larger value, therefore, the second weighting factor can distinguish the difficult samples or the noise samples in the sample set, so that the difficult samples or the noise samples can be given with smaller weights, and the problem of unbalanced sample data of the difficult samples or the noise samples can be solved by restricting the difficult samples or the noise samples in the training process. In conclusion, the scheme trains the classification network according to the loss values obtained by the first weight factor and the second weight factor, solves the problems of unbalanced class of the training samples and unbalanced sample data caused by the existence of difficult samples or noise samples, improves the robustness of the trained classification network, and improves the training efficiency of the trained classification network.

Fig. 3 is a flowchart of another implementation manner of S101 in the embodiment of fig. 2, and as shown in fig. 3, the step S101 "determining a first weighting factor of sample data of each class in the sample set according to the number of sample data of each class in the sample set" includes:

s201, determining the number of effective samples of each category in the sample set according to the number of sample data of each category in the sample set.

When the computer device obtains the number of each type of sample data in the sample set, an effective sample can be further extracted from each type of sample data by a corresponding analysis method to extract an invalid sample or an abnormal sample, and the number of the effective samples is determined for later use.

Optionally, when the computer device executes the method in S201, the number of valid samples in each category in the sample set may be determined according to the number of sample data in each category in the sample set and a preset constant coefficient, and specifically, the number of valid samples is obtained by the following relation (1) or its variant:

E_n＝(1-βⁿ)/(1-β) (1)；

β is a constant coefficient, which can be set according to the actual application requirement, for example, it can be set to 0.999. E_nIndicating the number of valid samples. In this embodiment, when the computer device obtains sample data of each category, the data size n of the sample data of each category may be obtained, and then the number of valid samples included in the sample data of each category may be obtained by using the relational expression (1).

S202, determining a first weight factor of the sample data of each category according to the reciprocal of the number of the effective samples of each category.

When the computer device obtains the number of the effective samples of each category according to the method in S101, the inverse operation may be further performed on the number of the effective samples of each category to obtain an inverse number of the effective samples of each category, and the inverse number of the effective samples of each category is directly used as the first weight factor of the sample data of each category. It can be seen that the larger the number of valid samples included in the sample data of each category is, the smaller the first weighting factor is, and the smaller the number of valid samples included in the sample data of each category is, the larger the first weighting factor is. The first weight factor provided by the scheme is used for giving smaller weight to a large number of classes of samples and giving larger weight to a small number of classes of samples, so that the problem that the class number is extremely unbalanced in the existing sample set can be solved by the weight distribution method, and the problem that training is influenced because samples of some classes are difficult to obtain in the prior art is solved.

Fig. 4 is a flowchart of another implementation manner of S102 in the embodiment of fig. 2, and as shown in fig. 4, the step S102 "determining the second weighting factor of each sample data according to the gradient value of each sample data in the sample set" includes:

s301, obtaining gradient values of each sample datum.

The embodiment relates to a method for acquiring gradient values of sample data in a sample set by computer equipment, which specifically comprises the following steps: and substituting the sample data into a preset gradient calculation function to calculate to obtain the gradient value of the sample data.

S302, determining the number of samples in different gradient distribution areas according to the distribution trend of the gradient values of each sample datum.

The distribution trend of the gradient values of each sample data may represent a distribution of the gradient values of each sample data in a preset region. The preset area is a preset area with a certain range of values, for example, the preset area may be a (0,1) line segment. In this embodiment, when the computer device obtains the gradient value of each sample data, a region including a plurality of gradient distribution regions may be further set, and then the number of samples in different gradient distribution regions may be determined by analyzing samples including samples corresponding to the gradient values in each gradient distribution region.

And S303, determining a second weight factor of the sample data in each gradient distribution area according to the reciprocal of the number of the samples in each gradient distribution area.

When the computer device determines the number of samples distributed in each gradient distribution area, the reciprocal of the number of samples in each gradient distribution area may be obtained by performing an inversion operation on the number of samples in each gradient distribution area, and the reciprocal of the number of samples in each gradient distribution area is determined as the second weight factor of the sample data in each gradient distribution area. It can be seen that the larger the number of samples distributed in the gradient distribution area, the smaller the second weighting factor, and the smaller the number of samples distributed in the gradient distribution area, the larger the second weighting factor. In a general training process, gradient values of large-part data are smaller and are distributed near a 0 value of a gradient distribution area, so that a smaller second weight factor is given to the part of data, meanwhile, most of noise samples or difficult samples have larger gradient values and are distributed near a 1 value of the gradient distribution area, so that a smaller second weight factor is given to the part of data, a method for correspondingly selecting and distributing different second weight factors to samples in various distribution areas according to different gradient distribution areas is realized, and the method distinguishes the difficult samples or the noise samples through the gradient distribution areas, so that the problem of unbalance of the sample data caused by the difficult samples or the noise samples in the prior art is solved.

Fig. 5 is a flowchart of another implementation manner of S301 in the embodiment of fig. 4, and as shown in fig. 5, the step S301 "acquiring gradient values of each sample data" includes:

s401, obtaining each sample data and a label corresponding to each sample data.

The sample data is used for training the classification network, and the label corresponding to each sample data can be obtained by labeling the sample data by computer equipment according to the user requirement in advance. In this embodiment, the computer device may scan the object to be identified through the scanning device to obtain sample data, or may directly download the sample data on the network, and the sample data obtaining method is not limited in this embodiment.

S402, inputting the sample data into a classification network to obtain a classification result corresponding to the sample data.

When the computer device obtains each sample data in the sample set and the label corresponding to each sample data, each sample data can be input to a preset classification network, and a classification result corresponding to each sample data is obtained. Specifically, the classification network may be, but not limited to, a deep convolutional neural network, and specifically, may be a commonly used network for classification, or analysis, or detection, or segmentation, which is not limited in this embodiment,

and S403, obtaining the gradient value of each sample data according to the label corresponding to each sample data, the classification result corresponding to each sample data and a preset gradient calculation function.

The preset gradient calculation function is a gradient calculation function which is constructed in advance by the computer equipment. After the computer device obtains the label corresponding to each sample data and the classification result corresponding to each sample data, the label corresponding to each sample data and the classification result corresponding to each sample data can be further input to a preset gradient calculation function, and the gradient value of each sample data is obtained through calculation. The gradient calculation function may be an existing gradient calculation function, which is not limited in this embodiment, as long as the gradient value of each sample data can be calculated according to the label corresponding to each sample data and the classification result corresponding to each sample data.

In practical applications, the classification network may specifically include a feature extraction network and a classifier, such as the training network shown in fig. 9, and in this application, in an embodiment, the present application provides a method for obtaining a classification result, that is, as shown in fig. 6, fig. 6 is a flowchart of another implementation manner of S402 in the embodiment of fig. 5, and as shown in fig. 6, the step S402 "inputs each sample data into the classification network to obtain a classification result corresponding to each sample data" includes:

and S501, inputting the sample data into a feature extraction network to obtain feature information of the sample data.

The feature extraction network is used to extract feature information in the input data or the image, and the feature extraction network may be a deep convolutional neural network or another type of neural network, such as a V-NET network, a U-NET network, and the like, which is not limited in this embodiment. In this embodiment, when the computer device obtains each sample data, each sample data may be input to a preset feature extraction network for feature extraction, so as to obtain feature information of each sample data.

And S502, inputting the characteristic information into a classifier to obtain a classification result corresponding to each sample data.

The classifier is used to classify or identify each sample data, and the classifier may be composed of a full connection layer or other types of classifiers, which is not limited in this embodiment. In this embodiment, after the computer device performs feature extraction on each sample data to obtain feature information of each sample data, each feature information may be further input to a preset classifier for classification or analysis to obtain a classification result corresponding to each sample data.

Fig. 7 is a flowchart of another implementation manner of S302 in the embodiment of fig. 4, and as shown in fig. 7, the step S302 "acquiring gradient values of each sample data" includes:

s601, dividing the distribution trend of the gradient values of each sample datum to obtain a plurality of gradient distribution areas.

In this embodiment, when the computer device obtains the gradient value of each sample data, the distribution trend of the gradient value of each sample data may be obtained, and then the distribution trend of the gradient value of each sample data may be divided to obtain a plurality of gradient distribution areas, so as to obtain a corresponding sample in each gradient distribution area according to each divided gradient distribution area. It should be noted that, when the distribution trend of the gradient values of each sample data is divided, the distribution trend may be specifically divided into a plurality of regions with different value ranges, and the specific division number may be determined according to the actual application requirements, for example, the computer device may averagely divide a region within a value range of 0 to 1 into 100 regions (the length of each region is 0.01), optionally, the computer device may also divide a region within a value range of 0 to 1 into unequal regions, which is not limited in this embodiment.

S602, counting the number of samples corresponding to the gradient value in each gradient distribution area to obtain the number of samples in different gradient distribution areas.

In this embodiment, after the computer device divides the distribution trend of the gradient value of each sample data to obtain a plurality of gradient distribution areas, the number of samples corresponding to the gradient value in each gradient distribution area may be further counted to obtain the number of samples in different gradient distribution areas for later use.

Fig. 8 is a flowchart of another implementation manner of S103 in the embodiment of fig. 2, and as shown in fig. 8, the step S103 "determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data, and the classification result corresponding to each sample data" includes:

and S701, multiplying the first weight factor and the second weight factor to obtain a target weight factor.

The target weight factor is a value of a variable in the loss function, which is required to be used when calculating the loss value. In practical application, after the computer device obtains the first weight factor and the second weight factor according to the method of the foregoing embodiment, the first weight factor and the second weight factor may be multiplied to obtain a target weight factor, so as to be used later.

S702, determining a loss value of the classification network according to the target weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data.

In this embodiment, when the computer device obtains the target weight factor based on the method in S701, the target weight factor may be input to the preset loss function as an input value of one variable in the preset loss function, the label corresponding to each sample data may be input to the preset loss function as an input value of one variable, and the classification result corresponding to each sample data may be input to the preset loss function as an input value of one variable to be calculated, so as to finally obtain a calculation value of the loss function, that is, a loss value. For later use in training the classification network based on the loss values.

In summary, in practical applications, after obtaining the loss value by using the method described in any of the above embodiments, the classification network needs to be trained by using the loss value, and therefore, the present application also provides a training system for a classification network, as shown in fig. 9, the system includes: the system comprises a feature extraction network, a classifier, a gradient calculation module, a loss function optimization module and a category analysis module. The characteristic extraction network is used for extracting characteristic information in input sample data; the classifier is used for classifying the input characteristic information to obtain a classification result; the gradient calculation module is used for calculating to obtain a gradient value of each sample data according to the input classification result and the label corresponding to the sample data, and determining a second weight factor of each sample data according to the gradient value of each sample data; the category analysis module is used for determining a first weight factor of each category of sample data according to each sample data; the loss function optimization module is used for obtaining a loss value according to the input label of each sample data, the classification result corresponding to each sample data, the first weight factor of each sample data and the second weight factor of each sample data. When training the classification network shown in fig. 9, the computer device may train parameters of the feature extraction network and the classifier according to the obtained loss values to obtain a high-quality classification network, so that the classification network may have high classification accuracy and precision when classifying or identifying the target object.

It should be understood that although the various steps in the flow charts of fig. 2-8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-8 may include multiple sub-steps or phases that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or phases is not necessarily sequential.

In one embodiment, as shown in fig. 10, there is provided a training apparatus for a classification network, including: a first determination module 11, a second determination module 12 and a training module 13, wherein:

a first determining module 11, configured to determine a first weight factor of sample data of each category in the sample set according to the number of the sample data of each category in the sample set;

a second determining module 12, configured to determine a second weighting factor of each sample data according to the gradient value of each sample data in the sample set;

and the training module 13 is configured to determine a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data, and the classification result corresponding to each sample data, and update parameters in the classification network according to the loss value.

In one embodiment, as shown in fig. 11, the first determining module 11 includes: a first acquisition unit 111 and a first determination unit 112, wherein:

a first obtaining unit 111, configured to determine, according to the number of sample data of each category in the sample set, the number of valid samples of each category in the sample set;

a first determining unit 112, configured to determine a first weighting factor of the sample data of each category according to an inverse number of the number of valid samples of each category.

In an embodiment, the first obtaining unit 111 is specifically configured to determine the number of valid samples in each category in the sample set according to the number of sample data in each category in the sample set and a preset constant coefficient.

In one embodiment, as shown in fig. 12, the second determining module 12 includes: a second acquisition unit 121, a second determination unit 122, and a third determination unit 123, wherein:

a second obtaining unit 121 configured to obtain gradient values of each sample data;

the second determining unit 122 is configured to determine the number of samples in different gradient distribution regions according to the distribution trend of the gradient value of each sample data;

the third determining unit 123 is configured to determine a second weight factor of the sample data in each gradient distribution region according to an inverse of the number of samples in each gradient distribution region.

In an embodiment, the second obtaining unit 121 is specifically configured to obtain each sample data and a label corresponding to each sample data; inputting each sample data into a classification network to obtain a classification result corresponding to each sample data; and obtaining the gradient value of each sample data according to the label corresponding to each sample data, the classification result corresponding to each sample data and a preset gradient calculation function.

In an embodiment, the second obtaining unit 121 is further specifically configured to input each sample data to a feature extraction network, so as to obtain feature information of each sample data; and inputting the characteristic information into a classifier to obtain a classification result corresponding to each sample data.

In an embodiment, the second determining unit 122 is specifically configured to divide a distribution trend of the gradient values of each sample data to obtain a plurality of gradient distribution areas; and counting the number of samples corresponding to the gradient value in each gradient distribution area to obtain the number of samples in different gradient distribution areas.

In one embodiment, as shown in fig. 13, the third determining module 13 includes: a fourth determination unit 131 and a training unit 132, wherein:

a fourth determining unit 131, configured to perform a multiplication operation on the first weight factor and the second weight factor to obtain a target weight factor;

and the training unit 132 is configured to determine a loss value of the classification network according to the target weight factor, the label corresponding to each sample data, and the classification result corresponding to each sample data.

For the specific definition of the training apparatus of the classification network, reference may be made to the above definition of the training method of the classification network, and details are not described here. The modules in the training apparatus of the classification network may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

The implementation principle and technical effect of the computer device provided by the above embodiment are similar to those of the above method embodiment, and are not described herein again.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, further implementing the steps of:

The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of training a classification network, the method comprising:

and determining a loss value of the classification network according to the first weight factor, the second weight factor, the label corresponding to each sample data and the classification result corresponding to each sample data, and updating parameters in the classification network according to the loss value.

2. The method according to claim 1, wherein determining a first weighting factor for each class of sample data in the sample set based on the number of each class of sample data in the sample set comprises:

3. The method of claim 2, wherein determining the number of valid samples for each class in the sample set according to the number of sample data for each class in the sample set comprises:

4. A method according to any one of claims 1 to 3, wherein said determining a second weight factor for each sample data in said sample set in dependence on the gradient value for each sample data comprises:

determining a gradient value of each sample data;

determining the number of samples in different gradient distribution areas according to the distribution trend of the gradient values of the sample data;

5. The method of claim 4, wherein said determining a gradient value for each of said sample data comprises:

obtaining each sample data and a label corresponding to each sample data;

inputting each sample data into the classification network to obtain a classification result corresponding to each sample data;

6. The method according to claim 4, wherein the determining the number of samples in different gradient distribution areas according to the distribution trend of the gradient values comprises:

dividing the distribution trend of the gradient value of each sample data to obtain a plurality of gradient distribution areas;

and counting the number of samples corresponding to the gradient values in each gradient distribution area to obtain the number of samples in different gradient distribution areas.

7. The method of claim 1, wherein determining a loss value of a classification network based on the first weighting factor, the second weighting factor, the label associated with each sample data, and the classification result associated with each sample data comprises:

8. An apparatus for training a classification network, the apparatus comprising:

a second determining module, configured to determine a second weighting factor of each sample data according to the gradient value of each sample data in the sample set;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.